The Masakhane Playbook is a community-driven, openly licensed guide for building NLP datasets, models, and tools for African languages.
Mission
To make African languages first-class citizens of modern AI by providing practical, reproducible, and culturally grounded guidance for everyone involved in the dataset lifecycle — from data collection through annotation, modelling, evaluation, and responsible release.
Anchor institutions
The project is anchored at:
- Bayero University, Kano (Nigeria) — coordinates Hausa and West African language workstreams
- Bahir Dar University ICT4D Research Center (Ethiopia) — coordinates Amharic, Ethiopic-script, and East African language workstreams
Partner communities
Built in collaboration with grassroots networks across the continent:
- Masakhane — pan-African NLP community
- EthioNLP — Ethiopian NLP research and language coverage
- HausaNLP — Hausa NLP research and bot-based collection
- Lanfrica — discoverability and knowledge sharing
- Black in AI — outreach and community amplification
- Zindi Africa — competition-driven, incentivised annotation
What we're building
Two complementary public goods:
- The Playbook — this site. An end-to-end guide covering data collection, annotation design, quality assurance, modality-specific tasks, documentation, governance, evaluation, lifecycle management, and community collaboration. Translated into 6 languages.
- MasakhaneTool — an Apache 2.0, mobile-first, offline-capable annotation platform adapted for African contexts: low-bandwidth, multi-script, community-led workflows.
Both are open from day one — there is no closed version, no commercial fork.
Get in touch
- 💬 Discord — join the community at discord.gg/ChNPHV2PPS
- 🗣️ GitHub Discussions — github.com/MasakhaneHubNLP/MasakhanePlaybook/discussions
- 🐛 Bugs / feature requests — GitHub Issues
- 📰 Newsletter — /newsletter (launching shortly)
How to contribute
We welcome contributions from researchers, practitioners, students, language experts, and translators. Pick whichever fits:
- Write a chapter — fill a gap in the Playbook. See How to contribute a chapter.
- Translate — adapt an existing chapter into Hausa, Amharic, Swahili, French, or Portuguese.
- Review — open issues or PRs against existing chapters; correct technical errors, clarify language, suggest references.
- Share a case study — a short post in our blog about a real-world Masakhane project.
How to cite
If you reference the Playbook in research, teaching, or a project, please cite it. See the citation page for BibTeX, APA, MLA, Chicago, and the machine-readable CITATION.cff.
Code of conduct
We follow the Contributor Covenant. Be respectful — especially across language and cultural boundaries — and assume good faith. That's the whole point of the Playbook.
License
- Site content & Playbook chapters — community-maintained, openly licensed (see the repository for full terms)
- MasakhaneTool annotation platform — Apache 2.0
- Datasets produced through the project — licensing handled per dataset; see the relevant chapter on documentation and release
Acknowledgments
The Playbook is the result of contributions from researchers, practitioners, students, language experts, and translators across the continent. A full contributors list is maintained in CITATION.cff and on the GitHub repository.