Skip to main content

The Masakhane Playbook is a community-driven, openly licensed guide for building NLP datasets, models, and tools for African languages.

Mission

To make African languages first-class citizens of modern AI by providing practical, reproducible, and culturally grounded guidance for everyone involved in the dataset lifecycle — from data collection through annotation, modelling, evaluation, and responsible release.

Anchor institutions

The project is anchored at:

  • Bayero University, Kano (Nigeria) — coordinates Hausa and West African language workstreams
  • Bahir Dar University ICT4D Research Center (Ethiopia) — coordinates Amharic, Ethiopic-script, and East African language workstreams

Partner communities

Built in collaboration with grassroots networks across the continent:

  • Masakhane — pan-African NLP community
  • EthioNLP — Ethiopian NLP research and language coverage
  • HausaNLP — Hausa NLP research and bot-based collection
  • Lanfrica — discoverability and knowledge sharing
  • Black in AI — outreach and community amplification
  • Zindi Africa — competition-driven, incentivised annotation

What we're building

Two complementary public goods:

  1. The Playbook — this site. An end-to-end guide covering data collection, annotation design, quality assurance, modality-specific tasks, documentation, governance, evaluation, lifecycle management, and community collaboration. Translated into 6 languages.
  2. MasakhaneTool — an Apache 2.0, mobile-first, offline-capable annotation platform adapted for African contexts: low-bandwidth, multi-script, community-led workflows.

Both are open from day one — there is no closed version, no commercial fork.

Get in touch

How to contribute

We welcome contributions from researchers, practitioners, students, language experts, and translators. Pick whichever fits:

  • Write a chapter — fill a gap in the Playbook. See How to contribute a chapter.
  • Translate — adapt an existing chapter into Hausa, Amharic, Swahili, French, or Portuguese.
  • Review — open issues or PRs against existing chapters; correct technical errors, clarify language, suggest references.
  • Share a case study — a short post in our blog about a real-world Masakhane project.

How to cite

If you reference the Playbook in research, teaching, or a project, please cite it. See the citation page for BibTeX, APA, MLA, Chicago, and the machine-readable CITATION.cff.

Code of conduct

We follow the Contributor Covenant. Be respectful — especially across language and cultural boundaries — and assume good faith. That's the whole point of the Playbook.

License

  • Site content & Playbook chapters — community-maintained, openly licensed (see the repository for full terms)
  • MasakhaneTool annotation platform — Apache 2.0
  • Datasets produced through the project — licensing handled per dataset; see the relevant chapter on documentation and release

Acknowledgments

The Playbook is the result of contributions from researchers, practitioners, students, language experts, and translators across the continent. A full contributors list is maintained in CITATION.cff and on the GitHub repository.