The Rise of AfricaNLP: 20 Years of Progress in One Paper
When you spend years working inside a research community, you accumulate a sense of how it's changing — but you rarely get to see the change quantified. A new survey we published on arXiv tries to do exactly that for African NLP: map two decades of contributions, contributors, and impact, end to end.
Read the paper: The Rise of AfricaNLP: A Survey of Contributions, Contributors, Community Impact, and Bibliometric Analysis
What we did
We assembled and analysed a dataset of 2.2K NLP papers, 4.9K contributing authors, and 7.8K human-annotated contribution sentences spanning 2005–2025. The goal: examine how the field has grown across:
- Publications and venues — where AfricaNLP work appears, and how that's shifted
- NLP topics and tasks — what's being worked on, and what's missing
- Contributors — authors, institutions, geographic distribution
- Funding and community structures — who funds what, who collaborates with whom
We also built a research explorer tool so the community can query the dataset and track emerging trends without re-doing the bibliometric work.
Why it matters
A field's reputation is often built on a handful of headline projects. The reality is broader and messier — and the only way to see it is to count carefully.
A few things stood out to us in the data (the paper has the full picture):
- The growth in AfricaNLP publications over the last five years is non-linear, but the contribution is unevenly distributed — both across languages and across institutions. Closing those gaps is the field's next decade of work.
- Community-led collaborations like Masakhane, HausaNLP, EthioNLP, and others account for a meaningful share of the recent growth, especially for low-resource languages that don't have institutional NLP labs of their own.
- Annotation, dataset construction, and evaluation remain disproportionately under-cited relative to the engineering effort they require — a structural issue this survey hopes to nudge.
Why this is on the Masakhane Playbook blog
Several of the survey's authors — including the lead authors and reviewers — are active in the Masakhane community. The work intersects directly with what the Playbook documents: how annotation, dataset, and benchmark contributions accumulate over time and shape what's possible.
If you're choosing what to work on next, or framing a grant or a thesis around African-language NLP, we'd encourage you to read the survey, query the explorer, and write back to us with what you find missing. The dataset is open; gaps are an invitation, not a complaint.
Read & contribute
- 📄 Paper: arxiv.org/abs/2509.25477
- 🤝 Discuss the findings: join
#researchin our Discord - ✏️ Suggest corrections to the dataset: the explorer tool accepts community input — see the paper for details
The next two decades of AfricaNLP will look very different if we're deliberate about how we build infrastructure, fund work, and credit contribution. This survey is one snapshot — we'd like to keep taking them.
Cite the survey:
Belay, T. D., Hussen, K. Y., Imam, S. H., Ahmad, I. S., Inuwa-Dutse, I., Haile, A. B., Sidorov, G., Vazquez, E. R., Ameer, I., Abdulmumin, I., Gwadabe, T., Marivate, V., Yimam, S. M., & Muhammad, S. H. (2026). The Rise of AfricaNLP: A Survey of Contributions, Contributors, Community Impact, and Bibliometric Analysis. arXiv:2509.25477.
