Language Justice in AI. Orange, Meta & OpenAI’s Model for African Languages.

1. Orange, OpenAI & Meta Join the Language Inclusion Race

writing, lead set, letterpress, gutenberg, letters, brief, case, typography, high pressure, set of quantities, rows, set, words, language, writing, words, words, words, language, language, language, language, language

In November 2024, Reuters reported that Orange would collaborate with OpenAI and Meta to adapt AI large language models (LLMs) to support regional African languages. The initiative plans to build on models like OpenAI’s Whisper (speech recognition) and Meta’s Llama (text) systems, starting with languages in West Africa such as Wolof and Pulaar.

More recently, in August 2025, Reuters reaffirmed this work, stating that it intends to make its custom models freely available to governments and public institutions. Steve Jarrett, Orange’s Chief AI Officer, described it as a blueprint for digital inclusion, “collaborating with local startups and communities” so that African languages become first-class citizens in AI.

Earlier, Orange had struck deals to access pre-release AI models from OpenAI, and committed data samples (in Wolof, Pular) to train localized versions of Whisper and Llama. These developments matter not just to tech enthusiasts, but to millions of Africans whose languages have long been excluded from the digital conversation.

2. Why Language Justice Matters in Africa

Over 2,000 African languages exist across the continent, many spoken by millions yet excluded in AI systems.
Language is identity, cognition, and culture. If AI tools don’t understand local idioms or dialects, they misinterpret meaning, reduce trust, and exclude users.
Many public services like health, agriculture, legal are delivered via digital interfaces. If they ignore the language needs of rural or marginalized groups, those populations lose access.
Language exclusion replicates colonial legacies: English, French, Arabic, and other colonial languages dominate global tech. Justice demands inversion-technology that centers local tongues.

Orange’s move is notable because it’s one of only a handful of large-scale efforts by a telecom company to localize AI beyond token translation.

3. Technical and Ethical Challenges

3.1 Data Scarcity & Quality

Many African languages lack large corpora: text, audio, annotated datasets. Training robust models demands volume, diversity, and quality.
For example, a proof-of-concept Wolof voicebot jointly built by Orange Senegal and partners achieved approximately 22% word error rate (WER) on ASR, and 78% F1 on NLU tasks—encouraging, but not production-grade.
Read the paper

3.2 Dialects, Code-Switching & Phonetics

Even within a language like Wolof or Pulaar, regional dialects, loanwords, and code-switching (mixing local language + English/French) present challenges. Modeling must account for this variation rather than assuming linguistic uniformity.

3.3 Ownership, Consent & Governance

Who collects voice and text samples? Under what terms? Are speakers compensated or credited?
Past data extraction practices have often sidelined community participation or benefit, repeating patterns of digital colonialism.

3.4 Transparency & Audibility

When systems translate or transcribe, users should know how decisions were made; what model weights, training datasets, and error rates. These transparency features are rare in proprietary AI.

3.5 Deployment, Hosting & Compute

Will the models be served locally (on-device) or via cloud? Local hosting supports sovereignty, data privacy, and lower latency; but demands local infrastructure, energy, and compute capacity.

4. Supporting Research & Models in African Languages

Woman sitting on carpet with laptop, surrounded by bookshelves and plants, focused on study.

Preuve de concept d’un bot vocal dialoguant en Wolof: voice assistant built in Senegal using ASR + NLU, with results noted above.
Cheetah: Natural Language Generation for 517 African Languages: demonstrates that large-scale multilingual generation is possible with optimized techniques.
AfroDigits: A Community-Driven Spoken Digit Dataset: community-sourced speech dataset for 38 African languages—targeting minimal tasks like digit recognition but foundational for future speech models.
The African Languages Lab (All Lab) (2025): collects multi-modal text/speech corpora across approximately 40 languages, aiming to close resource gaps.

These efforts illustrate both the promise and the resource gap: models can be built, but only with sustained investment, open governance, and local capacity.

5. Reconciling the Promise & the Peril: Risks of Poorly Designed Localizations

Algorithmic homogenization: Reducing complexity to simple translation might erase cultural nuance.
Overfitting & brittleness: Models might work for test data but fail in real-world, noisy settings.
Data sovereignty erosion: If Orange or OpenAI exerts too much control over localized models, communities may lose agency in language representation.
Maintenance burden: Models drift over time (new slang, new usage). Without ongoing funding or community maintenance, they degrade.

6. Policy & Institutional Imperatives for Africa

6.1 National AI Strategies Must Prioritize Language Inclusion

Design documents, regulatory frameworks, and funding budgets should explicitly require local languages in all AI systems deployed by governments.

6.2 Open Licensing & Auditable Models

Localized models should carry open or permissive licenses (e.g., Apache, MIT) and come with transparency reports, model cards, and evaluation metrics by language.

6.3 Community-Driven Data Collection

Set up language labs in universities, communities, and civic tech groups to collect, validate, and annotate local language data with consent and attribution.

6.4 Regional Collaboration & Standards

Bodies like Smart Africa, the African Union, and African Languages Research Institutes should coordinate interoperability, standards, and resource sharing.

6.5 Funding & Capacity Building

Governments, donors, and private sector must invest not just in model building, but in computing infrastructure, training programs, and maintenance ecosystems.

7. You Might be Wondering: What Does This Mean for Governance?

1. Data Governance: Ownership, Consent, and Localization

When companies like Orange, Meta, and OpenAI collect voice or text data in African languages, they shape the future of those languages in AI.
Governance implication: Who owns this data?
Without clear data governance laws, there’s a risk of digital colonialism, where local voices train global models but communities have no control or benefit.

Policy entry points:

Governments can require data sovereignty clauses in any AI or telco partnership.
Mandate community consent and fair data sharing agreements before collection.
Require data collected locally to be stored on national servers under local jurisdiction.

2. National AI Strategies and Language Inclusion

Most African countries’ AI policies (if they exist) focus on innovation, startups, or economic growth — not inclusion.
Language justice makes a strong case for embedding equity and cultural representationin AI strategies.

Policy entry points:

Explicitly include language inclusion goals in national AI strategies.
Require public-sector AI tools (e.g., e-governance, education, health apps) to support local languages.
Fund National Language AI Programs that build open datasets and models for major local languages.

3. Ethical and Transparent Partnerships

Telcos and Big Tech partnerships (like Orange–OpenAI) should not operate in a policy vacuum.
Without transparency mandates, governments and citizens won’t know:

How models are trained
Which data is used
What biases or limitations exist

Policy entry points:

Introduce AI transparency laws requiring model documentation (datasets, metrics, error rates).
Establish independent AI ethics review boards that oversee such partnerships.
Encourage public–private–community agreements that give universities and civic tech groups oversight roles.

4. Education, Research, and Capacity Building

Governments can catalyze homegrown innovation. Most AI research for African languages is happening outside Africa.

Policy entry points:

Fund language technology labs at universities.
Offer grants for NLP research in underrepresented languages.
Build national AI fellowships for researchers and linguists.

5. Regional Collaboration and Standards

Language data often crosses borders; Hausa, Swahili, Yoruba, Wolof are regional, not just national.

Policy entry points:

The African Union (AU) or Smart Africa Alliance can set continental standards for language data collection, licensing, and model evaluation.
Promote interoperable datasets and open repositories that serve multiple countries.

6. Civic Participation and Accountability

AI governance shouldn’t be top-down. People whose languages are being digitized deserve a say.

Policy entry points:

Mandate public consultations before adopting national AI frameworks.
Create community councils to advise on ethical data use and inclusivity.
Protect freedom of expression and digital rights in all AI systems that process human speech.

Long Story Short:

Language justice connects directly to governance because language is political power; it defines who gets to participate in the digital state.
If governments don’t step in now, the governance of African languages in AI will be decided by corporations.
But if they act strategically, they can make Africa a leader in ethical, inclusive, multilingual AI governance.

8. Recommendations

Yellow gloved hand making OK sign against a blue background, symbolizing approval.

Launch National Language AI Funds to subsidize corpora creation, baseline models, and community annotation efforts.
Require Explainability & Error Disclosure by default in any AI model deployed in local language settings.
Mandate Inclusive Testing across gender, dialect, and rural/urban users during model evaluation.
Fund Local NLP Labs in universities in language regions (e.g., Yoruba, Igbo, Hausa, Fulfulde, Wolof).
Open Public Audits of localized models—hold regular third-party reviews of performance drift, bias, and errors.

9. Conclusion: Making Voices Heard

Language justice is more than translation; it’s representation, dignity, and power in the AI era.
Orange’s work with OpenAI and Meta is a promising step, but unless African communities hold the reins, we risk repeating old models of extraction and exclusion.

The future of AI in Africa must speak with our tongues, not about us.
To write that future, we must invest in data, governance, equity, and ensure every voice is heard.

Language Justice in AI. Orange, Meta & OpenAI’s Model for African Languages.

1. Orange, OpenAI & Meta Join the Language Inclusion Race

2. Why Language Justice Matters in Africa