Play
Pause
Explore and listen to the sounds of Kiswahili, a major lingua franca of East and Central Africa.
Kiswahili is a major lingua franca of East Africa used in schools, work, in the media (newspapers, radio, TV) and popular culture. In 2022, it was adopted as an official working language of the African Union.
Transcript: Habari, unaendeleaje? (Hello, how are you doing?)
Modern Swahili is written in the Latin alphabet and uses straightforward, phonemic spelling where most letters map closely to the sounds they represent. Common digraphs like ch, sh, ng, ng’ and ny capture sounds that don’t have single-letter equivalents. Historical texts also appear in Ajami (Arabic script), but contemporary education, media, and government overwhelmingly use the Roman script. Swahili does not mark tone, and long vowels are usually not indicated unless a teaching text or dictionary needs to highlight pronunciation..
Transcript: Habari, unaendeleaje? (Hello, how are you doing?)
Transcript: Habari, unaendeleaje? (Hello, how are you doing?)
Transcript: Habari, unaendeleaje? (Hello, how are you doing?)
Swahili is under‑represented online; adding clear, community‑reviewed audio can help build tools for local broadcasting, education, and assistive technologies.
Speech recognition datasets teach AI systems to accurately understand and transcribe African languages. By training models on diverse accents and tones, we make voice technology more inclusive and effective for real-world communication.
Access high-quality Swahili translation datasets featuring paired text and voice samples. These resources support language research, model training, and cultural preservation. Sign in to request access or contribute your own translations.
Information access datasets help AI systems bridge the language gap, making online knowledge, education, and public information available in African languages. They promote digital inclusion and empower communities through localized, AI-driven access to information.
Version: 1.0
Size: 2GB
License: CC-BY 4.0
DOI: 10.1234/swahili.001
Version: 1.0
Size: 2GB
License: CC-BY 4.0
DOI: 10.1234/swahili.001

Our platform digitally preserves Africa’s rich linguistic diversity by collecting audio, text, and community contributions to build a comprehensive database for research, learning, and AI model training.
Contact us if interested in collaborations.
© 2025 All Rights Reserved.
Request access to the Ogiek language datasets. Sign in to view and download curated audio and text resources for AI research, language preservation, and educational purposes.
Contribute your recordings and transcripts to help preserve the Ogiek language. Submit audio, text, and consent forms to support AI research, education, and cultural preservation.