Voice of Bukusu (Lubukusu)

Explore the sounds of Bukusu, a language widely spoken in Western Kenya, especially in Bungoma and Trans‑Nzoia, with connections to related speech across the Kenya–Uganda border.

Overview

Bukusu is a Bantu language spoken primarily by the Bukusu people, a sub-group of the Luhya ethnic community in western Kenya. It is spoken by  over 2.5 million speakers in Kenya and Uganda, and closely related to Bagisu and Masaba in Uganda. The Bukusu trace their roots to Central Africa and are believed to have migrated from the Congo Basin during the Bantu migrations. They have a clan-based system, with strong age-set traditions and communal rites of passage. Culturally, they are known for initiation ceremonies, rich oral traditions, and traditional governance through elders..

Sample Audio

Transcript: Habari, unaendeleaje? (Hello, how are you doing?)

Writing system

The Bukusu writing system uses the Latin alphabet and follows a phonemic approach, meaning that words are spelled the way they sound. In written Bukusu, combinations of letters (like digraphs) and the use of “n” before certain consonants help represent common sounds and clusters found in everyday speech. To show vowel length, especially when it affects meaning or rhythm, some writers repeat the vowel (e.g., “aa”, “oo”). Although Bukusu is a tonal language, tone markings are usually left out in everyday writing. Tonal information typically appears only in academic or linguistic texts. Most community publications like folktales, notices, or educational materials, aim for a simple, tone-free style that’s easy to read and widely accessible.

What’s Here Now

Urban Dialogue

Transcript: Habari, unaendeleaje? (Hello, how are you doing?)

Market Talk

Transcript: Habari, unaendeleaje? (Hello, how are you doing?)

Community Radio

Transcript: Habari, unaendeleaje? (Hello, how are you doing?)

Why It Matters
for AI

Bukusu is under‑represented online; adding clear, community‑reviewed audio can help build tools for local broadcasting, education, and assistive technologies.

Speech Recognition

Speech recognition datasets teach AI systems to accurately understand and transcribe African languages. By training models on diverse accents and tones, we make voice technology more inclusive and effective for real-world communication.

Translation

Access high-quality Bukusu translation datasets featuring paired text and voice samples. These resources support language research, model training, and cultural preservation. Sign in to request access or contribute your own translations.

Information Access

Information access datasets help AI systems bridge the language gap, making online knowledge, education, and public information available in African languages. They promote digital inclusion and empower communities through localized, AI-driven access to information.

Bukusu Datasets

Bukusu Corpus v1.0

Version: 1.0
Size: 2GB
License: CC-BY 4.0
DOI: 10.1234/swahili.001

Bukusu Corpus v1.0

Version: 1.0
Size: 2GB
License: CC-BY 4.0
DOI: 10.1234/swahili.001

Our platform digitally preserves Africa’s rich linguistic diversity by collecting audio, text, and community contributions to build a comprehensive database for research, learning, and AI model training.

Collaborators

Contact us if interested in collaborations. 

© 2025 All Rights Reserved.

Scroll to Top

Request Access

Request access to the Ogiek language datasets. Sign in to view and download curated audio and text resources for AI research, language preservation, and educational purposes.

Request Access

Contribute Data

Contribute your recordings and transcripts to help preserve the Ogiek language. Submit audio, text, and consent forms to support AI research, education, and cultural preservation.

Contribution form