
Nadhari AI Lab
16 posts

Nadhari AI Lab
@NadhariAI
AI research lab advancing frontier AI research and applications in Sub-Saharan Africa (and beyond).





Introducing the Swahili Thinking Dataset. Excited to release the first open-source chain-of-thought reasoning dataset for Swahili. Following OpenAI's Harmony response format, the dataset comprises of high-quality Swahili conversational AI responses along with their chain-of-thought. While such datasets exist for English, French, Spanish, e.t.c, there were no publicly accessible high-quality reasoning datasets for African languages. Until now!! This dataset enables researchers and developers to build Swahili language models with native reasoning capabilities, advancing AI for 200+ million Swahili speakers. Release announcement: nadhari.ai/swahili-thinki… Dataset: huggingface.co/datasets/Nadha… The dataset built upon the excellent work by @huggingface H4's Multilingual-Thinking dataset. We intend to extend the dataset in the future and we welcome further contributions to the dataset.




O'Shaughnessy Ventures Backs 22 Innovators With $220,000 in Grant Funding prnewswire.com/news-releases/…



Introducing Gemma-3n-Swahili preview: In the past two weeks I have been working on Swahili variants of Gemma-3n. The Gemma-3n models are multimodal and have a very efficient architecture enabling them to run locally on most devices, which is amazing. However, we found out the model, while they have a fundamental Swahili language understanding and text generation, at times they make-up non-existent words and fail on basic Swahili prompts. 🧵









