
CognitiveLab
24 posts

CognitiveLab
@cognitivelab_ai
Democratizing Generative AI


🚀 Thrilled to announce CognitiveLab wins the six-figure Llama Impact Grant by @Meta ! 🇮🇳 As India's only recipient, we're powering Nayana a multimodal, multilingual multi task AI model family! #AI #MultilingualAI #Innovation










It's going to be hard to adapt Llama3 for Indic languages, in my opinion. Here are a few reasons why: 👉🏼 The tokenizer used is TikToken-based, which is not really efficient in tokenizing Indic text despite having a vocabular size of 121k. 👉🏼 unlike sentence-piece based models, it might be difficult to perform vocabulary expansion. I ran some quick tokenization on Devanagari and Kannada. It performed better than Llama 2 with Devanagari. But with other Indic languages like Kannada and Tamil,etc it does not tokenize efficiently. Here are some quick comparison screenshots. I will have to do some more testing to see its base level peformace will try to soon add it to the Indic LLM leaderboard huggingface.co/spaces/Cogniti…






