Joseph
390 posts

Joseph
@RealJosephus
How dare I teach robots how to learn.



Interesting Baidu has a better OCR than Whale



Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices. thinkingmachines.ai/blog/modular-m… We explore a fundamental understanding of the geometry of neural network optimization.



Honestly, it's sickening to see people with no linguistics background pontificating on the future of LLMs, or those with no neuroscience background holding forth on AI and AGI as if they're experts.




Tools & Agents Upgrades 🧰 📈 Better results on SWE / Terminal-Bench 🔍 Stronger multi-step reasoning for complex search tasks ⚡️ Big gains in thinking efficiency 3/5



if you loved kimi k2, you will love what a certain chinese team is about to release which is highly competitive with 1M context length



Announcing 🎙️ Kimi-Audio! Our new open-source audio foundation model advances capabilities in audio understanding, generation, and conversation. Key Features & Achievements: ✅ Universal audio foundation model handles diverse tasks like speech recognition, audio understanding, audio-to-text chat, speech-to-speech conversation. ✅ Large-scale pre-training on >13 Million hours of diverse audio data (speech, music, sounds). ✅ Unique 12.5Hz tokenizer & hybrid architecture for rich perception and efficient generation. ✅ SOTA on 10+ audio benchmarks: excels in Speech Recognition (LibriSpeech 1.28/2.42 WER), Audio Understanding (MMAU, VocalSound), and Conversation (VoiceBench). We're also releasing our comprehensive evaluation toolkit to foster fair benchmarking! 🛠️ 📄Dive into the details in our Technical Report: github.com/MoonshotAI/Kim… 🌟Explore the Code, Models & Eval Toolkit on GitHub: github.com/MoonshotAI/Kim… HuggingFace: huggingface.co/moonshotai/Kim… Excited to see the innovative audio applications the community will build!

This suggests that, in reality, NO 'serious' LLM training is actually centered around the Hugging Face ecosystem - many who claim to surpass Meta LLaMA3.1, don't even know how to train a model properly - script kiddies








