
Arjun Reddy
324 posts

Sabitlenmiş Tweet

We liberated @claudeai opus distilled @Alibaba_Qwen 3.6 27B by @KyleHessling1 & Jackrong with Heretic abliteration tool kit. Quantized models with vision preserved:
huggingface.co/osmapi/osmQwop…
huggingface.co/osmapi/osmQwop…
huggingface.co/osmapi/osmQwop…
Thanks to @osmAPI_off @TervPro 's student research team
English
Arjun Reddy retweetledi

A must attend webinar on what to build with our SoTA speech recognition model and the recent upgrades on diarisation and accuracy.
Sarvam@SarvamAI
We're hosting a webinar on Saaras V3, our speech-to-text model for teams building Voice AI for India. Date: Thursday, May 21 Time: 5:00 PM IST In this session, we'll break down what it takes to ship Voice AI that works reliably across noisy environments, regional accents, mixed-language speech, and multiple speakers. Register here: links.sarvam.io/speech-to-text…
English

@ajay_2512x I’m a co-founder of this IIT incubated AI Healthcare company- Ohm.Doctor how can I be of service?
English

@jun_song At osmAPI.com we are working on exactly this and guess what, we have so many people contributing from Apple Studios and Clusters
English

Some guy really said Macs are bad for local LLMs.
Zero heat compared to GPUs, no PC building hassle, and unified memory lets you run massive models.
Plus, the power bill is cut in half—that’s hundreds of dollars saved a month.
macOS is basically perfect for agent workflows, and MLX is clearly the winning format right now.
The ONLY downside is prefill speed.
Name a better $4k setup than the upcoming M5 Max Mac Studio (128GB RAM, 600GB/s bandwidth). You can’t.
Recommending a DGX Spark instead is wild.
Good luck running agents properly on 273GB/s. (And yeah, I don't recommend the Mac Mini either, bandwidth is too low).
If you're going to criticize, at least do your research first.
English

@jun_song @dealignai Thank you for your sincere efforts Jun! Fitting either MiniMax M3 or K2.6 on a 128GB MBP (I’ve two of these so I’m extra happy!) would really change the way people see local AI vs $200 per month max plans
English

In few weeks, everyone with 128gb Mac will have uncensored Opus-4.6 locally.
It will be Minimax-M3.0-JANGTQ-CRACK by @dealignai
The open-source community is working hard on fitting them into 24GB VRAM.
The future of Local LLM is so bright.
English

@Alibaba_Qwen We are @osmAPI_off , the OpenRouter of India building a strong user base for Qwen models here, we’d love to be a Qwen Ambassador
English

📣We're calling for ambassadors!
Whether you're a developer with great technical taste or a local community leader who loves bringing people together, we'd love to have you join us.
Visit the website below for more details and to apply. In return, ambassadors will receive early access to Qwen models, API credits, annual merchandise, and more.
Come and check it out!👇
qwen.ai/ambassador

English

@OfficialLoganK We run osmAPI.com and would love to partner to serve our 30k+ college student users together
English

Tencent has killed fine-tuning and RL with a $18 budget.
Right now, if you want an AI agent to become an expert at a specific, complex real-world task, you have to use Reinforcement Learning.
You let it try, fail, and update its internal parameters over and over again.
This is the exact optimization technique (GRPO) that DeepSeek used to build their massive reasoning models.
But there is a massive problem.
Updating model weights is insanely expensive. It requires massive GPU clusters. And worst of all, when you train a model to be highly specialized at one thing, it often "overfits" and forgets how to be good at everything else.
Tencent killed this bottleneck forever.. by building Training-Free GRPO.
Instead of spending thousands of dollars to permanently alter the AI's brain, they asked a simple question: What if we just distill the experience of learning, and inject it as a memory?
Here is how it works.
They run the AI through the exact same trial-and-error process. But instead of updating the weights, they extract the "semantic advantage"—the actual logic of why one answer was better than another.
They compress this winning logic into a "token prior”, a tiny package of high-quality experiential knowledge.
Then, they just attach that knowledge directly into the API call.
The results are staggering.
Tested on DeepSeek-V3, this method required only a few dozen training samples to turn the AI into a specialized expert in complex math and web searching.
It didn't just compete with models that were actually fine-tuned. It outperformed them.
Zero parameter updates. Zero expensive training runs. Zero base-model amnesia.

English

Is it true that @deepseek_ai is dropping prices like it’s hot because they wanna drive up their traction, ipso facto - valuation before their merger with @Kimi_Moonshot ?
English

fully automated end to end pipeline ready and deployed
scaled to 22 workers w/ one master
now just gonna let it run forever

ron@RonxldWilson
how I built a search engine from scratch here's what I have been building over the course of last month resulting in visiting of over 55 million unique domains 130 GB of sqlite DB, 200 million rows and over 4 million unique Indian B2B businesses 1/n
English

@bindureddy Once the context length of 1M without rot is perfect in Kimi K3, there’ll be an exodus from OpenAI and Anthropic. Enterprise AI will work towards making it LTS
English

@AlicanKiraz0 Salt of the earth believers of opensource future, helping us fight closed AGI Overlords! Thank you brother
English

@jun_song @Kimi_Moonshot and @Alibaba_Qwen as they are the only ones that make agentic ready models with vision. While we are happy that there are amazing GLM, MiniMax and DeepSeek opensource releases, it would awesome to have more vision ready models
English

@runsonai Maybe this is a stupid question but can it also handle some Claude opus distill fine tunes?
English

I open-sourced DDTree-MLX: tree-based speculative decoding for Apple Silicon.
Now you can run Qwen 3.5 27b on your Apple machines 1.5x faster than normal. Expect even faster on smaller models.
It runs Qwen 3.5 27B locally with MLX, extends DFlash with draft trees, and gets ~10-15% faster than DFlash alone on code + structured prompts while keeping output lossless.
Built on the works of @bstnxbt @liranringel @yaniv_romano
github.com/humanrouter/dd…
English

Hi @SarvamAI @SarvamForDevs
We used TurboQuant to shrink Sarvam30B ~3.5 times smaller while losing only 2.5% performance compared to BF16
English

Compress the model. Keep the intelligence.
Sarvam built a remarkable 32-billion parameter model for a billion Indian language speakers. Our job was making it run anywhere.
64 gigabytes is a lot of weight to carry. We applied TurboQuant. The core insight is simple. Rotate the weights so every coordinate carries equal information. Now each number looks like every other number. No outliers. No wasted bits. Quantize each to one of eight optimal values. Three bits instead of sixteen. Add two tiny scale factors per block to recover the fine detail.
The experts — 90% of the model — got the aggressive treatment. Attention weights got slightly more room. Router weights stayed untouched. Routing is a discrete decision. You don't approximate discrete decisions.
64 gigabytes became 18.6. One GPU instead of four. Cosine similarity above 0.99. The weights changed. The intelligence didn't.
Specific knowledge plus leverage. Know which weights can compress. Apply the math. Ship it
English












