Vallabh

35 posts

Vallabh

Vallabh

@vallabh

Tech, Stocks & Real Estate.

Katılım Aralık 2007
575 Takip Edilen39 Takipçiler
Vallabh retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
While waiting for DeepSeek V4 we got two very strong open-weight LLMs from India yesterday. There are two size flavors, Sarvam 30B and Sarvam 105B model (both reasoning models). Interestingly, the smaller 30B model uses “classic” Grouped Query Attention (GQA), whereas the larger 105B variant switched to DeepSeek-style Multi-Head Latent Attention (MLA). As I wrote about in my analyses before, both are popular attention variants to reduce KV cache size (the longer the context, the more you save compared to regular attention). MLA is more complicated to implement, but it can give you better modeling performance if we go by the ablation studies in the 2024 DeepSeek V2 paper (as far as I know, this is still the most recent apples-to-apples comparison). Speaking of modeling performance, the 105B model is on par with LLMs of similar size: gpt-oss 120B and Qwen3-Next (80B). Sarvam is better on some tasks and worse on others, but roughly the same on average. It’s not the strongest coder in SWE-Bench Verified terms, but it is surprisingly good at agentic reasoning and task completion (Tau2). It’s even better than Deepseek R1 0528. Considering the smaller Sarvam 30B, the perhaps most comparable model to the 30B model is Nemotron 3 Nano 30B, which is slightly ahead in coding per SWE-Bench Verified and agentic reasoning (Tau2) but slightly worse in some other aspects (Live Code Bench v6, BrowseComp). Unfortunately, Qwen3-30B-A3B is missing in the benchmarks, which is, as far as I know, is the most popular model of that size class. Interestingly, though, the Sarvam team compared their 30B model to Qwen3-30B-A3B on a computational performance analysis, where they found that Sarvam gets 20-40% more tokens/sec throughput compared to Qwen3 due to code and kernel optimizations. Anyways, one thing that is not captured by the benchmarks above is Sarvam’s good performance on Indian languages. According to a judge model, the Sarvam team found that their model is preferred 90% of the time compared to others when it comes to Indian texts. (Since they built and trained the tokenizer from scratch as well, Sarvam also comes with a 4 times higher token efficiency on Indian languages.
Sebastian Raschka tweet media
Pratyush Kumar@pratykumar

📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…

English
45
690
4.1K
254.3K
Vallabh
Vallabh@vallabh·
@Geiger_Capital @Citrini7 If margins get eroded and rupee depreciates, any left over white collar will probably shift to India at a fraction of the cost. In this scenario it's a lose lose for everyone.
English
1
0
0
501
Geiger Capital
Geiger Capital@Geiger_Capital·
My favorite part of the @Citrini7 piece India is going to be absolutely decimated due to their entire economy being reliant on providing cheap white-collar workers to the West. Probably spot on, actually.
Geiger Capital tweet media
English
459
580
6.2K
1.2M
Vallabh
Vallabh@vallabh·
@michaelxbloch If that happens and all swe lose their jobs, then nothing will be safe. Expect more competition for the "safe" businesses.
English
0
0
0
20
Michael Bloch
Michael Bloch@michaelxbloch·
This is one of the most underrated observations in tech right now. if AI commoditizes software, what's actually safe? - regulated and liability-bearing businesses (someone has to be on the hook) - anything touching the physical world (hardware, manufacturing, energy) - proprietary data sets (AI makes your data more valuable, not less) - marketplaces and businesses with network effects (liquidity > software) - operationally intense businesses (the "bad" businesses become the best ones) - cybersecurity and physical security (more AI = more attack surface)
BuccoCapital Bloke@buccocapital

For 50 yrs we treated the supremacy of asset-light businesses as a permanent economic law But if AI commoditizes asset-light businesses, we’d just be reverting to the historical mean where value accrued to atoms, infrastructure, energy It would be a 50 year blip. An anomaly

English
78
130
1.4K
356.8K
Vallabh retweetledi
Nick Collins
Nick Collins@nickcollins1953·
Burgeoning Indian trade with the Roman Empire financed almost half the Empire’s defence expenses from Augustus’ reign. When this trade collapsed after the empire-wide plague of 166 which killed perhaps a third of its population, Europe’s maritime world fell silent. By 700 AD, Indian Ocean goods had vanished from Europe. But trade didn’t die - it shifted east. Indian, Persian and Omani merchants forged vast sea routes from the Gulf to China. A 6th century mural at Ajanta depicts a three-masted ship - a design Europe wouldn’t use for another 900 years. An 8th century Tamil text describes ships “bent to the point of breaking” under heaps of spices, pepper, ginger and gems. The 8th century Phanom Surin shipwreck near Thailand carried ivory, antler horn and Abbasid jars - proof that long before Europe’s “Age of Discovery,” the Indian Ocean was already the world’s greatest highway of trade.
Nick Collins tweet media
English
38
850
3K
205.5K
Vallabh retweetledi
VV
VV@visualizevalue·
Starting Again
VV tweet media
English
17
42
264
27.8K
Nozz
Nozz@NoahEpstein_·
Tech Twitter just admitted everyone's building the same AI wrappers. But they buried the real story: A 22-year-old sends $73K invoices monthly selling basic automation to dentists. I tested this "intelligence gap" framework for 3 months and it prints $10K+/month. Here are the 5 steps to your first enterprise client (no coding required):
English
71
27
581
73.3K
Vallabh retweetledi
GDP
GDP@bookwormengr·
Writing this as an Indian who works on AI in leadership role for one the largest companies in the world (though strictly my personal opinion, but based on verifiable data). You heard it first here: —————————- First some more shocks: You heard DeepSeek. Wait till you hear about Qwen (Alibaba), MiniMax, Kimi, DuoBao (ByteDance) all from China. Within China, DeepSeek is not unique and their competition is close behind (not far behind). IMHO, China has 10 labs comparable to OpenAI/Anthropic and another 50 tier 2 labs. The world will discover them in coming weeks in awe and shock. AI is not hard (I am not high) ———————————— Ignore Sam Altman. Many teams that built foundation models are below 50 persons (e.g. Mixtral). In AI, LLM science part is actually quite easy. All these models are “Transformer Decoder only models”, an architecture that was invented in late 2017. There are improvements since then (flash attention, ROPE, MOE, PPO/DPO/GRPO), but they are relatively minor, open source and easy to implement. Since building foundation models is easy and Nvidia is there to help you (if not directly, then by sharing their software like “Megatron” that is assembly line to build AI models) there are so many foundation models built by Chinese labs as well as global labs. It is machines that learn by themselves…if you give them data & compute. This is unlike writing operating system or database software. Also, everyone trains on same data: internet archives, books, github code for the first stage called “pre-training”. What is part is hard then? ———————————- It is the parallel & distributed computing to run AI training jobs across thousands of GPUs that is hard. DeepSeek did lot of innovation here to save on “flops” and network calls. They used an innovative architecture called Mixture of Experts and a new approach called GRPO. with verifiable rewards both of which are in open domain through 2024. Also, there is lot of data curation needed particularly for “post training” to teach model on proper style of answering (SFT/DPO) or to teach them learn to reason (GRPO with verifiable reward). STF/DPO is where “stealing” from existing models to save cost of manual labor may happen. LLM building is nothing that Indian engineers living in India cannot pull off. Don’t worry about Indians who have left. There are plenty in the country as of today. Then why India does not have foundation models? ——————— It is for the same reason India does not have Google or Facebook of its own. You need to able to walk before you can run. There is no protected market to practice your craft in early days. You will get replaced by American service providers as they are cheaper and better every single time. That is not the case with Chinese player. They have a protected market and leadership who treats this skillset as existential due to geopolitics. So, even if Chinese models are not good in early days they will continue to get funding from their conglomerates as well as provincial governments. Darwinian competition ensures best rise to the top. Recall DeepSeek took 2 years to get here without much revenue. They were funded by their parent. Also, most of their engineers are not PHDs. There is nothing that engineers who built Ola/Swiggy/Flipkart cannot build. Remember these services are second to none when you compare them to their Bay Area counterparts. Also , don’t trivialize those services; there is brilliant engineering to make them work at the price points at which they work. Indian DARPA with 3B USD in funding over 3 years ———————- What we need is a mentality that treats this skillset as existential. We need a national fund that will fund such teams and the only expected output will be benchmark performance with benchmarks becoming harder every 6 months . No revenue needed to survive for first 3 years. That money will be loose change for GOI and world’s richest men living in India. @protosphinx @balajis @vikramchandra @naval
English
221
2K
5.6K
810.1K
Vallabh retweetledi
carried_no_interest
carried_no_interest@carrynointerest·
@NiceQuarterGuys agreed: no one in our industry (pe) deserves sympathy. To be clear: wasn’t asking for it I slightly feel bad for the saas CEOs who built a wonderful product, employ tons of people, and were convinced by VCs to value their business in a way that exits are difficult
English
2
2
54
9.2K
Vallabh
Vallabh@vallabh·
@shyamsek The majority of zerodha profits come from f&o traders. Zero brokerage is only for stock trades.
English
0
0
0
268
Vallabh retweetledi
professional hog groomer
professional hog groomer@bidetmarxman·
It is impossible to understand the current existential threat the US feels from China without first understanding what happened to Japan 37 years ago. This is the story of the Plaza Accord 🧵
English
215
3.1K
9.9K
0
Vallabh
Vallabh@vallabh·
US June CPI inflation shocker comes in at 9.1% vs 8.8% estimated. More fed hikes coming!
English
1
0
0
0
Vallabh retweetledi
Epic Maps 🗺️
Epic Maps 🗺️@theepicmap·
Topographic Map of the Indian Subcontinent
Epic Maps 🗺️ tweet media
English
79
1.7K
12.9K
0
Vallabh retweetledi
Universal Curiosity
Universal Curiosity@UniverCurious·
This animated map shows what Earth will look like in 250 million years via @TechInsider
English
570
1.5K
8.9K
0
Eternal fool 🤡
Eternal fool 🤡@passivefool·
What’s something that’s clearly a scam but Indian investors have been conditioned to believe it’s “normal”?
English
30
1
19
0