Vinay Babu

6.7K posts

Vinay Babu banner
Vinay Babu

Vinay Babu

@min2bro

Foodie|Love Music| Blogger | Data Scientist | Responsible for what i say, not for what u understand and ofcourse I’m real and I hope someone I following are too

India Katılım Ocak 2010
434 Takip Edilen173 Takipçiler
Vinay Babu retweetledi
How To AI
How To AI@HowToAI_·
The entire RAG industry is about to get cooked. Researchers have built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. It's called PageIndex. Instead of chunking your docs and stuffing them into pinecone, it builds a tree index and lets the LLM reason through it like a human reading a book. hit 98.7% on financebench. beats every vector RAG on the leaderboard. no embeddings. no chunking. no vector DB. 100% open source.
How To AI tweet media
English
224
780
6.9K
611.5K
Vinay Babu retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I spent more test time compute and realized that my micrograd can be dramatically simplified even further. You just return local gradients for each op and get backward() to do the multiply (chaining) with global gradient from loss. So each op just expresses the bare fundamentals of what it needs to: the forward computation and the backward gradients for it. Huge savings from 243 lines of code to just 200 (~18%). Also, the code now fits even more beautifully to 3 columns and happens to break just right: Column 1: Dataset, Tokenizer, Autograd Column 2: GPT model Column 3: Training, Inference Ok now surely we are done.
Andrej Karpathy tweet media
English
90
176
2.6K
265.5K
Vinay Babu retweetledi
Chubby♨️
Chubby♨️@kimmonismus·
This is such an important article: "Because the honest version sounds like I've lost my mind. And for a while, I told myself that was a good enough reason to keep what's truly happening to myself. But the gap between what I've been saying and what is actually happening has gotten far too big." 2026 is the year when AI becomes tangible for everyone. And Matt Shumer has summarized it in a way that everyone, especially those who aren't living in the bubble like us, should read.
Matt Shumer@mattshumer_

x.com/i/article/2021…

English
21
22
413
58.7K
Vinay Babu retweetledi
Lorwen Harris Nagle, PhD
Lorwen Harris Nagle, PhD@LORWEN108·
I’m American. After my PhD, I went to India. What I experienced dismantled my Western worldview. Here are 8 lessons that permanently rewired how I see life:
Lorwen Harris Nagle, PhD tweet mediaLorwen Harris Nagle, PhD tweet media
English
54
123
1.1K
207K
Vinay Babu retweetledi
Andrew Ng
Andrew Ng@AndrewYNg·
Job seekers in the U.S. and many other nations face a tough environment. At the same time, fears of AI-caused job loss have — so far — been overblown. However, the demand for AI skills is starting to cause shifts in the job market. I’d like to share what I’m seeing on the ground. First, many tech companies have laid off workers over the past year. While some CEOs cited AI as the reason — that AI is doing the work, so people are no longer needed — the reality is AI just doesn’t work that well yet. Many of the layoffs have been corrections for overhiring during the pandemic or general cost-cutting and reorganization that occasionally happened even before modern AI. Outside of a handful of roles, few layoffs have resulted from jobs being automated by AI. Granted, this may grow in the future. People who are currently in some professions that are highly exposed to AI automation, such as call-center operators, translators, and voice actors, are likely to struggle to find jobs and/or see declining salaries. But widespread job losses have been overhyped. Instead, a common refrain applies: AI won’t replace workers, but workers who use AI will replace workers who don’t. For instance, because AI coding tools make developers much more efficient, developers who know how to use them are increasingly in-demand. (If you want to be one of these people, please take our short courses on Claude Code, Gemini CLI, and Agentic Skills!) So AI is leading to job losses, but in a subtle way. Some businesses are letting go of employees who are not adapting to AI and replacing them with people who are. This trend is already obvious in software development. Further, in many startups’ hiring patterns, I am seeing early signs of this type of personnel replacement in roles that traditionally are considered non-technical. Marketers, recruiters, and analysts who know how to code with AI are more productive than those who don’t, so some businesses are slowly parting ways with employees that aren’t able to adapt. I expect this will accelerate. At the same time, when companies build new teams that are AI native, sometimes the new teams are smaller than the ones they replace. AI makes individuals more effective, and this makes it possible to shrink team sizes. For example, as AI has made building software easier, the bottleneck is shifting to deciding what to build — this is the Product Management (PM) bottleneck. A project that used to be assigned to 8 engineers and 1 PM might now be assigned to 2 engineers and 1 PM, or perhaps even to a single person with a mix of engineering and product skills. The good news for employees is that most businesses have a lot of work to do and not enough people to do it. People with the right AI skills are often given opportunities to step up and do more, and maybe tackle the long backlog of ideas that couldn’t be executed before AI made the work go more quickly. I’m seeing many employees in many businesses step up to build new things that help their business. Opportunities abound! I know these changes are stressful. My heart goes out to every family that has been affected by a layoff, to every job seeker struggling to find the role they want, and to the far larger number of people who are worried about their future job prospects. Fortunately, there’s still time to learn and position yourself well for where the job market is going. When it comes to AI, the vast majority of people, technical or nontechnical, are at the starting line, or they were recently. So this remains a great time to keep learning and keep building, and the opportunities for those who do are numerous! [Original text; deeplearning.ai/the-batch/issu… ]
English
228
591
2.9K
482.1K
Vinay Babu retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
nanochat can now train GPT-2 grade LLM for <<$100 (~$73, 3 hours on a single 8XH100 node). GPT-2 is just my favorite LLM because it's the first time the LLM stack comes together in a recognizably modern form. So it has become a bit of a weird & lasting obsession of mine to train a model to GPT-2 capability but for much cheaper, with the benefit of ~7 years of progress. In particular, I suspected it should be possible today to train one for <<$100. Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc. As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year. I think this is likely an underestimate because I am still finding more improvements relatively regularly and I have a backlog of more ideas to try. A longer post with a lot of the detail of the optimizations involved and pointers on how to reproduce are here: github.com/karpathy/nanoc… Inspired by modded-nanogpt, I also created a leaderboard for "time to GPT-2", where this first "Jan29" model is entry #1 at 3.04 hours. It will be fun to iterate on this further and I welcome help! My hope is that nanochat can grow to become a very nice/clean and tuned experimental LLM harness for prototyping ideas, for having fun, and ofc for learning. The biggest improvements of things that worked out of the box and simply produced gains right away were 1) Flash Attention 3 kernels (faster, and allows window_size kwarg to get alternating attention patterns), Muon optimizer (I tried for ~1 day to delete it and only use AdamW and I couldn't), residual pathways and skip connections gated by learnable scalars, and value embeddings. There were many other smaller things that stack up. Image: semi-related eye candy of deriving the scaling laws for the current nanochat model miniseries, pretty and satisfying!
Andrej Karpathy tweet media
English
331
621
7.4K
1.3M
Vinay Babu retweetledi
Dan Kornas
Dan Kornas@DanKornas·
Hands down, this is the clearest and most effective explanation of Transformers on YouTube. The lecture was delivered by Professor Bryce as part of Davidson’s CSC 381: Deep Learning (Fall 2022). If your goal is to truly understand Transformers and self-attention, this is the only video you need and there’s no reason to watch anything else.
Dan Kornas tweet media
English
39
482
4.5K
266.2K
Vinay Babu retweetledi
Simons
Simons@Simon_Ingari·
I asked my father what kept him motivated to serve in the corporate sector for 35 years. He replied: Keep your vacations planned. Twice or thrice a year, short or long trips. You need them because 9-5 is exhausting and drains the life out of you. At least there should be something to look forward to. Me: Hey Papa, where do I get the leave? Dad: I have served as the head of the HR department for so many years, and the only tip I can share with you regarding your leave is that vacations aren't requested; they're informed. You need to tell them that you won't be available from this day to this day. No manager will happily approve your leave if you ask them. He is the manager for a reason. Damn! Okay.
English
35
842
6.3K
363.1K
Vinay Babu
Vinay Babu@min2bro·
@AirIndiaX Is a mini computer like Apple Mac-mini allowed in hand baggage?
English
1
0
0
33
Vinay Babu retweetledi
Andrew Ng
Andrew Ng@AndrewYNg·
Is there an AI bubble? With the massive number of dollars going into AI infrastructure such as OpenAI’s $1.4 trillion plan and Nvidia briefly reaching a $5 trillion market cap, many have asked if speculation and hype have driven the values of AI investments above sustainable values. However, AI isn’t monolithic, and different areas look bubbly to different degrees. - AI application layer: There is underinvestment. The potential is still much greater than most realize. - AI infrastructure for inference: This still needs significant investment. - AI infrastructure for model training: I’m still cautiously optimistic about this sector, but there could also be a bubble. Caveat: I am absolutely not giving investment advice! AI application layer. There are many applications yet to be built over the coming decade using new AI technology. Almost by definition, applications that are built on top of AI infrastructure/technology (such as LLM APIs) have to be more valuable than the infrastructure, since we need them to be able to pay the infrastructure and technology providers. I am seeing many green shoots across many businesses that are applying agentic workflows, and am confident this will grow. I have also spoken with many Venture Capital investors who hesitate to invest in AI applications because they feel they don’t know how to pick winners, whereas the recipe for deploying $1B to build AI infrastructure is better understood. Some have also bought into the hype that almost all AI applications will be wiped out merely by frontier LLM companies improving their foundation models. Overall, I believe there is significant underinvestment in AI applications. This area remains a huge focus for my venture studio, AI Fund. AI infrastructure for inference. Despite AI’s low penetration today, infrastructure providers are already struggling to fulfill demand for processing power to generate tokens. Several of my teams are worried about whether we can get enough inference capacity, and both cost and inference throughput are limiting our ability to use even more. It is a good problem to have that businesses are supply-constrained rather than demand-constrained. The latter is a much more common problem, when not enough people want your product. But insufficient supply is nonetheless a problem, which is why I am glad our industry is investing significantly in scaling up inference capacity. As one concrete example of high demand for token generation, highly agentic coders are progressing rapidly. I’ve long been a fan of Claude Code; OpenAI Codex also improved dramatically with the release of GPT-5; and Gemini 3 has made Google CLI very competitive. As these tools improve, their adoption will grow. At the same time, overall market penetration is still low, and many developers are still using older generations of coding tools (and some aren’t even using any agentic coding tools). As market penetration grows — I’m confident it will, given how useful these tools are — aggregate demand for token generation will grow. I predicted early last year that we’d need more inference capacity, partly because of agentic workflows. Since then, the need has become more acute. As a society, we need more capacity for AI inference. Having said that, I’m not saying it’s impossible to lose money investing in this sector. If we end up overbuilding — and I don’t currently know if we will — then providers may end up having to sell capacity at a loss or at low returns. I hope investors in this space do well financially. The good news, however, is that even if we overbuild, this capacity will get used, and it will be good for application builders! AI infrastructure for model training. I am happy to see the investments going into training bigger models. But, of the three buckets of investments, this seems the riskiest. If open-source/open-weight models continue to grow in market share, then some companies that are pouring billions into training models might not see an attractive financial return on their investment. Additionally, algorithmic and hardware improvements are making it cheaper each year to train models of a given level of capability, so the “technology moat” for training frontier models is weak. (That said, ChatGPT has become a strong consumer brand, and so it enjoys a strong brand moat, while Gemini, assisted by Google's massive distribution advantage, is also making a strong showing.) I remain bullish about AI investments broadly. But what is the downside scenario — that is, is there a bubble that will pop? One scenario that worries me: If part of the AI stack (perhaps in training infra) suffers from overinvestment and collapses, it could lead to negative market sentiment around AI more broadly and an irrational outflow of interest away from investing in AI, despite the field overall having strong fundamentals. I don’t think this will happen, but if it does, it would be unfortunate since there’s still a lot of work in AI that I consider highly deserving of much more investment. Warren Buffett popularized Benjamin Graham’s quote, “In the short run, the market is a voting machine, but in the long run, it is a weighing machine.” He meant that in the short term, stock prices are driven by investor sentiment and speculation; but in the long term, they are driven by fundamental, intrinsic value. I find it hard to forecast sentiment and speculation, but am very confident about the long-term health of AI’s fundamentals. So my plan is just to keep building! [Original text: deeplearning.ai/the-batch/issu… ]
English
263
684
3.1K
404.5K
Vinay Babu retweetledi
Gaurav Sen
Gaurav Sen@gkcs_·
AI Agents will fade away, like microservices did. Painful to scale, and difficult to deploy. Eventually, you will see them hidden behind a wall of well-engineered solutions. Hype doesn't survive complexity.
English
183
84
1.6K
81K
Vinay Babu retweetledi
Philosophy Of Physics
Philosophy Of Physics@PhilosophyOfPhy·
Mathematical Thinking ; "For People Who Hate Math" - ✍️ Albert Rutherford
Philosophy Of Physics tweet media
English
21
341
2.5K
97.3K
Vinay Babu retweetledi
Paras Chopra
Paras Chopra@paraschopra·
What an absolute gem of a #book on the philosophy of mathematics! The book asks an innocuous question - what are we doing when we’re doing mathematics? Are we discovering entities, or creating new ones? What is a proof anyway? What happens when we find out that mathematics is incomplete and cannot be formalised? The view taken by the author (which I 100% agree with) is that math is a socio-cultural artifact. Doing math is like inventing rules of a game and figuring out its consequences. Of course, some math has practical utility but the truth of mathematical objects isn’t out there, but rather is in the minds of other mathematicians who give us necessary feedback or validation. I strongly recommend this book. It goes into my read-it-again list.
Paras Chopra tweet media
English
55
278
2.6K
126.6K
Vinay Babu retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
Implemented Olmo 3 from scratch (in a standalone notebook) this weekend! If you are a coder, probably the best way to read the architecture details at a glance: github.com/rasbt/LLMs-fro…
Sebastian Raschka tweet media
Sebastian Raschka@rasbt

Olmo models are always a highlight due to them being fully transparent and their nice, detailed technical reports. I am sure I'll talk more about the interesting training-related aspects from that 100-pager in the upcoming days and weeks. In the meantime, here's the side-by-side architecture comparison with Qwen3. 1) As we can see, the Olmo 3 architecture is relatively similar to Qwen3. However, it's worth noting that this is essentially likely inspired by the Olmo 2 predecessor, not Qwen3. 2) Similar to Olmo 2, Olmo 3 still uses a post-norm flavor instead of pre-norm, as they found in the Olmo 2 paper that it stabilizes the training. 3) Interestingly, the 7B model still uses multi-head attention similar to Olmo 2. However, to make things more efficient and shrink the KV cache size, they now use sliding window attention (e.g., similar to Gemma 3.) Next, let's look at the 32B model. 4) Overall, it's the same architecture but just scaled up. Also, the proportions (e.g., going from the input to the intermediate size in the feed forward layer, and so on) roughly match the ones in Qwen3. 5) My guess is the architecture was initially somewhat smaller than Qwen3 due to the smaller vocabulary, and they then scaled up the intermediate size expansion from 5x in Qwen 3 to 5.4 in Olmo 3 to have a 32B model for a direct comparison. 6) Also, note that the 32B model (finally!) uses grouped query attention.

English
17
287
2K
166.1K
Vinay Babu
Vinay Babu@min2bro·
@svpino What about the cost of training and llm from scratch? Could everyone afford it?
English
0
0
0
14
Santiago
Santiago@svpino·
The best way to learn how to work with large language models is to build one yourself.
English
38
14
289
23.8K
Vinay Babu retweetledi
Swapna Kumar Panda
Swapna Kumar Panda@swapnakpanda·
Stanford's Courses on AI & ML (FREE): ❯ CS221 - AI ❯ CS229 - ML ❯ CS229M - ML Theory ❯ CS230 - DL ❯ CS234 - RL ❯ CS236 - Deep Generative Models ❯ CS336 - LLM from Scratch ❯ CS224N - NLP with DL Course links inside:
English
21
244
1.8K
97.7K
Vinay Babu retweetledi
Madhura
Madhura@madhurahoval·
From 0 → $1M: The Reddit growth strategy that worked ↓
Madhura tweet media
English
29
68
1.1K
81.5K
Vinay Babu retweetledi
Santiago
Santiago@svpino·
Today, you can make $300,000/year by working a few days every week, advising companies on how to use AI for coding. People outside this bubble have no idea of what's going on. I'm telling you: go and talk to folks working at companies out there. They are dinosaurs! Most people aren't using AI for coding at all. And it's not even about writing code. It's about automation in general. There are so many things you can automate using AI today and free up your team's time to do better work! I've spoken with several companies willing to pay $300+/hr consulting fees for anyone who can help their team become more efficient. $300/hr x 20 hours per week x 50 weeks = $300,000. Go and talk to people.
English
73
42
1K
146.8K