Himanshu Yadav

4.9K posts

Himanshu Yadav banner
Himanshu Yadav

Himanshu Yadav

@himanshuy_

https://t.co/rW1LgkhBIH

Boston, MA Katılım Kasım 2007
494 Takip Edilen259 Takipçiler
Sabitlenmiş Tweet
Himanshu Yadav
Himanshu Yadav@himanshuy_·
Never give up on a dream just because of the time it will take to accomplish it. The time will pass anyway - Earl Nightingale
English
0
0
10
0
Celtics on CLNS
Celtics on CLNS@CelticsCLNS·
QUESTION for #Celtics Fans: Would you trade Jaylen Brown for Giannis Antetokounmpo?🤔 Let us know! ⬇️
English
16
4
7
5K
Himanshu Yadav retweetledi
Kevin Carpenter
Kevin Carpenter@kejca·
Warren Buffett: "You can make mistakes with people. I mean, look at the divorce rate. That's more important than whether you've got the right CEO or anything else." "Now, they spend five years [together dating] and they still make the same mistakes." 🤣
English
6
26
466
68.2K
☘️
☘️@SmokedBy0·
This was deadass the most magical Celtics season I’ve felt in years 💔
English
7
30
773
31.2K
Noa Dalzell 🏀
Noa Dalzell 🏀@NoaDalzell·
Joe Mazzulla is doing a virtual media availability at 12:30pm today. What do you want to know?
English
109
1
152
23.1K
Himanshu Yadav
Himanshu Yadav@himanshuy_·
@CapitalOne what the heck did you do with your travel's support? they can't answer simple questions
English
0
0
0
4
Himanshu Yadav
Himanshu Yadav@himanshuy_·
@BillSimmons Agreed. Not sure why Baylor, Garza, Hugo and Harper Jr not getting enough time.
English
0
0
1
165
Bill Simmons
Bill Simmons@BillSimmons·
The Jays came through when it mattered - but 42 mins for JT and 40 for JB, not really sustainable. They gotta get Queta and the bench swings going.
English
84
46
1.9K
272.3K
Himanshu Yadav retweetledi
Bill Simmons
Bill Simmons@BillSimmons·
I’m sure there’s a good reason for Vucevic playing crunch time in a 2026 playoff game but - my bad, I can’t come up with it.
English
126
77
2.2K
248.2K
Marc D'Amico
Marc D'Amico@Marc_DAmico·
This broadcast has been roughhhhh
English
46
13
692
23.8K
Celtics on CLNS
Celtics on CLNS@CelticsCLNS·
Payton Pritchard on Joel Embiid being upgraded to doubtful: "Nice." "If he plays, he plays. We'll figure that out. It's not like we're sitting here worried if he's playing or not ... we haven't even game-planned for him yet." @CLNSMedia | Q: @RealBobManning
English
80
78
1.4K
372K
Himanshu Yadav retweetledi
NBA Communications
NBA Communications@NBAPR·
Boston Celtics guard Derrick White has been named the 2025-26 NBA Sportsmanship Award winner, earning the Joe Dumars Trophy. Presented annually since the 1995-96 season, the NBA Sportsmanship Award honors a player who best represents the ideals of sportsmanship on the court.
NBA Communications tweet media
English
73
1K
7.4K
259.2K
Himanshu Yadav retweetledi
SportsCenter
SportsCenter@SportsCenter·
The Celtics are heading to the playoffs ☘️ That's 12-straight seasons that Boston has made the playoffs, the longest active streak in the NBA.
SportsCenter tweet media
English
168
3K
19.4K
653.4K
Himanshu Yadav retweetledi
Jaylen Brown
Jaylen Brown@FCHWPO·
50 wins in a gap year ☘️
English
940
8K
67.3K
3.1M
Himanshu Yadav retweetledi
Isaiah Thomas
Isaiah Thomas@isaiahthomas·
Jaylen Brown needs more love for MVP!!!!
English
138
1.3K
11.6K
207.2K
NBA
NBA@NBA·
The @Kia Defensive Players of the Month for February! West: Victor Wembanyama (@spurs) East: Derrick White (@celtics)
NBA tweet mediaNBA tweet media
English
49
236
2K
253.3K
Himanshu Yadav retweetledi
Keith Smith
Keith Smith@KeithSmithNBA·
There are a lot of good candidates, but Joe Mazzulla has to be Coach of the Year. What he's doing night to night with this roster is incredible. He's pulling the right levers all the time. He made like five huge, important decisions tonight just in the overtime periods.
English
58
150
2.2K
74.3K
Himanshu Yadav
Himanshu Yadav@himanshuy_·
@bcherny That is such a useful feature. After creating the plan I ask Claude to save it to md file, review it and then start a new context window to start implementing it.
English
0
0
0
19
Boris Cherny
Boris Cherny@bcherny·
Now in Claude Code: when you accept a plan, Claude automatically clears your context, so your plan gets a fresh context window. We found this helps keep Claude on track longer, and significantly improves plan adherence. If you prefer not to clear your context when accepting a plan, that option is still available too.
Boris Cherny tweet media
English
492
321
8.3K
332.1K
Himanshu Yadav retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
New post: nanochat miniseries v1 The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent. For the first public release of nanochat my focus was on end-to-end pipeline that runs the whole LLM pipeline with all of its stages. Now after YOLOing a few runs earlier, I'm coming back around to flesh out some of the parts that I sped through, starting of course with pretraining, which is both computationally heavy and critical as the foundation of intelligence and knowledge in these models. After locally tuning some of the hyperparameters, I swept out a number of models fixing the FLOPs budget. (For every FLOPs target you can train a small model a long time, or a big model for a short time.) It turns out that nanochat obeys very nice scaling laws, basically reproducing the Chinchilla paper plots: Which is just a baby version of this plot from Chinchilla: Very importantly and encouragingly, the exponent on N (parameters) and D (tokens) is equal at ~=0.5, so just like Chinchilla we get a single (compute-independent) constant that relates the model size to token training horizons. In Chinchilla, this was measured to be 20. In nanochat it seems to be 8! Once we can train compute optimal models, I swept out a miniseries from d10 to d20, which are nanochat sizes that can do 2**19 ~= 0.5M batch sizes on 8XH100 node without gradient accumulation. We get pretty, non-itersecting training plots for each model size. Then the fun part is relating this miniseries v1 to the GPT-2 and GPT-3 miniseries so that we know we're on the right track. Validation loss has many issues and is not comparable, so instead I use the CORE score (from DCLM paper). I calculated it for GPT-2 and estimated it for GPT-3, which allows us to finally put nanochat nicely and on the same scale: The total cost of this miniseries is only ~$100 (~4 hours on 8XH100). These experiments give us confidence that everything is working fairly nicely and that if we pay more (turn the dial), we get increasingly better models. TLDR: we can train compute optimal miniseries and relate them to GPT-2/3 via objective CORE scores, but further improvements are desirable and needed. E.g., matching GPT-2 currently needs ~$500, but imo should be possible to do <$100 with more work. Full post with a lot more detail is here: github.com/karpathy/nanoc… And all of the tuning and code is pushed to master and people can reproduce these with scaling_laws .sh and miniseries .sh bash scripts.
Andrej Karpathy tweet mediaAndrej Karpathy tweet mediaAndrej Karpathy tweet mediaAndrej Karpathy tweet media
English
227
675
5.4K
709K
Himanshu Yadav retweetledi
bab
bab@1xeraz·
Jaylen Brown tells DDG that being BORING is GOOD… It keeps you out of TROUBLE. “You got all the MONEY in the WORLD you need BORING"
English
19
182
1.3K
231.3K