Nihal Kurth

1.9K posts

Nihal Kurth banner
Nihal Kurth

Nihal Kurth

@nihalkurth

Where does reasoning live now, I wonder. → https://t.co/GfT3zVqkEJ

Katılım Temmuz 2010
644 Takip Edilen229 Takipçiler
Nihal Kurth retweetledi
@jason
@jason@Jason·
Starlink will be the largest subscription product ever created. Bigger than windows (400m) Netflix (300m) and Spotify (260m) — combined. @grok remind me of this is ten years @grok build a model that shows the path of a 1B subscriber product globally
English
192
121
2.5K
276.7K
Nihal Kurth
Nihal Kurth@nihalkurth·
@BrandonSeverin Hard to pick a favourite but 4:29 is very memorable for me. Three of you walking together, where is it by the way? And you were probably telling sth like "imagine me telling sth important here..." :)
English
0
0
1
9
Nihal Kurth retweetledi
Nihal Kurth retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
1K
2.4K
19.8K
4M
Nihal Kurth retweetledi
Jeremy
Jeremy@ManaByte·
This is one of the greatest photos ever taken by a human…so far.
Jeremy tweet media
English
1.3K
9.4K
107.8K
1.7M
Nihal Kurth retweetledi
NASA
NASA@NASA·
Hello, Moon. It’s great to be back. Here’s a taste of what the Artemis II astronauts photographed during their flight around the Moon. Check out more photos from the mission: nasa.gov/artemis-ii-mul…
NASA tweet mediaNASA tweet mediaNASA tweet mediaNASA tweet media
English
10K
174.1K
810.4K
28.5M
MrComputerScience
MrComputerScience@MrComputerSci·
Check this out. My Substack FINALLY got a subscriber from the SEO articles I've been writing. Ironically, GPT sent it & not Google. I'm so happy I could cry because I suck at getting subscribers. This proves that SOMETHING I did worked. (Thank you to whoever just signed up.)
MrComputerScience tweet media
English
5
1
13
342
Nihal Kurth
Nihal Kurth@nihalkurth·
All we want is the fastest take off possible.
Dwarkesh Patel@dwarkesh_sp

You'd think the race to AGI would mean training the biggest possible model. But parameter scaling had stalled for a long time after GPT-4's trillion+ parameters, and only now are models getting bigger again. What gives? Partially it’s RL scaling, as @dylan522p explains. A 5T parameter model takes 5x longer to generate RL rollouts than a 1T model. Even if the bigger model is 2x more sample-efficient, the smaller model finishes RL faster, gets deployed to research sooner, and starts helping build the next model before the big one is even done training.

English
0
0
2
42
Nihal Kurth retweetledi
Paul Graham
Paul Graham@paulg·
I'm glad she chose this excerpt about how to make a convincing Demo Day presentation. Founders would be so much more effective at fundraising if they gave their pitches YC-style "vertebrae".
Jessica Livingston@jesslivingston

Paul Graham is back in the latest Social Radars, talking about what went on behind the scenes in the early days of YC. If you like the fly-on-the-wallness of Social Radars interviews, this is the most fly-on-the-wall of all. pod.link/1677066062/epi…

English
32
10
318
61.3K
Nihal Kurth retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.7K
2.4K
31.2K
3.4M
Nihal Kurth retweetledi
Vivid Void
Vivid Void@vividvoid·
Vivid Void tweet media
ZXX
68
3.3K
22.7K
409.3K
Nihal Kurth retweetledi
Marc Andreessen 🇺🇸
OpenClaw and Pi together are in the top 10 of all time software breakthroughs.
Chrys Bader@chrysb

folks who are calling @openclaw pure hype are telling on themselves openclaw is like the early internet, it's raw, unrefined, and takes a little doing to get things to work, but when you figure it out, it's transformative. here are some real use cases that are having material impact on our $2.5M ARR business: 1. ad creative pipeline. our head of growth @ArjunShukl95550 built an end-to-end creative pipeline to go from ideation to publish adds to meta, greatly increasing our creative iteration speed. it's producing winning creatives. it lives in slack, and anyone on the team can share their ideas and have them enter the pipeline. 2. data analytics agent. another bot lives in our slack that connects to bigquery and lets our team ask any questions of the data, it produces charts and answers questions in real time. no one needs to write SQL anymore. 3. recruiting. i told my agent about a role we're hiring for, and it scoured linkedin and the web, found 30 candidates, portfolio, email addresses, and stack ranked them based on fit with our criteria this is just in the past week. i have twenty more success stories for you i can share another time. you have to understand, this is the shittiest it will ever be. everyone is going to have one or more personal self-improving agents that they use every day, and openclaw is what revealed this future to us. if you can't see this, i encourage you to look harder there will be many competitors (and already are), and the large labs will start to converge on this (they already are) too. openclaw may not win, but it opened pandora's box and uncorked the agentic future.

English
196
177
2.8K
641.8K
Nihal Kurth
Nihal Kurth@nihalkurth·
@paulg The ending: "The shoemaker's children finally got some shoes." :)
English
0
0
1
328
Paul Graham
Paul Graham@paulg·
Some people criticize Garry, but this excerpt from the latest Social Radars episode is a good example of how insiders feel about how he's doing. The 3 of us in this conversation are 3/5 of YC's board.
Garry Tan@garrytan

I am so thankful for @paulg @jesslivingston @bchesky @cjoneslevy for believing me and selecting me to be the President & CEO of YC. To be able to lead this brilliant band of partners is beyond the best job I’ve ever had! If you love what you do, you never work a day in your life

English
45
12
471
140.7K