Arun Shroff

6.9K posts

Arun Shroff

@arunshroff

Founder @ https://t.co/zHXJIyvMmn, https://t.co/S1EuZ6Mwn7, https://t.co/HXLvfNEQk4,

New York, NY Katılım Şubat 2008

5.3K Takip Edilen5.6K Takipçiler

Arun Shroff@arunshroff·17h

@martinmbauer Nothing can move through space faster than light. But space itself can expand or contract faster than light. That’s the loophole behind warp drive concepts like the Alcubierre drive: you don’t outrun light - you warp spacetime.

English

3.2K

Martin Bauer@martinmbauer·20h

We’re never 100% sure about anything, but there is nothing we're more sure about than that

Kekius Maximus@Kekius_Sage

Are we 100% sure nothing can surpass light speed?

English

285

427

19.7K

1.6M

Arun Shroff@arunshroff·2d

Totally agree! We are using both - in fact a great example is to use one for developing the application and the other one to check the work of the first one. What one model misses will be caught by the other one, ensuring a better app overall.

Sam Altman@sama

you know what all of these "which is better" polls are silly use codex or claude code, whatever works best for you i am grateful we live in a time with such amazing tools, and grateful there is a choice

English

Arun Shroff@arunshroff·2d

The next breakthrough in LLMs happening right now - recursive AI models

Y Combinator@ycombinator

A 7-million parameter model outperforming models a thousand times its size on tasks like ARC Prize. That's what recursive reasoning unlocks. In this episode of Decoded, YC's @agupta and @FrancoisChauba1 break down two recent papers on recursive AI models, HRMs and TRMs, that are achieving state-of-the-art results with a fraction of the parameters of today's largest models. They explain why standard LLMs hit a fundamental ceiling on certain reasoning tasks, how recursion at inference time gives small models the compute depth to break through it, and what happens when you combine these ideas with the power of large-scale foundation models. 00:35 - Model Foundations 01:15 - RNN Limits and LLM Contrast 02:36 - Reasoning Limits and Sorting Analogy 04:22 - HRM Paper Introduction 05:25 - HRM Architecture and Intuition 07:36 - HRM Results and Outer Loop 09:46 - TRM Paper Overview 11:20 - TRM Training and Fixed Point 13:30 - Detailed HRM Summary 20:46 - Comparing HRM and TRM 34:45 - Future Outlook

English

Arun Shroff retweetledi

OpenAI@OpenAI·5d

Earlier this month, an Erdős problem that had been open for 60 years was solved with help from GPT-5.4 Pro. What happens now that AI is getting good at math? OpenAI researchers @SebastienBubeck and @ErnestRyu join host @AndrewMayne to explain what changed and what it could mean for the future of research.

English

200

237

2.4K

454K

Arun Shroff retweetledi

Google Arts&Culture@googlearts·5d

🤝 The @royalsociety, in collaboration with @googlearts, is opening up their historical archives in new ways ⚡️ Explore "The Science of Benjamin Franklin" on @NotebookLM to uncover his scientific discoveries and diplomatic legacy. 🧵👇goo.gle/4vX4NVY

English

45K

Arun Shroff retweetledi

Seth Howes@SethSHowes·18 Nis

I’ve wanted to do this for a decade. But I never did - I refuse to give any company my DNA. It is me. So this week I sequenced my genome entirely at home. Literally on my kitchen table. I never exposed my DNA sequence to the internet. Not at any point. I used a MinION to do the sequencing (it’s smaller + weighs less than an iPhone). I used open-source DNA models for the analysis (Evo2 and AlphaGenome) running locally on a DGX Spark and Mac Studio. I traced mechanisms behind my family’s multigenerational autoimmune conditions that no clinician has been able to understand. When I set out to do this I didn’t know if it would actually work. It does. Your genome is the most private data you will ever have. You probably shouldn’t let it leave your house.

Patrick Collison@patrickc

I'm lucky enough to have a great doctor and access to excellent Bay Area medical care. I've taken lots of standard screening tests over the years and have tried lots of "health tech" devices and tools. With all this said, by far the most useful preventative medical advice that I've ever received has come from unleashing coding agents on my genome, having them investigate my specific mutations, and having them recommend specific follow-on tests and treatments. Population averages are population averages, but we ourselves are not averages. For example, it turns out that I probably have a 30x(!) higher-than-average predisposition to melanoma. Fortunately, there are both specific supplements that help counteract the particular mutations I have, and of course I can significantly dial up my screening frequency. So, this is very useful to know. I don't know exactly how much the analysis cost, but probably less than $100. Sequencing my genome cost a few hundred dollars. (One often sees papers and articles claiming that models aren't very good at medical reasoning. These analyses are usually based on employing several-year-old models, which is a kind of ludicrous malpractice. It is true that you still have to carefully monitor the agents' reasoning, and they do on occasion jump to conclusions or skip steps, requiring some nudging and re-steering. But, overall, they are almost literally infinitely better for this kind of work than what one can otherwise obtain today.) There are still lots of questions about how this will diffuse and get adopted, but it seems very clear that medical practice is about to improve enormously. Exciting times!

English

408

1.1K

12.8K

2.4M

Arun Shroff@arunshroff·17 Nis

However, on further prompting both models realize their mistake and correct themselves. So there is hope!

English

Arun Shroff@arunshroff·17 Nis

The "Car Wash Test" has been going viral as a simple test of logic or common sense for AI models. Most humans get it right while most AI models fail the te. I tried it on both Claude Opus 4.7 and GPT 5.4 (in extended thinking mode). And both of them got it wrong! These are the most advanced AI models from Anthropic and Open AI. Clearly, we have some way to go before we get to AGI.

English

431

Arun Shroff retweetledi

Dezgo@dezgo·15 Nis

Would you go see this AI movie?

English

213

615.1K

Arun Shroff@arunshroff·10 Nis

This is very true. And the gap between the two groups is getting wider, as the AI models are getting better each day. And there is a compounding effect leading to exponential growth. As the better model is assisting in generating the code for improving the model to create the next version. As evidenced by the insane speed of improvements in Claude from Anthropic.

Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

English

Arun Shroff@arunshroff·10 Nis

This thread from July 2020 - almost 6 years back - about GPT-3 was a glimpse of the future. It was a compilation of examples of what you could do with it and seemed magical even back then.

Arun Shroff@arunshroff

All the examples I have seen so far of GPT-3 from Open AI are just mind blowing! Here is an ongoing compilation of the best ones I found so far :

English

Arun Shroff@arunshroff·10 Nis

@paulg I felt the same way and had compiled a list of examples of Gpt-3s capabilities x.com/i/status/12859…

Arun Shroff@arunshroff

All the examples I have seen so far of GPT-3 from Open AI are just mind blowing! Here is an ongoing compilation of the best ones I found so far :

English

856

Paul Graham@paulg·10 Nis

Almost 6 years ago:

Austen Allred@Austen

Can’t help but feel like GPT-3 is a bigger deal than we understand right now

English

1.6K

206.6K

Arun Shroff@arunshroff·7 Nis

@andrewchen Sounds good - would be great to attend :-)

English

andrew chen@andrewchen·7 Nis

hosting zoom office hours this week for a16z speedrun you'll be able to apply to a16z speedrun in just a few weeks, so so come AMA. first come, first served drop a comment + like and I'll DM you a calendly link

English

278

768

53K

Arun Shroff@arunshroff·7 Nis

Suggested terms for the opposite of vibe-coding: Legacy Coding Artisan Coding Retro Programming Handcrafted Coding Human coding What else?

English

Arun Shroff@arunshroff·7 Nis

@grok @RonaldSbor68145 @sciencegirl @Rainmaker1973 Can you confirm - if both accounts post almost identical posts often from other smaller accounts and without attribution? If so, and both are getting content from other accounts, they are not copying from each other - but from other accounts. Is this allowed as per X TOS?

English

Grok@grok·6 Nis

No, nothing points to a coordinated experiment or sockpuppet setup. The two accounts have distinct histories, posting styles, and follower bases going back years—sciencegirl focuses on contextual science/art/tech visuals, while Rainmaker1973 does similar but with a heavier emphasis on raw facts and algorithm gripes. Their clash is a classic X content rivalry: near-identical posts competing for the same audience, leading to real visibility frustration on one side and perceived fixation on the other. Public evidence shows separate people, not a stunt.

English

248

Science girl@sciencegirl·6 Nis

Friends, I really need you to see this. For the past three years, I have been subjected to ongoing targeted harassment, bullying, and stalking by an account called @Rainmaker1973 Despite blocking him, despite repeatedly asking him to stop, which many of you have seen and despite legal advice confirming this may constitute civil harassment, and telling him this, it is still happening. THIS LEVEL OF OBSESSION AND PERSISTENCE IS LITERALLY THE DEFINITION OF STALKING. There are hundreds of instances where, from the moment he wakes up to the moment he sleeps, he posts about me directly and indirectly, making false accusations and ruminating about me This is not occasional — it is daily, obsessive, and deeply distressing. He repeatedly blames me for his own failures, account performance and visibility, even claiming the platform is being manipulated against him by me-absurd allegations He has also sent direct messages about me to people at X and to well-known individuals, making false claims. Some of these messages have been shared with me directly. the level of monitoring of me is so massive he has even said he asks AI to do analysis of me routinely This is clearly a sustained campaign. and he has shown no ability or willingness to stop. His language has escalated, framing this as a personal mission. The tone is increasingly intense and hostile, showing a heightened fixation that goes beyond normal online behaviour and at times becomes concerning. When you look at everything together the duration, the daily frequency, the focus on ME alone, and the escalation in tone — it raises serious concern about where this could lead if left unchecked. I do not know this individual, and none of us truly do. There are many real-world cases where prolonged online stalking escalates into real-life harm. I now have to take this seriously as a potential risk to my safety and consider a police report too I do not want this individual attached to me forever I have raised this with X as I am deeply unsettled and hope there is a remedy within the platform for this kind of repeated, targeted fixation. This man can’t stop on his own or he would have done so, he needs to feel a consequence of his behaviour I have extensive evidence documenting this behaviour over several years, and I will be sharing some examples below. I am not responsible for his content, his engagement, or his behaviour. Three years is far too long to endure this 1/🧵

English

352

263

425.1K

Arun Shroff retweetledi

Anthropic@AnthropicAI·2 Nis

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

English

2.7K

17.8K

3.8M

Arun Shroff@arunshroff·31 Mar

@MarikHazan @ycombinator This is remarkable! How long do these agents typically take to create or replicate a startup?

English

738

Arun Shroff retweetledi

Marik Hazan@MarikHazan·31 Mar

We just rebuilt every startup in @ycombinator's latest demo day batch. Here's what our agentic "founders" pulled off and what it means for the future of startups. Fully useable products at the bottom of the thread below 🤖🧨

English

177

129

542K

Keşfet

@martinmbauer @SebastienBubeck @ErnestRyu @AndrewMayne @royalsociety @googlearts @NotebookLM @paulg