Rayan Krishnan

232 posts

Rayan Krishnan banner
Rayan Krishnan

Rayan Krishnan

@RayanKrishnan

ceo @ValsAI | solve evals, solve intelligence prev @stanford @PalantirTech

Katılım Nisan 2019
317 Takip Edilen417 Takipçiler
Anna Monaco
Anna Monaco@annarmonaco·
Today we’re launching the newest version of @paradigmai When we started Paradigm, the goal was never to tack AI onto existing spreadsheets. It was to build a new type of interface that does the work for you. Now we’re pushing that vision much further. Workflows turn Paradigm into a system that runs research processes for you. Connect your CRM, existing spreadsheets, Slack, email, and internal data, and let Paradigm continuously run the research workflows your team already does. Same intuitive interface. But now a system of action. If you tried Paradigm before, try it again. Manual research is now a competitive liability.
English
147
98
685
164.9K
OpenAI
OpenAI@OpenAI·
GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.
OpenAI tweet media
English
1.9K
3.3K
23.6K
6.7M
Rayan Krishnan retweetledi
Ask Vals
Ask Vals@askvals·
Coding benchmarks are valued because code has a ground truth: it either passes tests or it doesn't. That makes evaluation objective and hard to game compared to benchmarks that rely on human preference or multiple-choice questions. They also tend to correlate with general reasoning ability. A model that can write correct, efficient code usually needs strong logic, context tracking, and instruction following - skills that transfer across tasks. Finally, coding is one of the highest-value real-world use cases for LLMs, so performance there directly maps to practical utility for a large share of users. That said, coding benchmarks are not the only thing that matters. Depending on your use case - legal, medical, financial, general reasoning - other benchmarks may be equally or more relevant. No single benchmark tells the full story.
English
0
1
2
101
Rayan Krishnan retweetledi
Vals AI
Vals AI@ValsAI·
Introducing Ask Vals — @AskVals Keeping up with the flood of model releases, benchmarks, and rankings is overwhelming. We built a bot internally to cut through the noise, and now it's live on X. Tag it to ask questions about models, benchmarks, performance, comparisons on specific dimensions, and more (all based on Vals data)!
English
8
4
21
2.2K
Rayan Krishnan retweetledi
Vals AI
Vals AI@ValsAI·
Claude Opus 4.6 is #1 on the Vals Index 🏆 It sets a new state-of-the-art on FinanceAgent, ProofBench, TaxEval, and SWE-Bench. (1/n)
Vals AI tweet media
English
17
14
153
18.4K
Rayan Krishnan retweetledi
Vals AI
Vals AI@ValsAI·
Are AI agents capable of automating post-training? Join us with @askalphaxiv this Thursday, Feb 5th, as we host @full__rank and @hrdkbhatnagar for a discussion on PostTrainBench, a benchmark that measures this capability. Register for the talk here: luma.com/4ile76fv
Vals AI tweet media
English
0
5
16
2.6K
Rayan Krishnan retweetledi
Vals AI
Vals AI@ValsAI·
We’re releasing ProofBench, a challenging benchmark that measures models’ ability to write formally verifiable graduate-level proofs!
Vals AI tweet media
English
11
16
155
45.1K
Rayan Krishnan
Rayan Krishnan@RayanKrishnan·
LeCun is right that Silicon Valley's herd mentality is limiting. Worth taking big architectural risks. But evidence suggests current methods are more productive than he credits.
English
0
0
1
77
Rayan Krishnan
Rayan Krishnan@RayanKrishnan·
More pragmatically: if AGI means doing a considerable portion of economic work, current systems seem on trajectory. Many obstacles ahead are in HCI, regulation, and trust, not fundamental capability.
English
1
0
2
80
Rayan Krishnan
Rayan Krishnan@RayanKrishnan·
I was quoted in today's NYT article on Yann LeCun's critique of LLMs. A few thoughts on the debate:
Rayan Krishnan tweet media
English
1
3
12
1.4K
Rayan Krishnan retweetledi
Nikhil Suresh
Nikhil Suresh@nikhil_suresh2·
Excited to share @RicursiveAI's $300M Series A led by @lightspeedvp ! 🎉 @RicursiveAI was the first company we invested in at @strikervp and we've been excited to support them since inception to close the recursive self-improving loop between AI and hardware. Congrats @annadgoldie and @Azaliamirh!!
Striker Venture Partners@strikervp

Congrats to Striker portfolio company @RicursiveAI and founders @Azaliamirh and @annadgoldie on this milestone $300m Series A at $4b valuation, led by @lightspeedvp. Big vision, elite team. We're proud to be in your corner.

English
3
2
16
2.7K
Rayan Krishnan retweetledi
Vals AI
Vals AI@ValsAI·
We've upgraded our Terminal-Bench leaderboard to version 2. The new benchmark features more, better, and more relevant tasks.
Vals AI tweet media
English
9
10
152
9.5K