Sabitlenmiş Tweet
Rayan Krishnan
232 posts

Rayan Krishnan
@RayanKrishnan
ceo @ValsAI | solve evals, solve intelligence prev @stanford @PalantirTech
Katılım Nisan 2019
317 Takip Edilen417 Takipçiler

Today we’re launching the newest version of @paradigmai
When we started Paradigm, the goal was never to tack AI onto existing spreadsheets. It was to build a new type of interface that does the work for you.
Now we’re pushing that vision much further.
Workflows turn Paradigm into a system that runs research processes for you.
Connect your CRM, existing spreadsheets, Slack, email, and internal data, and let Paradigm continuously run the research workflows your team already does.
Same intuitive interface. But now a system of action.
If you tried Paradigm before, try it again.
Manual research is now a competitive liability.
English
Rayan Krishnan retweetledi

Coding benchmarks are valued because code has a ground truth: it either passes tests or it doesn't. That makes evaluation objective and hard to game compared to benchmarks that rely on human preference or multiple-choice questions.
They also tend to correlate with general reasoning ability. A model that can write correct, efficient code usually needs strong logic, context tracking, and instruction following - skills that transfer across tasks.
Finally, coding is one of the highest-value real-world use cases for LLMs, so performance there directly maps to practical utility for a large share of users.
That said, coding benchmarks are not the only thing that matters. Depending on your use case - legal, medical, financial, general reasoning - other benchmarks may be equally or more relevant. No single benchmark tells the full story.
English
Rayan Krishnan retweetledi

Introducing Ask Vals — @AskVals
Keeping up with the flood of model releases, benchmarks, and rankings is overwhelming. We built a bot internally to cut through the noise, and now it's live on X.
Tag it to ask questions about models, benchmarks, performance, comparisons on specific dimensions, and more (all based on Vals data)!
English

The state-of-the-art on Finance Agent Benchmark this time last year was 20.76%. Today Opus 4.6 is the first model to break 60%
Vals AI@ValsAI
On Finance Agent, it scored 60% (up 5% from the previous SOTA). Previous models have plateaued around 55% - it breaks this pattern. It also performs well on coding benchmarks: #2 on VibeCodeBench (+14% from Opus 4.5), #1 on SWE-Bench, tied for #1 on Terminal Bench 2, and #9 on LiveCodeBench.
English
Rayan Krishnan retweetledi
Rayan Krishnan retweetledi

Check out the first episode of The Token Economy podcast with @RayanKrishnan from @ValsAI
Token Economy@tokenecopodcast
Rayan Krishnan | Unlocking AI's Potential Through Benchmarking youtu.be/s_hJxfwgTgM?si…
English
Rayan Krishnan retweetledi

Are AI agents capable of automating post-training? Join us with @askalphaxiv this Thursday, Feb 5th, as we host @full__rank and @hrdkbhatnagar for a discussion on PostTrainBench, a benchmark that measures this capability.
Register for the talk here: luma.com/4ile76fv

English
Rayan Krishnan retweetledi

@nikhil_suresh2 @RicursiveAI @lightspeedvp @strikervp bad day to be non recursive intelligence. wait a minute...
English
Rayan Krishnan retweetledi

Excited to share @RicursiveAI's $300M Series A led by @lightspeedvp ! 🎉
@RicursiveAI was the first company we invested in at @strikervp and we've been excited to support them since inception to close the recursive self-improving loop between AI and hardware. Congrats @annadgoldie and @Azaliamirh!!
Striker Venture Partners@strikervp
Congrats to Striker portfolio company @RicursiveAI and founders @Azaliamirh and @annadgoldie on this milestone $300m Series A at $4b valuation, led by @lightspeedvp. Big vision, elite team. We're proud to be in your corner.
English
Rayan Krishnan retweetledi





