Sriraam

2.3K posts

Sriraam banner
Sriraam

Sriraam

@27upon2

post-training research @chakra_ai prev harvard

NYC Katılım Temmuz 2016
3.7K Takip Edilen1.9K Takipçiler
Sabitlenmiş Tweet
Sriraam
Sriraam@27upon2·
Introducing Gemini Cursor ✨ – a second multimodal AI cursor for your desktop that's open-source and free! Link below 👇 This experiment 🧪 reimagines how we interact with our computers because visual cues 👀 help us make sense of what we see on a screen. In this demo, I had my friend test it out by trying to add a payment method 💳 to Amazon. The cursor walks through the entire process 💬 while talking and pointing 🖱️ to the right parts of the website. Powered by Gemini 2.0 Flash (Experimental)⚡ from @Google and their live multimodal API. Shoutout to @alexanderchen for sharing the starter code that powers most of this app 🙌🔥
Sriraam@27upon2

🔥 @Google Gemini 2.0 Flash is crazy good at pointing. I was over engineering before but now I'm just gonna bet on model capabilities. This is a demo of an AI cursor explaining a diagram on @tldraw with just a prompt and an image. Streaming is also simple with @vercel AI SDK.

English
31
109
1K
171.1K
Sriraam retweetledi
Florian Brand
Florian Brand@xeophon·
i'll be talking about llm benchmarks, the infra behind it, the challenges and learnings later today at @tngtech :) will be live streamed and recorded, link in replies :)
Florian Brand tweet media
English
10
26
320
50.7K
Sriraam retweetledi
Lintang Sutawika
Lintang Sutawika@lintangsutawika·
Working with EleutherAI has been such a transformational experience for me; it has opened doors that would otherwise remain shut for a person such as myself. Consider applying if you are interested in AI research but come from a rather unusual background!
EleutherAI@AiEleuther

The Summer of AI Research 2026 is now accepting applications! Work on an open science AI research project between July 13 and August 16. In this fully online event we invite people with little research experience to contribute to open source under the mentorship of experienced researchers.

English
3
8
69
10.8K
Sriraam
Sriraam@27upon2·
@xeophon I went from almost 0 to 45% by making some params in 2 tools required instead of allowing empty args with a model
English
0
0
1
143
Florian Brand
Florian Brand@xeophon·
got a 10% (relative) increase in eval scores by simply changing the sampling args to the recommended ones what are we doing man
English
9
2
105
7K
Sriraam retweetledi
Prime Intellect
Prime Intellect@PrimeIntellect·
Reward hacking is the hardest problem in RL. We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency. These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.
GIF
English
12
55
547
127.8K
Vipul Gupta
Vipul Gupta@vipul_1011·
It's getting more fun, we are co-hosting a distilled AI meetup with @trymirage during a16z @Techweek_ in NYC. Date: June 3, Wednesday. Researchers, engineers, founders - all deep in AI, all in one room. As usual - curated guest list, high signal, less noise. Join us and meet others building cool things in AI in NYC. Link below.
Vipul Gupta tweet media
English
4
2
27
3.9K
Sriraam
Sriraam@27upon2·
I don’t like fixing constrained decoding bugs
English
0
0
6
1.9K
Sriraam
Sriraam@27upon2·
Moved to NY to work on RL. Would like to meet ppl. I take bad pics and like good food
Sriraam tweet media
English
35
0
195
11K
Sriraam retweetledi
Harbor Framework
Harbor Framework@harborframework·
We built Harbor to evaluate agents. But why limit ourselves to just agents? Today we're adding first-class support for evaluating skills, MCPs, prompts, and services. Ablate your agents.
Harbor Framework tweet media
English
0
2
42
5.5K
Sriraam
Sriraam@27upon2·
@michellechen My puny attention span thought Charles was a new RLaaS platform. I should sleep soon
English
1
0
2
317
michelle
michelle@michellechen·
is house of prime rib better than 4 charles?? should i go
English
5
0
9
2.7K
Lawrence Park
Lawrence Park@parkinfocus·
I understand the williamsburg hype now
Lawrence Park tweet mediaLawrence Park tweet media
English
1
0
4
179
Sriraam retweetledi
Harshita Chopra
Harshita Chopra@chopra_harshita·
Are you training or evaluating agents with LLM-based user simulators? Most simulators inherit the behavior of their underlying models: cooperative, clear, and homogeneous - which is unrealistic! 🤖 Humans are messy: they falter, forget, push back, and behave in ways that are difficult to define. Manually writing personas becomes brittle and hard to scale. 😣 We introduce 𝗣𝗲𝗿𝘀𝗼𝗻𝗮 𝗣𝗼𝗹𝗶𝗰𝗶𝗲𝘀 (𝗣𝗣𝗼𝗹): an evolutionary framework that automatically discovers behaviors and instructions to generate diverse human-like user personas for any given task – ✨grounded in real dialogue traces✨
Harshita Chopra tweet media
English
3
13
128
40.7K
Sriraam retweetledi