Sriraam

2.3K posts

Sriraam

@27upon2

post-training research @chakra_ai prev harvard

NYC Katılım Temmuz 2016

3.7K Takip Edilen1.9K Takipçiler

Sabitlenmiş Tweet

Sriraam@27upon2·11 Şub

Introducing Gemini Cursor ✨ – a second multimodal AI cursor for your desktop that's open-source and free! Link below 👇 This experiment 🧪 reimagines how we interact with our computers because visual cues 👀 help us make sense of what we see on a screen. In this demo, I had my friend test it out by trying to add a payment method 💳 to Amazon. The cursor walks through the entire process 💬 while talking and pointing 🖱️ to the right parts of the website. Powered by Gemini 2.0 Flash (Experimental)⚡ from @Google and their live multimodal API. Shoutout to @alexanderchen for sharing the starter code that powers most of this app 🙌🔥

Sriraam@27upon2

🔥 @Google Gemini 2.0 Flash is crazy good at pointing. I was over engineering before but now I'm just gonna bet on model capabilities. This is a demo of an AI cursor explaining a diagram on @tldraw with just a prompt and an image. Streaming is also simple with @vercel AI SDK.

English

109

171.1K

Sriraam retweetledi

Nirmal Krishnan@0xnirmal·3d

he's crashing out again

Nirmal Krishnan@0xnirmal

this guy just complained about how every eval is broken because of harness failures or misspecifications never been more excited about the work

English

858

Sriraam retweetledi

Florian Brand@xeophon·4d

i'll be talking about llm benchmarks, the infra behind it, the challenges and learnings later today at @tngtech :) will be live streamed and recorded, link in replies :)

English

320

50.7K

Sriraam retweetledi

Lintang Sutawika@lintangsutawika·4d

Working with EleutherAI has been such a transformational experience for me; it has opened doors that would otherwise remain shut for a person such as myself. Consider applying if you are interested in AI research but come from a rather unusual background!

EleutherAI@AiEleuther

The Summer of AI Research 2026 is now accepting applications! Work on an open science AI research project between July 13 and August 16. In this fully online event we invite people with little research experience to contribute to open source under the mentorship of experienced researchers.

English

10.8K

Sriraam@27upon2·4d

@xeophon I went from almost 0 to 45% by making some params in 2 tools required instead of allowing empty args with a model

English

143

Florian Brand@xeophon·4d

got a 10% (relative) increase in eval scores by simply changing the sampling args to the recommended ones what are we doing man

English

105

Sriraam retweetledi

Prime Intellect@PrimeIntellect·5d

Reward hacking is the hardest problem in RL. We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency. These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

GIF

English

547

127.8K

Jess Li@jessicafeiyali·5d

I wrote something on reward hacking 🐵 and we're also doing free compute 👀

Prime Intellect@PrimeIntellect

English

19.1K

Sriraam@27upon2·5d

@jessicafeiyali @willccbb 🐐

QME

Sriraam@27upon2·5d

@vipul_1011 @trymirage @Techweek_ Applied 🔥🙌 thank u for putting this together!

English

132

Vipul Gupta@vipul_1011·5d

It's getting more fun, we are co-hosting a distilled AI meetup with @trymirage during a16z @Techweek_ in NYC. Date: June 3, Wednesday. Researchers, engineers, founders - all deep in AI, all in one room. As usual - curated guest list, high signal, less noise. Join us and meet others building cool things in AI in NYC. Link below.

English

3.9K

Sriraam@27upon2·19 May

I don’t like fixing constrained decoding bugs

English

1.9K

Sriraam@27upon2·19 May

@feulf Awesome thanks!

English

Federico Ulfo@feulf·18 May

@27upon2 here is luma.com/ai-dinner-may-…

English

Sriraam@27upon2·17 May

Moved to NY to work on RL. Would like to meet ppl. I take bad pics and like good food

English

195

11K

Sriraam@27upon2·19 May

@0xnirmal @feulf 🫡

QME

Nirmal Krishnan@0xnirmal·18 May

@feulf @27upon2 would recommend, fed is the goat

English

Sriraam@27upon2·19 May

@feulf @0xnirmal Oo will check it out thank you!

English

Federico Ulfo@feulf·18 May

@27upon2 join aisocratic.org ask @0xnirmal

English

Sriraam retweetledi

Harbor Framework@harborframework·19 May

We built Harbor to evaluate agents. But why limit ourselves to just agents? Today we're adding first-class support for evaluating skills, MCPs, prompts, and services. Ablate your agents.

English

5.5K

Sriraam@27upon2·18 May

@michellechen but srsly house of prime sounds cool @willccbb

English

Sriraam@27upon2·18 May

@michellechen My puny attention span thought Charles was a new RLaaS platform. I should sleep soon

English

317

michelle@michellechen·18 May

is house of prime rib better than 4 charles?? should i go

English

2.7K

Sriraam@27upon2·18 May

@parkinfocus fr

Lawrence Park@parkinfocus·17 May

I understand the williamsburg hype now

English

179

Sriraam retweetledi

Harshita Chopra@chopra_harshita·16 May

Are you training or evaluating agents with LLM-based user simulators? Most simulators inherit the behavior of their underlying models: cooperative, clear, and homogeneous - which is unrealistic! 🤖 Humans are messy: they falter, forget, push back, and behave in ways that are difficult to define. Manually writing personas becomes brittle and hard to scale. 😣 We introduce 𝗣𝗲𝗿𝘀𝗼𝗻𝗮 𝗣𝗼𝗹𝗶𝗰𝗶𝗲𝘀 (𝗣𝗣𝗼𝗹): an evolutionary framework that automatically discovers behaviors and instructions to generate diverse human-like user personas for any given task – ✨grounded in real dialogue traces✨

English

128

40.7K

Sriraam retweetledi

Jonas@jonaasw1·18 May

introducing "Cafe Roulette" make coffee chats fun again. discover cafes + invite someone to come. here's how it works: - spin to discover a new cafe - send a cafe postcard to a friend - (optionally) send an invitation with date/time - (optionally) auto-add to both calendars give it a try! nyccafelist.com/roulette