james

30.2K posts

james

@Jamesthesnake

My tweets reflect the opinions and values of American Idol season 3 winner Fantasia Barrino. 29. https://t.co/juQ8fBFUi8

austin,tx Katılım Eylül 2010

6.2K Takip Edilen718 Takipçiler

Sabitlenmiş Tweet

james@Jamesthesnake·2 May

Supporting the writers strike because it was the studios that canceled HBO Ballers

English

5.6K

james retweetledi

Ming Jin@MingJin_AI·3h

Starting in ~ 1 hour (9am PT/12pm ET)! ⏳ Grab your virtual seat and join the discussion: 📍 Join: virginiatech.zoom.us/j/87872134251 🔑 Passcode: 309194

Ming Jin@MingJin_AI

Vision-Language-Action (VLA) models are evolving fast. How do we move robots from following basic instructions to executing complex, multi-stage tasks with sophisticated test-time reasoning? 🤖🧠 We are incredibly honored to host Sergey Levine @svlevine for the next AI Agent Frontier Seminar to present: "Robotic Foundation Models." Sergey will discuss the leap from first-generation VLAs to models that handle diverse data modalities and advanced reasoning, outlining the true frontiers of the field. Date: This Friday 3/27 12pm ET/9am PT 🔗 agentic-ai-frontier-seminar.github.io 📍 Join: virginiatech.zoom.us/j/87872134251 🔑 Passcode: 309194 Organizers: @yalidux @ShangdingG95714 @MingJin_AIl #Robotics #AIAgents #VLA #FoundationModels

English

2.1K

james retweetledi

Cody Blakeney@code_star·20h

This reminds me a lot of how RecSys models at meta worked when I was there. They moved the time for updates down from 1 hour -> ~5minutes at the time. This made them probably insane amounts more ad money. I've kind of been waiting for this level of live updates to become a thing at companies that have sufficient volume of user interactions daily and its great to see it happen. RL is sort of perfectly suited for this. I imagine this will be pretty standard practice across many companies in the next couple of years.

Cursor@cursor_ai

Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours.

English

6.5K

james retweetledi

Grigory Sapunov@che_shr_cat·7h

1/ Most agent deployments fail because memory and weight updates are treated as mutually exclusive. This leads to stale rewards and catastrophic forgetting. A new framework fixes this non-stationarity natively. 🧵

English

654

james retweetledi

Harrison Chase@hwchase17·1d

if you want to learn how we think about evaluating agents - real agents, not simple llm prompts - this blog is a goldmine

Viv@Vtrivedy10

x.com/i/article/2036…

English

279

66.8K

james retweetledi

Siddharth@Pseudo_Sid26·1d

x.com/i/article/2037…

ZXX

24.5K

james retweetledi

Teknium (e/λ)@Teknium·1d

Hmm it’s like someone saw hermes agent and wrote a paper on our memory/skill system

Sumanth@Sumanth_077

Let Agents Design Agents Memento-Skills is a self-evolving agent framework where agents learn from failures and rewrite their own skills. Most agent frameworks treat skills as static. You write them once, load them into context, and hope they work. When they fail, you debug manually or try again with the same broken skill. Memento-Skills takes a different approach. When a skill fails, the system reflects on why it failed, locates the broken skill, rewrites it, and stores the improved version back into the skill library. Here's how it works: The framework runs a continuous Read → Execute → Reflect → Write loop. Read: Retrieve candidate skills from the local library instead of loading every skill into context. Execute: Run skills in a local sandbox with actual tool calling - file operations, web search, scripts, external systems. Reflect: When execution fails, the system records what went wrong, updates the skill's utility score, and attributes the failure to specific skills. Write: Rewrite broken skills, optimize weak ones, or create new skills when nothing suitable exists. This isn't about accumulating more skills. It's about building a skill library that improves through task experience. The system was tested on HLE (Humanity's Last Exam) and GAIA (General AI Assistants) benchmarks. Performance improved over multiple learning rounds as the skill library grew from basic atomic skills into a richer set of learned capabilities. Built for open-source LLM ecosystems - works with Kimi, MiniMax, GLM, and other OpenAI-compatible endpoints. Comes with 9 built-in skills (filesystem, web-search, PDF, docx, xlsx, pptx, image analysis, skill-creator, dependency install) that serve as the starting point for the evolving library. It's 100% open source Link to Memento-Skills in comments!

English

432

23.3K

james retweetledi

AT@AliesTaha·20h

x.com/i/article/2037…

ZXX

506

58.1K

james retweetledi

Yacine Mahdid@yacinelearning·20h

every time I do a live interview like this there are at least 7 different relatively low-cost high-impact experiments a novice researcher could try out that are mentioned for free

Yacine Mahdid@yacinelearning

good morning folks we're going live in about 2h to have a jolly discussion about making models self-teach themselves hard stuff like @justinskycak would

English

133

7.1K

james@Jamesthesnake·21h

Nice

AVB@neural_avb

Yo check out this new article 👀 I would second so much of what he wrote here. If I stopped dogfooding my agents I'm just ngmi. That's where all my best ideas have come from. Building evals is usually Step 2. Viv breaks it all down here. Very straightforward and educational!

English

james retweetledi

Kevin Wang@KevinWang_111·23h

RL training on large models always feels painfully slow. And sometimes I wonder — which part is actually making it slow? Is it even possible to improve? This is exactly what @alexbert135's IKP solves. Instead of treating a CUDA kernel as a black box, you instrument named regions (load → compute → store) and get per-region hardware metrics: stall reasons, memory behavior, pipeline utilization, all the way down to SASS. If you're scaling RL training or serving LLMs with custom kernels, this profiling tool could be quite helpful :D

Jianzhu Yao@alexbert135

Open-sourced IKP: Intra-Kernel Profiler for CUDA kernels. Most GPU profilers tell you what happened at the kernel level. IKP shows what happened inside the kernel, for developers, and for agents. Repo: github.com/yao-jz/intra-k… #GPU #Profiling #CUDA

English

5.7K

james retweetledi

Elliot Arledge@elliotarledge·1d

x.com/i/article/2037…

ZXX

204

11.3K

james@Jamesthesnake·1d

@iScienceLuvr Ok

159

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·1d

if you can reply to this, you're awesome and cool :)

English

172

362

41.6K

james retweetledi

Yacine Mahdid@yacinelearning·1d

I've been studying this paradigm for the past few weeks guys and I get this feeling that this is it

Jonas Hübotter@jonashubotter

Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed. Today, we introduce a simple algorithm that enables the model to learn from any rich feedback! And then turns it into dense supervision. (1/n)

English

605

69K

james retweetledi

AVB@neural_avb·1d

Holy moly Few companies understand RL Harnesses like the Prime Intellect guys Few companies understand browser-use like Browserbase I have such high expectations now

Browserbase@browserbase

We're excited to announce our partnership with @PrimeIntellect to allow anyone to train browser agents. General-purpose models aren't optimized for your browser workflows, BrowserEnv lets you train one that is. Checkout browserenv.com and train your own custom model in a few hours.

English

269

34.6K

james retweetledi

Vivek Galatage@vivekgalatage·2d

Stanford seminar - Nvidia’s H100 GPU youtu.be/MC223HlPdK0

YouTube

Vivek Galatage@vivekgalatage

Roadmap: Understanding GPU Architecture from Cornell cvw.cac.cornell.edu/gpu-architectu…

Norsk

233

14.4K

james@Jamesthesnake·1d

@tenobrus This is just “sadly, porn” right ?

English

1.4K

Tenobrus@tenobrus·1d

you can argue forever about various life philosophies, but some are inherently self refuting due to visibly destroying the lives of their biggest proponents. it's not just that *you* would be unhappy if you acted like Clavicular, it's that *he's* made unhappy by the things he espouses. at least many christians seem genuinely quite happy with their belief system even if it's not for me personally! at least donald trump seems to enjoy the world he inhabits and the things he's decided to value! much of incel adjacent rhetoric is based on constructing a world where you definitionally cannot ever be happy as a way to cope for the fact that you currently aren't. and that leaves you doomed to endless comparison and maximization, leaves Clavicular in this soulless joyless glass zoo of his own creation, where he can never stop striving and never reaches any goal that feels genuinely satisfying to him. the affliction of Blind-posters and TC-maxxers is nearly identical, largely because those cultures are very much incel-derived. SWEs who are getting laid and have friends don't have a need to construct these kinds of game-able value systems. they have robust holistic real value in their lives reject Famine in all his forms. do not let the hunger consume you.

Maia@maiamindel

clavicular is an interesting character because he clearly does not enjoy anything he does. he doesn't like the people he hangs out with, he's definitionally unhappy with his looks, he doesn't like the women he sleeps with and doesn't enjoy the sex. he needs continental philosophy

English

561

30.4K

james@Jamesthesnake·1d

@CameronCorduroy @maiamindel @nikicaga The only thing she cares about now

English

565

Cameron 🇺🇸 🗽🦅@CameronCorduroy·1d

@maiamindel @nikicaga saw it aptly described as "she's a transphobic activist who used to write books"

English

598

Maia@maiamindel·2d

jk rowling discourse is insane because people treat her like she's dave chappelle making a transphobic joke but she's basically like if kanye west and elon musk were the same person

English

1.3K

13.4K

232.2K

james@Jamesthesnake·1d

@JimJarmuschHair How do you plan for a tv show a decade ahead of time ? Also kids don’t like harry potter as much anymore !

English

Andrew Woods@JimJarmuschHair·1d

I truly feel that ratings for the Harry Potter show will be high at first but then will slowly start to trail off. There’s nothing new there! I wouldn’t be surprised if they never finish it.

English

930

james retweetledi

Vivek Galatage@vivekgalatage·2d

Roadmap: Understanding GPU Architecture from Cornell cvw.cac.cornell.edu/gpu-architectu…

English

202

1.2K

111.9K

james@Jamesthesnake·2d

@Duderichy Daycare is a labor issue

English

the Rich@Duderichy·2d

daycare is way too expensive.

Aziz Sunderji@AzizSunderji

The NYT recently wrote about a couple who earn $500k and live in a tiny apartment with their toddler and poodle. This seemed fishy to me—is half a million dollars really not enough to live comfortably here? I have the exact same demographic profile as this couple (live Upper West Side of Manhattan, dual earners, toddler, small dog)—and, armed with a lot of data, feel qualified to weigh in…there is indeed something fishy here. 🐟🐟🐟

English

6.7K

Keşfet

@alexbert135 @iScienceLuvr @tenobrus @CameronCorduroy @maiamindel @nikicaga @elonmusk @BarackObama