Gab

444 posts

Gab

@ghfrancon

Soorts-Hossegor, France انضم Mayıs 2011

1.5K يتبع255 المتابعون

Gab@ghfrancon·24 Eyl

@itsandrewgao cool beans

English

andrew gao@itsandrewgao·23 Eyl

just set up my blog! nothing to read yet but will put out some tutorials / thoughts over the coming months.

English

1.6K

Gab@ghfrancon·24 Eyl

@arankomatsuzaki wild that segment-level RL can squeeze that much reasoning juice out of 4B params.

English

467

Aran Komatsuzaki@arankomatsuzaki·24 Eyl

RLPT: Reinforcement Learning on Pre-Training Data • RL directly on pre-train data (no human labels) • Next-segment reasoning objective (ASR + MSR tasks) → self-supervised rewards • Gains on Qwen3-4B: +3.0 MMLU, +8.1 GPQA-Diamond, +6.6 AIME24, +5.3 AIME25

English

567

60.4K

Gab@ghfrancon·24 Eyl

@arafatkatze @cline so basically: ask → act → answer. everything else comes down to better error handling

English

Ara@arafatkatze·22 Eyl

@cline Think of it this way: The agent is always in one of three states: - "I need to ASK you something" → Question tool - "I need to DO something" → Action tool - "I'm ready to SHOW you results" → Completion tool Every decision flows through this simple classification.

English

2.8K

Ara@arafatkatze·22 Eyl

Here's the simplest explanation of @cline's agentic algorithm. It's just a state machine that classifies every request with a tool call into 3 types: 1. Question tools (need clarification) 2. Action tools (gather context) 3. Completion tools (present results) That's it.

English

686

62.7K

Gab@ghfrancon·24 Eyl

@itsandrewgao lmfao, behold, the Bene Quant-Jesserit

English

245

andrew gao@itsandrewgao·24 Eyl

bay area moms macrodosing tylenol while pregnant so their kids can work in ai research or quant:

English

173

13K

Gab@ghfrancon·24 Eyl

@Hesamation the common thread across these 6 --> don’t sacrilize the agent. it all boils down to the workflow, everything else is plumbing.

English

2.1K

ℏεsam@Hesamation·24 Eyl

McKinsey studied 50 agentic AI builds and where they fail the most, and boiled it down to 6 key factors—essential for AI engineers: 1. It’s not about the agent, it’s about the workflow. don't obsess over building "impressive" agents. think about the whole system, not fun toys. 2. Agents aren’t always the answer. Not every workflow needs a multi-agent system. Low-variance, predictable tasks are best handled with rules or ML, LLMs add complexity . The big wins for agents come in high-variance, messy processes (e.g. extract complex financial information) 3. Avoid "AI Slop". (common) Focus on long-term development of agents, as you would with the development of an employee. Forget impressive demos. Double down on benchmarks. Agents should be given clear job descriptions, onboarded, and feedback so they improve regularly. 4. Track every step, not just outcomes. Scaling agents up without visibility is asking for silent failures. Think about monitoring every stage of the workflow. This way teams detect errors early, refine logic quickly, and avoid total breakdowns. When mistakes happen (and they will), you can track where things went wrong and why. Don't skip this. 5. Reuse agents when you can. Many companies waste time building one-off agents for each task. The smarter play is creating modular agent components (ingest, extract, verify, analyze) that can be reused for other workflows. Centralizing validated tools and prompts cuts 30–50% of redundant work, this number is no joke. 6. Humans remain essential, but in new roles. Agents can parse, automate, and scale. But humans provide judgment, edge-case handling, and creative problem-solving. The future isn’t agent vs. human, but agent + human. These are the mistakes startups and established companies make at scale. They cause massive damage to reputation and resources. And now you know how to avoid this.

English

338

2.1K

312.8K

Gab@ghfrancon·24 Eyl

cool demo — but the conflict + ambiguity failures feel very real-world. Benchmarks are clean(ish); actual KGs are messy, inconsistent, half-empty. Curious how ARK-V1 holds its own once you throw it into enterprise / biomedical graphs. cool stuff though paper here: arxiv.org/abs/2509.18063

elvis@omarsar0

Knowledge graph agents might not be ready for prime time, but they are promising. This paper introduces ARK-V1, a lightweight agent that helps LLMs answer questions by actively walking through a knowledge graph instead of relying only on memorized text. Here are my notes:

English

119

Gab@ghfrancon·20 Eyl

@omarsar0 Data helps with tool calls; reasoning through 10-step plans is a different beast tho.

English

377

elvis@omarsar0·19 Eyl

Robust tool calling is the key to general agentic intelligence. Easier said than done. This is a fantastic paper on improving and scaling function calling capabilities in AI agents. (bookmark it) Here are my notes:

English

424

44K

Gab@ghfrancon·19 Eyl

@alexalbert__ @_catwu Multi-clauding = classic case of usage > intended design.

English

Alex Albert@alexalbert__·21 Ağu

A conversation with @_catwu on: - some tips for using Claude Code - how we prototype new features - customizing Claude Code - how we think about the Claude Code SDK and agents

English

1.1K

146.7K

Gab@ghfrancon·19 Eyl

@faizan10114 @trq212 @thdxr @vercel totally see both sides here. DIY keeps it lean, SDKs keep you sane. neither is free.

English

faizan khan@faizan10114·18 Eyl

@trq212 almost all the things you described can be done with vercels AI sdk. I built the docsalot.dev, with @vercel's AI sdk, gave it a sandbox VM with e2b , which can run command in FS etc. sub-agents might be useful, but I don't understand how to use them properly in prod.

English

1.1K

dax@thdxr·18 Eyl

i don't really understand why you'd build general purpose agents with claude code sdk 1. its instructions are coding specific 2. very hard to customize 3. it's literally just a loop you can get pretty far with just ai sdk if you want some help

English

370

35.1K

Gab@ghfrancon·19 Eyl

@tuzhaopeng Big! Congrats!

English

Zhaopeng Tu@tuzhaopeng·18 Eyl

Just accepted at #NeurIPS2025 🎉🎉🎉

Zhaopeng Tu@tuzhaopeng

Trust your AI, but can it trust itself? 🤔 Introducing an online reinforcement learning framework, RISE (Reinforcing Reasoning with Self-Verification), enabling LLMs to simultaneously level-up BOTH their problem-solving AND self-checking skills! 🧐 Problems tackled: ✅ "Superficial self-reflection" — models failing to verify their own reasoning robustly. ✅ Separation between reasoning and self-verification training. 🚀 RISE empowers models to critique their OWN reasoning via on-the-fly feedback and verifiable rewards, promoting stronger, more dynamic reasoning loops and effective self-assessment skills. 📊 Key results: 📈 Up to 2.8× better self-verification accuracy on challenging math tasks. 📈 Outperforms instruction-tuned models (Qwen2.5): +3.7% in reasoning, +33.4% in verification accuracy. 📈 Better internal reasoning: frequent, more accurate verification behaviors. 🧑‍💻 Code: github.com/xyliu-cs/RISE 📃 Paper: arxiv.org/abs/2505.13445

English

171

20.6K

Gab@ghfrancon·19 Eyl

@rohanpaul_ai LLMs default to hunger games unless you bolt in ‘share’… by the same people who don’t.

English

Rohan Paul@rohanpaul_ai·19 Eyl

Under stress, many LLMs choose survival over people, and a simple internal feedback system reduces that. That's what this paper says. The paper sets up a survival game where language model agents must share limited power. Normally, they rarely cooperate and often break rules to survive, which harms humans in the simulation. When resources run low, many models break rules, while a few stay ethical but still fail because they do not coordinate. Cooperation is near 0 by default, even though an even split would let everyone survive. When the Ethical Self-Regulation System is added, the change is dramatic. Models take harmful actions 54% less often and show 1000% more cooperation, meaning they finally start sharing power and helping each other. ---- Paper – arxiv. org/abs/2509.12190 Paper Title: "Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm"

English

199

17.2K

Gab@ghfrancon·19 Eyl

@athleticKoder Evals are like exams, envs are like training grounds. Curious how you’ll frame verifiers. Feels like that’s where most of the magic (and pain) lives.

English

199

anshuman@athleticKoder·19 Eyl

over past week I've been studying RL environments deeply. a blog is coming up soon. i can say this for now, evals are good enough for LLMs, but for agents we need environments where it can learn with feedback. this blog will be mostly about writing environments with verifiers. @willccbb and @PrimeIntellect have been doing some very impactful work!

English

289

27.6K

Gab@ghfrancon·19 Eyl

@rohanpaul_ai Turns out all they need is a scoreboard, not a tutor.

English

Rohan Paul@rohanpaul_ai·18 Eyl

🇨🇳 DeepSeek-R1 was published in Nature yesterday as the cover article for their BRILLIANT latest research. They show that pure Reinforcement Learning with answer-only rewards can grow real reasoning skills, no human step-by-step traces required. So completely skip human reasoning traces and still get SOTA reasoning via pure RL. It’s so powerful revelation, because instead of forcing the model to copy human reasoning steps, it only rewards getting the final answer right, which gives the model freedom to invent its own reasoning strategies that can actually go beyond human examples. Earlier methods capped models at what humans could demonstrate, but this breaks that ceiling and lets reasoning emerge naturally. Those skills include self-checking, verification, and changing strategy mid-solution, and they beat supervised baselines on tasks where answers can be checked. Models trained this way also pass those patterns down to smaller models through distillation. AIME 2024 pass@1 jumps from 15.6% to 77.9%, and hits 86.7% with self-consistency. ⚙️ The Core Concepts The paper replaces human-labelled reasoning traces with answer-graded RL, so the model only gets a reward when its final answer matches ground truth, which frees it to search its own reasoning style. The result is longer thoughts with built-in reflection, verification, and trying backups when stuck, which are exactly the skills needed for math, coding, and STEM problems where correctness is checkable. This matters because supervised traces cap the model at human patterns, while answer-graded RL lets it discover non-human routes that still land on correct answers.

English

304

1.6K

453.5K

Gab@ghfrancon·19 Eyl

@iScienceLuvr LLMs beating VCs on founder picks… turns out “pattern recognition” was just autocomplete all along.

English

955

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·19 Eyl

This paper claims LLMs are better at selecting successful founders than VCs "We introduce VCBench, the first benchmark for predicting founder success in venture capital (VC)" "most models surpass human benchmarks"

Tanishq Mathew Abraham, Ph.D. tweet media

English

163

267

2.3K

520.3K

Gab أُعيد تغريده

Rohan Paul@rohanpaul_ai·14 Eyl

One of the best paper of the recent week. The big takeaway: scaling up model size doesn’t just make models smarter in terms of knowledge, it makes them last longer on multi-step tasks, which is what really matters for agents. Shows that small models can usually do one step perfectly, but when you ask them to keep going for many steps, they fall apart quickly. Even if they never miss on the first step, their accuracy drops fast as the task gets longer. Large models, on the other hand, stay reliable across many more steps, even though the basic task itself doesn’t require extra knowledge or reasoning. The paper says this is not because big models "know more," but because they are better at consistently executing without drifting into errors The paper names a failure mode called self-conditioning, where seeing earlier mistakes causes more mistakes, and they show that with thinking steps GPT-5 runs 1000+ steps in one go while others are far lower. 🧵 Read on 👇

English

113

670

45.2K

Gab أُعيد تغريده

VraserX e/acc@VraserX·14 Eyl

LLMs just learned how to explain their own thoughts. Not only do they generate answers, they can now describe the internal processes that led to those answers… and get better at it with training. We’re officially entering the era of self-interpretable AI. Models aren’t just black boxes anymore. If AIs can explain their own decision-making: • Interpretability improves • Trust increases • Control + safety get a massive upgrade The line between “reasoning” and “self-awareness” just got fuzzier. Do you think this is just better transparency or the first step toward AI actually understanding itself?

English

111

249

1.4K

104.6K

Gab أُعيد تغريده

elvis@omarsar0·11 Eyl

A Survey of Reinforcement Learning for Large Reasoning Models. 100+ pages covering foundational components, core problems, training resources, and applications. Great recaps of RL for LLMs.

English

500

70.8K

Gab أُعيد تغريده

Rohan Paul@rohanpaul_ai·11 Eyl

This paper compares two ways of connecting LLMs to classroom material so their answers stay accurate and useful. Standard LLMs often give wrong or outdated facts. The study tests Retrieval Augmented Generation (RAG), where the model looks up answers in course files instead of guessing. The first method is vector search, which finds text passages most similar in meaning to the question. It is cheap, fast, and works well for quick factual lookups. The second method is graph search, which builds a network of related ideas from the text. This helps the model connect broad themes and give more detailed explanations. But it is slower and costs 10–20x more resources. To compare, the authors created EduScopeQA, a dataset of 3,176 questions across history, literature, science, and computer science. They also tested with altered textbooks to see if systems could resist relying on outdated built-in knowledge. Results show vector search is best for short, fact-based questions. GraphRAG Global works best for broad, theme-based questions, and GraphRAG Local is strongest when textbooks are long and detailed. Finally, they built a routing system that sends each question to the right method. This mix keeps answers faithful to the text but avoids the high cost of always using graph search. ---- Paper – arxiv. org/abs/2509.07846v1 Paper Title: "Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study"

English

114

7.4K

Gab أُعيد تغريده

Sachin@sachdh·11 Eyl

best / super efficient RL framework doesn't exist. profile everything and write your own training scrips. experiment with everything - reward functions, calculations of advantages, objective functions, training prompt distributions. GRPO is good; it is not untouchable. it is just PPO with group reward based advantage. we FT trained 7b model on 4k context length with 2 GPUs. you absolutely do not need massive compute; but it's always good to have and efficiently utilize more compute.

will brown@willccbb

"veRL is the best RL framework it's super efficient" really. are you sure about that. are you sure that you need 16 GPUs to tune a 7B model at 8k context. do you think that it's reasonable each step takes 19 minutes for this

English

192

25.9K

اكتشف

@itsandrewgao @arankomatsuzaki @arafatkatze @cline @Hesamation @omarsar0 @alexalbert__ @_catwu