Isaac Fino (@GhxIsaac) - Twitter Profili | Zamantika Mersobahis Locabet

Isaac Fino retweetledi

Zhuofeng Li@zhuofengli96475·1d

🚀 Excited to share our ACL 2026 paper ReviewGrounder: AI peer review that grounds feedback in rubrics, paper evidence, and prior work. eigentom.github.io/ReviewGrounder/ arxiv.org/abs/2604.14261 🤖 ReviewGrounder is a rubric-guided, tool-integrated multi-agent reviewer: specialist agents examine the paper, search related work, check evidence, and synthesize grounded feedback. 💡 The core idea: Give LLMs two things human reviewers rely on — explicit rubrics and contextual grounding in the literature. 📈 The result: ReviewGrounder outperforms top baselines: + 41% over GPT-4.1 + 135% over GPT-4o 🎯 Evaluated across 8 rubric-based review quality dimensions Try it yourself! 🧠 Code: github.com/EigenTom/Revie… 💻 Demo: huggingface.co/spaces/ReviewG… #agentic #LLMs #MultiAgent #tooluse #PeerReview #AcademicAI

English

10

27

2.6K

Isaac Fino retweetledi

Shibo Hao@Ber18791531·14 Nis

🍫 CocoaBench v1.0 is out! CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search. Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs. Some takeaways: • Even the best agent system reaches only 45.1% on CocoaBench v1.0. • Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering. • Stronger agents tend to push more of the work into code. • Open source models still lag behind leading frontier models on these general tasks. 👇More on the website and in the paper #AI #Agents #LLM #Benchmark #CocoaBench

Shibo Hao@Ber18791531

🍫 CocoaBench is calling for contributions from the community! Join us and help shape how next-generation agents are evaluated and built🚀✨ #LLM #AI #Agent #CocoaBench More details in the threads 👇

English

2

35

78

10K

Isaac Fino@GhxIsaac·9 Nis

@DongfuJiang Nice work! Couldn't agree more about VLMs reasoning without vision. Here's another recent work: arxiv.org/abs/2603.21687

English

0

10

Dongfu Jiang@DongfuJiang·9 Nis

How much of “video understanding” is actually… not about video? We found that 40–60% of questions in popular benchmarks (VideoMME, MMVU) can be answered without watching the video. And it gets worse as models scale. 🧵 This problem doesn’t just affect evaluation. It’s baked into post-training data. So when you do SFT / RL, a large portion of the “gain” actually comes from better language priors, not better visual grounding. We propose a simple fix: VidGround 👉 Filter out text-only-answerable questions 👉 Keep only visually grounded data That’s it. Surprisingly, less data works better: • Only 69.1% of the data • +6.2 pts improvement • Outperforms more complex RL pipelines Key takeaway: - If your data allows shortcutting, your model will learn shortcuts. - For video understanding: Grounding signal > data scale > algorithm tricks 📄 huggingface.co/papers/2604.05… 🌐 vidground.etuagi.com

English

3

6

22

1.2K

Isaac Fino retweetledi

Zhuofeng Li@zhuofengli96475·24 Mar

🚀 OpenResearcher paper is finally released! 🔥 We explore how to synthesize long-horizon research trajectories for deep-research agents — fully offline, scalable, and low-cost, without relying on live web APIs. 📄 huggingface.co/papers/2603.20… 🧩Two key ideas: Offline Corpus — One-time bootstrapping seeds 10K gold passages + 15M-doc FineWeb corpus. 📚 Explicit Browsing Primitives — Just 3 ops: search / open / find. The agent learns not just what to retrieve, but how to inspect docs and localize evidence at multiple scales. 🔎 📊 Results: 54.8% on BrowseComp-Plus with our 30B-A3B — #1 open-source under the same search engine setup. Beating much larger models like GPT-4.1, Claude-Opus-4, Gemini-2.5-Pro, and DeepSeek-R1. 💡 Insights: Beyond accuracy, we dissect deep research pipeline design—from data filtering and agent configuration to retrieval accuracy dynamics (RQ1-RQ5). Try it yourself: 🛠️ Code: github.com/TIGER-AI-Lab/O… 🤗 Models & data: huggingface.co/collections/TI… 🚀 Demo: huggingface.co/spaces/OpenRes… #llms #agentic #deepresearch #tooluse #opensource #retrieval #SFT

Dongfu Jiang@DongfuJiang

🚀 Introducing OpenResearcher: a fully offline pipeline for synthesizing 100+ turn deep-research trajectories—no search/scrape APIs, no rate limits, no nondeterminism. 💡 We use GPT-OSS-120B + a local retriever + a 10T-token corpus to generate long-horizon tool-use traces (search → open → find) that look like real browsing, but are free + reproducible. 📈 The payoff: SFT on these trajectories turns Nemotron-3-Nano-30B-A3B from 20.8% → 54.8% accuracy on BrowseComp-Plus (+34.0). 🧩 What makes it work? 🔎 Offline corpus = 15M FineWeb docs + 10K “gold” passages (bootstrapped once) 🧰 Explicit browsing primitives = better evidence-finding than “retrieve-and-read” 🎯 Reject sampling = keep only successful long-horizon traces 🧵 And we’re releasing everything: ✅ code + search engine + corpus recipe ✅ 96K-ish trajectories + eval logs ✅ trained models + live demo 👨‍💻 GitHub: github.com/TIGER-AI-Lab/O… 🤗 Models & data: huggingface.co/collections/TI… 🚀 Demo: huggingface.co/spaces/OpenRes… 🔎 Eval logs: huggingface.co/datasets/OpenR… #llms #agentic #deepresearch #tooluse #opensource #retrieval #SFT

English

11

60

310

45.9K

Isaac Fino retweetledi

TianqiaoChen@tianqiao_chen·11 Mar

Conversation is easy for AI. Solving real problems is not. Real problems — in science, finance, and engineering — require something very different: •long reasoning chains •structured exploration •verification at every step That’s the motivation behind MiroThinker. Today we’re releasing the next generation of our research agent models: MiroThinker-1.7 and MiroThinker-H1. Instead of scaling conversations, we focused on scaling effective reasoning — improving both reasoning depth and step-level accuracy. Some highlights: 🧠 Heavy-duty reasoning for long-horizon tasks 🔎 Verification-centric architecture with both local and global checks 🌐 Strong performance on BrowseComp, BrowseComp-ZH, GAIA, and Seal-0 📊 Leading results across scientific and financial evaluation benchmarks Our long-term goal is simple: build agents that can reason, verify, and solve real problems — not just generate answers. Proud of the team for pushing this forward. Explore MiroThinker: Hugging Face lnkd.in/gkEAh88G GitHub lnkd.in/eJyH4xEM The MiroMind app integration will roll out in the coming days.

English

10

11

75

56.2K

Isaac Fino@GhxIsaac·4 Mar

Having a blast with Lambda!

Lambda@LambdaAPI

What if small models with the right tools could beat large models without them? AgentFlow assembles a team of agents: Planner, Executor, Verifier, Generator. Each learning to coordinate and call tools in the flow of a task via end-to-end RL. 3B/7B models trained on 8× NVIDIA A100 GPUs outperforms GPT-4o on reasoning benchmarks. + ICLR 2026 Oral (Top 1.1%) + Best Paper Nomination @ NeurIPS 2025 Efficient Reasoning Workshop + One of 12 papers Lambda co-authored at ICLR 2026 Project page: agentflow.stanford.edu

English

0

72

Isaac Fino retweetledi

Pan Lu@lupantech·3 Mar

Introducing Eubiota: A multi-agent AI framework for autonomous discovery in the human microbiome. 🧬🤖🧫 👇 Explore the platform: eubiota.ai Eubiota doesn’t just chat: it plans, uses tools, verifies evidence, and drives end-to-end discovery—from hypothesis to wet-lab validation. Eubiota achieved 87.7% accuracy on mechanistic reasoning (vs. GPT-5.1 77.3%). But we went further. We used Eubiota to drive 4 discoveries with experimental validation: ✅ Gene Discovery: Identified the uvr-ruv stress axis by screening 1,945 genes & 10K papers in hours (on 2 GPUs) 🧬⚡ ✅ Therapeutics: Designed a microbial therapy that reduced colitis inflammation 🦠💊 ✅ Antibiotics: Engineered a cocktail that kills pathogens but spares commensals 🎯🛡️ ✅ Metabolites: Discovered novel anti-inflammatory molecules from large human data 🥗🧪 📄 Paper: biorxiv.org/content/10.648… 💻 Code: github.com/lupantech/Eubi… Try the live app to start your own discovery: 🎮 App: app.eubiota.ai Huge thanks to fantastic co-lead @YifanGao15, our incredible PIs @james_y_zou @LabSonnenburg, stellar advisors Kerwyn Casey Huang, @YejinChoinka, and the excellent team! 👏 #Eubiota #Microbiome #Inflammation #AI4Sci #AgenticAI #Agent

English

13

45

151

30K

Isaac Fino@GhxIsaac·3 Mar

Can't agree more, especially the Steam platforms'.

Wenhu Chen@WenhuChen

Some of the CAPTCHA are so hard that I have to use AI to assist me in solving them. What kind of world is this? Am I the actually a bot?

English

0

87

Isaac Fino@GhxIsaac·3 Mar

claude is down ( •̥́ ˍ •̀ू ), it's time to remind us don't over depend on one code agent or platform, maybe minimax?

English

0

465

Isaac Fino retweetledi

Dongfu Jiang@DongfuJiang·9 Şub

🚀 Introducing OpenResearcher: a fully offline pipeline for synthesizing 100+ turn deep-research trajectories—no search/scrape APIs, no rate limits, no nondeterminism. 💡 We use GPT-OSS-120B + a local retriever + a 10T-token corpus to generate long-horizon tool-use traces (search → open → find) that look like real browsing, but are free + reproducible. 📈 The payoff: SFT on these trajectories turns Nemotron-3-Nano-30B-A3B from 20.8% → 54.8% accuracy on BrowseComp-Plus (+34.0). 🧩 What makes it work? 🔎 Offline corpus = 15M FineWeb docs + 10K “gold” passages (bootstrapped once) 🧰 Explicit browsing primitives = better evidence-finding than “retrieve-and-read” 🎯 Reject sampling = keep only successful long-horizon traces 🧵 And we’re releasing everything: ✅ code + search engine + corpus recipe ✅ 96K-ish trajectories + eval logs ✅ trained models + live demo 👨‍💻 GitHub: github.com/TIGER-AI-Lab/O… 🤗 Models & data: huggingface.co/collections/TI… 🚀 Demo: huggingface.co/spaces/OpenRes… 🔎 Eval logs: huggingface.co/datasets/OpenR… #llms #agentic #deepresearch #tooluse #opensource #retrieval #SFT

English

30

208

1.3K

143.2K

Isaac Fino@GhxIsaac·2 Şub

@jianwen_xie @lupantech @YejinChoinka @zhuofengli96475 @yuz9yuz Thank you for the support! It’s been a great collaboration. Looking forward to seeing RL unlock even more potential for agentic systems 🚀

English

0

1

76

Jianwen Xie@jianwen_xie·2 Şub

Excited that AgentFlow agentflow.stanford.edu won 🏆 Best Paper Nomination at the NeurIPS Effective Reasoning Workshop, and also got accepted by ICLR 2026! Congratulations to my collaborators from Stanford TAMU and UCSD! @lupantech @YejinChoinka @zhuofengli96475 @yuz9yuz @james_y_zou @Stanford @LambdaAPI @TAMU @iclr_conf @UCSD #agentic #llms #RL #tooluse #ICLR2026

English

3

5

56

6.1K

Isaac Fino@GhxIsaac·28 Ara

Recognized by GPT, aww. Should I be more relaxed and poetic myself, right?

English

0

1

83

Isaac Fino@GhxIsaac·21 Kas

When can we eat natural soda watermelon? Really need it.

English

0

77

Isaac Fino@GhxIsaac·20 Kas

@WenhuChen Human-computer interaction experience targeted finetuned? The aesthetics of openai in user experience

English

0

34

Isaac Fino@GhxIsaac·28 Eki

Golden Finger the OpenAI

English

0

127

Isaac Fino@GhxIsaac·28 Eki

Claude is Shannon, Claude is Debbusy, Claude is Monet. Aesthetics.

English

0

1

61

Isaac Fino@GhxIsaac·27 Eki

I'm curious why a process even exists that allows candidates to pass so-called Machine Learning interviews based entirely on reciting a set of given 'templates.'

English

0

66

Isaac Fino@GhxIsaac·27 Eki

No offense to those who really have insight.

English

0

59

Isaac Fino@GhxIsaac·27 Eki

Here is an interesting method to determine whether your research idea in AI is up-to-date. If you open a Chinese short video app and find that even a senior IC without an AI background can teach you how to improve performance, then the idea is out.

English

1

0

69

Isaac Fino@GhxIsaac·25 Eki

Crying. It's just too beautiful an album. SON OF SPERGY @DanielCaesar

English

0

2

83

Isaac Fino

Keşfet