Isaac Fino

84 posts

Isaac Fino banner
Isaac Fino

Isaac Fino

@GhxIsaac

Good for you's just like you very much

Tokyo Japan Katılım Aralık 2022
88 Takip Edilen24 Takipçiler
Isaac Fino retweetledi
Zhuofeng Li
Zhuofeng Li@zhuofengli96475·
🚀 Excited to share our ACL 2026 paper ReviewGrounder: AI peer review that grounds feedback in rubrics, paper evidence, and prior work. eigentom.github.io/ReviewGrounder/ arxiv.org/abs/2604.14261 🤖 ReviewGrounder is a rubric-guided, tool-integrated multi-agent reviewer: specialist agents examine the paper, search related work, check evidence, and synthesize grounded feedback. 💡 The core idea: Give LLMs two things human reviewers rely on — explicit rubrics and contextual grounding in the literature. 📈 The result: ReviewGrounder outperforms top baselines: + 41% over GPT-4.1 + 135% over GPT-4o 🎯 Evaluated across 8 rubric-based review quality dimensions Try it yourself! 🧠 Code: github.com/EigenTom/Revie… 💻 Demo: huggingface.co/spaces/ReviewG… #agentic #LLMs #MultiAgent #tooluse #PeerReview #AcademicAI
Zhuofeng Li tweet mediaZhuofeng Li tweet mediaZhuofeng Li tweet mediaZhuofeng Li tweet media
English
10
10
27
2.6K
Isaac Fino retweetledi
Shibo Hao
Shibo Hao@Ber18791531·
🍫 CocoaBench v1.0 is out! CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search. Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs. Some takeaways: • Even the best agent system reaches only 45.1% on CocoaBench v1.0. • Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering. • Stronger agents tend to push more of the work into code. • Open source models still lag behind leading frontier models on these general tasks. 👇More on the website and in the paper #AI #Agents #LLM #Benchmark #CocoaBench
Shibo Hao@Ber18791531

🍫 CocoaBench is calling for contributions from the community! Join us and help shape how next-generation agents are evaluated and built🚀✨ #LLM #AI #Agent #CocoaBench More details in the threads 👇

English
2
35
78
10K
Dongfu Jiang
Dongfu Jiang@DongfuJiang·
How much of “video understanding” is actually… not about video? We found that 40–60% of questions in popular benchmarks (VideoMME, MMVU) can be answered without watching the video. And it gets worse as models scale. 🧵 This problem doesn’t just affect evaluation. It’s baked into post-training data. So when you do SFT / RL, a large portion of the “gain” actually comes from better language priors, not better visual grounding. We propose a simple fix: VidGround 👉 Filter out text-only-answerable questions 👉 Keep only visually grounded data That’s it. Surprisingly, less data works better: • Only 69.1% of the data • +6.2 pts improvement • Outperforms more complex RL pipelines Key takeaway: - If your data allows shortcutting, your model will learn shortcuts. - For video understanding: Grounding signal > data scale > algorithm tricks 📄 huggingface.co/papers/2604.05… 🌐 vidground.etuagi.com
Dongfu Jiang tweet media
English
3
6
22
1.2K
Isaac Fino retweetledi
Zhuofeng Li
Zhuofeng Li@zhuofengli96475·
🚀 OpenResearcher paper is finally released! 🔥 We explore how to synthesize long-horizon research trajectories for deep-research agents — fully offline, scalable, and low-cost, without relying on live web APIs. 📄 huggingface.co/papers/2603.20… 🧩Two key ideas: Offline Corpus — One-time bootstrapping seeds 10K gold passages + 15M-doc FineWeb corpus. 📚 Explicit Browsing Primitives — Just 3 ops: search / open / find. The agent learns not just what to retrieve, but how to inspect docs and localize evidence at multiple scales. 🔎 📊 Results: 54.8% on BrowseComp-Plus with our 30B-A3B — #1 open-source under the same search engine setup. Beating much larger models like GPT-4.1, Claude-Opus-4, Gemini-2.5-Pro, and DeepSeek-R1. 💡 Insights: Beyond accuracy, we dissect deep research pipeline design—from data filtering and agent configuration to retrieval accuracy dynamics (RQ1-RQ5). Try it yourself: 🛠️ Code: github.com/TIGER-AI-Lab/O… 🤗 Models & data: huggingface.co/collections/TI… 🚀 Demo: huggingface.co/spaces/OpenRes… #llms #agentic #deepresearch #tooluse #opensource #retrieval #SFT
Zhuofeng Li tweet mediaZhuofeng Li tweet mediaZhuofeng Li tweet mediaZhuofeng Li tweet media
Dongfu Jiang@DongfuJiang

🚀 Introducing OpenResearcher: a fully offline pipeline for synthesizing 100+ turn deep-research trajectories—no search/scrape APIs, no rate limits, no nondeterminism. 💡 We use GPT-OSS-120B + a local retriever + a 10T-token corpus to generate long-horizon tool-use traces (search → open → find) that look like real browsing, but are free + reproducible. 📈 The payoff: SFT on these trajectories turns Nemotron-3-Nano-30B-A3B from 20.8% → 54.8% accuracy on BrowseComp-Plus (+34.0). 🧩 What makes it work? 🔎 Offline corpus = 15M FineWeb docs + 10K “gold” passages (bootstrapped once) 🧰 Explicit browsing primitives = better evidence-finding than “retrieve-and-read” 🎯 Reject sampling = keep only successful long-horizon traces 🧵 And we’re releasing everything: ✅ code + search engine + corpus recipe ✅ 96K-ish trajectories + eval logs ✅ trained models + live demo 👨‍💻 GitHub: github.com/TIGER-AI-Lab/O… 🤗 Models & data: huggingface.co/collections/TI… 🚀 Demo: huggingface.co/spaces/OpenRes… 🔎 Eval logs: huggingface.co/datasets/OpenR… #llms #agentic #deepresearch #tooluse #opensource #retrieval #SFT

English
11
60
310
45.9K
Isaac Fino retweetledi
TianqiaoChen
TianqiaoChen@tianqiao_chen·
Conversation is easy for AI. Solving real problems is not. Real problems — in science, finance, and engineering — require something very different: •long reasoning chains •structured exploration •verification at every step That’s the motivation behind MiroThinker. Today we’re releasing the next generation of our research agent models: MiroThinker-1.7 and MiroThinker-H1. Instead of scaling conversations, we focused on scaling effective reasoning — improving both reasoning depth and step-level accuracy. Some highlights: 🧠 Heavy-duty reasoning for long-horizon tasks 🔎 Verification-centric architecture with both local and global checks 🌐 Strong performance on BrowseComp, BrowseComp-ZH, GAIA, and Seal-0 📊 Leading results across scientific and financial evaluation benchmarks Our long-term goal is simple: build agents that can reason, verify, and solve real problems — not just generate answers. Proud of the team for pushing this forward. Explore MiroThinker: Hugging Face lnkd.in/gkEAh88G GitHub lnkd.in/eJyH4xEM The MiroMind app integration will roll out in the coming days.
TianqiaoChen tweet media
English
10
11
75
56.2K
Isaac Fino retweetledi
Pan Lu
Pan Lu@lupantech·
Introducing Eubiota: A multi-agent AI framework for autonomous discovery in the human microbiome. 🧬🤖🧫 👇 Explore the platform: eubiota.ai Eubiota doesn’t just chat: it plans, uses tools, verifies evidence, and drives end-to-end discovery—from hypothesis to wet-lab validation. Eubiota achieved 87.7% accuracy on mechanistic reasoning (vs. GPT-5.1 77.3%). But we went further. We used Eubiota to drive 4 discoveries with experimental validation: ✅ Gene Discovery: Identified the uvr-ruv stress axis by screening 1,945 genes & 10K papers in hours (on 2 GPUs) 🧬⚡ ✅ Therapeutics: Designed a microbial therapy that reduced colitis inflammation 🦠💊 ✅ Antibiotics: Engineered a cocktail that kills pathogens but spares commensals 🎯🛡️ ✅ Metabolites: Discovered novel anti-inflammatory molecules from large human data 🥗🧪 📄 Paper: biorxiv.org/content/10.648… 💻 Code: github.com/lupantech/Eubi… Try the live app to start your own discovery: 🎮 App: app.eubiota.ai Huge thanks to fantastic co-lead @YifanGao15, our incredible PIs @james_y_zou @LabSonnenburg, stellar advisors Kerwyn Casey Huang, @YejinChoinka, and the excellent team! 👏 #Eubiota #Microbiome #Inflammation #AI4Sci #AgenticAI #Agent
Pan Lu tweet mediaPan Lu tweet mediaPan Lu tweet media
English
13
45
151
30K
Isaac Fino
Isaac Fino@GhxIsaac·
claude is down ( •̥́ ˍ •̀ू ), it's time to remind us don't over depend on one code agent or platform, maybe minimax?
Isaac Fino tweet media
English
0
0
0
465
Isaac Fino retweetledi
Dongfu Jiang
Dongfu Jiang@DongfuJiang·
🚀 Introducing OpenResearcher: a fully offline pipeline for synthesizing 100+ turn deep-research trajectories—no search/scrape APIs, no rate limits, no nondeterminism. 💡 We use GPT-OSS-120B + a local retriever + a 10T-token corpus to generate long-horizon tool-use traces (search → open → find) that look like real browsing, but are free + reproducible. 📈 The payoff: SFT on these trajectories turns Nemotron-3-Nano-30B-A3B from 20.8% → 54.8% accuracy on BrowseComp-Plus (+34.0). 🧩 What makes it work? 🔎 Offline corpus = 15M FineWeb docs + 10K “gold” passages (bootstrapped once) 🧰 Explicit browsing primitives = better evidence-finding than “retrieve-and-read” 🎯 Reject sampling = keep only successful long-horizon traces 🧵 And we’re releasing everything: ✅ code + search engine + corpus recipe ✅ 96K-ish trajectories + eval logs ✅ trained models + live demo 👨‍💻 GitHub: github.com/TIGER-AI-Lab/O… 🤗 Models & data: huggingface.co/collections/TI… 🚀 Demo: huggingface.co/spaces/OpenRes… 🔎 Eval logs: huggingface.co/datasets/OpenR… #llms #agentic #deepresearch #tooluse #opensource #retrieval #SFT
Dongfu Jiang tweet media
English
30
208
1.3K
143.2K
Isaac Fino
Isaac Fino@GhxIsaac·
Recognized by GPT, aww. Should I be more relaxed and poetic myself, right?
Isaac Fino tweet mediaIsaac Fino tweet media
English
0
0
1
83
Isaac Fino
Isaac Fino@GhxIsaac·
When can we eat natural soda watermelon? Really need it.
English
0
0
0
77
Isaac Fino
Isaac Fino@GhxIsaac·
@WenhuChen Human-computer interaction experience targeted finetuned? The aesthetics of openai in user experience
English
0
0
0
34
Isaac Fino
Isaac Fino@GhxIsaac·
Golden Finger the OpenAI
Isaac Fino tweet media
English
0
0
0
127
Isaac Fino
Isaac Fino@GhxIsaac·
Claude is Shannon, Claude is Debbusy, Claude is Monet. Aesthetics.
English
0
0
1
61
Isaac Fino
Isaac Fino@GhxIsaac·
I'm curious why a process even exists that allows candidates to pass so-called Machine Learning interviews based entirely on reciting a set of given 'templates.'
English
0
0
0
66
Isaac Fino
Isaac Fino@GhxIsaac·
No offense to those who really have insight.
English
0
0
0
59
Isaac Fino
Isaac Fino@GhxIsaac·
Here is an interesting method to determine whether your research idea in AI is up-to-date. If you open a Chinese short video app and find that even a senior IC without an AI background can teach you how to improve performance, then the idea is out.
English
1
0
0
69