Ian Fischer

85 posts

Ian Fischer

Ian Fischer

@itfische

Ex Google DeepMind Researcher, now cofounder of Poetiq

Miami Katılım Ağustos 2007
188 Takip Edilen717 Takipçiler
Ian Fischer
Ian Fischer@itfische·
Thanks so much @ycombinator for hosting @poetiq_ai on @LightconePod! It was an honor and a pleasure chatting with @garrytan, @harjtaggar, @sdianahu, and @snowmaker about stilts, Humanity's Last Exam, and the bitter lesson!
Y Combinator@ycombinator

.@poetiq_ai is a new startup that recently achieved a major jump on the ARC-AGI benchmark by layering a recursive self-improvement system on top of existing models. In this episode of the @LightconePod, Poetiq's Founder & CEO @itfische joined us to discuss how small teams can build “reasoning harnesses” that outperform base models, what that means for startups and why automating prompt engineering may be one of the most powerful levers in AI today. 00:00 – Intro 00:40 – What Is Poetiq? 01:07 – Recursive Self-Improvement Explained 02:07 – The Fine-Tuning Trap 02:59 – “Stilts” for LLMs 03:14 – Recursive Self-Improvement vs. Fine-Tuning 05:05 – Taking the Top Spot on ARC-AGI 06:37 – Beating Claude on Humanity’s Last Exam 08:40 – How the Meta-System Works 10:26 – Beyond RL: A New S-Curve 11:32 – Automating Prompt Engineering 13:37 – From 5% to 95% Performance 14:50 – Early Access & Putting Your Agent on Stilts 16:17 – From YC Founder to DeepMind Researcher 18:29 – Advice for Engineers in the AI Era

English
3
1
11
1.8K
Ian Fischer retweetledi
Greg Kamradt
Greg Kamradt@GregKamradt·
Fun to see Poetiq team publish 5.2 xhigh results. If this score holds, their system looks like it handles model swaps well. Due to API infra issues on OpenAI's side, we haven't verified this yet. We're on hold until we get the greenlight from OAI that X-High is ready for a big test like this
Poetiq@poetiq_ai

We finally had a moment to run our system with GPT-5.2 X-High on ARC-AGI-2! Using the same Poetiq harness as before, we saw results as high as 75% at under $8 / problem using GPT-5.2 X-High on the full PUBLIC-EVAL dataset. This beats the previous SOTA by ~15 percentage points.

English
14
17
382
43.1K
Poetiq
Poetiq@poetiq_ai·
Poetiq has officially shattered the ARC-AGI-2 SOTA 🚀 @arcprize has officially verified our results: - 54% Accuracy – first to break the 50% barrier! - $30.57 / problem – less than half the cost of the previous best! We are now #1 on the leaderboard for ARC-AGI-2!
Poetiq tweet media
English
111
262
2.4K
471.4K
Sean McDonald
Sean McDonald@seanmcdonaldxyz·
@itfische Hey Ian this is awesome. Surprised you’re not seeing more views on the post. Will share.
English
1
0
2
74
Ian Fischer
Ian Fischer@itfische·
@FutureBuckNasty @poetiq_ai @arcprize @METR_Evals Great question! We only optimized our agent for ARC-AGI. Fortunately, writing code to solve ARC-AGI problems doesn't immediately translate into existential risk. Keeping Poetiq agents safe is important to us as well!
English
0
0
2
136
Ian Fischer retweetledi
Poetiq
Poetiq@poetiq_ai·
Is more intelligence always more expensive? Not necessarily. Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1.
Poetiq tweet media
English
58
111
941
502.8K
Ian Fischer retweetledi
Kanjun 🐙
Kanjun 🐙@kanjun·
Sculptor: the missing UI for Claude Code 🎨 Imagine running 5 Claudes in parallel, safely in containers, while you stay in flow. Then bring their work straight into your IDE to test/edit together. This is how one developer ships like a team. Try it with Sonnet 4.5!
English
211
171
2K
504.9K
Ian Fischer retweetledi
AK
AK@_akhaliq·
Google presents A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts paper page: huggingface.co/papers/2402.09… Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs. To address these limitations, we propose ReadAgent, an LLM agent system that increases effective context length up to 20x in our experiments. Inspired by how humans interactively read long documents, we implement ReadAgent as a simple prompting system that uses the advanced language capabilities of LLMs to (1) decide what content to store together in a memory episode, (2) compress those memory episodes into short episodic memories called gist memories, and (3) take actions to look up passages in the original text if ReadAgent needs to remind itself of relevant details to complete a task. We evaluate ReadAgent against baselines using retrieval methods, using the original long contexts, and using the gist memories. These evaluations are performed on three long-document reading comprehension tasks: QuALITY, NarrativeQA, and QMSum. ReadAgent outperforms the baselines on all three tasks while extending the effective context window by 3-20x.
AK tweet media
English
7
138
536
64.2K
Ian Fischer retweetledi
Kuang-Huei Lee
Kuang-Huei Lee@kuanghueilee·
We propose ReadAgent 📖, a LLM agent that reads and reasons over text up to 20x more than the raw context length. Like humans, it decides where to pause, keeps fuzzy episodic memories of past readings, and looks up detail info as needed. Just by prompting. read-agent.github.io
Kuang-Huei Lee tweet media
English
5
60
293
47.8K