Martin Szummer

13 posts

Martin Szummer

@MSzummer

Researcher, Entrepreneur, Lover of the North

London & Stockholm Bergabung Mayıs 2012

139 Mengikuti97 Pengikut

Martin Szummer@MSzummer·17 Şub

Maestro (our agent) is making the most of Sonnet 4.6 already - showing great results!

iGent AI@iGent_AI

We’ve been testing Sonnet 4.6 and it has been potent in our agent, Maestro. Our primary eval is to implement a long list of features across a diverse set of use cases, iteratively across codebases, building on prior work. The result: it completed features faster, cheaper, and with a higher benchmark pass rate.

English

Martin Szummer me-retweet

Claude@claudeai·17 Şub

This is Claude Sonnet 4.6: our most capable Sonnet model yet. It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It also features a 1M token context window in beta.

English

1.1K

2.5K

22.3K

7.5M

Martin Szummer me-retweet

Claude@claudeai·24 Kas

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

English

1.1K

2.5K

19.3K

7.8M

Martin Szummer@MSzummer·19 Eyl

This is a historic moment for us. Our software engineering agent, Maestro, generated solutions for all 12 ICPC World Finals problems — one of the hardest team programming competitions on Earth! We're opening its solutions for the community to validate. Go break them.

iGent AI@iGent_AI

We're excited to share that our agent, Maestro, drafted solutions to all 12 problems from ICPC 2025 World Finals in ~2 hours - using current models, no human involvement, no internet access. We deeply respect the human teams' extraordinary dedication. Note: no official validation

English

Martin Szummer@MSzummer·12 Ağu

Anthropic just made *the* LLM release we have been waiting for - two massive context Claude Sonnet models, handling up to 1M input tokens. These are the models that we used with our Maestro system @iGent_AI to build large, complex software, like a Redis-compatible database written in Rust, written entirely by AI x.com/MSzummer/statu…

Claude@claudeai

Claude Sonnet 4 now supports 1 million tokens of context on the Anthropic API—a 5x increase. Process over 75,000 lines of code or hundreds of documents in a single request.

English

326

Martin Szummer@MSzummer·8 Ağu

Our agentic software engineering system, Maestro, can build large, complex software: it just finished building a Redis database from first principles in Rust, improving on its safety and performance!

iGent AI@iGent_AI

Tired of toy AI demos that fizzle in production? iGentAI built Ferrous: A Rust Redis-compatible server outperforming Valkey. 35KLOC, 100% test passing, beats benchmarks. Zero human code. Built in 70 hours of part-time direction. Toys vs. tools—here's the proof.

English

484

Martin Szummer me-retweet

iGent AI@iGent_AI·22 May

You can also find out the full details on Sonnet 4.0 VibeCodeBench performance at igent.ai/sonnet4eval.pdf

English

224

Martin Szummer me-retweet

iGent AI@iGent_AI·22 May

We've integrated Claude Sonnet 4 into Maestro, and the results are transformative. As our evaluations show, it maintains higher code quality even as project complexity grows. Combined with its new extended thinking capabilities, Maestro delivers an unmatched AI engineering experience. Signup at igent.ai

English

265

Martin Szummer me-retweet

iGent AI@iGent_AI·22 May

@Anthropic reports Claude 4 models are 65% less likely to use shortcuts on agentic tasks. Our evaluations confirm this—Claude Sonnet 4 consistently understates feature completeness rather than overstate success. This translates to more reliable AI assistance through Maestro.

English

267

Martin Szummer me-retweet

iGent AI@iGent_AI·22 May

Our VibeCodeBench evaluations affirm what @Anthropic just announced: Claude Sonnet 4 excels at autonomous multi-feature development. We've seen codebase navigation errors drop from 20% to near zero and strategic refactoring that saves ~500k tokens on multi stage, complex tasks. Proud to power Maestro with this breakthrough.

Anthropic@AnthropicAI

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

English

553

Martin Szummer me-retweet

iGent AI@iGent_AI·26 Şub

"Agency > Intelligence" @karpathy nailed it, and after 18 months building Maestro, we agree. The real AI leap isn’t just smarts—it’s agency: the ability to act independently, turning assistants into partners.

English

8.7K

Martin Szummer@MSzummer·14 Haz

3 of us are planning a hike/cycle trip in Scotland following the #ICML2012 workshops (July 2-3-4); Anyone else wants to join? Bring boots!

English

Jelajahi

@iGent_AI @Anthropic @karpathy @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates