Theophile Sautory

40 posts

Theophile Sautory

@TSautory

applied x startups @openai views my own

San Francisco, CA Katılım Şubat 2020

204 Takip Edilen152 Takipçiler

Theophile Sautory@TSautory·3d

@Houda_nait codex keeps surprising!

English

Houda Nait El Barj@Houda_nait·3d

I wrote about it on my Substack. I’m starting a new format: SoC (Stream of Consciousness) — short musings on AI, philosophy, and where intelligence is heading. open.substack.com/pub/houdanait/…

English

365

Houda Nait El Barj@Houda_nait·3d

This weekend, Codex stunned me with beauty. It solved a problem with striking elegance, taking a path few human minds would have chosen. Yet the result was undeniably correct. I think AI reasoning may be starting to diverge from human reasoning in a deep but beautiful way.

English

932

Theophile Sautory retweetledi

Windsurf@windsurf·17 Mar

GPT-5.4 mini is now available in Windsurf!

English

273

34.5K

Theophile Sautory@TSautory·18 Şub

@realchillben reviewing all of this was tough

English

Bill Chen@realchillben·17 Şub

accelerate

Bill Chen@realchillben

kicking off codex jobs before i go for a run

Română

8.8K

Theophile Sautory@TSautory·9 Şub

right now, policy > capability

English

Theophile Sautory retweetledi

Evan Goldschmidt@evangoldschmidt·7 Oca

@OpenAI's frontier models sit at the core of our approach. recently we sat down with them to document & share some of our learnings: openai.com/index/tolan/

English

199

Theophile Sautory@TSautory·12 Ara

congrats @theodormarcu and team!!

Windsurf@windsurf

GPT-5.2 is now live in Windsurf! Available for 0x credits for a limited time (paid and trial users). The version bump undersells the jump in intelligence: - Biggest leap for GPT models in agentic coding since GPT-5 - SOTA coding model at its price point - Default in Windsurf

English

404

Theophile Sautory retweetledi

Windsurf@windsurf·12 Ara

English

488

119.6K

Theophile Sautory retweetledi

Noah MacCallum@noahmacca·4 Ara

GPT-5.1-Codex-Max is our best agentic coding model, and is now available in our API. It's a great model, but it’s also easier than ever to integrate. I’ve spent the last two weeks experimenting with our top customers - here's how to get the most out of it 🧵

English

294

30.3K

Theophile Sautory retweetledi

Alexander Embiricos@embirico·4 Ara

We just shipped GPT-5.1-Codex-Max to API! Codex agents everywhere are now faster & smarter, whether you're using Codex in the CLI via API key, or in other tools like @cursor_ai and @code. More about how Cursor updated their harness to make the most of it cursor.com/blog/codex-mod…

English

417

69.4K

Theophile Sautory retweetledi

Consensus@ConsensusNLP·23 Eki

We are thrilled to partner with @OpenAI on this launch. Scholar Agent is powered by GPT-5. @sama, @gdb, @kevinweil and team have built an amazing product that powers the future AI-enabled scientific exploration and discovery. More on our partnership⤵️ openai.com/index/consensu…

English

6.5K

Theophile Sautory retweetledi

shyamal@shyamalanadkat·20 May

getting started with evals doesn't require too much. the pattern that we've seen work for small teams looks a lot like test‑driven development applied to AI engineering: 1/ anchor evals in user stories, not in abstract benchmarks: sit down with your product/design counterpart and list out the concrete things your model needs to do for users. "answer insurance claim questions accurately", "generate SQL queries from natural language". for each, write 10–20 representative inputs and the desired outputs/behaviors. this is your first eval file. 2/ automate from day one, even if it's brittle. resist the temptation to "just eyeball it". well, ok, vibes doesn't scale for too long. wrap your evals in code. you can write a simple pytest that loops over your examples, calls the model, and asserts that certain substrings appear. it's crude, but it's a start. 3/ use the model to bootstrap harder eval data. manually writing hundreds of edge cases is expensive. you can use reasoning models (o3) to generate synthetic variations ("give me 50 claim questions involving fire damage") and then hand‑filter. this speeds up coverage without sacrificing relevance. 4/ don't chase leaderboards; iterate on what fails. when something fails in production, don't just fix the prompt – add the failing case to your eval set. over time your suite will grow to reflect your real failure modes. periodically slice your evals (by input length, by locale, etc.) to see if you're regressing on particular segments. 5/ evolve your metrics as your product matures. as you scale, you'll want more nuanced scoring (semantic similarity, human ratings, cost/latency tracking). build hooks in your eval harness to log these and trend them over time. instrument your UI to collect implicit feedback (did the user click "thumbs up"?) and feed that back into your offline evals. 6/ make evals visible. put a simple dashboard in front of the team and stakeholders showing eval pass rates, cost, latency. use it in stand‑ups. this creates accountability and helps non‑ML folks participate in the trade‑off discussions. finally, treat evals as a core engineering artifact. assign ownership, review them in code review, celebrate when you add a new tricky case. the discipline will pay compounding dividends as you scale.

English

220

27.2K

Theophile Sautory retweetledi

shyamal@shyamalanadkat·26 May

new guide on “exploring model graders for reinforcement fine-tuning” on openai’s platform. h/t @TSautory final result was a high-signal, domain-sensitive grader that guided the model toward better explanations. cookbook.openai.com/examples/reinf…

English

15.5K

Theophile Sautory retweetledi

Bill Chen@realchillben·28 May

I will be hosting this week's @OpenAI build hour this Thurs! This one will be on Image Gen! I will talk about: - cool use cases you can build with it - how you can use it together with other hosted tools - best practices - demo you can use! It will be technical. Builders welcome!

English

225

27.2K

Theophile Sautory retweetledi

Brendan Fortuner@bfortuner·8 May

Ambience's RFT-tuned model boosted ICD-10 coding accuracy 27% over expert clinicians—and it's now spotlighted in @OpenAIDevs' Reinforcement Fine-Tuning use case guide. Excited to see what the community builds with this powerful new customization technique: platform.openai.com/docs/guides/rf…

English

105

45.8K

Theophile Sautory retweetledi

Noah MacCallum@noahmacca·15 Nis

I've been loving GPT-4.1, and it's now my daily driver. We've spent hundreds of hours experimenting to get it performing at its best, and I wanted to share some key things we learned to help you get the most out of these models from day 1. Agentic workflows - This is your new starting point. Just add tools and prompt to make it plan, execute, and verify iteratively until producing a correct answer. Only add multi-agent frameworks once you've squeezed as much as you can here. - Coding agents will be particularly good with this model, especially when paired with our recommended diff output format and diff apply tool. Chain of thought - Prompting for chain of thought can help to break down the problem and improve overall intelligence. - Start with a simple instruction at the end of your prompt, as the model is already quite good at thinking through problems. Then, observe correct and incorrect eval samples to direct further changes to the thinking instructions. - Often, asking the model to spend more time understanding user intent and gathering context can help. Long context - The full context window (now 1M token in put) is much more performant/usable than before. As task complexity increases, performance might still improve by decreasing context size. - Check the guide for suggested delimiters. Improving instruction following - Performance is much better with this model, and you can add significant detail for what to do/not do in different circumstances. - Many model performance issues actually come from inconsistent, conflicting, or underspecified instructions (including conflicting information in examples as well). - Be careful with telling the model to "always" do something, you might not actually want that and it can induce hallucinations, for example if it needs to call a tool with insufficient context. - I find it's helpful to ask ChatGPT Canvas or Cursor for help iterating on prompts cohesively (e.g. asking for conflicting instructions, edge cases, or adding an instruction and updating examples at the same time) Prompt structure - tl;dr check out the recommended template (attached) - Context at or near the bottom seems to be best (and repeating instructions both above and below very long context can actually help too). - Markdown and XML both work great

English

810

114.7K

Theophile Sautory retweetledi

Houda Nait El Barj@Houda_nait·23 Oca

Today we’re launching Operator —our first AI agent that can browse the web and complete tasks for you. Available to all Pro users in the US starting today. We’re partnering with top companies to maximize impact, and we’ve built robust safeguards to keep you safe.

OpenAI@OpenAI

Introduction to Operator & Agents openai.com/index/introduc…

English

184

29.8K

Theophile Sautory retweetledi

Houda Nait El Barj@Houda_nait·17 Oca

Nick Romeo's new book starts with the story of @coreeconteam. When I joined CoreEcon in 2016, we believed in “Better Economics for a Better World” & aimed to reform the Economics curriculum. Today, our textbook is used in ~400 colleges, and constantly integrates latest research🎊

PublicAffairs@public_affairs

Capitalism causes environmental destruction and massive income inequality, impoverishing most at the expense of enriching the few. But there is an alternative. Check out Nick Romeo's new book THE ALTERNATIVE, in bookstores today. hachettebookgroup.com/titles/nick-ro…

English

2.9K

Theophile Sautory@TSautory·2 Ağu

@Houda_nait @Iceman_Hof Dès demain !

Français

Houda Nait El Barj@Houda_nait·31 Tem

@Iceman_Hof Toi bientot @TSautory

Français

Wim Hof@Iceman_Hof·30 Tem

Steady as a rock 🪨 #strong #steady #rock #armbalance #mind #body #connection #mountain #nature #waterfall #naturelove #breathe

English

632

Theophile Sautory retweetledi

Daniele Grattarola@riceasphait·6 Tem

Fresh off the press by @jacobbamberger @SPLTS2 "A topological characterisation of Weisfeiler-Leman equivalence classes" Introduces a dataset of non-isomorphic graphs that cannot be distinguished by message-passing. arxiv.org/abs/2206.11876

English

Theophile Sautory@TSautory·8 Haz

@Houda_nait @Galeries_Laf @dixhuit_prod On ira !!! <3

Euskara

Houda Nait El Barj@Houda_nait·3 Haz

@Galeries_Laf @dixhuit_prod @TSautory

QAM

Groupe Galeries Lafayette@Galeries_Laf·27 May

La Coupole des Galeries Lafayette Paris Haussmann à 360 degrés, comme vous ne l’avez jamais vue ! L’occasion de (re)découvrir ce joyau de l’Art Nouveau édifié en 1912 qui a achevé l’an dernier une rénovation majeure retrouvant ainsi toute sa splendeur originelle. 📸 @dixhuit_prod

Français

Keşfet

@Houda_nait @realchillben @OpenAI @theodormarcu @cursor_ai @code @sama @gdb