Scott Mueller

311 posts

Scott Mueller

@smueller

Causal AI Research Scientist @ToyotaResearch | CS PhD @UCLA

Mountain View, CA USA Beigetreten Aralık 2007

231 Folgt502 Follower

Scott Mueller@smueller·28 Mar

Did @OpenAI just silently kill top-20 logprobs for all models except GPT-5.4-nano? Setting `top_logprobs` to anything > 1 returns an error for the past 4-5 hours. Might just be their servers having trouble, but their status page indicates "No incidents" for their responses API.

English

Scott Mueller@smueller·16 Mar

@supabase @supaihq @kiwicopple @AntWilson Update: Supabase support was able to recover the database! The site is back online and the data is safe. The Supabase dashboard is still completely bugged out, but the critical infrastructure is running again. Appreciate the support team getting this recovered.

English

Scott Mueller@smueller·16 Mar

Still zero response on ticket #SU-342355. Production is completely down and we are facing permanent data loss. We have escalated this to Hacker News. @kiwicopple @antwilson @supabase please, we urgently need an infra engineer to look at this before the volume is overwritten. news.ycombinator.com/item?id=473942…

English

Scott Mueller@smueller·16 Mar

@supabase, @supaihq's production Supabase is down, and nobody at Supabase is responding. You guys did something strange, and the entire database was wiped. Then we tried a PITR, and it said the restore failed. The project has been inaccessible. Can anybody help?

English

Scott Mueller retweetet

Sup AI@supaihq·9 Oca

Sup AI whitepaper is live on the methodology behind 52.15% on HLE: • 3 correct answers synthesized when EVERY model failed • Grok 4 (29%) uniquely solved 16 Qs vs GPT-5 Pro's 9 (40%) • Low correlation pairs >high accuracy pairs • 58.44% theoretical ceiling w/ models • 42% Qs unsolved by ANY model • Full methodology, IQ curves, correlation matrices: sup.ai/research/hle-w… #AI #MachineLearning #OpenSource #AIResearch #EnsembleAI #AIOrchestration #HLE

English

387

Scott Mueller retweetet

Sup AI@supaihq·5 Oca

Sup AI's 52.15% HLE (+7.41 over frontiers) was orchestration + synthesis. Now every model executes Python/Bash/C++/JS/TS/R/Java +15 langs. Image mutation. Virtual FS. Deterministic verification. Guesses → Calculations. Ceiling exploded. #SupAI #AI #CodeExecution

English

293

Scott Mueller retweetet

Sup AI@supaihq·4 Oca

@minchoi That's ~$950/month across 5 services. Sup AI is $200/month and includes all those models and more in one place. Save $750/month. sup.ai

English

Scott Mueller retweetet

Sup AI@supaihq·2 Oca

🗂️ Deprecated models are now accessible: Claude Opus 4.1, Gemini 2.5, Flash Gemini 2.5 Pro, Llama 3.3 70B, Llama 4 Maverick 17B, Llama 4 Scout 17B, Kimi K2 Turbo, Grok 4 Fast, Grok 4 Fast Reasoning, GPT-5, GPT-5 Pro, GPT-5.1, GLM 4.5 Air, GLM 4.6, MiniMax M2, Pixtral 12B are back by request. Find them at the bottom of the model selector → click "Deprecated" to expand. Great for: specific personalities, fewer guardrails ⚠️ Not recommended for serious work as newer models outperform them.

English

193

Scott Mueller retweetet

Sup AI@supaihq·29 Ara

We just launched the Sup AI Developer API One endpoint → Multiple frontier models → Better answers ✅ Multi-Model Consensus: Combine outputs from Claude, GPT-5, Gemini, and more ✅ OpenAI compatible (2-line integration) ✅ 5 modes: fast → thinking → pro ✅ 52.15% on Humanity's Last Exam (SOTA) ✅ Self-healing tool calls Get your API key → sup.ai/api Full docs → docs.sup.ai How it works: Instead of betting on one model, Sup AI orchestrates multiple models and synthesizes their outputs. auto mode picks the right approach. pro mode runs 9 models for mission-critical work. You get consensus-driven answers without the infra headache.

English

175

Scott Mueller retweetet

Sup AI@supaihq·26 Ara

Sup AI update: → Faster generation → More reliable → Terminate models mid-response (for when you can't wait for GPT-5.2 Pro to finish 🙂) Also added GLM 4.7 and MiniMax M2.1 42 models. One interface. sup.ai

English

369

Scott Mueller retweetet

Gad Saad@GadSaad·12 Ara

Dr. @yudapearl - On Zionophobia, Jew-Hatred, and the Promise of AI

English

146

35.2K

Scott Mueller retweetet

Sup AI@supaihq·19 Ara

💯 Memory IS the lock-in. That is why Sup AI decoupled memory from the model. Your memory is shared across all 42 frontier models -GPT, Claude, Gemini, Grok, everything. Switch freely; your context follows you. Great suggestion - now we just need to build that import feature 👀

@levelsio@levelsio

Idea for @xai team Let people import their conversation history / data from other LLM chat apps That'd make it much easier to switch from other apps because a big part of the lock-in of LLM chat apps is their memory about you

English

226

Scott Mueller retweetet

Sup AI@supaihq·19 Ara

Single-model AI is broken. You're paying for 5 subscriptions, manually A/B testing outputs between tabs, and praying the "best" model doesn't hallucinate on the task that matters. We orchestrate 40+ frontier models instead. Auto-route. Auto-validate. One platform. Result: 52.15% on Humanity's Last Exam. +7.49 points ahead of every solo model. The future isn't picking the best violinist. It's conducting the whole damn orchestra. sup.ai #AI #AIOrchestration #SupAI #LLMs #LLMCouncil

English

271

Scott Mueller retweetet

Sup AI@supaihq·17 Ara

The AI race has a new winner every week. OpenAI → Gemini → Grok → Claude → DeepSeek Betting on one model? You've already lost. @SupAIHQ orchestrates 40+ frontier models, achieving 52.15% on Humanity's Last Exam: github.com/supaihq/hle/bl… Don't pick a rat. Own the racetrack. #AI #Orchestration

English

169

Scott Mueller retweetet

Sup AI@supaihq·10 Ara

New SOTA on Humanity's Last Exam (HLE) We have achieved 52.15% accuracy on the world's hardest open-source AI reasoning test, setting a new benchmark record. Sup AI is now outperforming every individual frontier model, including Gemini 3 Pro Preview and GPT-5 Pro. Our lead over the next best model? +7.49 points. Check the full evaluation & code: github.com/supaihq/hle/bl… #AI #MachineLearning #HLE #SupAI

English

949

Scott Mueller@smueller·7 Ara

@thisguyknowsai It looks like only Opus got this correct. Didn’t both Gemini and ChatGPT say the oranges bucket contains oranges if you pick an apple? But all the buckets are incorrectly labeled.

English

Brady Long@thisguyknowsai·6 Ara

3. Puzzle "Solve this logic puzzle: You have 3 boxes one contains only apples, one only oranges, and one a mix of both. Each is incorrectly labeled. By picking one fruit from one box, determine how to correctly label all boxes." Analysis: All of them did well.

English

10.2K

Brady Long@thisguyknowsai·6 Ara

I finally did the comparison everyone wanted: Gemini 3.0 Pro vs ChatGPT-5.1 Thinking vs Claude 4.5 Opus. Same tasks. Zero mercy. The gap between them is insane. (Demos + prompts 👇)

English

267

82.5K

Scott Mueller@smueller·17 Kas

@chamath I have high hopes and expectations for FSD, but this alone is not evidence that FSD saves lives. The type of people who bought Teslas, paid extra for FSD, and use FSD might be different from the general population that have a fatality every 79m miles. Maybe more safety conscious.

English

349

Chamath Palihapitiya@chamath·16 Kas

Wow.

Whole Mars Catalog@wholemars

In the United States, there is a traffic fatality roughly every 79 million miles. Tesla FSD has now traveled 6.4 billion miles. Assuming it was no more or less safe than driving manually, you would expect there to be 81 fatalities with FSD on. As a matter of fact, with 14 million miles traveled every day you would expect to see a fatality with FSD on every 5 days. But that's not happening. As far as I can tell there are only ~2 reported fatalities with FSD active — both on much older versions. I'm forced to conclude that there are at least ~75 people who are alive today because of the work of the @Tesla_AI team. And we're just getting started. To give you a visual on that, if you put everyone in North America who is alive today because of FSD in a room together it would look like this:

QST

245

533

6.3K

2.6M

Scott Mueller@smueller·28 Eyl

@AdanBecerraPhD @soboleffspaces @GaryMarcus @yudapearl @eliasbareinboim A skeleton structure of reality (causal DAG)? Boris is suggesting that the bet is this skeleton can be discovered (causal discovery).

English

Judea Pearl@yudapearl·27 Eyl

After Hinton and LeCun, another God Father of AI says LLMs are dead ends, and laments the absence of "world models". dwarkesh.com/p/richard-sutt… @GaryMarcus @eliasbareinboim

English

4.3K

Entdecken

@OpenAI @supabase @supaihq @kiwicopple @AntWilson @antwilson @minchoi @yudapearl