Singularabbit

179 posts

Singularabbit

@Singularabbit

2026 AGI | AI tinkerer & benchmarker 🐰 | Singularity via code

가입일 Temmuz 2016

142 팔로잉26 팔로워

Singularabbit@Singularabbit·13h

@sama GPT-5.5 is currently the world’s most intelligent model, providing an ample usage quota

English

Sam Altman@sama·13h

i keep thinking i want the models to be cheaper/faster more than i want them to be smarter but it seems that just being smarter is still the most important thing

English

287

10K

674.7K

Singularabbit@Singularabbit·14h

@LexnLin Unless Anthropic cuts pricing, OpenAI is reclaiming the throne this year Still watching Gemini and Grok close the gap too tho

English

1.2K

Leon Lin@LexnLin·15h

did you realize that it's literally over for claude code 😂

English

152

21.6K

Singularabbit@Singularabbit·1d

@meta_alchemist I just cleaned up the background at first, but damn there were a ton of other problems This is super helpful, thanks

English

1.3K

Meta Alchemist@meta_alchemist·1d

Codex's app has been super slow for me lately. at first, I thought the problem was Codex itself. It wasn’t. After cleaning things up properly, Codex felt roughly 10X faster. 0 slowness. Before this, I had 8GB of logs built up, and it slowed things down like crazy. Here’s the 15-point cleanup system, which worked perfectly for me. It won't delete anything. Copy paste these 15 bullet points when your Codex starts to slow down: > it will inspect things first > back up & archive important files > and make your Codex blazing fast again. 15 ITEMS TO KEEP CODEX FAST 1. Check what is actually taking space. Inspect sessions, archived sessions, worktrees, archived worktrees, logs, config, and the local state database. 2. Back up the important files first. Back up config, global state, session index, state database, memories, skills, plugins, and automations before changing anything. 3. Check if Codex is open. If Codex is running, only inspect. Apply cleanup after closing it so the local database is not being touched from two places. 4. Find the giant active chats. Look for the biggest active session files. These are often old conversations that are still treated as active history. 5. Archive old non-pinned chats. Move chats older than 7-10 days into archived sessions, unless they are pinned or clearly still current. 6. Keep only recent work active. Your sidebar/history should not be carrying weeks or months old execution threads. 7. Use handoff docs instead of massive chats. If an old thread matters, turn it into a handoff doc, archive the thread, and resume in a fresh chat from the doc. 8. Normalize weird paths. On Windows, clean up path mismatches like normal C:\... paths vs extended \\?\C:\... paths. 9. Prune dead config projects. Remove project paths from config that no longer exist or point to temporary folders. 10. Move stale worktrees. Don’t keep old Codex worktrees in the hot worktrees folder. Archive them instead of deleting them. 11. Rotate large logs. Move oversized old logs into an archive folder so Codex can recreate fresh ones. 12. Check heavy background processes. Look at Node/dev-server processes. Don’t auto-kill them, but close the ones you don’t need. 13. Verify the cleanup. Afterward, confirm config still parses, the database opens, active session size dropped, archived sessions increased, and no bad paths remain. 14. Turn this into a weekly script. The cleanup should not be a dramatic one-time rescue mission. Make it repeatable. 15. Make it boring. Weekly maintenance should back up first, archive old sessions, normalize paths, prune config, move stale worktrees, rotate logs, and give you a report. The biggest lesson for me: giant chats should not become permanent memory. Chats are for execution. Handoff docs are for memory. Archives are for history. Fresh threads are for speed. P.S. Before doing all this, make comprehensive handoff documents for each active chat, too, with prompts prepared for each to reactivate them after. This will start new chats from the exact places you left off, but at blazing-fast speed. Like this, things simply work perfectly. I even told my Codex to automate these weekly, and it has set it up for every Sunday. Save this for when you will need it, as Codex app does get heavy as you use it more, especially if you are using many terminals and long sessions a lot.

English

1.1K

199.6K

Singularabbit 리트윗함

TestingCatalog News 🗞@testingcatalog·1d

XAI 🚨: Voice cloning is now available on xAI Console in the US. > Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, video game characters, and more. This also means we will see custom voices on Grok soon. I hope they won't be restricted to the US only.

English

261

15K

Singularabbit 리트윗함

can@marmaduke091·1d

🚨 Google updated Gemini 3 Flash in arena It still has the same name "Gemini 3 Flash". However, output quality is two tiers above it. It could be 3.1, 3.2 or 3.5 Flash. Not sure what they'll call it. It's performance is closer to current 3.1 Pro than the current 3 Flash. Huge upgrade.

English

865

196.6K

Singularabbit 리트윗함

ARC Prize@arcprize·1d

GPT-5.5 & Opus 4.7 on ARC-AGI-3 - GPT-5.5: 0.43% - Opus 4.7: 0.18% We found 3 failure modes: - True local effect, false world model - Wrong level of abstraction from training data - Solved the level, didn’t reinforce the reward See our full analysis 🧵

English

133

1.4K

329.1K

Singularabbit@Singularabbit·1d

Now we've truly entered the era where everyone is a web designer What's next?

Leon Lin@LexnLin

Done. Codex and images 2.0 cooked

English

Singularabbit@Singularabbit·2d

Interesting update from xAI Grok 4.3 just dropped on OpenRouter — input prices cut ~40%, output ~60% vs 4.2, and it actually got smarter Per Artificial Analysis benchmarks, overall Intelligence Index went up while agentic performance jumped +321 ELO to 1500 on GDPval-AA, passing several top-tier models at a fraction of the cost Cheaper AND better is rare — most labs trade one for the other With Grok 4.4 and Grok Build on the horizon, xAI is stacking momentum fast

Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English

Singularabbit 리트윗함

OpenRouter@OpenRouter·2d

The new Grok-4.3 from @xai is live on OpenRouter! Grok-4.3 releases at a lower price than Grok-4.2, while seeing a large jump in agentic performance: a 321 point increase to 1500 ELO on @ArtificialAnlys GDPval-AA, surpassing other top models despite the lower price.

English

143

239

1.9K

20.6M

Singularabbit 리트윗함

TestingCatalog News 🗞@testingcatalog·2d

APPLE 🍎: “AFM Plus 150B Instruct” Apple Foundation Model has been spotted in the internal AFM Playground app. This app is being used internally by Apple employees to test Apple Foundation models. WWDC26 will be hot 🔥

MWR@MWRevamped

( #appleinternal ) Apple Internally uses an application that looks pretty similar to ChatGPT named AFM Playground, which uses Apple’s Foundation Models instead. A few more images below

English

652

90.2K

Singularabbit@Singularabbit·2d

@sama It’ll probably hit 100% the day after tomorrow

English

110

Sam Altman@sama·2d

lisan say more mean things about us you're being too nice

Lisan al Gaib@scaling01

GPT-5.5 is on par with Claude Mythos - GPT-5.5 average pass rate of 71.4% (±8.0%) - Mythos Preview 68.6% (±8.7%) - GPT-5.5 solved a task that takes a human expert ~12 hours in under 11 minutes at a cost of $1.73

English

284

2.8K

368.4K

Singularabbit@Singularabbit·3d

ZXX

Singularabbit@Singularabbit·3d

OAI’s GoblinGate, turned into a comic I literally just gave GPT the link and asked it to make a multi-page comic from the article

English

165

Singularabbit@Singularabbit·3d

@pranv21 @thsottiaux No cap 😂

English

486

Pranav Yeole@pranv21·3d

@Singularabbit @thsottiaux yea wtf 🤣

English

505

Tibo@thsottiaux·3d

Send us feature requests for codex in the form of an images 2.0 generated image. It makes it easier for codex to implement if we decide to go for it. Saw some good ones today already that codex is cooking on.

English

624

2.3K

175.6K

Singularabbit@Singularabbit·3d

@LexnLin I’ve been using this with Codex OAuth You should try it too github.com/lidge-jun/ima2…

English

Leon Lin@LexnLin·3d

@Singularabbit really?

English

Leon Lin@LexnLin·3d

Please let it be 2k images 🙏

Andrew Ambrosino@ajambrosino

more details shortly

English

2.9K

Singularabbit@Singularabbit·3d

@iruletheworldmo If Gemini launches with coding capabilities and a super-app, it will be doomsday for GPT and Claude

English

475

🍓🍓🍓@iruletheworldmo·3d

gemini 3.5 will be the first time google truly flex their power. they’re about to framemog jestermax bone smesh their gremlin competitors. did i say that right chat? they’ve finally linked frontier intelligence with agency.

English

772

47.8K

Singularabbit@Singularabbit·3d

Mistral Medium 3.5 128B came out way better than I expected 77.6 on SWE-Bench basically matching Sonnet 4.5 at 77.2, and 91.4 on Telecom putting it 2nd overall Airline 72.0 is a bit weak but Retail 76.1 is actually the highest score in that category A 128B model going toe to toe with 700B-1000B class models, in terms of parameter efficiency this is the most impressive result on the whole chart Banking is a graveyard for everyone so not counting that Honestly did not expect Mistral to hold up like this at this size

Mistral Vibe@mistralvibe

Mistral Medium 3.5, a new flagship model in public preview by @MistralAI that merges instruction-following, reasoning, and coding into a single 128B dense model with a 256k context window and configurable reasoning effort. It's a new default model for Mistral Vibe and Le Chat. Released as open weights, under a modified MIT license.

English

102

Singularabbit 리트윗함

Mistral Vibe@mistralvibe·3d

English

646

483K

탐색

@sama @LexnLin @meta_alchemist @xai @ArtificialAnlys @pranv21 @thsottiaux @elonmusk