Singularabbit

179 posts

Singularabbit banner
Singularabbit

Singularabbit

@Singularabbit

2026 AGI | AI tinkerer & benchmarker 🐰 | Singularity via code

가입일 Temmuz 2016
142 팔로잉26 팔로워
Singularabbit
Singularabbit@Singularabbit·
@sama GPT-5.5 is currently the world’s most intelligent model, providing an ample usage quota
English
0
0
2
57
Sam Altman
Sam Altman@sama·
i keep thinking i want the models to be cheaper/faster more than i want them to be smarter but it seems that just being smarter is still the most important thing
English
2K
287
10K
674.7K
Singularabbit
Singularabbit@Singularabbit·
@LexnLin Unless Anthropic cuts pricing, OpenAI is reclaiming the throne this year Still watching Gemini and Grok close the gap too tho
English
1
0
2
1.2K
Leon Lin
Leon Lin@LexnLin·
did you realize that it's literally over for claude code 😂
English
33
2
152
21.6K
Singularabbit
Singularabbit@Singularabbit·
@meta_alchemist I just cleaned up the background at first, but damn there were a ton of other problems This is super helpful, thanks
English
1
0
2
1.3K
Meta Alchemist
Meta Alchemist@meta_alchemist·
Codex's app has been super slow for me lately. at first, I thought the problem was Codex itself. It wasn’t. After cleaning things up properly, Codex felt roughly 10X faster. 0 slowness. Before this, I had 8GB of logs built up, and it slowed things down like crazy. Here’s the 15-point cleanup system, which worked perfectly for me. It won't delete anything. Copy paste these 15 bullet points when your Codex starts to slow down: > it will inspect things first > back up & archive important files > and make your Codex blazing fast again. 15 ITEMS TO KEEP CODEX FAST 1. Check what is actually taking space. Inspect sessions, archived sessions, worktrees, archived worktrees, logs, config, and the local state database. 2. Back up the important files first. Back up config, global state, session index, state database, memories, skills, plugins, and automations before changing anything. 3. Check if Codex is open. If Codex is running, only inspect. Apply cleanup after closing it so the local database is not being touched from two places. 4. Find the giant active chats. Look for the biggest active session files. These are often old conversations that are still treated as active history. 5. Archive old non-pinned chats. Move chats older than 7-10 days into archived sessions, unless they are pinned or clearly still current. 6. Keep only recent work active. Your sidebar/history should not be carrying weeks or months old execution threads. 7. Use handoff docs instead of massive chats. If an old thread matters, turn it into a handoff doc, archive the thread, and resume in a fresh chat from the doc. 8. Normalize weird paths. On Windows, clean up path mismatches like normal C:\... paths vs extended \\?\C:\... paths. 9. Prune dead config projects. Remove project paths from config that no longer exist or point to temporary folders. 10. Move stale worktrees. Don’t keep old Codex worktrees in the hot worktrees folder. Archive them instead of deleting them. 11. Rotate large logs. Move oversized old logs into an archive folder so Codex can recreate fresh ones. 12. Check heavy background processes. Look at Node/dev-server processes. Don’t auto-kill them, but close the ones you don’t need. 13. Verify the cleanup. Afterward, confirm config still parses, the database opens, active session size dropped, archived sessions increased, and no bad paths remain. 14. Turn this into a weekly script. The cleanup should not be a dramatic one-time rescue mission. Make it repeatable. 15. Make it boring. Weekly maintenance should back up first, archive old sessions, normalize paths, prune config, move stale worktrees, rotate logs, and give you a report. The biggest lesson for me: giant chats should not become permanent memory. Chats are for execution. Handoff docs are for memory. Archives are for history. Fresh threads are for speed. P.S. Before doing all this, make comprehensive handoff documents for each active chat, too, with prompts prepared for each to reactivate them after. This will start new chats from the exact places you left off, but at blazing-fast speed. Like this, things simply work perfectly. I even told my Codex to automate these weekly, and it has set it up for every Sunday. Save this for when you will need it, as Codex app does get heavy as you use it more, especially if you are using many terminals and long sessions a lot.
Meta Alchemist tweet media
English
29
66
1.1K
199.6K
Singularabbit 리트윗함
TestingCatalog News 🗞
TestingCatalog News 🗞@testingcatalog·
XAI 🚨: Voice cloning is now available on xAI Console in the US. > Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, video game characters, and more. This also means we will see custom voices on Grok soon. I hope they won't be restricted to the US only.
English
10
19
261
15K
Singularabbit 리트윗함
can
can@marmaduke091·
🚨 Google updated Gemini 3 Flash in arena It still has the same name "Gemini 3 Flash". However, output quality is two tiers above it. It could be 3.1, 3.2 or 3.5 Flash. Not sure what they'll call it. It's performance is closer to current 3.1 Pro than the current 3 Flash. Huge upgrade.
can tweet media
English
45
51
865
196.6K
Singularabbit 리트윗함
ARC Prize
ARC Prize@arcprize·
GPT-5.5 & Opus 4.7 on ARC-AGI-3 - GPT-5.5: 0.43% - Opus 4.7: 0.18% We found 3 failure modes: - True local effect, false world model - Wrong level of abstraction from training data - Solved the level, didn’t reinforce the reward See our full analysis 🧵
ARC Prize tweet media
English
71
133
1.4K
329.1K
Singularabbit
Singularabbit@Singularabbit·
Interesting update from xAI Grok 4.3 just dropped on OpenRouter — input prices cut ~40%, output ~60% vs 4.2, and it actually got smarter Per Artificial Analysis benchmarks, overall Intelligence Index went up while agentic performance jumped +321 ELO to 1500 on GDPval-AA, passing several top-tier models at a fraction of the cost Cheaper AND better is rare — most labs trade one for the other With Grok 4.4 and Grok Build on the horizon, xAI is stacking momentum fast
Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English
0
0
0
46
Singularabbit 리트윗함
OpenRouter
OpenRouter@OpenRouter·
The new Grok-4.3 from @xai is live on OpenRouter! Grok-4.3 releases at a lower price than Grok-4.2, while seeing a large jump in agentic performance: a 321 point increase to 1500 ELO on @ArtificialAnlys GDPval-AA, surpassing other top models despite the lower price.
OpenRouter tweet media
English
143
239
1.9K
20.6M
Singularabbit 리트윗함
TestingCatalog News 🗞
TestingCatalog News 🗞@testingcatalog·
APPLE 🍎: “AFM Plus 150B Instruct” Apple Foundation Model has been spotted in the internal AFM Playground app. This app is being used internally by Apple employees to test Apple Foundation models. WWDC26 will be hot 🔥
TestingCatalog News 🗞 tweet mediaTestingCatalog News 🗞 tweet media
MWR@MWRevamped

( #appleinternal ) Apple Internally uses an application that looks pretty similar to ChatGPT named AFM Playground, which uses Apple’s Foundation Models instead. A few more images below

English
22
32
652
90.2K
Singularabbit
Singularabbit@Singularabbit·
@sama It’ll probably hit 100% the day after tomorrow
English
0
0
1
110
Singularabbit
Singularabbit@Singularabbit·
OAI’s GoblinGate, turned into a comic I literally just gave GPT the link and asked it to make a multi-page comic from the article
Singularabbit tweet mediaSingularabbit tweet mediaSingularabbit tweet mediaSingularabbit tweet media
English
1
0
0
165
Tibo
Tibo@thsottiaux·
Send us feature requests for codex in the form of an images 2.0 generated image. It makes it easier for codex to implement if we decide to go for it. Saw some good ones today already that codex is cooking on.
English
624
51
2.3K
175.6K
Singularabbit
Singularabbit@Singularabbit·
@iruletheworldmo If Gemini launches with coding capabilities and a super-app, it will be doomsday for GPT and Claude
English
1
0
2
475
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
gemini 3.5 will be the first time google truly flex their power. they’re about to framemog jestermax bone smesh their gremlin competitors. did i say that right chat? they’ve finally linked frontier intelligence with agency.
English
59
23
772
47.8K
Singularabbit
Singularabbit@Singularabbit·
Mistral Medium 3.5 128B came out way better than I expected 77.6 on SWE-Bench basically matching Sonnet 4.5 at 77.2, and 91.4 on Telecom putting it 2nd overall Airline 72.0 is a bit weak but Retail 76.1 is actually the highest score in that category A 128B model going toe to toe with 700B-1000B class models, in terms of parameter efficiency this is the most impressive result on the whole chart Banking is a graveyard for everyone so not counting that Honestly did not expect Mistral to hold up like this at this size
Mistral Vibe@mistralvibe

Mistral Medium 3.5, a new flagship model in public preview by @MistralAI that merges instruction-following, reasoning, and coding into a single 128B dense model with a 256k context window and configurable reasoning effort. It's a new default model for Mistral Vibe and Le Chat. Released as open weights, under a modified MIT license.

English
0
0
0
102
Singularabbit 리트윗함
Mistral Vibe
Mistral Vibe@mistralvibe·
Mistral Medium 3.5, a new flagship model in public preview by @MistralAI that merges instruction-following, reasoning, and coding into a single 128B dense model with a 256k context window and configurable reasoning effort. It's a new default model for Mistral Vibe and Le Chat. Released as open weights, under a modified MIT license.
Mistral Vibe tweet mediaMistral Vibe tweet mediaMistral Vibe tweet mediaMistral Vibe tweet media
English
34
78
646
483K