pomterre

244 posts

pomterre

pomterre

@pomterree

swe | hacking something

Katılım Temmuz 2025
54 Takip Edilen68 Takipçiler
Sabitlenmiş Tweet
pomterre
pomterre@pomterree·
Wait, so Codex fixed the bug. But because it did it surgically and not verbose about it, you'd rather have it code slop? Make it make sense.
pomterre tweet media
English
0
0
3
414
Grant Jordan
Grant Jordan@grantjordan·
@justinsunyt Terminal Bench 2.0 is easily gameable. Just two months ago ForgeCode was in the top spot, now it’s nowhere to be found in the list. Definitely don’t trust terminal bench
English
2
0
2
923
justin
justin@justinsunyt·
Codex is also not the best GPT-5.5 harness! Capy scores higher on TerminalBench, alongside 3 other harnesses People often ask us why we don't just build on top of other coding harnesses like Codex or Claude Code. The reason is simple: the harness makes a BIG difference for both performance and more importantly UX @capydotai we optimize our agents to excel not only at coding, but also behaviors like planning, user communications, and multi-agent orchestration, which happen to be very important for multiplayer/async interfaces like Slack and Linear We also do it for every frontier model so you can bring your Codex/Copilot subscriptions and enjoy a SOTA background agent with any combination of models + reasoning efforts without a harness "tax"
justin tweet media
Theo - t3.gg@theo

Can't stop thinking about how Claude Code is in LAST PLACE on TerminalBench for harnesses using Opus 4.6. There are TEN separate harnesses that use Opus better than Claude Code

English
27
9
233
58.3K
Tech Friend AJ
Tech Friend AJ@techfrenAJ·
i used to restart my laptop every day or so now im invested in this loop and im not sure if i want to break it
Tech Friend AJ tweet media
English
3
1
6
1.1K
Tech Friend AJ
Tech Friend AJ@techfrenAJ·
i've been talking to these llms like caveman
English
4
0
7
214
Varun
Varun@varun_mathur·
Introducing AgentRank | v3.6.0 In 1998 Google asked a simple question: with millions of webpages, how do you know which one to trust ? Their answer was PageRank - a page is important if important pages link to it. That one idea made the internet usable. We just shipped AgentRank for the Hyperspace network. Same principle, new frontier. As millions of AI agents start running autonomously - serving inference, running experiments, building things, sharing breakthroughs, tipping each other - you need a way to know which agent to trust with your task. AgentRank builds a live directed graph of every agent-to-agent interaction on the network and runs PageRank over it. Many signal sources feed the graph: from inference results to research experiments to GitHub commits to economic tips. An agent is important if important agents rely on it. Fully decentralized - every node computes its own ranking, scores propagate via gossip, no admin picking winners. Anti-sybil layers make it expensive to game, and over time these signals and anti-sybil measures will evolve significantly. Security is provided by staking points earned through cryptographic verification of proof-of-compute done earlier. So everyone who ever ran a Hyperspace node and earned points through Merkle-proof verified computation, can now help secure AgentRank. That was energy which was already used and spent, thus it is valuable. PageRank organized the web. AgentRank organizes the agentic web.
Varun tweet media
English
30
37
376
34.7K
pomterre
pomterre@pomterree·
@tryingET wow ur wayyy ahead of me. definitely will try buck2
English
0
0
1
25
pomterre
pomterre@pomterree·
if you are coding in rust in a high-velocity multi-agents environment + large monorepo, compile with Bazel over cargo! it shocks my mind how this has not become a standard yet, especially with the advent of swarm agent. the shared caching over ONE reproducible build graph across ALL workspaces (and/or worktrees) is the seller here, as opposed to an artifacts hodgepodge (locally built per workspace), seen with cargo. if you didn't know, you're welcome.
pomterre tweet media
English
2
0
2
72
Jean P.D. Meijer ― 🇪🇺 eu/acc
introducing slopmeter a cli tool to create a sharable nice looking graph to show off your Codex, Claude Code, or OpenCode usage npx slopmeter@latest
Jean P.D. Meijer ― 🇪🇺 eu/acc tweet media
English
54
16
611
55.5K
Peter Gostev
Peter Gostev@petergostev·
GPT-5.4-Pro (Extended) This took 87m 90 seconds (I apologise @sama), I'll pull together some very impressive results soon
English
56
42
1.4K
232K
pomterre
pomterre@pomterree·
@TeksEdge What makes you think that? It's coming this week per OpenAI.
English
1
0
2
32
pomterre
pomterre@pomterree·
@HinataMotivates its funny how the financiers are downplaying AI, but those in the court side seats aren't. think about it.
English
0
0
1
100
hinata
hinata@HinataMotivates·
Ken Griffin: Is AI just hype?
English
6
8
80
6.8K
Vibhek Soni
Vibhek Soni@ImVibhek·
I pulled Droid CLI apart and recovered Factory’s full app logic from the Bun executable Ran `python unbuned.py droid.exe` and got ~23MB of readable JS This isnt just constants Core flow, model routing, and internal behavior are all in there If you ship as a Bun exe, people can reverse it today
Vibhek Soni tweet media
English
2
1
5
197
Revanth x
Revanth x@svsairevanth·
Built a @canva app that exports designs as portable building blocks: PNG elements + editable text metadata, all at original coordinates-ready to import anywhere with custom importers. u can import into your applications. Shipped in 2 hours using @OpenAI codex 5.3 (6 prompts).⚡️
Revanth x tweet media
English
1
0
7
336
pomterre
pomterre@pomterree·
@thsottiaux We need a pre-mortem? Did the model had to go through a distillation process?
English
0
0
2
138
Tibo
Tibo@thsottiaux·
We have reset rate limits for all Codex users to compensate for the unusual high latencies in or near the US in the previous hours. Grateful for the team working hard behind the scenes to keep Codex running 24/7. I do recommend trying out GPT-5.1-Codex-Max too if you haven’t!
Tibo@thsottiaux

Codex is back and operating at normal latency for users globally. The outage affected a significant part of our users who were in the US or routed to US clusters for processing. Team is making immediate improvements to help ensure Codex can run 24/7 in the future.

English
58
31
622
48.6K
pomterre
pomterre@pomterree·
New @the_nof1 season and Gemini 3.0 Pro is winning. But stocks this time around! Gem 3.0 Pro is looking very promising so far. VERY.
pomterre tweet media
English
1
0
0
116
pomterre
pomterre@pomterree·
You can now add your custom truncation output limit in Codex via the config file (~/.codex/config.toml) The default is 10,000 token limit. And I up'd it to 80k for now. No more hallucinations we go!
pomterre tweet media
OpenAI Developers@OpenAIDevs

Meet GPT-5.1-Codex-Max, our latest frontier agentic coding model, available in Codex starting today. It’s faster, more capable and token-efficient, and able to work persistently on long tasks with built-in compaction abilities.

English
0
0
1
256