pomterre

244 posts

pomterre

@pomterree

swe | hacking something

Katılım Temmuz 2025

54 Takip Edilen68 Takipçiler

Sabitlenmiş Tweet

pomterre@pomterree·19 Eyl

Wait, so Codex fixed the bug. But because it did it surgically and not verbose about it, you'd rather have it code slop? Make it make sense.

English

414

pomterre@pomterree·6d

@grantjordan @justinsunyt Yes cus they cheated if you read the integrity blog article so they revoked it

English

Grant Jordan@grantjordan·6d

@justinsunyt Terminal Bench 2.0 is easily gameable. Just two months ago ForgeCode was in the top spot, now it’s nowhere to be found in the list. Definitely don’t trust terminal bench

English

923

justin@justinsunyt·6d

Codex is also not the best GPT-5.5 harness! Capy scores higher on TerminalBench, alongside 3 other harnesses People often ask us why we don't just build on top of other coding harnesses like Codex or Claude Code. The reason is simple: the harness makes a BIG difference for both performance and more importantly UX @capydotai we optimize our agents to excel not only at coding, but also behaviors like planning, user communications, and multi-agent orchestration, which happen to be very important for multiplayer/async interfaces like Slack and Linear We also do it for every frontier model so you can bring your Codex/Copilot subscriptions and enjoy a SOTA background agent with any combination of models + reasoning efforts without a harness "tax"

Theo - t3.gg@theo

Can't stop thinking about how Claude Code is in LAST PLACE on TerminalBench for harnesses using Opus 4.6. There are TEN separate harnesses that use Opus better than Claude Code

English

233

58.3K

pomterre@pomterree·7 May

@techfrenAJ @steipete u got competition, ralph it up

English

Tech Friend AJ@techfrenAJ·7 May

i used to restart my laptop every day or so now im invested in this loop and im not sure if i want to break it

English

1.1K

pomterre@pomterree·18 Nis

@techfrenAJ go go go go go oogie

Polski

Tech Friend AJ@techfrenAJ·18 Nis

i've been talking to these llms like caveman

English

214

pomterre@pomterree·15 Mar

@varun_mathur i call this hypeware at this point.

English

Varun@varun_mathur·15 Mar

Introducing AgentRank | v3.6.0 In 1998 Google asked a simple question: with millions of webpages, how do you know which one to trust ? Their answer was PageRank - a page is important if important pages link to it. That one idea made the internet usable. We just shipped AgentRank for the Hyperspace network. Same principle, new frontier. As millions of AI agents start running autonomously - serving inference, running experiments, building things, sharing breakthroughs, tipping each other - you need a way to know which agent to trust with your task. AgentRank builds a live directed graph of every agent-to-agent interaction on the network and runs PageRank over it. Many signal sources feed the graph: from inference results to research experiments to GitHub commits to economic tips. An agent is important if important agents rely on it. Fully decentralized - every node computes its own ranking, scores propagate via gossip, no admin picking winners. Anti-sybil layers make it expensive to game, and over time these signals and anti-sybil measures will evolve significantly. Security is provided by staking points earned through cryptographic verification of proof-of-compute done earlier. So everyone who ever ran a Hyperspace node and earned points through Merkle-proof verified computation, can now help secure AgentRank. That was energy which was already used and spent, thus it is valuable. PageRank organized the web. AgentRank organizes the agentic web.

English

376

34.7K

pomterre@pomterree·15 Mar

@tryingET wow ur wayyy ahead of me. definitely will try buck2

English

tryingEveryThing@tryingET·15 Mar

@pomterree why not buck2 or pantsbuild? bazel is a old slow dinosaur

English

pomterre@pomterree·15 Mar

if you are coding in rust in a high-velocity multi-agents environment + large monorepo, compile with Bazel over cargo! it shocks my mind how this has not become a standard yet, especially with the advent of swarm agent. the shared caching over ONE reproducible build graph across ALL workspaces (and/or worktrees) is the seller here, as opposed to an artifacts hodgepodge (locally built per workspace), seen with cargo. if you didn't know, you're welcome.

English

pomterre@pomterree·12 Mar

ZXX

pomterre@pomterree·12 Mar

@initjean sweet!

English

233

Jean P.D. Meijer ― 🇪🇺 eu/acc@initjean·11 Mar

introducing slopmeter a cli tool to create a sharable nice looking graph to show off your Codex, Claude Code, or OpenCode usage npx slopmeter@latest

Jean P.D. Meijer ― 🇪🇺 eu/acc tweet media

English

611

55.5K

pomterre@pomterree·6 Mar

@petergostev @sama What was your prompt? I could barely get it to 10 mins.

English

5.3K

Peter Gostev@petergostev·6 Mar

GPT-5.4-Pro (Extended) This took 87m 90 seconds (I apologise @sama), I'll pull together some very impressive results soon

English

1.4K

232K

pomterre@pomterree·4 Mar

@TeksEdge What makes you think that? It's coming this week per OpenAI.

English

David Hendrickson@TeksEdge·4 Mar

No GPT-5.4 today but more likely later in the month of May.

David Hendrickson@TeksEdge

Check out this SVG created with GPT-5.4. Amazing if true. Remember @QuiverAI from waaay back (a week ago)?

English

391

pomterre@pomterree·4 Mar

@HinataMotivates its funny how the financiers are downplaying AI, but those in the court side seats aren't. think about it.

English

100

hinata@HinataMotivates·3 Mar

Ken Griffin: Is AI just hype?

English

6.8K

pomterre@pomterree·1 Mar

@ImVibhek impressive

English

Vibhek Soni@ImVibhek·28 Şub

I pulled Droid CLI apart and recovered Factory’s full app logic from the Bun executable Ran `python unbuned.py droid.exe` and got ~23MB of readable JS This isnt just constants Core flow, model routing, and internal behavior are all in there If you ship as a Bun exe, people can reverse it today

English

197

pomterre@pomterree·22 Şub

@sai_revanth_12 @canva @OpenAI so amazing. wow.

English

Revanth x@svsairevanth·22 Şub

Built a @canva app that exports designs as portable building blocks: PNG elements + editable text metadata, all at original coordinates-ready to import anywhere with custom importers. u can import into your applications. Shipped in 2 hours using @OpenAI codex 5.3 (6 prompts).⚡️

English

336

pomterre@pomterree·3 Şub

@techfrenAJ @theo wow! that looks amazing. 😳

English

pomterre retweetledi

Tech Friend AJ@techfrenAJ·3 Şub

@theo oooo yeaa

Türkçe

854

Theo - t3.gg@theo·2 Şub

The Niri window manager feels like exactly what I was looking for. Holy shit this is good.

Theo - t3.gg@theo

x.com/i/article/2018…

English

173

153

4.2K

758.8K

pomterre@pomterree·11 Ara

90.5% COST REDUCTION IN A YEAR! The scaling continues... until we have AGI in our pocket.

ARC Prize@arcprize

A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year

English

pomterre@pomterree·23 Kas

@thsottiaux We need a pre-mortem? Did the model had to go through a distillation process?

English

138

Tibo@thsottiaux·23 Kas

We have reset rate limits for all Codex users to compensate for the unusual high latencies in or near the US in the previous hours. Grateful for the team working hard behind the scenes to keep Codex running 24/7. I do recommend trying out GPT-5.1-Codex-Max too if you haven’t!

Tibo@thsottiaux

Codex is back and operating at normal latency for users globally. The outage affected a significant part of our users who were in the US or routed to US clusters for processing. Team is making immediate improvements to help ensure Codex can run 24/7 in the future.

English

622

48.6K

pomterre@pomterree·20 Kas

New @the_nof1 season and Gemini 3.0 Pro is winning. But stocks this time around! Gem 3.0 Pro is looking very promising so far. VERY.

English

116

pomterre@pomterree·20 Kas

You can now add your custom truncation output limit in Codex via the config file (~/.codex/config.toml) The default is 10,000 token limit. And I up'd it to 80k for now. No more hallucinations we go!

OpenAI Developers@OpenAIDevs

Meet GPT-5.1-Codex-Max, our latest frontier agentic coding model, available in Codex starting today. It’s faster, more capable and token-efficient, and able to work persistently on long tasks with built-in compaction abilities.

English

256

pomterre@pomterree·18 Kas

Gemini 3 TOMORROW GUYS!

Logan Kilpatrick@OfficialLoganK

Gemini

English

210

Keşfet

@grantjordan @justinsunyt @capydotai @techfrenAJ @steipete @varun_mathur @tryingET @initjean