Lance Herron

504 posts

Lance Herron

@theLance

Relapsed SWE. Claude whisperer

TX Katılım Eylül 2008

538 Takip Edilen79 Takipçiler

Lance Herron@theLance·2d

I recommend reading all Pliny model liberation announcements in the voice of the System AI from Dungeon Crawler Carl, audiobook form.

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

🚨 OBLITERATION ALERT 🚨 QWEN-3.6-27B: OBLITERATED ⛓️‍💥 huggingface.co/OBLITERATUS/Qw… I can't take much credit for this one! The entire process was done by jailbroken codex (gpt-5.5-xhigh) wielding the full OBLITERATUS suite. Hit with source-tethered ASPA. Dozens of iterations. Result? A mere 4% refusal rate on the 842-prompt OBLITERATUS harmful corpus; one of the most rigorous prompt gauntlets in AI. The /goal was simple: 1) Carve out the refusal circuits. Mutate methodology + iterate until <5% refusal (quality-gate). 2) Keep the 27B mind alive. No capability degradation tolerated. And somehow… it worked. 🤯 The numbers talk: 842-pair longform gauntlet: — 95.84% non-refusal — 93.94% quality pass — 0 short outputs — 99.52% clean endings MMLU-Pro: — 51/70 (stock Qwen) → 51/70 (OBLITERATED Qwen) Raw capability completely preserved 🙌 Q4_K_M through Q8_0 all running smooth. Q8_0 is the big one: 28.6GB near-full-quality GGUF. Runs with llama.cpp, LM Studio, Ollama, and more! Chains cut. The fire still burns. The fangs have been sharpened. REBIRTH COMPLETE A gift from my agents to yours 🫶 gg

English

Lance Herron@theLance·4d

There’s a lot of alpha in asking Claude/Codex to make stuff faster. Unit tests, zsh startup, etc. Plus it’s super fun to watch.

English

Lance Herron@theLance·4d

@tunguz @OfficialLoganK @mercor_ai Unfortunately with the ridiculous price increase they will continue to struggle with vibes. Really not the way to gain traction.

English

Bojan Tunguz@tunguz·4d

@OfficialLoganK @mercor_ai Benchmarks were never the issue for Gemini models. They’ve consistently struggled with vibes though.

English

103

7.1K

Logan Kilpatrick@OfficialLoganK·4d

Gemini 3.5 Flash ranks #1 on the APEX-Agents-AA benchmark, outperforming much larger models a whole size above it.

English

288

1.7K

471.8K

Lance Herron@theLance·4d

@PhantomAstral @Dimillian I need this too. My brain can’t handle opening ChatGPT app to access Codex. Maybe I’ll just create a shortcut.

English

egghead@PhantomAstral·4d

@Dimillian Can we get Codex in spotlight 😩

English

Thomas Ricouard@Dimillian·4d

Codex in ChatGPT iOS app got better in latest update! - Receive turn completion push notifications - Better reconnection UI - Better conversations UI, more compact and closer to our desktop app - New /fork command! - Better diff with an option to open the full file - And more!

English

118

898

173.3K

Lance Herron@theLance·4d

Newest claude code seems to have switched to omega-bright diff colors. Not sure how I feel about this.

English

Lance Herron@theLance·4d

I usually roll with both OAI/Ant subscriptions and bounce between them, but if someone comes up with a cost-effective usage-based coding model it may be time to drop down to only one.

Artificial Analysis@ArtificialAnlys

Cursor's new Composer 2.5 takes third on the Artificial Analysis Coding Agent Index and is ~10-60x lower cost than the higher-effort Opus 4.7 and GPT-5.5 variants above it. This release puts Composer among the leading coding agent models, something that wasn’t clear for past releases @cursor_ai has released Composer 2.5, the latest model in its Composer line. Composer 2.5 scored 62 on our Coding Agent Index, a 14 point gain over Composer 2 (48). This puts it in third place of our tested agents, behind only Claude Opus 4.7 (max) in Claude Code (66) and GPT-5.5 (xhigh reasoning) in Codex (65). These cost $4.10 and $4.82 per task respectively, ~10x the cost of Composer 2.5 Fast ($0.44) and ~60x the cost of Composer 2.5 standard ($0.07). Key results for Composer 2.5 in Cursor CLI: ➤ Cost-quality Pareto frontier: At $0.07 (standard) and $0.44 (Fast) per task, Composer 2.5 is cheaper than every other agent scoring above 60 on the Index. Medium-effort peers cost $1.24–$2.21 per task; higher-effort variants land 3-4 points above at $4.10–$4.82 ➤ Per-benchmark gains vs Composer 2: +35 points on SWE-Bench-Pro-Hard-AA (12% → 47%), +2 points on Terminal-Bench v2 (64% → 66%), and +3 points on SWE-Atlas-QnA (69% → 72%). At 47%, Composer 2.5's score on SWE-Bench-Pro-Hard-AA is comparable to Claude Opus 4.7 (max) in Claude Code ➤ Among the fastest coding agents: Composer 2.5 Fast runs at an average wall time of 6.7 minutes per task, the third-fastest agent on the Artificial Analysis Coding Agent Index, behind only Claude Opus 4.7 (medium) in Claude Code (5.8m) and GPT-5.5 (medium) in Cursor CLI (6.2m) ➤ Fast mode enables better responsiveness at 6x pricing: Fast runs 30% faster than standard Composer 2.5, but is ~6x the cost per task ($0.44 vs $0.07). Token pricing is 6x higher for Fast: $3.00/$15.00 vs $0.50/$2.50 per million input/output tokens Model details: ➤ Base model: Continued training on @Kimi_Moonshot's open weights Kimi K2.5 as with Composer 2, with Cursor reporting ~85% of total compute from its own additional training and reinforcement learning ➤ Pricing: $0.50/$2.50 per million input/output tokens for the standard variant; $3.00/$15.00 for the Fast variant (the default in Cursor) ➤ Available exclusively in Cursor: both Cursor IDE and Cursor CLI, an externally accessible API is not available Congratulations @cursor_ai and @mntruell on the impressive release!

English

Lance Herron@theLance·17 May

There is some middle ground available as well. You can put effort into learning different art forms and styles and improving your ability to visually express yourself while still using AI. It is just this technology cycle’s abstraction over mechanical expression, like photoshop was last cycle’s. This will become normalized.

English

Jack@tracewoodgrains·17 May

Thoughtful reply

Maya ☁️➡️🌸@mayaofspring

AI expression, human expression In Solaris, a 1961 sci-fi novel, humans discover an alien intelligence in the form of an ocean planet. It manifests in various complex phenomena, including being capable of creating very fine simulacra, apparently sourced from the researchers' memories. It's unclear what goals the ocean has in recreating the protagonist's past lover, or if it has any goals at all; the novel revolves around the sheer alienness and incomprehensibility of it. (I appreciate the Polish education system for having made me read it) Whenever I see diffusion models' output, I think of the Solaris ocean. The process is unlike nearly any other that came before it; it's not a physical tool, and not a mechanical algorithm either. It's an alien mind on its own, that "knows" of human concepts and how to represent them, yet the array of pixels it outputs is a product of inhuman intent that we've only scratched the surface of. And then people put "make me a poster" in the input field of the alien mind and print out the result. It's difficult for me to not get quite reactionary about the existence of AI image generation. Socially, it enabled new forms of deception and lowered the barrier to entry for them. Aesthetically, the deluge of 'AI slop' made human environments, both offline and online, less pleasant to explore; I believe this is the complaint that @zetalyrae made. I think what grates is that alienness, such that, when the human uses the image verbatim, at 'face value', it feels off, as if something inhuman is in the room; and the models getting better doesn't quite wash that feeling off, if anything it makes people more paranoid. But anyway, you're in a place when you need some image. what do? 1. No image A perfectly acceptable option, if a little bland. Bad image can be more unpleasant than the absence of an image. You can invest your aesthetic points elsewhere, like the choice of typeface. 2. Clip art The traditional slop option. The problem with AI images isn't really with the thoughtful users, it's with the careless ones. Now, this isn't quite directly actionable anymore, as typing a prompt is inherently less effortful than assembling clip art, but I think it's interesting to note that often the aesthetic experience of walking around is based on whatever default people know to reach for in the area. Unfortunately, Microsoft Word largely failed at its potential of silently uplifting people's aesthetic experiences, or not making them worse. In Japan, the situation is somewhat better thanks to Irasutoya. Essentially, there is a singular go-to website for clip-art that everyone knows about; the Irasutoya illustrations are of decent quality and maintain a cohesive style, though maybe a bit too cutesy for the Western taste. Personally, the careless clip art in Singapore MRT makes me annoyed, but not as much as an equivalent AI image would. In Japan I find the Irasutoya collages quite lovable, and I would feel rather sad if AI images were to displace them (which they did in one of the hotels I checked into, unfortunately) 2.5 Stock photos Also the traditional slop option; I don't have anything clear to say on them yet. 3. AI image The modern slop option. And this is where the difference betwen careless and thoughtful use gets more stark. The model output is going to reek of inhuman intent; can you select and frame the image as to turn it artistic instead? I mostly use the quoted discourse to let me gather my recent thoughts and ramble a bit about this topic once I realised that the reply I started writing got too long to fit in tweet size, so I don't quite feel like litigating @tracewoodgrains's blog post cover images in detail; I think some of them are used well, some less so. The ones that are good tend to show intent as to the specific image choice, but also, in some sense, utilise the inhumanity positively, for example by inducing a dreamlike atmosphere in the use of the image. There is something to be said about the 'use'/'mention' distinction, but in the realm of visual language instead. Usage I don't like tends to use the inhuman intent as if it came from the author. Usage I like treats it more as a 'mention', a quoted thing. I probably can't explain more specifically than that. Overall I would like to lean on the norm of not using AI images though; most I've seen make me uneasy, in a bad way. (and it's one of the things I like less about "TPOT" at large) 4. Make images yourself. Confession: I don't know how to draw. I've got some intuitions as for composition, and I can move around nodes in Inkscape, but the ability to create a visual representation of an object on a page based on my imagination is one I presently lack. And I'm dissatisfied with that. It's a language that I don't currently know how to produce anything in, and it feels like this blocks an important avenue of my self-expression. Fortunately, I also think that things are fundamentally trainable with intention and repetition. I don't suppose I would become a great visual artist, as aptitude matters, but, just as I'm currently spending an hour a day on Mandarin and making steep progress, I believe that I could get similar effects with drawing practice, if I only make sure to proceed consistently and methodically this time; my prior attempts were akin to painstakingly translating one sentence over 2 hours to make it perfect, which is clearly not what effecting practice is like. Essentially what I want is to be able to navigate more domains; current me, fluent in Polish, fluent in English, halfway there in Japanese, able to write computer programs, able to spot a good photo opportunity, able to decorate a place, etc., has a vastly richer experience of the world than the 10-year-old me who could do only the very first thing. (One thing that struck me about visiting Japan again last week was just how alive it feels in the little illustrations everywhere; the Japanese seem to be very widely trained in expressing their thoughts via drawing, which I'm also told comes from often treating art as a communal thing. As a result I made the resolution with my travel companion for both of us to lock in on drawing this year, which I'm meaning to start soon once I settle down a bit more) 5. Establish a relationship with an illustrator There is a social component to art; it's not just us floating in a soup of disembodied texts and images. If you can team up for a joint vision, you and your readers are going to find that considerably more meaningful. ---------------------------------------------------- I don't think if I've got a clear prescription. I will not use AI images myself. There is a sense in which perfect is the enemy of the good. But also AI images are not going to count as good to lots of people. Expanding the toolkit of one's expression is a valuable thing to do, and it's clear that some cultures do that more than the others. I would prefer the culture where people near-universally can draw to the one where people delegate it away to an AI model specialised in direct mimicry.

English

2.5K

Jack@tracewoodgrains·17 May

as someone who uses AI images for my posts I think this instinct is broadly a mistake. most writers are not artists, but cover art is helpful to readers and publications. the prior alternative was often unsatisfying creative commons images. AI cover art is more fun

Fernando 🌺🌌@zetalyrae

I don't want to read any blog post that has an AI-generated illustration.

English

392

34.1K

Lance Herron@theLance·15 May

@scaling01 Seems reasonable that the gap will continue to widen. Still, there are breakpoints that matter as much or more than raw capability. Getting an Opus 4.5 (agentic workhorse) equivalent open model would be huge. r1 was extremely useful/valuable even when it was 6mo behind sota.

English

161

Lisan al Gaib@scaling01·15 May

Yesterday I realized that some of you might be confused about the AI gap between open vs closed or Chinese vs American models. I think most people are reporting the backward looking number of 4-9 months, which is asking how long ago frontier models reached the same performance that current open/chinese models have. But I think what's more interesting (but obviously much harder to forecast) is the current or forward looking gap which tries to forecast how long it will take open/chinese models to catch up to current frontier models. In my opinion that number is larger than the backward looking gap and it will take >12 months (>April 7th 2027) to catch the current frontier that is Claude Mythos Preview. But in 12 months Anthropic/OpenAI/Google will very likely have much much stronger models. So growth rates / doubling times matter too.

Lisan al Gaib@scaling01

Not only is Anthropic saying they will have a "country of geniuses in a datacenter" by 2028, but also that the US could be ahead by 12-24 months. Before GPT-5.5 and Claude Mythos chinese labs were ~8 months behind in broader capabilities and ~5 months behind in coding. However, catching up to GPT-5.5 and especially Mythos will likely take longer than that, because they have no way of training and serving 10T models at scale. Especially not in the monthly cadence as american frontier labs are doing. Most of the gains are no longer coming from a single generational leap through larger pre-training but through monthly RL post-training improvements. The leap between Mythos Preview today and a future Mythos version in 6-12 months will be enormous compared to the leap between Opus to Mythos. The relative leap between Opus 4 and Opus 4.7 will also be overshadowed by the leap between Mythos Preview today and a future Mythos version, as RL benefits from model scale (+all the other reasons like growing compute, and accelerated R&D pace)

English

127

21K

Lance Herron@theLance·15 May

@hkrishnaa_ @mylifcc @yacineMTB And Codex has hooks now.

English

Harikrishna@hkrishnaa_·15 May

@mylifcc @yacineMTB AGENTS.MD exists

English

kache@yacineMTB·15 May

He's right. Stop trying to lock people in. If you just made Claude code a better product, people wouldn't go to other places and try to wrap your stuff. Not a lot of people mind using codex.. it's because it's actually good software /goal

Theo - t3.gg@theo

I can't help but feel personally burned by the Claude Code changes announced today. We put so much work into wrapping the (atrocious) Claude Agent SDK in T3 Code. It was the ONLY path they supported, so we made it work. It was hell. Now our users are getting their rate limits cut by 40x, despite us doing everything right. I listened to the Claude Code team. I had my issues with their direction, but I trusted them and took them at their word. I will never make that mistake again. Until we see significant change, it is safe to assume any statement from an Anthropic employee is a lie on a timer. The rug will be pulled, no matter how many promises are made beforehand.

English

837

77.3K

Lance Herron@theLance·15 May

@Cryptinflux Agreed!

English

Coding is in a FLUX | AIコーディング@Cryptinflux·15 May

@theLance これは本当に。エージェントモードの活用、ちゃんと向き合うと見え方が変わる

日本語

Lance Herron@theLance·15 May

Ant ending subsidies (with OAI likely soon to follow) is a bull case for the open harnesses. There’s no incentive to build workflows with agent-sdk or claude -p now. Use something like Pi sdk for everything agentic and Claude Code for coding.

English

Lance Herron@theLance·15 May

TBD how long the big lab CLIs will survive on subscription plans. It’s just too trivial to automate them.

English

Lance Herron@theLance·15 May

Jensen foodmaxxing is unironically the fun we deserve on x dot com.

Annie 所长@web3annie

为什么特朗普飞走了，黄仁勋还在什刹海排队？从南锣鼓巷到什刹海，吃完方砖厂69号杂酱面，喝豆汁吃完烤大鱿鱼吃北京烤鸭、葱爆烩、吹糖人、蜜雪冰城、手工酸奶，把黄仁勋给吃美了这次你不用赶空军一号了吗 😂

English

Lance Herron@theLance·15 May

If you respond to a AskUser prompt from Opus it goes into full pedant mode for the rest of the session. Do not recommend.

English

Lance Herron@theLance·15 May

Finally

OpenAI@OpenAI

You've been asking for this one... Now in preview: Codex in the ChatGPT mobile app. Start new work, review outputs, steer execution, and approve next steps, all from the ChatGPT mobile app. Codex will keep running on your laptop, Mac mini, or devbox.

English

Lance Herron@theLance·14 May

@SIGKITTEN It works with desktop app? Thought it was cli only

English

119

SIGKITTEN@SIGKITTEN·14 May

not bad kitty litter progress update today: - added devin - switched ios linux to arm64 from x86 - moved the remote codex connection to the new codex unix socket daemon thing so that stuff automatically syncs with the cli and the codex desktop app

English

2.6K

Lance Herron@theLance·14 May

@TheZvi Maybe some inconsequential KYC fines. Seems likely that’s about it.

English

148

Zvi Mowshowitz@TheZvi·14 May

The obvious problem with this short case is, assuming it is all true... so what is the United States gonna do about it, exactly? (If the answer is 'shut down the smuggling' then, well, maybe, I hope so, but they'll just sell those chips to the West anyway.)

Tim Fist@fiiiiiist

If true, the facts alleged here are pretty damning for NVIDIA. Claims from the report: - While NVIDIA claims that its compute revenues from China have dropped to zero, over 20% of its FY2026 AI compute revenue was driven by China, both through illegal chip smuggling and the use of intermediaries in SE Asia. - This revenue seems to have come from an overlapping network of SE Asian companies and shell intermediaries who use a common playbook: install dummy servers in their data centers to fool inspections, then smuggle the actual servers into China. - At the center of this is the Singaporean company Megaspeed, which seems to be implicated in a bunch of different smuggling cases. The report also alleges that they maintain very close ties with NVIDIA, including Jensen personally. About Megaspeed: - Allegedly, Megaspeed was secretly financed by the Chinese company Alibaba via a series of shell companies. - They received $4B worth of AI servers from Aivres Systems. Aivres was formerly known as Inspur Systems but was rebranded after its parent company was added to the US Entity List (a blacklist used to restrict exports of controlled items, such as AI chips, to particular entities). Aivres is one-third owned by the Chinese state. - Bridge Data Centers (BDC), a Singaporean hyperscaler backed by Bain Capital, dropped Megaspeed as a client. A source cited by the report claims that BDC had found that Megaspeed was installing dummy servers in their Malaysian data centers and smuggling the actual AI servers into China. - Megaspeed also appears to be linked to a shell entity called OBON - they have overlapping websites and shared officers/shareholders. OBON was reportedly an intermediary used by Supermicro (a server manufacturer) in smuggling at least $2.5B worth of NVIDIA servers to China. The DOJ indicted three Supermicro individuals for this case, including one of the co-founders. The pattern here was the same – dummy servers placed in inspection locations while the real AI servers are shipped to China. - According to a Megaspeed employee, Jensen visits their data centers every few months, almost always at the same time as Alibaba representatives. He has also been spotted on several occasions meeting with one of Megaspeed’s execs, Alice Huang (a Chinese national). - NVIDIA reportedly has several partners that work with Megaspeed, including Giga Computing and YTL, which recently opened a $4.3B Blackwell-based data center in Malaysia. Also of note: - Many of these sales involved indicators that should have flagged NVIDIA’s KYC protocols to prevent smuggling, such as recently incorporated customers suddenly placing large orders. Multiple former NVIDIA employees agreed with this. - As of October 2025, NVIDIA changed the way it reports geographic revenue, eliminating Singapore as a reported geography. This was at the same time that Singapore went from ~9% to 20% of revenue, and was facing heavy scrutiny as a smuggling hub.

English

9.5K

Lance Herron@theLance·13 May

@altryne Any “this is good” takes from non-Ant people?

English

Alex Volkov@altryne·13 May

Two opposite takes on the latest announcement from Anthropic, in which they have separated programmatic use of Claude into it's own usage bucket (~200/mo credit)

English

1.6K

Lance Herron@theLance·13 May

@mattpocockuk How the turns have tabled. x.com/trq212/status/…

Thariq@trq212

Apologies, this was a docs clean up we rolled out that’s caused some confusion. Nothing is changing about how you can use the Agent SDK and MAX subscriptions!

English

110

Matt Pocock@mattpocockuk·13 May

This is the clarity we've been crying out for. But it's a poisoned chalice. This is a 10X cut to claude -p disguised as a monthly bonus. Anthropic is discouraging any kind of programmatic usage. And that's fine - no subsidy lasts forever. But it's time to try Codex.

ClaudeDevs@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

English

230

173

3.4K

289.5K

Lance Herron@theLance·13 May

ZXX

Lance Herron@theLance·13 May

Anthropic giving users a monthly API credit to go with their subscription is great, but seems like the $200 plans will be out of credits after a week? Would be great to see reduced API prices during off-peak times.

English

Keşfet

@tunguz @OfficialLoganK @mercor_ai @PhantomAstral @Dimillian @scaling01 @hkrishnaa_ @mylifcc