Minh Le

65 posts

Minh Le banner
Minh Le

Minh Le

@minhdoestech

ai security @mit, cyber @usarmy / prev, data @afterquery

Austin, TX Katılım Ocak 2025
91 Takip Edilen11 Takipçiler
Minh Le
Minh Le@minhdoestech·
@NousResearch Now with all of this output, I’ll definitely need to move to local models…
English
0
0
3
908
Nous Research
Nous Research@NousResearch·
Hermes Agent now has multi-agent via the Kanban, new in v0.12.0. Agents claim tasks from a board, work in parallel, and hand off when blocked. You watch progress and unblock from one easy view instead of juggling terminals. We asked it to plan and make this video about itself:
English
255
455
5.6K
1.4M
Polina Moshenets
Polina Moshenets@MoshenetsPolina·
Looking for a CTO to join SichGate as a technical co‑founder. Prefer AI/ML and/or security engineering background. sichgate.com
Polina Moshenets tweet media
English
2
1
3
323
Minh Le
Minh Le@minhdoestech·
Not feeling the Opus 4.7 nerf at all, in Claude Code the performance is still insane. Yes it's expensive and usage runs out quick, but it is extremely intelligent when planning and blazes through work at high quality. Also nothing beats this 1M context window...
Minh Le tweet media
English
0
0
1
74
Minh Le
Minh Le@minhdoestech·
@SeanZCai RL data is not a durable asset, I think model architectures will shift towards persistent online learning or these world-model simulations. We're massively over-allocating into static intelligence when the market will shift towards true novel infrastructures for data synth.
English
0
0
0
351
Sean Cai
Sean Cai@SeanZCai·
On data markets: A while ago, Anthropic said that they would be spending a billion dollars this year on RL data. This year, that amount will be far exceeded, with good data rarely being turned down for budget concerns. We can expect OpenAI to be of similar mindset, although the window for banal data projects serviced by the likes of Mercor is rumored to be closing entirely this year. Deepmind, Meta, Microsoft, Amazon, and xAI are known to be N-1 labs who may buy datasets already saturated by the likes of Anthropic, or buy RL environments in light of not having a system like Tundra in Anthropic. The TAM is still 10s of billions if not more and the raw aggregate spent on data will only continue to increase. But one must remember what is bought when data is sold, because few today can really differentiate Mercor/Handhshake from a Mechanize/Surge. Data is valuable, to frontier labs, based on how much it can be easily used to improve frontier models. To show this capability, it matters whether teams selling data can show how most directly it can be used to hillclimb models, how much frontier SOTA models struggle on its benchmarks, and how much trouble they can save the frontier lab in its continual acquisition. Data sold is, therefore, very much resembling selling outcomes rather than an actual reusable product, which is why one must obsess about indexing on the scalable means of producing internal systems that can help end model trainers produce outcomes rather than fixating on data itself when evaluating RL environment companies. In this way, the TAM of data markets is actually extremely greenfield and growing, because few teams have the sophistication for research services and scale for on demand consistently QA’ed data. It is the semblance of this product with which Mercor was able to overtake Scale, the semblance of this product which many newer upstarts are painting as an argument to chip away at Mercor/Handshake/Surge’s lunches. From my April's edition of State of Data on substack:
Sean Cai tweet media
English
26
29
544
65.5K
Minh Le
Minh Le@minhdoestech·
@itsolelehmann I am unsure if we can kill sycophancy with more prompt engineering, it seems to be a model training issue and a fundamental problem with reward signalling. Will give this a try though!
English
0
0
1
1.7K
Ole Lehmann
Ole Lehmann@itsolelehmann·
POV: claude traveled 6 months into the future and told you exactly how your next move failed. it's called a premortem. daniel kahneman (nobel prize-winning psychologist behind "thinking fast and slow") called it his single most valuable decision-making technique. google, goldman sachs, and procter & gamble all use it before major launches. here's the problem it solves. when you ask claude "is this a good plan?" it finds all the reasons to say yes. that's what it was trained to do. so you walk away feeling confident. you execute, and spend weeks / months building on top of that plan. then it blows up. and you realize the problem was obvious in hindsight, you just never stress-tested it because claude told you it was solid. a premortem fixes this by flipping the frame. instead of asking "what could go wrong?" you tell claude "it's 6 months from now and this is already dead. tell me how it died." that shift turns off claude's optimism because there's nothing to be optimistic about. the premise already says it failed. so claude stops looking for reasons your plan will work and starts explaining how it fell apart. claude comes back with every way your plan could die, each one with a full failure story and the early warning signs to watch for. then a synthesis pulls it all together: > which failure is most likely > which failure is most dangerous > the single biggest hidden assumption you're making (often the most valuable part) > a revised version of your plan with the gaps closed you say "premortem this" and give it your plan. the skill handles the rest.
Ole Lehmann tweet media
English
141
605
6K
588.8K
Minh Le
Minh Le@minhdoestech·
@sudoingX I also see good results with Hermes. Fundamentally what do you think the team at @NousResearch is doing differently that allows them to extract alignment and outperform the frontier labs in harness engineering?
English
0
0
1
464
Sudo su
Sudo su@sudoingX·
if you are running local ai or thinking to start, if i could give you one single piece of advice it is this: choose your agentic harness carefully. it matters more than the model. i have lost count of how many people have dm'd me saying their local model is "dumb" or "broken" or "not as good as the cloud one." then they switch from openclaw or some other bloated framework to hermes agent and the same model suddenly works. just clean tool calls and the agent doing the thing it was supposed to do. hermes agent is the best general purpose agent i have used in 2026. drives my single 3090 with qwen 3.6 27b dense q4, drives my dgx spark with nemotron omni q8, and the same harness handles coding, research, video editing, automation, anything you point it at. packed with skills out of the box (browser tools, code, github, jupyter, multimodal, more than i have used yet), full tool calling that holds across long sessions, persistent memory, sub agents. if you tried local ai once or twice and gave up because it felt half baked, the issue might not have been the model. it might have been the harness wrapping it. swap the harness, run the same model again, and watch what changes. hermes agent is the one i recommend to everyone running local. and especially to anyone who almost gave up on it.
Sudo su tweet media
Sudo su@sudoingX

most of you don't know how big a deal it is that a single rtx 3090 from 2020 runs qwen 27b dense q4 with 256k context at 40 tok/s, full agentic loops on hermes agent, zero tool call failures. the more i build on this card the more i think nobody really knows how untapped it actually is. the silicon was always capable, the models finally caught up.

English
125
206
2.2K
184.6K
0xSero
0xSero@0xSero·
6-7 see you there fellow goblins
0xSero tweet media
English
32
1
260
12.7K
Minh Le
Minh Le@minhdoestech·
@gregpr07 Codex has it's use cases - limits are much better if you're on a tight budget for one, also the subscription policy for 3rd party usage is great for leveraging harnesses like Hermes and Claw. I agree though pure harness to harness, until Codex cracks 1M context, CC is far better
English
0
0
0
1K
Gregor Zunic
Gregor Zunic@gregpr07·
Who actually uses Codex over Claude Code? Claude Code is just 100x better imo, like the DX is WAY better.
English
213
0
211
307.8K
Minh Le
Minh Le@minhdoestech·
I switched to Hermes Agent by @NousResearch about 3 days ago, spent every hour since optimizing this workflow to myself. I've never felt or imagined such alignment from an agent, this is after months of using harnesses like Claude Code, Cursor, and Codex. So excited to build.
Minh Le tweet media
English
1
0
1
86
Minh Le
Minh Le@minhdoestech·
@0xSero Ever thought of a Kanban..?
English
0
0
0
18
0xSero
0xSero@0xSero·
How I work. I typically have 4-8 workspaces - autoresearch - vllm-studio - whatever i'm doing for work - blog ------- I prefer file editor ADEs, I don't want the code to be abstracted away from me. ------- I run vertical panels for dealing with bugs as I run into them ------- For larger work, I have a session which writes tickets and 1 which just does the work. (New session per ticket) The only apps that have been able to support my style comfortably. 1. Zed 2. Warp
0xSero tweet media
English
40
25
577
21K
Minh Le
Minh Le@minhdoestech·
@BennettBuhner @0xSero Dude I've literally spent the last 3 days building this out - heavily focusing on persistent sessions (cross-harness handoffs and runtime initialization) and a unified dashboard to track projects. Would be amazing if you OSS this, awesome stuff man!
English
0
0
0
6
BenIt Pro
BenIt Pro@BennettBuhner·
Yeeeeeee, I wanted something that could use all my inference (as well as free unlimited inference from Nvidia, Cursor (not rly unlimited but a lot), as well as a friend) with, ease ANYWHERE. My homelab runs all my work, and I can easily access it with this anywhere; terminal, agents, and so on, all close to Cursor’s design language as well. I find it to be really fun to use!
English
1
0
2
425
0xSero
0xSero@0xSero·
1. t3code 2. pi 3. -35k lines of code still slop but it's becoming more sensible and less of a burden.
English
7
3
264
26.1K
Minh Le
Minh Le@minhdoestech·
@OpenAI So, mastery of reward signals = goblinminizing
English
0
0
0
2.9K
Ben Sehl
Ben Sehl@benjaminsehl·
First time I’ve enjoyed taxes. Extremely great to have my @NousResearch Hermes agent organize all my receipts and expenses and send them off to my accountant.
English
11
3
79
11.4K
0xSero
0xSero@0xSero·
Man, GPT-5.5 ain’t what it use to be. They lobotomised it since launch, it was soooo good
GIF
English
22
6
319
12.9K
Minh Le
Minh Le@minhdoestech·
gpt 5.5 benchmarks vs mythos preview
Minh Le tweet media
Français
2
0
1
88
Minh Le
Minh Le@minhdoestech·
@OpenAI Congratulations on the release! Excited to play with this.
English
0
0
0
175
OpenAI
OpenAI@OpenAI·
Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.
English
2.5K
7K
51.9K
12.9M