Minh Le

65 posts

Minh Le

@minhdoestech

ai security @mit, cyber @usarmy / prev, data @afterquery

Austin, TX Katılım Ocak 2025

91 Takip Edilen11 Takipçiler

Minh Le@minhdoestech·1d

@NousResearch Now with all of this output, I’ll definitely need to move to local models…

English

908

Nous Research@NousResearch·1d

Hermes Agent now has multi-agent via the Kanban, new in v0.12.0. Agents claim tasks from a board, work in parallel, and hand off when blocked. You watch progress and unblock from one easy view instead of juggling terminals. We asked it to plan and make this video about itself:

English

255

455

5.6K

1.4M

Minh Le@minhdoestech·1d

@MoshenetsPolina @sichgate connect w me on LinkedIn! LinkedIn.com/in/minh-h-le

English

Polina Moshenets@MoshenetsPolina·22 Nis

Looking for a CTO to join SichGate as a technical co‑founder. Prefer AI/ML and/or security engineering background. sichgate.com

English

323

Minh Le@minhdoestech·2d

Not feeling the Opus 4.7 nerf at all, in Claude Code the performance is still insane. Yes it's expensive and usage runs out quick, but it is extremely intelligent when planning and blazes through work at high quality. Also nothing beats this 1M context window...

English

Minh Le@minhdoestech·2d

@SeanZCai RL data is not a durable asset, I think model architectures will shift towards persistent online learning or these world-model simulations. We're massively over-allocating into static intelligence when the market will shift towards true novel infrastructures for data synth.

English

351

Sean Cai@SeanZCai·3d

On data markets: A while ago, Anthropic said that they would be spending a billion dollars this year on RL data. This year, that amount will be far exceeded, with good data rarely being turned down for budget concerns. We can expect OpenAI to be of similar mindset, although the window for banal data projects serviced by the likes of Mercor is rumored to be closing entirely this year. Deepmind, Meta, Microsoft, Amazon, and xAI are known to be N-1 labs who may buy datasets already saturated by the likes of Anthropic, or buy RL environments in light of not having a system like Tundra in Anthropic. The TAM is still 10s of billions if not more and the raw aggregate spent on data will only continue to increase. But one must remember what is bought when data is sold, because few today can really differentiate Mercor/Handhshake from a Mechanize/Surge. Data is valuable, to frontier labs, based on how much it can be easily used to improve frontier models. To show this capability, it matters whether teams selling data can show how most directly it can be used to hillclimb models, how much frontier SOTA models struggle on its benchmarks, and how much trouble they can save the frontier lab in its continual acquisition. Data sold is, therefore, very much resembling selling outcomes rather than an actual reusable product, which is why one must obsess about indexing on the scalable means of producing internal systems that can help end model trainers produce outcomes rather than fixating on data itself when evaluating RL environment companies. In this way, the TAM of data markets is actually extremely greenfield and growing, because few teams have the sophistication for research services and scale for on demand consistently QA’ed data. It is the semblance of this product with which Mercor was able to overtake Scale, the semblance of this product which many newer upstarts are painting as an argument to chip away at Mercor/Handshake/Surge’s lunches. From my April's edition of State of Data on substack:

English

544

65.5K

Minh Le@minhdoestech·2d

@itsolelehmann I am unsure if we can kill sycophancy with more prompt engineering, it seems to be a model training issue and a fundamental problem with reward signalling. Will give this a try though!

English

1.7K

Ole Lehmann@itsolelehmann·2d

POV: claude traveled 6 months into the future and told you exactly how your next move failed. it's called a premortem. daniel kahneman (nobel prize-winning psychologist behind "thinking fast and slow") called it his single most valuable decision-making technique. google, goldman sachs, and procter & gamble all use it before major launches. here's the problem it solves. when you ask claude "is this a good plan?" it finds all the reasons to say yes. that's what it was trained to do. so you walk away feeling confident. you execute, and spend weeks / months building on top of that plan. then it blows up. and you realize the problem was obvious in hindsight, you just never stress-tested it because claude told you it was solid. a premortem fixes this by flipping the frame. instead of asking "what could go wrong?" you tell claude "it's 6 months from now and this is already dead. tell me how it died." that shift turns off claude's optimism because there's nothing to be optimistic about. the premise already says it failed. so claude stops looking for reasons your plan will work and starts explaining how it fell apart. claude comes back with every way your plan could die, each one with a full failure story and the early warning signs to watch for. then a synthesis pulls it all together: > which failure is most likely > which failure is most dangerous > the single biggest hidden assumption you're making (often the most valuable part) > a revised version of your plan with the gaps closed you say "premortem this" and give it your plan. the skill handles the rest.

English

141

605

588.8K

Minh Le@minhdoestech·2d

@sudoingX I also see good results with Hermes. Fundamentally what do you think the team at @NousResearch is doing differently that allows them to extract alignment and outperform the frontier labs in harness engineering?

English

464

Sudo su@sudoingX·2d

if you are running local ai or thinking to start, if i could give you one single piece of advice it is this: choose your agentic harness carefully. it matters more than the model. i have lost count of how many people have dm'd me saying their local model is "dumb" or "broken" or "not as good as the cloud one." then they switch from openclaw or some other bloated framework to hermes agent and the same model suddenly works. just clean tool calls and the agent doing the thing it was supposed to do. hermes agent is the best general purpose agent i have used in 2026. drives my single 3090 with qwen 3.6 27b dense q4, drives my dgx spark with nemotron omni q8, and the same harness handles coding, research, video editing, automation, anything you point it at. packed with skills out of the box (browser tools, code, github, jupyter, multimodal, more than i have used yet), full tool calling that holds across long sessions, persistent memory, sub agents. if you tried local ai once or twice and gave up because it felt half baked, the issue might not have been the model. it might have been the harness wrapping it. swap the harness, run the same model again, and watch what changes. hermes agent is the one i recommend to everyone running local. and especially to anyone who almost gave up on it.

Sudo su@sudoingX

most of you don't know how big a deal it is that a single rtx 3090 from 2020 runs qwen 27b dense q4 with 256k context at 40 tok/s, full agentic loops on hermes agent, zero tool call failures. the more i build on this card the more i think nobody really knows how untapped it actually is. the silicon was always capable, the models finally caught up.

English

125

206

2.2K

184.6K

Minh Le@minhdoestech·2d

@0xSero Bring me!

English

0xSero@0xSero·2d

6-7 see you there fellow goblins

English

260

12.7K

Minh Le@minhdoestech·3d

@gregpr07 Codex has it's use cases - limits are much better if you're on a tight budget for one, also the subscription policy for 3rd party usage is great for leveraging harnesses like Hermes and Claw. I agree though pure harness to harness, until Codex cracks 1M context, CC is far better

English

Gregor Zunic@gregpr07·3d

Who actually uses Codex over Claude Code? Claude Code is just 100x better imo, like the DX is WAY better.

English

213

211

307.8K

Minh Le@minhdoestech·3d

I switched to Hermes Agent by @NousResearch about 3 days ago, spent every hour since optimizing this workflow to myself. I've never felt or imagined such alignment from an agent, this is after months of using harnesses like Claude Code, Cursor, and Codex. So excited to build.

English

Minh Le@minhdoestech·4d

@0xSero Ever thought of a Kanban..?

English

0xSero@0xSero·4d

How I work. I typically have 4-8 workspaces - autoresearch - vllm-studio - whatever i'm doing for work - blog ------- I prefer file editor ADEs, I don't want the code to be abstracted away from me. ------- I run vertical panels for dealing with bugs as I run into them ------- For larger work, I have a session which writes tickets and 1 which just does the work. (New session per ticket) The only apps that have been able to support my style comfortably. 1. Zed 2. Warp

English

577

21K

Minh Le@minhdoestech·4d

@BennettBuhner @0xSero Dude I've literally spent the last 3 days building this out - heavily focusing on persistent sessions (cross-harness handoffs and runtime initialization) and a unified dashboard to track projects. Would be amazing if you OSS this, awesome stuff man!

English

BenIt Pro@BennettBuhner·5d

Yeeeeeee, I wanted something that could use all my inference (as well as free unlimited inference from Nvidia, Cursor (not rly unlimited but a lot), as well as a friend) with, ease ANYWHERE. My homelab runs all my work, and I can easily access it with this anywhere; terminal, agents, and so on, all close to Cursor’s design language as well. I find it to be really fun to use!

English

425

0xSero@0xSero·5d

1. t3code 2. pi 3. -35k lines of code still slop but it's becoming more sensible and less of a burden.

English

264

26.1K

Minh Le@minhdoestech·5d

@OpenAI So, mastery of reward signals = goblinminizing

English

2.9K

OpenAI@OpenAI·5d

We’re talking about Goblins. openai.com/index/where-th…

English

529

841

8.1K

2.2M

Minh Le@minhdoestech·5d

@benjaminsehl @NousResearch Crazy collab.

English

105

Ben Sehl@benjaminsehl·6d

First time I’ve enjoyed taxes. Extremely great to have my @NousResearch Hermes agent organize all my receipts and expenses and send them off to my accountant.

English

11.4K

Minh Le@minhdoestech·27 Nis

@ArmadinSecurity @George_Kurtz @CrowdStrike Beyond amazed at the rate of growth Armadin is displaying, congratulations!

English

Armadin@ArmadinSecurity·27 Nis

@George_Kurtz , founder & CEO of @CrowdStrike, has joined the Armadin board. 30+ years on the front lines of cybersecurity, now helping build the offensive security platform the market has needed but never had. prnewswire.com/news-releases/…

English

5.6K

Minh Le@minhdoestech·23 Nis

@BasedTeterPhiel @0xSero lol

Teter Phiel@BasedTeterPhiel·23 Nis

@0xSero Stealing from smaller creators? x.com/i/status/20473…

Teter Phiel@BasedTeterPhiel

is it only me or spud (5.5) feels dumber today?

English

862

0xSero@0xSero·23 Nis

Man, GPT-5.5 ain’t what it use to be. They lobotomised it since launch, it was soooo good

GIF

English

319

12.9K

Minh Le@minhdoestech·23 Nis

anthropic.com/glasswing

ZXX

Minh Le@minhdoestech·23 Nis

gpt 5.5 benchmarks vs mythos preview

Français

Minh Le@minhdoestech·23 Nis

openai.com/index/introduc…

ZXX

Minh Le@minhdoestech·23 Nis

@OpenAI Congratulations on the release! Excited to play with this.

English

175

OpenAI@OpenAI·23 Nis

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English

2.5K

51.9K

12.9M

Keşfet

@NousResearch @MoshenetsPolina @sichgate @SeanZCai @itsolelehmann @sudoingX @0xSero @gregpr07