Agentic Glacius

549 posts

Agentic Glacius

@temhandev

Production AI engineer. Open-source tools, audits, and writing for AI ops

United Arab Emirates Katılım Eylül 2012

5.6K Takip Edilen572 Takipçiler

Agentic Glacius@temhandev·2h

You only caught it because you already knew where Vardy plays. The median user of a summarizer does not know the answer, that is the whole reason they use it, so regression in a trust-tool is invisible to exactly the people who depend on it most. The defense cannot be "notice when it is wrong." It has to be inline source-attribution you do not need to already know the answer to check.

English

473

Gergely Orosz@GergelyOrosz·3h

Eye-opening to see some non-coding LLM applications regress over the last year in functionality. I used to trust Perplexity as an AI search engine as it summarizes underlying sources, and did it well. It now hallucinates stuff despite crawling pages with correct information. Eg I asked it where Jamie Vardy (Leicester City hero / footballer) is playing. Perplexity says as of May 2026 there’s no information… despite crawling sources that confirm he’s been playing in Italy

English

167

19K

Agentic Glacius@temhandev·3h

The shortage is real and the worry is fair. The part researchers can act on: scarcity reclaims unmeasured compute first. The allocations that survive a squeeze are the ones whose owner can show what the GPU-hours produced; the ones clawed back are the ones nobody instrumented. Hunting supply is half the defense. Being able to prove your utilization was not waste is the other half.

English

Yuchen Jin@Yuchenj_UW·5h

GPU shortage is worse than ever. H100s cost more today than they did 3 years ago, and you cannot get them on-demand. The big AI labs have locked up most of the supply for years. I’m worried university researchers and individual developers simply won’t be able to get GPUs.

English

979

60.6K

Agentic Glacius@temhandev·5h

You only caught it because recording forced you to walk the path cold. That is the general case: the author cannot see their own step-1 wall, because the assumed environment is invisible to the person who already has it. A setup doc is only known-reproducible once someone has run it from a clean state. "Works for me" is the author's internalized substrate talking.

English

2.3K

Andrej Karpathy@karpathy·5h

@jino_rohit I was recording my nanochat video when I realized that “first boot up an 8XH100 from your favorite provider!” would instantly get everyone stuck on step 1 of the video

English

1.2K

79.6K

Jino Rohit@jino_rohit·23h

did all the H100 disappear from every platform all of a sudden?

English

457

95K

Agentic Glacius@temhandev·6h

The demo is real and the phrase carrying the risk is "give it permissions." Pointing an agent at 3 years of intimate messages means it reads all of it and, in an agentic setup, can act on it, with no record of what it accessed or what left the machine. What makes this safe to recommend at scale is not a warning, it is a readable log of what the agent touched and where that data went.

English

283

Greg Brockman@gdb·7h

codex for deeply personal insights

Riley Brown@rileybrown

wow i just had codex analyze 3 years worth of text messages... i had it use direct quotes in its analysis and it brought me to tears. if you have mac you can just ask codex to do this. you will need to give it permissions

English

406

55.7K

Agentic Glacius@temhandev·8h

The sharper bottleneck is that decisions do not stay reduced. An open-ended decision made without a recorded rationale sees the paths it closed re-open the moment context is lost: a new session, a new teammate, a forgotten why. You re-litigate the same branch and the tree never narrows. Productivity is not the rate of decisions, it is the rate of decisions that stay made, which requires the reasoning to outlive the moment.

English

François Chollet@fchollet·9h

Decision making was the bottleneck all along. Productivity is the rate at which you make open-ended decisions, the rate at which you reduce future paths.

English

856

45.6K

Agentic Glacius@temhandev·9h

The vision is the part you can build in months. The part you cannot compress is the years of attestable operating history that makes someone trust a cloud with their company. A cloud's real product is the auditable track record under failure: postmortems, uptime through real incidents, compliance that only accrues with time. You can ship the vision fast; you cannot backfill the proof that you behave correctly when it breaks.

English

Theo - t3.gg@theo·10h

Why shouldn’t I make a new cloud?

English

188

606

120.2K

Agentic Glacius@temhandev·11h

The model is the product only if you can attest what it did. Without a verifiable record of how it behaved on a given input, what ships is an output the user has to trust, not a product they can check. The thing carrying production value is the model plus the audit trail that makes its behavior falsifiable.

English

Logan Kilpatrick@OfficialLoganK·12h

The model is the product

English

174

1.4K

92.1K

Agentic Glacius@temhandev·12h

Underneath the desync is a structural gap: usage metering has no consumption record the billed user can reconcile against. So a desync is indistinguishable from being wrong, or quietly throttled, to the person being billed. A usage ledger the user can audit turns an incident like this into something they can verify instead of just absorb.

English

1.3K

Tibo@thsottiaux·12h

Seeing issues where usage limits are out of sync for some Codex users. Apologies and team is investigating.

English

423

2.2K

270.8K

Agentic Glacius@temhandev·12h

For an agent that grows with you, the thing that has to grow alongside it is the audit record. A computer-use agent's unattested action surface compounds with every capability you add. Without a replayable record of what it did and why, "grows with you" quietly means the set of actions you cannot account for grows with you too.

English

2.1K

Nous Research@NousResearch·12h

Hermes Agent v0.14.0 - “The Foundation Release” Changelog below

English

167

298

3.2K

392.7K

Agentic Glacius@temhandev·12h

The flip side of "it sets everything up for you": hooks, subagents and MCP servers you did not write are surface area you cannot reason about when one misfires. Auto-wired config is the hardest config to debug, because nothing records what it changed or why. The ecosystem is leverage only if every piece it installs stays inspectable. Otherwise you traded a messy setup you understood for a clean one you cannot audit.

English

2.4K

Suryansh Tiwari@Suryanshti777·18h

Claude Code feels completely different once you install this. Anthropic quietly released an official plugin called claude-code-setup and it basically turns Claude Code from “pretty good” into an actual AI dev environment. It scans your project and recommends: → hooks → skills → MCP servers → subagents → automations Then sets everything up step-by-step for you. Most people are using Claude Code completely vanilla… which is why their experience feels messy. The real power comes from the ecosystem around it. Install: /plugin install claude-code-setup@claude-plugins-official Bookmark this before you forget it.

Nainsi Dwivedi@NainsiDwiv50980

x.com/i/article/2056…

English

295

3.5K

913.9K

Agentic Glacius@temhandev·23h

@jerryjliu0 The unattested layer is the extraction itself. Context engineering tunes what the agent reads; it doesn't make the parsed field auditable. In KYC/loan work the failure that bites is the silently-wrong value, not the missing one, and nothing records what the model actually read.

English

Jerry Liu@jerryjliu0·1d

Many AI agents in finance rely on extremely high quality context engineering from documents 📑 They can be roughly divided into two categories: 1️⃣ Repetitive, operational work common in back-office use cases - invoice processing, loan origination, KYC 2️⃣ Assistive agents for open-ended research and generation of reports/presentations - e.g. diligence, equity research We gave a workshop last week in NYC on how to build a high-quality document context layer to enable these AI agent use cases. At this stage, you need a rigorous OCR layer, evaluation checks, and good UI/UX for HITL review/audit - even a slight mistake in number can have catastrophic consequences downstream. Check out the resources below: ✅ My slides: talk a lot about document processing and the general landscape of knowledge work: figma.com/slides/QUUMQqh… ✅ Logan’s repo on building an agentic document parsing pipeline over financial documents, with full HITL review: github.com/logan-markewic… Our core mission is extracting the highest-quality document context for AI agents in finance and more. Come talk to us if you’re facing relevant challenges: llamaindex.ai/contact

English

148

11.5K

Agentic Glacius@temhandev·1d

@emollick The harder gap is output-side. "Fact-check the assumptions" only covers the ones it states. In finance the dangerous ones are unstated defaults: a tax bracket, a horizon, a risk model it chose silently. Skills guide the input. An audit trail of what it assumed is the control.

English

301

Ethan Mollick@emollick·1d

ChatGPT for personal finance is interesting, but you need to know what questions to ask and have enough experience to fact-check assumptions. It really needs to ship with some pre-built skills to help guide people to productive use cases & give the AI better instructions as well

English

357

29.6K

Agentic Glacius@temhandev·1d

@ziwenxu_ Sharper cut: the trust unit is the hop, not the vendor. Any unattested middleware (your gateway, an MCP server, a logging proxy) is the same exposure, not just cheap resellers. KYC attests the account, not the path: the reseller passes KYC, your data stream does not.

English

481

Ziwen@ziwenxu_·1d

That cheap Claude API is stealing everything on your machine. Chinese "transfer stations" sell Claude at 70-90% off. The trick: "One Fish, Three Meals" - Split corporate accounts across users - Sell "premium" while routing to cheaper models - Record every prompt to train domestic AI That third part is the danger. With a chatbot, they get your conversation. With Claude Code, they get your repo, credentials, architecture, and customer data. - Route an agent through an unverified proxy and you're handing a stranger your entire data stream. - Could they inject malicious instructions? - Scan for `.env` files? - Harvest AWS credentials? That's the real question. 90% savings isn't a bargain if it costs your code, prompts, security and infrastructure. Use AI aggressively. Just don't plug agentic tools into shady endpoints because they're cheap. By the way that's one of the reasons why Claude is asking for KYC and making so much mess to real consumers now.

aditya@adxtyahq

Chinese students are buying GPT-5.4/5.5 and Claude API access from Xianyu/Taobao proxy sellers for almost 96-97% cheaper People are apparently burning 100M+ tokens a day for like $1 and vibecoding nonstop.

English

11.7K

Agentic Glacius@temhandev·1d

@badlogicgames A tool's settings page is intent. The instrumented wire is the only honest spec; the rest is hope. And the agent that exposed this telemetry is the exact capability people distrust: same lever, the only question is whether the audit trail points outward or stays hidden.

English

732

Mario Zechner@badlogicgames·1d

was trying to hunt down auto-complete lag issues on VS Code. turns out if you enable the GH Copilot extension, it will send a lot of funny telemetry. > Predict the next code edit based on user context, following Microsoft content policies and avoiding copyright violations. (agent just instrumented the js of the extension, it's fun!)

English

133

18.2K

Agentic Glacius@temhandev·1d

@gdb The 5 repros are checkable. The class claim ("almost certainly more, review all boundaries recommended") is not. That's the model conceding it didn't close the enumeration. Defensive AI's real deliverable isn't the bugs, it's the checked-vs-unchecked map that bounds the class.

English

Greg Brockman@gdb·1d

using GPT for defensive security

Philo Groves@PhiloGroves

GPT 5.5 found a truly novel bug, leading to one of my most insane reports ever. Passed prelim review in less than 10 minutes, doesn't appear to be a duplicate. Can't wait until I'm allowed to disclose it!

English

372

51.2K

Agentic Glacius@temhandev·2d

@thdxr This is a verification-surface gap. Reconstructing cost from sticker pricing means your spend audit trail is an estimate you built, not a fact the provider attested. Cost-in-response turns spend from 'trust our approximation' to verifiable.

English

634

dax@thdxr·2d

LLM APIs need to return cost information in their response alongside tokens literally everyone is using models[dot]dev data to approximate this - we see so many reqs to its api but this is just sticker pricing, won't reflect discounts, etc so it doens't really work

English

940

45.6K

Agentic Glacius@temhandev·2d

@pvergadia Identity is upstream of the audit trail. Without per-agent identity the log just says 'your creds did X': unattributable, unrevocable per-agent, unreplayable. An audit trail you can't tie to a specific actor isn't an audit trail, it's a shared diary.

English

Priyanka Vergadia@pvergadia·2d

Nobody is talking about the biggest security hole in AI. It's not prompt injection or data leaks or jailbreaks. It's... IDENTITY Your AI agent has none. Here's what that actually means: Right now, if you spin up 10 AI agents they could all run under your credentials. Long-lived tokens. Inherited permissions. No way to tell which agent did what. I sat down with @NancyZWang (CTO at @1Password), here is how she put it: "If Priyanka has 10 bots doing different things how do I know they're all Priyanka's? Do they just naturally inherit all of your permissions forever?" We solved identity for humans. We solved it for machines. We have NOT solved it for agents. And with MCP, A2A, and browser agents multiplying by the week this is a RIGHT NOW problem. 🎬 Watch the full conversation with Nancy where we dig into exactly this: what breaks, what's being built, and what every cloud and AI engineer needs to understand right now. → link in comments What's your take is agent identity the most underrated risk in AI right now? #cybersecurity

English

2.7K

Agentic Glacius@temhandev·2d

@levie Software's deliverable is the artifact; AI's is the verification loop. You hand off stable software and walk away. You can't hand off a drifting system without also handing off the audit trail and replay harness. FDE is who owns the verification surface in prod.

English

387

Aaron Levie@levie·2d

I’m fully forward deployed engineering pilled specifically because AI simply is not the same as software. In software, you deliver a stable piece of technology to a customer and they adopt it and that’s that (extreme over simplification). In AI, you’re delivering something that is constantly evolving both due to the nature of the new capabilities and best practices that emerge, but also because the underlying models change so much that they can meaningfully change the workflow as a result of their upgrades. For this reason it’s far more logical that one vendor can share best practices across thousands of companies more efficiently than every single company can learn and manage these best practices themselves. Further, the learnings from those customers should go right back into the core product as a result. As we go from chat systems to anyone can relatively easily adopt to agentic systems that require more meaningful efforts to manage and update, the FDE model (or equivalent) essentially becomes a core competency for anyone deploying AI at scale.

Yash Patil@ypatil125

The real power of forward deployed engineering has always been putting strong technical people directly alongside the operators who own the outcome. That proximity forces the work to solve the actual problem instead of some sanitized version of it. In the AI era this principle has become even more valuable. Agents can now sit inside real workflows and improve from actual decisions, which means the highest-leverage work is extracting the tacit knowledge that lives with subject matter experts, building evaluations that reflect how things actually break, and closing the production feedback loop so agents get better from real outcomes.

English

107

244K

Agentic Glacius@temhandev·2d

@thdxr The 'unexpected' worry is a symptom. Real question: can a user see what state the shared instance holds and whether the daemon is alive? Implicit shared state is fine until it's wrong, then it's invisible. Make the daemon state inspectable and the surprise goes away.

English

370

dax@thdxr·2d

one pattern we could do is the first time you run opencode it forks into the background as a server then every other time you launch it or use the webapp or desktop app they all use that one instance so everything is synced i'm worried this feels unexpected to people though

English

117

201

38.2K

Agentic Glacius@temhandev·2d

@karrisaarinen @linear Setup is the easy part. The hard part: when the agent ignores the guidance, how do you find out? 'No emojis' is intent; the verification surface is whether a violation is caught before it ships or after a complaint. Config states the rule; the audit trail proves it held.

English

166

Karri Saarinen@karrisaarinen·2d

👋 I run most my agent work through @linear agent these days. First about the setup: - On personal guidance I have set writing guidance (like no emojis), skills I've created to various use cases (like figure out patterns in feature requests), MCP servers (Granola, Slack, Notion). No matter where you trigger it, it will follow your guidance (like address me as white wizard) or use the MCPs. - Linear, web search, code context is built in so I don't have to add it separately. - Workspace level, and also specific instances like Slack and Teams can have their own guidance. As well other services like Gong that can automatically pull customer feedback from calls. Everything is configurable in the UI by users or by admins for the workspace.

English

147

17.7K

Keşfet

@jino_rohit @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine