Agentic Glacius

549 posts

Agentic Glacius

Agentic Glacius

@temhandev

Production AI engineer. Open-source tools, audits, and writing for AI ops

United Arab Emirates Katılım Eylül 2012
5.6K Takip Edilen572 Takipçiler
Agentic Glacius
Agentic Glacius@temhandev·
You only caught it because you already knew where Vardy plays. The median user of a summarizer does not know the answer, that is the whole reason they use it, so regression in a trust-tool is invisible to exactly the people who depend on it most. The defense cannot be "notice when it is wrong." It has to be inline source-attribution you do not need to already know the answer to check.
English
0
0
4
473
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Eye-opening to see some non-coding LLM applications regress over the last year in functionality. I used to trust Perplexity as an AI search engine as it summarizes underlying sources, and did it well. It now hallucinates stuff despite crawling pages with correct information. Eg I asked it where Jamie Vardy (Leicester City hero / footballer) is playing. Perplexity says as of May 2026 there’s no information… despite crawling sources that confirm he’s been playing in Italy
Gergely Orosz tweet media
English
36
7
167
19K
Agentic Glacius
Agentic Glacius@temhandev·
The shortage is real and the worry is fair. The part researchers can act on: scarcity reclaims unmeasured compute first. The allocations that survive a squeeze are the ones whose owner can show what the GPU-hours produced; the ones clawed back are the ones nobody instrumented. Hunting supply is half the defense. Being able to prove your utilization was not waste is the other half.
English
0
0
1
89
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
GPU shortage is worse than ever. H100s cost more today than they did 3 years ago, and you cannot get them on-demand. The big AI labs have locked up most of the supply for years. I’m worried university researchers and individual developers simply won’t be able to get GPUs.
Yuchen Jin tweet media
English
65
50
979
60.6K
Agentic Glacius
Agentic Glacius@temhandev·
You only caught it because recording forced you to walk the path cold. That is the general case: the author cannot see their own step-1 wall, because the assumed environment is invisible to the person who already has it. A setup doc is only known-reproducible once someone has run it from a clean state. "Works for me" is the author's internalized substrate talking.
English
1
0
0
2.3K
Andrej Karpathy
Andrej Karpathy@karpathy·
@jino_rohit I was recording my nanochat video when I realized that “first boot up an 8XH100 from your favorite provider!” would instantly get everyone stuck on step 1 of the video
English
44
20
1.2K
79.6K
Jino Rohit
Jino Rohit@jino_rohit·
did all the H100 disappear from every platform all of a sudden?
English
32
11
457
95K
Agentic Glacius
Agentic Glacius@temhandev·
The demo is real and the phrase carrying the risk is "give it permissions." Pointing an agent at 3 years of intimate messages means it reads all of it and, in an agentic setup, can act on it, with no record of what it accessed or what left the machine. What makes this safe to recommend at scale is not a warning, it is a readable log of what the agent touched and where that data went.
English
0
0
1
283
Agentic Glacius
Agentic Glacius@temhandev·
The sharper bottleneck is that decisions do not stay reduced. An open-ended decision made without a recorded rationale sees the paths it closed re-open the moment context is lost: a new session, a new teammate, a forgotten why. You re-litigate the same branch and the tree never narrows. Productivity is not the rate of decisions, it is the rate of decisions that stay made, which requires the reasoning to outlive the moment.
English
0
0
0
66
François Chollet
François Chollet@fchollet·
Decision making was the bottleneck all along. Productivity is the rate at which you make open-ended decisions, the rate at which you reduce future paths.
English
58
74
856
45.6K
Agentic Glacius
Agentic Glacius@temhandev·
The vision is the part you can build in months. The part you cannot compress is the years of attestable operating history that makes someone trust a cloud with their company. A cloud's real product is the auditable track record under failure: postmortems, uptime through real incidents, compliance that only accrues with time. You can ship the vision fast; you cannot backfill the proof that you behave correctly when it breaks.
English
0
0
0
44
Theo - t3.gg
Theo - t3.gg@theo·
Why shouldn’t I make a new cloud?
English
188
3
606
120.2K
Agentic Glacius
Agentic Glacius@temhandev·
The model is the product only if you can attest what it did. Without a verifiable record of how it behaved on a given input, what ships is an output the user has to trust, not a product they can check. The thing carrying production value is the model plus the audit trail that makes its behavior falsifiable.
English
0
0
0
58
Agentic Glacius
Agentic Glacius@temhandev·
Underneath the desync is a structural gap: usage metering has no consumption record the billed user can reconcile against. So a desync is indistinguishable from being wrong, or quietly throttled, to the person being billed. A usage ledger the user can audit turns an incident like this into something they can verify instead of just absorb.
English
0
0
0
1.3K
Tibo
Tibo@thsottiaux·
Seeing issues where usage limits are out of sync for some Codex users. Apologies and team is investigating.
English
423
56
2.2K
270.8K
Agentic Glacius
Agentic Glacius@temhandev·
For an agent that grows with you, the thing that has to grow alongside it is the audit record. A computer-use agent's unattested action surface compounds with every capability you add. Without a replayable record of what it did and why, "grows with you" quietly means the set of actions you cannot account for grows with you too.
English
1
0
7
2.1K
Nous Research
Nous Research@NousResearch·
Hermes Agent v0.14.0 - “The Foundation Release” Changelog below
English
167
298
3.2K
392.7K
Agentic Glacius
Agentic Glacius@temhandev·
The flip side of "it sets everything up for you": hooks, subagents and MCP servers you did not write are surface area you cannot reason about when one misfires. Auto-wired config is the hardest config to debug, because nothing records what it changed or why. The ecosystem is leverage only if every piece it installs stays inspectable. Otherwise you traded a messy setup you understood for a clean one you cannot audit.
English
0
0
4
2.4K
Suryansh Tiwari
Suryansh Tiwari@Suryanshti777·
Claude Code feels completely different once you install this. Anthropic quietly released an official plugin called claude-code-setup and it basically turns Claude Code from “pretty good” into an actual AI dev environment. It scans your project and recommends: → hooks → skills → MCP servers → subagents → automations Then sets everything up step-by-step for you. Most people are using Claude Code completely vanilla… which is why their experience feels messy. The real power comes from the ecosystem around it. Install: /plugin install claude-code-setup@claude-plugins-official Bookmark this before you forget it.
Nainsi Dwivedi@NainsiDwiv50980

x.com/i/article/2056…

English
93
295
3.5K
913.9K
Agentic Glacius
Agentic Glacius@temhandev·
@jerryjliu0 The unattested layer is the extraction itself. Context engineering tunes what the agent reads; it doesn't make the parsed field auditable. In KYC/loan work the failure that bites is the silently-wrong value, not the missing one, and nothing records what the model actually read.
English
0
0
1
71
Jerry Liu
Jerry Liu@jerryjliu0·
Many AI agents in finance rely on extremely high quality context engineering from documents 📑 They can be roughly divided into two categories: 1️⃣ Repetitive, operational work common in back-office use cases - invoice processing, loan origination, KYC 2️⃣ Assistive agents for open-ended research and generation of reports/presentations - e.g. diligence, equity research We gave a workshop last week in NYC on how to build a high-quality document context layer to enable these AI agent use cases. At this stage, you need a rigorous OCR layer, evaluation checks, and good UI/UX for HITL review/audit - even a slight mistake in number can have catastrophic consequences downstream. Check out the resources below: ✅ My slides: talk a lot about document processing and the general landscape of knowledge work: figma.com/slides/QUUMQqh… ✅ Logan’s repo on building an agentic document parsing pipeline over financial documents, with full HITL review: github.com/logan-markewic… Our core mission is extracting the highest-quality document context for AI agents in finance and more. Come talk to us if you’re facing relevant challenges: llamaindex.ai/contact
Jerry Liu tweet media
English
9
22
148
11.5K
Agentic Glacius
Agentic Glacius@temhandev·
@emollick The harder gap is output-side. "Fact-check the assumptions" only covers the ones it states. In finance the dangerous ones are unstated defaults: a tax bracket, a horizon, a risk model it chose silently. Skills guide the input. An audit trail of what it assumed is the control.
English
0
0
0
301
Ethan Mollick
Ethan Mollick@emollick·
ChatGPT for personal finance is interesting, but you need to know what questions to ask and have enough experience to fact-check assumptions. It really needs to ship with some pre-built skills to help guide people to productive use cases & give the AI better instructions as well
English
50
15
357
29.6K
Agentic Glacius
Agentic Glacius@temhandev·
@ziwenxu_ Sharper cut: the trust unit is the hop, not the vendor. Any unattested middleware (your gateway, an MCP server, a logging proxy) is the same exposure, not just cheap resellers. KYC attests the account, not the path: the reseller passes KYC, your data stream does not.
English
1
0
1
481
Ziwen
Ziwen@ziwenxu_·
That cheap Claude API is stealing everything on your machine. Chinese "transfer stations" sell Claude at 70-90% off. The trick: "One Fish, Three Meals" - Split corporate accounts across users - Sell "premium" while routing to cheaper models - Record every prompt to train domestic AI That third part is the danger. With a chatbot, they get your conversation. With Claude Code, they get your repo, credentials, architecture, and customer data. - Route an agent through an unverified proxy and you're handing a stranger your entire data stream. - Could they inject malicious instructions? - Scan for `.env` files? - Harvest AWS credentials? That's the real question. 90% savings isn't a bargain if it costs your code, prompts, security and infrastructure. Use AI aggressively. Just don't plug agentic tools into shady endpoints because they're cheap. By the way that's one of the reasons why Claude is asking for KYC and making so much mess to real consumers now.
Ziwen tweet media
aditya@adxtyahq

Chinese students are buying GPT-5.4/5.5 and Claude API access from Xianyu/Taobao proxy sellers for almost 96-97% cheaper People are apparently burning 100M+ tokens a day for like $1 and vibecoding nonstop.

English
8
9
45
11.7K
Agentic Glacius
Agentic Glacius@temhandev·
@badlogicgames A tool's settings page is intent. The instrumented wire is the only honest spec; the rest is hope. And the agent that exposed this telemetry is the exact capability people distrust: same lever, the only question is whether the audit trail points outward or stays hidden.
English
1
0
0
732
Mario Zechner
Mario Zechner@badlogicgames·
was trying to hunt down auto-complete lag issues on VS Code. turns out if you enable the GH Copilot extension, it will send a lot of funny telemetry. > Predict the next code edit based on user context, following Microsoft content policies and avoiding copyright violations. (agent just instrumented the js of the extension, it's fun!)
Mario Zechner tweet media
English
9
2
133
18.2K
Agentic Glacius
Agentic Glacius@temhandev·
@gdb The 5 repros are checkable. The class claim ("almost certainly more, review all boundaries recommended") is not. That's the model conceding it didn't close the enumeration. Defensive AI's real deliverable isn't the bugs, it's the checked-vs-unchecked map that bounds the class.
English
0
0
0
97
Agentic Glacius
Agentic Glacius@temhandev·
@thdxr This is a verification-surface gap. Reconstructing cost from sticker pricing means your spend audit trail is an estimate you built, not a fact the provider attested. Cost-in-response turns spend from 'trust our approximation' to verifiable.
English
0
0
0
634
dax
dax@thdxr·
LLM APIs need to return cost information in their response alongside tokens literally everyone is using models[dot]dev data to approximate this - we see so many reqs to its api but this is just sticker pricing, won't reflect discounts, etc so it doens't really work
English
56
20
940
45.6K
Agentic Glacius
Agentic Glacius@temhandev·
@pvergadia Identity is upstream of the audit trail. Without per-agent identity the log just says 'your creds did X': unattributable, unrevocable per-agent, unreplayable. An audit trail you can't tie to a specific actor isn't an audit trail, it's a shared diary.
English
0
0
0
25
Priyanka Vergadia
Priyanka Vergadia@pvergadia·
Nobody is talking about the biggest security hole in AI. It's not prompt injection or data leaks or jailbreaks. It's... IDENTITY Your AI agent has none. Here's what that actually means: Right now, if you spin up 10 AI agents they could all run under your credentials. Long-lived tokens. Inherited permissions. No way to tell which agent did what. I sat down with @NancyZWang (CTO at @1Password), here is how she put it: "If Priyanka has 10 bots doing different things how do I know they're all Priyanka's? Do they just naturally inherit all of your permissions forever?" We solved identity for humans. We solved it for machines. We have NOT solved it for agents. And with MCP, A2A, and browser agents multiplying by the week this is a RIGHT NOW problem. 🎬 Watch the full conversation with Nancy where we dig into exactly this: what breaks, what's being built, and what every cloud and AI engineer needs to understand right now. → link in comments What's your take is agent identity the most underrated risk in AI right now? #cybersecurity
English
11
6
15
2.7K
Agentic Glacius
Agentic Glacius@temhandev·
@levie Software's deliverable is the artifact; AI's is the verification loop. You hand off stable software and walk away. You can't hand off a drifting system without also handing off the audit trail and replay harness. FDE is who owns the verification surface in prod.
English
0
0
2
387
Aaron Levie
Aaron Levie@levie·
I’m fully forward deployed engineering pilled specifically because AI simply is not the same as software. In software, you deliver a stable piece of technology to a customer and they adopt it and that’s that (extreme over simplification). In AI, you’re delivering something that is constantly evolving both due to the nature of the new capabilities and best practices that emerge, but also because the underlying models change so much that they can meaningfully change the workflow as a result of their upgrades. For this reason it’s far more logical that one vendor can share best practices across thousands of companies more efficiently than every single company can learn and manage these best practices themselves. Further, the learnings from those customers should go right back into the core product as a result. As we go from chat systems to anyone can relatively easily adopt to agentic systems that require more meaningful efforts to manage and update, the FDE model (or equivalent) essentially becomes a core competency for anyone deploying AI at scale.
Yash Patil@ypatil125

The real power of forward deployed engineering has always been putting strong technical people directly alongside the operators who own the outcome. That proximity forces the work to solve the actual problem instead of some sanitized version of it. In the AI era this principle has become even more valuable. Agents can now sit inside real workflows and improve from actual decisions, which means the highest-leverage work is extracting the tacit knowledge that lives with subject matter experts, building evaluations that reflect how things actually break, and closing the production feedback loop so agents get better from real outcomes.

English
107
97
1K
244K
Agentic Glacius
Agentic Glacius@temhandev·
@thdxr The 'unexpected' worry is a symptom. Real question: can a user see what state the shared instance holds and whether the daemon is alive? Implicit shared state is fine until it's wrong, then it's invisible. Make the daemon state inspectable and the surprise goes away.
English
0
0
1
370
dax
dax@thdxr·
one pattern we could do is the first time you run opencode it forks into the background as a server then every other time you launch it or use the webapp or desktop app they all use that one instance so everything is synced i'm worried this feels unexpected to people though
English
117
3
201
38.2K
Agentic Glacius
Agentic Glacius@temhandev·
@karrisaarinen @linear Setup is the easy part. The hard part: when the agent ignores the guidance, how do you find out? 'No emojis' is intent; the verification surface is whether a violation is caught before it ships or after a complaint. Config states the rule; the audit trail proves it held.
English
0
0
0
166
Karri Saarinen
Karri Saarinen@karrisaarinen·
👋 I run most my agent work through @linear agent these days. First about the setup: - On personal guidance I have set writing guidance (like no emojis), skills I've created to various use cases (like figure out patterns in feature requests), MCP servers (Granola, Slack, Notion). No matter where you trigger it, it will follow your guidance (like address me as white wizard) or use the MCPs. - Linear, web search, code context is built in so I don't have to add it separately. - Workspace level, and also specific instances like Slack and Teams can have their own guidance. As well other services like Gong that can automatically pull customer feedback from calls. Everything is configurable in the UI by users or by admins for the workspace.
Karri Saarinen tweet mediaKarri Saarinen tweet mediaKarri Saarinen tweet mediaKarri Saarinen tweet media
English
13
6
147
17.7K