Hammad Abbasi

270 posts

Hammad Abbasi

@hammadspeaks

Innovating Enterprise Applications with AI & LLM | Solutions Architect | Tech Writer & Innovator | Bringing Ideas to Life using NextGen Technological Innovation

United Arab Emirates 参加日 Mart 2013

283 フォロー中106 フォロワー

Hammad Abbasi@hammadspeaks·5h

I've been using a pretty similar setup one thing i kept running into though: bm25 is a good baseline, but for agent workflows it falls over pretty fast. it scores relevance, but it’s still basically bag-of-words, so it misses intent and doesn't adapt well across code / docs / logs i ended up building this for that: github.com/csehammad/agrep instead of embedding queries, it compiles them nl query -> intent -> query plan -> deterministic ranking. felt way better for agents because it's controllable, debuggable, and the outputs are structured enough to chain on.

English

399

Andrej Karpathy@karpathy·7h

Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.

Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

534

1.2K

13.3K

1.8M

Hammad Abbasi@hammadspeaks·4d

Ever wonder how language models turns your question into an answer? I built a visual, interactive guides that explains how it works under the hood. hammadabbasi.com/under-the-hood… It covers: - how AI "learns patterns" from text - how it pays attention to context in your prompt - why it breaks words into smaller pieces (tokens) - how meaning is represented as numbers (embeddings) -how models are trained and improved -how responses are generated in real time If you’re curious about how these systems actually work, this is for you. #ArtificialIntelligence #GPT #Transformers #DeepLearning Built with @AnthropicAI #Opus_4_6

English

Hammad Abbasi@hammadspeaks·4d

@GergelyOrosz pulling in deps like axios just to avoid boilerplate is harder to justify now. a native fetch wrapper can cover baseURL, JSON, creds, 401 redirects, etc. in 10 minutes and it's code you own. codegen is cheap. the bar for adding packages should be higher.

English

956

Gergely Orosz@GergelyOrosz·4d

Supply chain attacks are becoming more frequent, and far more serious. What are sensible practices to protect against these when using Node or Python packages? I assume pinning versions is the bare minimum; for those with security teams / tools: why else do you do / can you do?

Feross@feross

🚨 CRITICAL: Active supply chain attack on axios -- one of npm's most depended-on packages. The latest axios @1.14.1 now pulls in plain-crypto-js@4.2.1, a package that did not exist before today. This is a live compromise. This is textbook supply chain installer malware. axios has 100M+ weekly downloads. Every npm install pulling the latest version is potentially compromised right now. Socket AI analysis confirms this is malware. plain-crypto-js is an obfuscated dropper/loader that: • Deobfuscates embedded payloads and operational strings at runtime • Dynamically loads fs, os, and execSync to evade static analysis • Executes decoded shell commands • Stages and copies payload files into OS temp and Windows ProgramData directories • Deletes and renames artifacts post-execution to destroy forensic evidence If you use axios, pin your version immediately and audit your lockfiles. Do not upgrade.

English

114

651

112.6K

Hammad Abbasi@hammadspeaks·4d

supply chain attacks are up. pulling in deps just to avoid boilerplate is harder to justify now. a native fetch wrapper can cover baseURL, JSON, creds, 401 redirects, etc. in 10 minutes and it's code you own. codegen is cheap. the bar for adding packages should be higher. #supplychainsecurity #opensource #npm #axios

Feross@feross

English

Hammad Abbasi@hammadspeaks·6d

I'd agree. I've seen patterns first-hand that engineers are treating AI output with the full trust . Accepting architecture suggestions without evaluating trade-offs, merging generated code without understanding it, putting rules in a markdown file and assuming the agent will follow them. Wrote about where that leads. linkedin.com/pulse/when-sof…

English

264

Mo Bitar@atmoio·6d

Ingraining in people’s minds that AI is sophisticated autocomplete would cure 95% of all AI psychosis cases. Normal people have no idea what AI actually is and it’s leading to all sorts of delusions.

Boris Cherny@bcherny

A weird part of working at Anthropic: getting a few of these each day

English

111

1.6K

74.7K

Hammad Abbasi@hammadspeaks·6d

Amazon mandated 80% AI coding tool adoption. Three months later, 6.3 million lost orders and a safety reset across 335 critical systems. The code looked fine. The review process hadn't caught up. That's the problem nobody wants to talk about. Generation got cheaper. Judgment didn't. Wrote a deep dive backed by peer-reviewed research and recent security incidents. It goes beyond what went wrong into the deeper questions: what happens to engineering skill when friction disappears, why agents ignore their own rules after long conversations, and how understanding debt is harder to fix than technical debt. linkedin.com/pulse/when-sof… #softwareengineering #aicoding #technicaldebt #productivity #futureofwork #llms #cybersecurity #softwaredevelopment #agenticai #codequality #engineeringculture

English

Hammad Abbasi@hammadspeaks·27 Mar

This is why “AI agents using your computer” should make people more cautious than excited. A model does not inherently know which instruction came from the user and which came from an attacker. It only sees tokens in context. If the application wraps hostile input in a trusted workflow, the model can treat malicious instructions as real commands. That is not a small bug. That is a structural security problem. The danger is not just bad answers. It is systems with permissions, memory, tools, and authority operating in environments full of adversarial content they cannot reliably classify. This is also why so many proposed fixes simply do not work. A classifier, a regex rule, a keyword filter, or a thin detection layer does not solve the underlying problem. Attackers can rephrase, fragment, disguise, or embed instructions in ways that evade simplistic checks. A s long as trusted intent and untrusted content are allowed to coexist in the same decision space, the system remains fundamentally exposed. That is not a detection problem. It is a trust-boundary and architecture problem.

English

185

The Hacker News@TheHackersNews·26 Mar

⚠️ A flaw in Claude’s Chrome extension let attackers inject prompts by just visiting a page. No clicks. A hidden iframe + XSS chain made the extension treat attacker input as real user commands, enabling data theft and actions like sending emails. 🔗 How the silent prompt injection worked → thehackernews.com/2026/03/claude…

English

263

29.7K

Hammad Abbasi@hammadspeaks·24 Mar

@ritakozlov This is the direction I’ve been writing about for the last year: not bigger tool catalogs in context, but runtime-governed code execution. Good to see Cloudflare make it concrete. levelup.gitconnected.com/why-code-execu…

English

rita kozlov 🐀@ritakozlov·24 Mar

dynamic workers are now in open beta, and available for anyone to try! 🚀 crazy that the isolates bet we made 9 years ago is the perfect fit for the current era of giving agents dynamic execution environments blog.cloudflare.com/dynamic-worker…

English

7.6K

Hammad Abbasi@hammadspeaks·24 Mar

@burcs This is the direction I’ve been writing about for the last year: not bigger tool catalogs in context, but runtime-governed code execution. Agents write code, runtimes enforce policy, APIs stay real. Good to see Cloudflare make it concrete. levelup.gitconnected.com/why-code-execu…

English

brandon@burcs·24 Mar

classic cloudflare... changing the current paradigm on containers/sandboxes by reinventing them to be lightweight, less expensive, and ridiculously fast

Cloudflare@Cloudflare

We’re introducing Dynamic Workers, which allow you to execute AI-generated code in secure, lightweight isolates. This approach is 100 times faster than traditional containers. cfl.re/4c2NvPl

English

210

30.6K

Hammad Abbasi@hammadspeaks·24 Mar

@TipsCsharp Exactly. I’ve been arguing for a year that code execution beats bloated/static tool registries. Let agents write code against real APIs, run it in a governed sandbox, and enforce policy at runtime. Glad to see this direction getting validated. levelup.gitconnected.com/why-code-execu…

English

741

Arvind@TipsCsharp·24 Mar

Cloudflare just dropped Dynamic Workers and it's a massive deal for AI agents. The problem: AI agents generate code. That code needs a sandbox. Containers take 100-500ms to boot and 100-500MB RAM. Dynamic Workers use V8 isolates instead: - Startup: 1-5ms (100x faster) - Memory: few MB (100x less) - No warm pools needed - Unlimited concurrency - Runs on same thread as host The killer feature: TypeScript API definitions replace OpenAPI specs. Fewer tokens, cleaner code, type-safe RPC across the sandbox boundary via Cap'n Web RPC. Code Mode: LLM writes TS code → runs in isolate → calls typed APIs → only final result returns to context. 81% fewer tokens vs sequential tool calls. $0.002 per Worker loaded/day. Free during beta. This is the serverless sandbox containers should have been.

English

570

137.3K

Hammad Abbasi@hammadspeaks·24 Mar

Been saying this for the last year: giant tool registries are the wrong abstraction. The scalable pattern is agents writing code and running it in governed sandboxes. Great to see Cloudflare push this forward. medium.com/gitconnected/w… #aiagents #codemode #codeexecution #mcp #toolcalling

Cloudflare@Cloudflare

We’re introducing Dynamic Workers, which allow you to execute AI-generated code in secure, lightweight isolates. This approach is 100 times faster than traditional containers. cfl.re/4c2NvPl

English

Hammad Abbasi@hammadspeaks·24 Mar

@jpschroeder This is a huge milestone, i've been advocating about this approach for a year: medium.com/gitconnected/w…

English

354

Justin Schroeder@jpschroeder·24 Mar

This is a much bigger deal than most people realize. If you don't know why, let me explain. Agents perform "work" right now by calling "tools". These are just pieces of context shoved into the context window saying "if you think you the next thing you should do falls into one of these categories, then respond with this format" — that format is the "tool" a JSONSchema response which a harness then uses to call a function. MCP, is best thought of as a way to shove more tools and context into your context window (it has a lot of shortcomings imo). The agent then has to pick which tool out of all the available tools it should call. So the more tools you have, the worse it selects the tools. @threepointone and @KentonVarda have an excellent article (blog.cloudflare.com/code-mode) where they introduced the idea of exposing the MCP tools as an SDK, so to call tools and compose them, the AI just does what it is ALREADY good at: write some code. The question, as always, is where do you run that code safely. Many have proposed sandboxes and containers as a possible solution, but these are hella slow and make the experience untenable. Thats what makes this announcement SO important, it allows you to run agent-written code in a matter of milliseconds with the explicit execution environment you specify pulled in (like a database, kv store, etc. Cloudflare calls these "bindings" btw). In practice, this means people can start building MUCH more effective agents that can *do* a lot more, because they can be exposed to more tools. Anyway, huge deal. Congrats to the CF team.

Cloudflare@Cloudflare

We’re introducing Dynamic Workers, which allow you to execute AI-generated code in secure, lightweight isolates. This approach is 100 times faster than traditional containers. cfl.re/4c2NvPl

English

114

427.7K

Hammad Abbasi@hammadspeaks·24 Mar

@JosephJacks_ A few years ago, if you installed software that could watch your screen, read your files, listen to audio, take remote instructions, and act across apps, most people would have called it spyware, a RAT, or an admin backdoor .

English

JJ@JosephJacks_·24 Mar

Don’t get me wrong, I love Opus 4.6 But there is no fucking way I’m letting Anthropic control my computer That’s why we have open source

Felix Rieseberg@felixrieseberg

Today, we’re releasing a feature that allows Claude to control your computer: Mouse, keyboard, and screen, giving it the ability to use any app. I believe this is especially useful if used with Dispatch, which allows you to remotely control Claude on your computer while you’re away.

English

202

30.7K

Hammad Abbasi@hammadspeaks·24 Mar

@_jaydeepkarale A few years ago, if you installed software that could watch your screen, read your files, listen to audio, take remote instructions, and act across apps, most people would have called it spyware, a RAT, or an admin backdoor.

English

582

Jaydeep@_jaydeepkarale·24 Mar

90% companies dont allow this btw

Claude@claudeai

You can now enable Claude to use your computer to complete tasks. It opens your apps, navigates your browser, fills in spreadsheets—anything you'd do sitting at your desk. Research preview in Claude Cowork and Claude Code, macOS only.

English

247

2.9K

372.8K

Hammad Abbasi@hammadspeaks·24 Mar

Interesting times!! We're entering a phase where the industry is so focused on speed, lower friction, seamless automation, and growth that security is quietly being pushed into the background. What matters here is not just whether an agent can use your computer. It's whether we're comfortable turning these systems into remote operators with real authority over our inbox, files, apps, workflows, and actions, before we've built a serious security model around that authority. A system like that is not merely a productivity tool. It creates a new attack surface: prompt injection through documents, web pages, logs, or on-screen content; excessive permissions that turn assistance into remote operational reach; extension and tool abuse; human review steps skipped because they are treated as friction; and data exfiltration hidden behind the language of convenience. What makes this category dangerous is not just model error, but the fact that it packages surveillance, action, and automation into one trusted workflow that can be abused far more easily. And what we are missing in this 'claw-wave' is critical thinking. Not just asking whether something can be automated, but whether it should be automated this way in the first place. Sometimes the most important step is to zoom out and question the problem itself, the tradeoffs being ignored, and whether removing friction is also removing the judgment, oversight, and restraint that were protecting the system to begin with.

English

567

Florian Roth ⚡️@cyb3rops·24 Mar

I don’t want any LLM running random applications on my computer, navigating my browser, or touching my spreadsheets. I don’t trust them to do the right thing all the time - and nobody doing serious work should. Sandboxed, with a controlled blast radius, fine. Full control over anything you can’t afford to lose? Never.

Claude@claudeai

English

163

162

1.8K

122.8K

Hammad Abbasi@hammadspeaks·24 Mar

Interesting times!! We're entering a phase where the industry is so focused on speed, lower friction, seamless automation, and growth that security is quietly being pushed into the background. What matters here is not just whether an agent can use your computer. It's whether we're comfortable turning these systems into remote operators with real authority over our inbox, files, apps, workflows, and actions, before we've built a serious security model around that authority. A system like that is not merely a productivity tool. It creates a new attack surface: prompt injection through documents, web pages, logs, or on-screen content; excessive permissions that turn assistance into remote operational reach; extension and tool abuse; human review steps skipped because they are treated as friction; and data exfiltration hidden behind the language of convenience. What makes this category dangerous is not just model error, but the fact that it packages surveillance, action, and automation into one trusted workflow that can be abused far more easily. And what we are missing in this rush is critical thinking. Not just asking whether something can be automated, but whether it should be automated this way in the first place. Sometimes the most important step is to zoom out and question the problem itself, the tradeoffs being ignored, and whether removing friction is also removing the judgment, oversight, and restraint that were protecting the system to begin with. The biggest AI failures (in my view) won't come from bad outputs. They'll come from systems doing the wrong thing very efficiently (because that is exactly what they were designed to optimize for) #ai #aiagents #agenticai #security #aisafety #riskmanagement #cybersecurity #automation #governance #trust

Felix Rieseberg@felixrieseberg

English

Hammad Abbasi@hammadspeaks·22 Mar

@OpenAIDevs This update is the stack catching up to the pattern: code-first agents + governed runtimes + less orchestration in prompt space. I wrote about this here: medium.com/gitconnected/w…

English

748

OpenAI Developers@OpenAIDevs·21 Mar

Agent workflows got even faster. You can spin up containers for skills, shell and code interpreter about 10x faster. We added a container pool to the Responses API, so requests can reuse warm infrastructure instead of creating a full container creation each session. #hosted-shell-quickstart" target="_blank" rel="nofollow noopener">developers.openai.com/api/docs/guide…

English

110

149

2.1K

173.1K

Hammad Abbasi@hammadspeaks·17 Mar

@atmoio Its a dunning kurger effect but on steroids. Wrote along the similar lines levelup.gitconnected.com/the-productivi…

English

Mo Bitar@atmoio·16 Mar

AI is making CEOs delusional

Indonesia

2.6K

19.1K

2.8M

Hammad Abbasi@hammadspeaks·16 Mar

I've been running local models for a while now, and the thing that hits you first isn't benchmark scores. It's speed. A slow model, even a smart one, kills the experience fast. That's what makes speculative decoding worth paying attention to. Instead of having the large model generate every token by itself, you pair it with a small draft model. The small one quickly guesses the next few tokens, the large one checks them. Correct guesses get accepted in bulk. Wrong ones get corrected and you move on. Same output, just faster. Think of it as: small model drafts, big model reviews. The large model stays the final authority, but stops burning cycles on every single token. Why this matters for local use specifically is that tok/s is the thing you actually feel. Not model size. Not leaderboard rankings. Just: how fast does it become usable? Code generation is a strong fit because many programming tasks contain repeated, predictable patterns. If you ask a local model to build a simple simulation, much of the output follows common structures such as data setup, state transitions, control flow, helper functions, and result reporting. That gives the draft model a better chance of predicting useful token sequences, allowing the larger model to verify and accept larger spans rather than producing every token from scratch. Structured code is more predictable than prose, so the gains tend to be bigger there. LM Studio shared numbers that make this concrete. On a quicksort prompt, speculative decoding pushed throughput from 7.30 to 17.74 tok/s on an M3 Pro . On an explanation prompt, the gains were smaller. Open-ended text is harder to predict. Speculative decoding makes the inference faster. The large model stays the final authority, it just stops wasting time on every tiny step. That feels like where local AI goes next. Not just through better models, but through better engineering around them. The problem now is turning raw capability into something fast enough to feel good in practice. Paper : arxiv.org/abs/2603.03251 #localllm #localai #speculativedecoding #llminference #aiinference #opensourceai #llmops #modeloptimization

English

ディスカバー

@AnthropicAI @GergelyOrosz @ritakozlov @burcs @TipsCsharp @jpschroeder @threepointone @KentonVarda