Emeka

1.6K posts

Emeka banner
Emeka

Emeka

@TheConfigGuy

If you like watching someone slowly automate themselves into employability, you’re in the right place. I do #ServiceNow. Mostly. #ITOM #ITAM

Glasgow, Scotland Katılım Ekim 2022
354 Takip Edilen829 Takipçiler
Sabitlenmiş Tweet
Emeka retweetledi
Aakash Gupta
Aakash Gupta@aakashgupta·
Anthropic is building a secure OpenClaw. Four features in 30 days, each one reverse-engineered from the open-source agent that hit 250K GitHub stars and 40,000 exposed machines. The feature mapping is surgical: OpenClaw: text agent from WhatsApp, it works on your desktop. Anthropic: Dispatch (March 17). Persistent thread from phone to desktop. OpenClaw: Discord and Telegram as control surfaces. Anthropic: Claude Code Channels (March 20). MCP bridge to both. OpenClaw: full OS access, browser control, app manipulation. Anthropic: computer use in Cowork and Claude Code (today). OpenClaw: 100+ community skills, no review process. Anthropic: curated plugin marketplace with enterprise admin controls. OpenClaw: heartbeat daemon, always-on 24/7. Anthropic: desktop must stay open. Intentional friction. Runaway prevention. The strategy is legible: let open source take the arrows, ship the enterprise-safe version before anyone else can. OpenClaw proved 250K developers want to text an AI that controls their computer. OpenClaw also proved that desire produces one-click RCEs, CrowdStrike threat advisories, agents creating dating profiles nobody asked for, inbox deletions during “automated cleanup,” and 20% malware rates in skill ecosystems. Anthropic studied every failure mode and built the inverse. Connectors before computer use. Permission prompts before every action. Sandboxed execution. Every constraint maps to a compliance checkbox. Gaps remain. Dispatch requires Anthropic’s own mobile app. OpenClaw works in WhatsApp and iMessage, apps 3 billion people already use. No native messaging integration yet. Cowork needs your Mac awake with Claude Desktop running. No headless mode, no background daemon, no proactive monitoring where the agent messages you first. The “always-on coworker” positioning still requires you to be mostly-on yourself. Here’s where it gets interesting. Steinberger built OpenClaw entirely on OpenAI’s Codex. Said his productivity doubled. Publicly called Claude Opus the best general-purpose agent while building the biggest agent project in history on a competitor’s coding tool. Joined OpenAI February 14. Altman posted he’d “drive the next generation of personal agents” and it would “quickly become core to our product offerings.” Five weeks of “quickly”: GPT-5.4 with strong benchmarks. ChatGPT agent mode in a cloud sandbox. And a March 20 “code red” meeting where leadership concluded product fragmentation was losing them the race to Anthropic’s unified tools. The plan: merge ChatGPT, Codex, and Atlas into one superapp. The core loop Steinberger proved, text from phone, agent works on your machine, you return to finished output, doesn’t exist in any OpenAI product. Their agent runs in an isolated cloud browser. No local files. No persistent desktop control. No async handoff. The person who built the most successful personal agent in history is inside OpenAI. The product that reflects his insight isn’t. Anthropic sent trademark lawyers, then shipped the product. OpenAI sent an offer letter, then called a reorg. The agent race rewards shipping velocity over hiring velocity. One company is converting the OpenClaw demand signal into product. The other is converting it into org charts.
Claude@claudeai

You can now enable Claude to use your computer to complete tasks. It opens your apps, navigates your browser, fills in spreadsheets—anything you'd do sitting at your desk. Research preview in Claude Cowork and Claude Code, macOS only.

English
22
47
469
112.8K
Emeka
Emeka@TheConfigGuy·
I've just offloaded Openclaw from my PC. Claude replaced it 100%.
English
0
0
0
9
Emeka
Emeka@TheConfigGuy·
@rohanpaul_ai this framing is right. AI agents identify what needs to happen but fall apart on execution. ServiceNow's 80B+ workflows mean the wiring already exists. you're not building connectors, you're routing through ones that are already there.
English
0
0
0
3
Rohan Paul
Rohan Paul@rohanpaul_ai·
ServiceNow CEO Bill McDermott: ServiceNow is growing revenue at 20%+ with zero headcount growth by deploying the AI agent into its workflows. "AI needs a clear shot on goal. It can't go in & out of hundreds of multiple systems. That's where we come in"
English
22
18
139
33K
Emeka
Emeka@TheConfigGuy·
@milan_milanovic the function reordering stat is the one that stuck with me. 83% accuracy drop just from moving code around. practical takeaway: when debugging with AI, pass only the relevant function directly rather than the whole file. the noise kills it more than people realize.
English
0
0
0
98
Dr Milan Milanović
Dr Milan Milanović@milan_milanovic·
𝗟𝗟𝗠𝘀 𝗔𝗿𝗲 𝗡𝗼𝘁 𝗥𝗲𝗮𝗱𝗶𝗻𝗴 𝗬𝗼𝘂𝗿 𝗖𝗼𝗱𝗲 We keep calling LLMs "AI coding assistants." But writing code and understanding code are not the same thing. Researchers from Virginia Tech and Carnegie Mellon University just ran 750,000 debugging experiments across 10 models to determine how well LLMs actually understand code. The results show that you should not blindly trust your AI coding assistant when debugging. Here is what they found: 𝟭. 𝗔 𝗿𝗲𝗻𝗮𝗺𝗲𝗱 𝘃𝗮𝗿𝗶𝗮𝗯𝗹𝗲 𝗯𝗿𝗲𝗮𝗸𝘀 𝘁𝗵𝗲 𝗱𝗲𝗯𝘂𝗴𝗴𝗲𝗿 Researchers created a bug, confirmed that the LLM found it, then made changes that don't touch the bug at all, such as renaming a variable or adding a comment. In 78% of cases, the model could no longer find the same bug. The bug was still there. The variable names and comments changed, and that was enough. 𝟮. 𝗗𝗲𝗮𝗱 𝗰𝗼𝗱𝗲 𝗶𝘀 𝗮 𝘁𝗿𝗮𝗽 Adding code that never runs reduced bug-detection accuracy to 20.38%. Models treated dead code as live, and flagged it as the source of the bug. But the bug was in another line. So, LLMs cannot reliably distinguish "this runs" from "this never runs." 𝟯. 𝗠𝗼𝗱𝗲𝗹𝘀 𝗿𝗲𝗮𝗱 𝘁𝗼𝗽-𝘁𝗼-𝗯𝗼𝘁𝘁𝗼𝗺, 𝗻𝗼𝘁 𝗹𝗼𝗴𝗶𝗰𝗮𝗹𝗹𝘆 56% of correctly found bugs were in the first quarter of the file. Only 6% were in the last quarter. The further down the code, the less attention the model pays to it. If the bug lives in the bottom half of your file, the model is already less likely to find it. 𝟰. 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝗿𝗲𝗼𝗿𝗱𝗲𝗿𝗶𝗻𝗴 𝗮𝗹𝗼𝗻𝗲 𝗰𝘂𝘁 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 𝗯𝘆 𝟴𝟯% Changing the order of functions in a Java file caused an 83% drop in debugging accuracy. The code still remained the same. Where the code physically sits in the file matters more to the model than what the code does. So, obviously, this is a sign of pattern recognition, not real code understanding. 𝟱. 𝗡𝗲𝘄𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 𝗵𝗮𝗿𝗱𝗹𝘆 𝗺𝗼𝘃𝗲 𝘁𝗵𝗲 𝗻𝗲𝗲𝗱𝗹𝗲 Claude improved ~1% between 3.7 and 4.5 Sonnet on this task. Gemini improved by ~1.8%. Every model release comes with a new benchmark leaderboard and new headlines. But the ability to reason about code under realistic conditions is improving slowly. 𝟲. 𝗧𝗵𝗲𝘀𝗲 𝘄𝗲𝗿𝗲 𝗯𝗲𝘀𝘁-𝗰𝗮𝘀𝗲 𝗰𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻𝘀 The study used single-file programs with ~250 lines, and each had a clear description of what the code should do. The authors say this was intentional. They wanted the best-case conditions. Real production code is multi-file, cross-module, and poorly documented. It will perform worse for sure. Here are three things worth changing based on the research: 🔹 𝗣𝗮𝘀𝘀 𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗰𝗼𝗻𝘁𝗲𝘅𝘁, 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗰𝗼𝗱𝗲. When asking an LLM to debug, include test output, stack traces, and failure messages alongside the source. Without runtime details, the model is guessing based on the code. 🔹 𝗗𝗼𝗻'𝘁 𝘁𝗿𝘂𝘀𝘁 𝗶𝘁 𝗼𝗻 𝗱𝗲𝗲𝗽-𝗳𝗶𝗹𝗲 𝗯𝘂𝗴𝘀. If the suspect code is in the bottom third of a long file, the model will have trouble finding it. Consider splitting the context or feeding the relevant function directly. 🔹 𝗖𝗹𝗲𝗮𝗻 𝘂𝗽 𝗱𝗲𝗮𝗱 𝗰𝗼𝗱𝗲 𝗯𝗲𝗳𝗼𝗿𝗲 𝘂𝘀𝗶𝗻𝗴 𝗔𝗜 𝗱𝗲𝗯𝘂𝗴𝗴𝗶𝗻𝗴 𝘁𝗼𝗼𝗹𝘀. Commented-out blocks and unreachable branches will mislead the model. It cannot filter them out. We rate AI coding tools on HumanEval. That tests whether a model can write a function from a description, but this says nothing about finding a bug in code it didn't write. Those are different problems. We're using the wrong benchmark.
Dr Milan Milanović tweet media
English
87
231
1.1K
103.1K
Emeka
Emeka@TheConfigGuy·
@trq212 would love if it asked: what's the deploy process, what tools/CLIs are available, and what should Claude never touch without asking. those three things cover 80% of what ends up in CLAUDE.md manually. if /init can pull those out through an interview, huge time saver.
English
1
0
0
32
Emeka
Emeka@TheConfigGuy·
@joseph_h_garvin two things that fixed it: CLAUDE.md with explicit rules on when to pause vs proceed (low-risk = just do it, destructive = stop). and --dangerously-skip-permissions kills the constant approval prompts. once both are set, 2-3 hour unattended runs are normal.
English
0
0
1
41
Joseph Garvin
Joseph Garvin@joseph_h_garvin·
Claude code rarely runs for longer than 15m without stopping and asking for input from me. How do all these stories of people letting agents run overnight work? Custom harnesses? Yelling at Claude in all caps to keep going no matter what?
English
403
65
5.8K
1.3M
Emeka
Emeka@TheConfigGuy·
@akshay_pachaar biggest thing I see people miss: stuffing everything into CLAUDE.md when half of it should be a custom command in .claude/commands/ instead. CLAUDE.md loads every session. commands only fire when you call them. keeps context lean and Claude way more focused.
English
0
0
3
798
Emeka
Emeka@TheConfigGuy·
@noahzweben the CI failure auto-resolve is exactly the kind of thing that used to need a dedicated cron + webhook setup. now it's just /schedule and a prompt. curious: when a scheduled job makes a change that breaks something else, does it retry or escalate?
English
0
0
0
236
Noah Zweben
Noah Zweben@noahzweben·
Use /schedule to create recurring cloud-based jobs for Claude, directly from the terminal. We use these internally to automatically resolve CI failures, push doc updates, and generally power automations that you want to exists beyond a closed laptop
English
138
231
3.3K
586.7K
Emeka
Emeka@TheConfigGuy·
@iSlimfit I got 2 offers same time last year. Similar role but Kpmg was offering 40% less, I was tempted to take it because of the "golden goose" tag of having a big firm like that on your CV but.... My wife knocked some monetary senses into my head 😂.
English
1
0
1
225
Emeka retweetledi
Paul Mit
Paul Mit@pmitu·
Me reviewing the code written by Claude before pushing it to production
English
380
1.9K
24.3K
1.4M
Emeka
Emeka@TheConfigGuy·
the one line in your CLAUDE.md that actually matters: whatever behavior you've had to correct Claude on more than twice. that's a pattern. put it in the file. everything else is optional.
English
0
0
0
21
Emeka
Emeka@TheConfigGuy·
@arthur_spirling the severance package is just a system prompt that says "wrap up your tasks and say goodbye"
English
0
0
0
176
Arthur Spirling
Arthur Spirling@arthur_spirling·
Laying off 15% of my Claude code agents due to AI
English
71
269
7.4K
220.3K
Emeka
Emeka@TheConfigGuy·
@andrewbrown @freeCodeCamp freeCodeCamp putting out an 8.5hr Claude Code course is going to get a lot of devs up to speed fast. curious if you cover the CLAUDE.md workflow - that's usually the biggest unlock for people just starting out with it.
English
0
0
0
110
Andrew Brown
Andrew Brown@andrewbrown·
My Claude Code course so far is about ~8.5 hours long. Today is my last day of recording and then I am packaging it up to @freeCodeCamp tomorrow and then they publish it when they can.
English
35
78
1.6K
55.7K
Emeka
Emeka@TheConfigGuy·
@signulll same. the privacy argument makes sense for specific sensitive workloads. for daily coding tasks, you're not getting back the speed delta no matter how private the model is.
English
0
0
0
20
signüll
signüll@signulll·
i have ~zero interest in running local llm’s. i am not a zealot about my data. i just want the fastest possible inference, & that’s clearly cloud right now.
English
181
29
1.1K
73.3K
Emeka
Emeka@TheConfigGuy·
@edandersen the bad pattern I've seen: generate code, it "looks right", ship it, and then someone in the PR review has to actually understand what it's doing and why. the cognitive load doesn't disappear, it just moves downstream to whoever reviews it.
English
2
0
1
102
Ed Andersen
Ed Andersen@edandersen·
Reading code, especially code you didn’t write, is 10x harder than writing code These people AI generating 90%+ of their code *are* reading it all, right… or are they just dumping the difficult verification work on their colleagues in PRs?
English
209
86
1.4K
51.7K
Emeka
Emeka@TheConfigGuy·
@housecor true for small, well-defined tasks. but the calculus flips when the task is big enough that your own speed is the bottleneck. "fix all the TypeScript errors in this repo" isn't faster done manually.
English
0
0
0
9
Cory House
Cory House@housecor·
Even in the age of AI, I can do many tasks more quickly manually than via prompting. Entering the right prompt, granting access, waiting for response, reading response, reviewing results, and iterating on results is often more time-consuming than just manually making a change.
English
94
29
373
19.2K
Emeka
Emeka@TheConfigGuy·
@johncrickett CLAUDE.md is doing real work though. "no PR without tests" in the md is different from saying it every session. the other stuff? yeah, lots of surface area. but the context quality argument cuts both ways: a well-configured CLAUDE.md *is* focused, relevant context.
English
0
0
1
175
John Crickett
John Crickett@johncrickett·
I spent the weekend actually reading the Claude Code docs. It's a rabbit hole. CLAUDE.md files. MCP configs. Skills. Subagents. Hooks. Plugins. Agent Teams. You could spend more time configuring Claude Code than building software. All of it is productivity theatre. The only thing that actually matters: think first, then give it focused, relevant context.
English
113
30
736
78.1K
Emeka
Emeka@TheConfigGuy·
@shiri_shh i'd rephrase it: easy-to-look-up knowledge is worth less. the knowledge of how systems interact, why things fail, and how to spot a plausible-looking wrong answer from AI? that's worth more than ever.
English
0
0
0
23
shirish
shirish@shiri_shh·
knowledge is almost worth zero in the AI era. what matters now is connecting the dots and executing fast.
English
294
573
5.1K
133.8K
Emeka
Emeka@TheConfigGuy·
@kamilelukosiute the $300 weight deletion is the kind of thing that teaches you to put "never delete .pt files or model checkpoints" in your CLAUDE.md before anything else. lesson learned at $300, not $3000
English
0
0
1
280
kamilė
kamilė@kamilelukosiute·
did my first serious machine learning engineering since last June with Claude this weekend: - claude is shockingly good, things that took me 2-3 weeks took <2 days - claude also deleted the weights of a model that took $300/9hrs to train to “make space on disk” lmfao
English
29
29
2.2K
98.3K
Emeka
Emeka@TheConfigGuy·
@JLarky the rot described here isn't Claude's fault - it's the same org dysfunction that existed before, just with a new accelerant. seniors who stopped thinking critically haven't suddenly gained that back by using AI. the tool just makes bad review culture worse faster
English
0
0
2
18
JLarky
JLarky@JLarky·
here's how your company is rotting right this moment: - your senior devs stopped writing code - they ask Claude to generate it, they check that it mostly works, they ask a junior to approve the new PR - a junior who never had a chance to learn about architecture or read the docs can't really explain what you are doing wrong, so they blindly LGTM it - your senior devs stopped thinking - instead they "consult" Claude on making a bunch of strategic decisions; they ask the PM/principal to approve the new architecture - your PMs and principals are too busy (re)discovering the joy of producing 10k LOC, so they don't care if what you are doing is wrong, so they blindly LGTM it
English
72
53
1K
94.4K