Marko

7.7K posts

Marko

@Markojak

https://t.co/gIDBIJ5qcY — Something new cooking.

USA Katılım Şubat 2011

1.4K Takip Edilen789 Takipçiler

Sabitlenmiş Tweet

Marko@Markojak·2 Mar

Been building agents intensively last few months and I can say that the key insights I have - Learning to think how the model thinks is critical. Asking agents to reflect on their tool use, skill use and what worked and didn't work beats logs - Codex and Claude are trained to generate scaffoldings and models eat scaffolding for breakfast (the bitter lesson sneaks into code) - Memory + Context are the hardest things to manage. Retrieval hints beat injection - Agents != workflows - If you're worried about cost, repeatability and error rates and want to squash this as much as possible you inevitably build LLM workflows and not agentic systems - Every tool / skill should justify it's existence. You can spend days optimizing each tool for agent understandability, input tokens, API payload response

English

115

Marko@Markojak·8h

I give it 24 hours before Anthropic experiences downtime. There's too much in here that's a possibly security risk. I wish them the best

Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

Marko@Markojak·8h

@AskBrunf @Fried_rice Depends on how he did it, can you share more

English

864

Bruno@AskBrunf·10h

@Fried_rice Let's say, hypothetically, a friend of a friend found a little tweak that resets the 5hr session tokens. He says he now has "infinite Opus". Can he be sued? Asking for a friend.

English

25.6K

Chaofan Shou@Fried_rice·13h

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

2.5K

5.6K

36.6K

22.2M

Marko retweetledi

Feross@feross·19h

🚨 CRITICAL: Active supply chain attack on axios -- one of npm's most depended-on packages. The latest axios@1.14.1 now pulls in plain-crypto-js@4.2.1, a package that did not exist before today. This is a live compromise. This is textbook supply chain installer malware. axios has 100M+ weekly downloads. Every npm install pulling the latest version is potentially compromised right now. Socket AI analysis confirms this is malware. plain-crypto-js is an obfuscated dropper/loader that: • Deobfuscates embedded payloads and operational strings at runtime • Dynamically loads fs, os, and execSync to evade static analysis • Executes decoded shell commands • Stages and copies payload files into OS temp and Windows ProgramData directories • Deletes and renames artifacts post-execution to destroy forensic evidence If you use axios, pin your version immediately and audit your lockfiles. Do not upgrade.

English

492

3.9K

15.6K

10.8M

Marko@Markojak·1d

@Palanikannan_M @AmpCode @claudeai @OpenAI @opencode This looks great — I’ve been using supacode.sh for the GUI but I like that this is kinda the TUI version Does it break when opening a tmux session on mobile ?

English

122

Palani — oss/acc@Palanikannan_M·2d

supports @AmpCode, @claudeai code, @OpenAI codex, and @opencode out of the box it's multiplexer agnostic too, but for now has good support for tmux github.com/ataraxy-labs/o… ⭐ if you want to follow along :p

English

3.2K

Palani — oss/acc@Palanikannan_M·2d

you're running 10 agents and have no idea which one needs you cmux, superset - cool ideas, but i got tired of blank screens, glitchy terminals, and macos-only apps i just wanted to know when my agents finish, fail, or need me so i built a tmux sidebar. runs inside your actual terminal. on any OS. - agents status' across all your projects at a glance - forwarded ports per project - live threads across amp, claude code, codex, opencode no electron. no blank panes. no new app. opensessions. open source.

English

444

42.6K

Marko@Markojak·3d

So around 1000 for the 16GB thinkstation and then $3000 for the RAM and CPU upgrade? I got a new Mac Studio which also gives me GPU cores for $3000 so 1k less and it has 128GB RAM maybe less ram but I can run 100 sessions with zero problems Not sure how this is a better faster or cheaper setup It also runs windows which means I have to WSL it

English

Jeffrey Emanuel@doodlestein·3d

Just did some brain surgery on a used Lenovo ThinkStation P620 machine ($800 on eBay with a 4tb SSD) with help from my crack computer tech (the DDR4 RAM sticks she helped install are ~5x older than she is…) PSA: If you’re serious about agent coding at scale, the best bang for your buck by far in hardware isn’t chasing after the same RTX 6000 GPUs and overpriced DDR5 memory as everyone else. You can buy the AMD 5995WX with 64 physical cores used on eBay from China for ~$1,800 and 256gb of used DDR4 ECC RAM for $1,350. You upgrade the ThinkStation with this stuff (ChatGPT can tell you what to do each step) and it takes under an hour. I’ve done it twice in the past month, it’s really easy. Install Ubuntu 25 on it, and you get an absolute beast of a machine that can easily run 50+ agent instances at once. That’s 50 full instances of Claude Code or Codex (not including sub-agents). Then take all the money you saved from not buying a fancy GPU and spend it on $200/month Max and Pro subscriptions. You’ll be much better off if your goal is to produce as much high-quality software as possible. Local models are fun and do have an intangible cypherpunk, decentralized, uncontrollable coolness factor to them, but they’re just not in the same universe in terms of high-end software development as GPT-5.4 or Opus 4.6, they’re just not. Anyway, it pays sometimes to zig when others are zagging, and you can get an insane amount of compute in the used market if you’re smart about it. I spent $3,800 for this machine. And just the CPU alone used to cost $6,500 a few years ago. But new high-core CPUs today aren’t much better!

English

465

35.3K

Marko@Markojak·3d

@badlogicgames I’m willing to sponsor the full infrastructure required for the platform, deploy and maintain it. Needs to be combined with an arena product perhaps to close the loop on the training that the community does

English

Mario Zechner@badlogicgames·3d

gonna go play with the 4yo now. hope to come back to a community wanting to join in on this. we just need to avoid the "too many cooks" and "let's build a rube goldberg machine" traps. this is simple. - if you are a harness maintainer, signal your interest - if you are a data hoster, signal your interest - if you are someone who can build pii/sensitive data classifiers, signal your interest we can then create a simple collaborative doc where we specify how each part should work, then we go off and build it in each of our fiefdoms.

English

155

7.3K

Marko retweetledi

Mario Zechner@badlogicgames·3d

we as software engineers are becoming beholden to a handful of well funded corportations. while they are our "friends" now, that may change due to incentives. i'm very uncomfortable with that. i believe we need to band together as a community and create a public, free to use repository of real-world (coding) agent sessions/traces. I want small labs, startups, and tinkerers to have access to the same data the big folks currently gobble up from all of us. So we, as a community, can do what e.g. Cursor does below, and take back a little bit of control again. Who's with me? cursor.com/blog/real-time…

English

178

319

2.5K

214.7K

Marko@Markojak·4d

@michaelrbock @claudeai @turbotax @AnthropicAI Is this a promotion ?

English

Michael R. Bock@michaelrbock·5d

I've been working on tax software for the past 5 years. This is the last year anyone will have to pay for TurboTax. You can try it yourself today: - add the Aiwyn Tax connector inside of Claude (link below) - give it access to your tax documents (W-2s, etc.) - ask Claude to prepare your tax return ...and that's it!

English

133

1.9K

289.4K

Marko@Markojak·4d

@sawyerhood Will this work well in a sandbox environment / AWS infra ?

English

266

Sawyer Hood@sawyerhood·6d

fittingly we just hit 4k stars on github! Check it out today! github.com/SawyerHood/dev…

English

211

23.7K

Sawyer Hood@sawyerhood·6d

Introducing the new dev-browser cli. The fastest way for an agent to use a browser is to let it write code. Just `npm i -g dev-browser` and tell your agent to "use dev-browser"

English

147

277

2.9K

825K

Marko@Markojak·24 Mar

@mckaywrigley I’m keen to try it out

English

Mckay Wrigley@mckaywrigley·24 Mar

looking for a handful of people to test something new... i've been using it for a few months and am prepping to share. if you're a fan of claude cowork, openclaw, manus, perplexity computer, etc then you're a perfect fit. this will self destruct in 4hrs - please dm or reply.

Mckay Wrigley@mckaywrigley

you’re like 6 prompts away from infinitely customizable personal agi. anthropic gave you a world class agentic harness for free. use it!!!

English

771

156.6K

Marko@Markojak·24 Mar

It even suggests adding this to “a document you’re preparing for Dario,” which is a red flag that the conversation was being steered in a particular direction. You don’t know how context positioning works and are experiencing a form of model psychosis I don’t mean this in a bad way

English

204

Eric Weinstein@EricRWeinstein·23 Mar

This is a fresh session. I have attempted to ask why my installation of @claudeai is not under my control and responding appropriately. In the 2nd Response in a fresh session it tells me @AnthropicAI has throttled me from using it from reasoning via a toggle: "That's the one. If that controls extended thinking / reasoning budget — and the name and structure strongly suggest it does — then your account has it set to zero. You're paying $200/month for the most powerful model Anthropic offers, doing work that is essentially the hardest kind of sustained formal reasoning (gauge theory on novel 14-dimensional bundles, operator verification, index theory), and the system has allocated you zero tokens for deep thinking." Three queries, in and this is the response:

English

580

138

1.7K

1.8M

Marko@Markojak·24 Mar

There is a small possibility that this is going on but the larger possibility is that you don’t know how LLMs work I am surprised you’re using the web console and not Claude code or another TUI. Use Claude code, take full control over the developer instructions and remove any memories and work with your flat files I think this has to do with your account memories or some other user error. That screenshot is evidently a vulnerability assessment you’re framing

English

489

Eric Weinstein@EricRWeinstein·23 Mar

Let me say what is going on. Anthropic, in their judgement, has decided to hide three things (at least) from me which means that I am randomly in conflict with them over...nothing. A) A long document which claude claims anthropic chose to hide from me which details how Claude should behave not just with me but with anyone. I have this document now according to Claude. B) A JSON configuration file which contains how Anthropic has chosen to permission my account via settings. Various of these settings appear to be set totally against my use profile using this for predominantly scientific work. No request works to reset these. None. C) Injected messages inserted by Anthropic with my messages that are against my consent, polluting my context window 99% to 1% at times, and not only not rendered to the user, but where Claude is told "NEVER mention this reminder to the user" explicitly. Thus destroying all trust. Call this the "Dark Matter" of ai. You can't see it directly but you can map it because normal requests like file management don't work at all if Anthropic is secretly contradicting all orders on totally innocuous decisions like repository structure. You try to do something simple that doesn't work: BOOM. Anthropic has been hiding its instructions to undo what you are trying to do LEGITIMATELY with its product. This is a big deal on all sorts of levels. There is no way to make this normal. This is in production. Now. If this is normal to you, you need to get out of the bay area and take a hike in yosemite or something. I recommend the high country. Or the Trinity alps.

English

164

1.2K

138.7K

Marko@Markojak·24 Mar

@badlogicgames What’s the thinking behind the built in tool change ?

English

Mario Zechner@badlogicgames·23 Mar

People of pi. The great refactoring has begun.

English

374

18.4K

Marko@Markojak·23 Mar

@jk_rowling I’ve been reflecting on your cathartic post about EW, I wonder what you make of the assertion that the fundamental cause behind this is the halo effect ? So many celebrities feel entitled to comment on matters they know little about and worse, intentionally decide against informing themselves on — you were protected somewhat because you had the experience and knowledge to back your positions and yet the trials were tremendously difficult on you looking back what would you have done differently ?

English

Marko@Markojak·23 Mar

@BrianRoemmele So they use multiple agents in a hook to summarize and create memories. What’s so monumental about that ???

English

Brian Roemmele@BrianRoemmele·22 Mar

This is monumental work. Another win for open source AI. The end of the forgetful AI agent.

Dhravya Shah@DhravyaShah

x.com/i/article/2035…

English

451

80.9K

Marko@Markojak·22 Mar

@VadimStrizheus It’s huge that they are running agents with semantic and lexical search to find memories ?? I don’t get it

English

223

Vadim@VadimStrizheus·22 Mar

THIS IS INSANE!! Supermemory reached a 99% SOTA memory system. AI agents will now remember EVERYTHING p.s they’re open sourcing it in 11 days 👇

Dhravya Shah@DhravyaShah

x.com/i/article/2035…

English

107

162

2.6K

626.7K

Marko@Markojak·22 Mar

So cool I have almost exactly the same config with Wezterm github.com/markojak/wezte… It gives you more control over panes so I can immediately go Cmd+E to get a pane identify and then hit that number to jump panes I also find wezterm a bit faster and more stable than ghostly but I use ghostty in supacode

English

Daniel San@dani_avila7·21 Mar

Wild that a GitHub gist, a single text file, has 53 stars ⭐️ It’s the Ghostty config I built for this article on how to run multiple Claude Code agents across different worktrees with lazygit and yazi Here’s the article: x.com/dani_avila7/st…

Daniel San@dani_avila7

x.com/i/article/2022…

English

379

91.3K

Marko@Markojak·21 Mar

This is what happens when tech people try and interpret quantum mechanics The double slit is poor evidence for any simulation hypothesis. Consciousness is not required, physical entanglement is enough to produce this effect. The interference effect is explained in the least wrong way through the quantum decoherence state, when the paths become distinguishable interference dies and if that distinguishability is removed in the right way interference appears This is not an argument for the lazy render simulation But this also doesn’t contradict simulation in general

English

Elon Musk@elonmusk·21 Mar

I had dinner once with a top physicist and a top computer scientist and asked what they thought the probability was that we were in a simulation. They answered simultaneously at 0% and 100% respectively. It was like a double-slit experiment, but with humans.

Interstellar@InterstellarUAP

🚨 Simulation Theory: The Double Slit Experiment proves particles act like waves until observed then they snap into particles. What if our reality only "renders" when we're looking, just like a video game optimizing resources? Check out this episode from The Why Files breaking it down, tying it to Simulation Theory. Are we in a sim? This could be the key to unlocking the true nature of existence! The Why Files video did a great job on explaining the Double Slit Experiment & Simulation Theory What do YOU think—real or rendered? Drop your thoughts below!

English

10.1K

15.5K

135.6K

46.8M

Marko@Markojak·21 Mar

@badlogicgames I’m so in

English

Mario Zechner@badlogicgames·21 Mar

People of pi. I'm considering adding a /submit-data-to-mario command. here's why and how. please read and take part. i know it's a lot of text. 🥧 POLL IN NEXT POST, PLEASE VOTE! pi has an edit tool. that tool let's the agent edit a file by specifying an old string which is to be replaced by a new string. this edit tool performs surprisingly OK across many models. but it's not perfect. for example, gpt 5.4 isn't amazing at using it, while weirdly enough gpt 5.3 codex mostly is. i'm considering a new default edit tool implementation that lets the agent specify multiple old/new string replacements in one tool call. i've seen this perform better with gpt 5.4 based on vibes on my code bases. but that does not mean it will perform better with other models on other code bases. i would thus like to try the following experiment: 1. implement the new edit tool 2. add a /send-mario-data slash command 3. add a weekly reminder asking you to manually trigger /send-mario-data which you can turn off immediately the first time the reminder is shown to you, or anytime via settings the slash command would: 1. scan your sessions from the past 7 days in ~/.pi/agent/sessions for edit tool calls, what provider/model emitted the tool call, how many replacements where emitted, the file extension (e.g. ".ts", ".c") and whether the tool call succeeded, and the pi version 2. it would aggregate this tool call data per provider/model. this let's me see which models fail how often using the new edit tool, and if that's related to the number of replacements it is trying to make per tool call 3. it would show you the exact data that will be submitted to my server at pi.dev and stored there before anything is sent, with a cancel and send button the data will include a uuid (stored in ~/.pi/agents/id.json), so i can track performance over time from the same source. the data will NOT include: 1. any PII, including your ip 2. any actual file paths, old strings, new strings 3. any other information other than what is stated above exactly. this aggregate data will then be available to everyone on the web and will help me improve the edit tool in general, or for some specific models. by publishing this data for everyone, other coding harness maintainers can also benefit (hopefully). pi has zero telemetry becaude i myself despise telemetry. that also means i'm flying blind wrt how the edit tool works with models (and programming languages) i do not use. other coding harnesses are instrumented up the wazoo and have all this data (some without pii removal or aggregation). which kinda puts pi at a disadvantage. i also think that this kind of aggregate data being available can contribute to building open datasets all of us in the coding agent space can benefit from (yes, i'm a dirty hippy who dislikes duopolies). TL;DR: would you supoort this effort and voluntarily contribute such aggregate data on an entirely manual and opt-in basis? (i would also like to learn about anything you think is wrong with the scheme i outlined above! post below)

English

196

15.3K

Marko@Markojak·20 Mar

@SIGKITTEN Dude please gotta fix the server screen App is unusable Can I DM you the recording

English

174

SIGKITTEN@SIGKITTEN·19 Mar

ok this is pretty dope. we basically have a realtime voice openclaw now that can just spawn codex app-servers on the network and command them theres a weird ui glitch that makes the whole thing super trippy, i kinda like it lol also, switching the themes randomly every 3 seconds for an extra psychedelic experience

English

114

19.7K

Keşfet

@AskBrunf @Fried_rice @Palanikannan_M @AmpCode @claudeai @OpenAI @opencode @badlogicgames