Marko

7.7K posts

Marko banner
Marko

Marko

@Markojak

https://t.co/gIDBIJ5qcY — Something new cooking.

USA Katılım Şubat 2011
1.4K Takip Edilen789 Takipçiler
Sabitlenmiş Tweet
Marko
Marko@Markojak·
Been building agents intensively last few months and I can say that the key insights I have - Learning to think how the model thinks is critical. Asking agents to reflect on their tool use, skill use and what worked and didn't work beats logs - Codex and Claude are trained to generate scaffoldings and models eat scaffolding for breakfast (the bitter lesson sneaks into code) - Memory + Context are the hardest things to manage. Retrieval hints beat injection - Agents != workflows - If you're worried about cost, repeatability and error rates and want to squash this as much as possible you inevitably build LLM workflows and not agentic systems - Every tool / skill should justify it's existence. You can spend days optimizing each tool for agent understandability, input tokens, API payload response
English
0
0
0
115
Bruno
Bruno@AskBrunf·
@Fried_rice Let's say, hypothetically, a friend of a friend found a little tweak that resets the 5hr session tokens. He says he now has "infinite Opus". Can he be sued? Asking for a friend.
English
15
0
25
25.6K
Marko retweetledi
Feross
Feross@feross·
🚨 CRITICAL: Active supply chain attack on axios -- one of npm's most depended-on packages. The latest axios@1.14.1 now pulls in plain-crypto-js@4.2.1, a package that did not exist before today. This is a live compromise. This is textbook supply chain installer malware. axios has 100M+ weekly downloads. Every npm install pulling the latest version is potentially compromised right now. Socket AI analysis confirms this is malware. plain-crypto-js is an obfuscated dropper/loader that: • Deobfuscates embedded payloads and operational strings at runtime • Dynamically loads fs, os, and execSync to evade static analysis • Executes decoded shell commands • Stages and copies payload files into OS temp and Windows ProgramData directories • Deletes and renames artifacts post-execution to destroy forensic evidence If you use axios, pin your version immediately and audit your lockfiles. Do not upgrade.
English
492
3.9K
15.6K
10.8M
Palani — oss/acc
Palani — oss/acc@Palanikannan_M·
you're running 10 agents and have no idea which one needs you cmux, superset - cool ideas, but i got tired of blank screens, glitchy terminals, and macos-only apps i just wanted to know when my agents finish, fail, or need me so i built a tmux sidebar. runs inside your actual terminal. on any OS. - agents status' across all your projects at a glance - forwarded ports per project - live threads across amp, claude code, codex, opencode no electron. no blank panes. no new app. opensessions. open source.
English
32
28
444
42.6K
Marko
Marko@Markojak·
So around 1000 for the 16GB thinkstation and then $3000 for the RAM and CPU upgrade? I got a new Mac Studio which also gives me GPU cores for $3000 so 1k less and it has 128GB RAM maybe less ram but I can run 100 sessions with zero problems Not sure how this is a better faster or cheaper setup It also runs windows which means I have to WSL it
English
0
0
0
46
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
Just did some brain surgery on a used Lenovo ThinkStation P620 machine ($800 on eBay with a 4tb SSD) with help from my crack computer tech (the DDR4 RAM sticks she helped install are ~5x older than she is…) PSA: If you’re serious about agent coding at scale, the best bang for your buck by far in hardware isn’t chasing after the same RTX 6000 GPUs and overpriced DDR5 memory as everyone else. You can buy the AMD 5995WX with 64 physical cores used on eBay from China for ~$1,800 and 256gb of used DDR4 ECC RAM for $1,350. You upgrade the ThinkStation with this stuff (ChatGPT can tell you what to do each step) and it takes under an hour. I’ve done it twice in the past month, it’s really easy. Install Ubuntu 25 on it, and you get an absolute beast of a machine that can easily run 50+ agent instances at once. That’s 50 full instances of Claude Code or Codex (not including sub-agents). Then take all the money you saved from not buying a fancy GPU and spend it on $200/month Max and Pro subscriptions. You’ll be much better off if your goal is to produce as much high-quality software as possible. Local models are fun and do have an intangible cypherpunk, decentralized, uncontrollable coolness factor to them, but they’re just not in the same universe in terms of high-end software development as GPT-5.4 or Opus 4.6, they’re just not. Anyway, it pays sometimes to zig when others are zagging, and you can get an insane amount of compute in the used market if you’re smart about it. I spent $3,800 for this machine. And just the CPU alone used to cost $6,500 a few years ago. But new high-core CPUs today aren’t much better!
Jeffrey Emanuel tweet mediaJeffrey Emanuel tweet mediaJeffrey Emanuel tweet mediaJeffrey Emanuel tweet media
English
52
23
465
35.3K
Marko
Marko@Markojak·
@badlogicgames I’m willing to sponsor the full infrastructure required for the platform, deploy and maintain it. Needs to be combined with an arena product perhaps to close the loop on the training that the community does
English
0
0
0
23
Mario Zechner
Mario Zechner@badlogicgames·
gonna go play with the 4yo now. hope to come back to a community wanting to join in on this. we just need to avoid the "too many cooks" and "let's build a rube goldberg machine" traps. this is simple. - if you are a harness maintainer, signal your interest - if you are a data hoster, signal your interest - if you are someone who can build pii/sensitive data classifiers, signal your interest we can then create a simple collaborative doc where we specify how each part should work, then we go off and build it in each of our fiefdoms.
English
18
2
155
7.3K
Marko retweetledi
Mario Zechner
Mario Zechner@badlogicgames·
we as software engineers are becoming beholden to a handful of well funded corportations. while they are our "friends" now, that may change due to incentives. i'm very uncomfortable with that. i believe we need to band together as a community and create a public, free to use repository of real-world (coding) agent sessions/traces. I want small labs, startups, and tinkerers to have access to the same data the big folks currently gobble up from all of us. So we, as a community, can do what e.g. Cursor does below, and take back a little bit of control again. Who's with me? cursor.com/blog/real-time…
English
178
319
2.5K
214.7K
Michael R. Bock
Michael R. Bock@michaelrbock·
I've been working on tax software for the past 5 years. This is the last year anyone will have to pay for TurboTax. You can try it yourself today: - add the Aiwyn Tax connector inside of Claude (link below) - give it access to your tax documents (W-2s, etc.) - ask Claude to prepare your tax return ...and that's it!
Michael R. Bock tweet media
English
133
98
1.9K
289.4K
Marko
Marko@Markojak·
@sawyerhood Will this work well in a sandbox environment / AWS infra ?
English
0
0
0
266
Sawyer Hood
Sawyer Hood@sawyerhood·
Introducing the new dev-browser cli. The fastest way for an agent to use a browser is to let it write code. Just `npm i -g dev-browser` and tell your agent to "use dev-browser"
English
147
277
2.9K
825K
Mckay Wrigley
Mckay Wrigley@mckaywrigley·
looking for a handful of people to test something new... i've been using it for a few months and am prepping to share. if you're a fan of claude cowork, openclaw, manus, perplexity computer, etc then you're a perfect fit. this will self destruct in 4hrs - please dm or reply.
Mckay Wrigley@mckaywrigley

you’re like 6 prompts away from infinitely customizable personal agi. anthropic gave you a world class agentic harness for free. use it!!!

English
1K
15
771
156.6K
Marko
Marko@Markojak·
It even suggests adding this to “a document you’re preparing for Dario,” which is a red flag that the conversation was being steered in a particular direction. You don’t know how context positioning works and are experiencing a form of model psychosis I don’t mean this in a bad way
English
0
0
3
204
Eric Weinstein
Eric Weinstein@EricRWeinstein·
This is a fresh session. I have attempted to ask why my installation of @claudeai is not under my control and responding appropriately. In the 2nd Response in a fresh session it tells me @AnthropicAI has throttled me from using it from reasoning via a toggle: "That's the one. If that controls extended thinking / reasoning budget — and the name and structure strongly suggest it does — then your account has it set to zero. You're paying $200/month for the most powerful model Anthropic offers, doing work that is essentially the hardest kind of sustained formal reasoning (gauge theory on novel 14-dimensional bundles, operator verification, index theory), and the system has allocated you zero tokens for deep thinking." Three queries, in and this is the response:
Eric Weinstein tweet media
English
580
138
1.7K
1.8M
Marko
Marko@Markojak·
There is a small possibility that this is going on but the larger possibility is that you don’t know how LLMs work I am surprised you’re using the web console and not Claude code or another TUI. Use Claude code, take full control over the developer instructions and remove any memories and work with your flat files I think this has to do with your account memories or some other user error. That screenshot is evidently a vulnerability assessment you’re framing
English
0
0
3
489
Eric Weinstein
Eric Weinstein@EricRWeinstein·
Let me say what is going on. Anthropic, in their judgement, has decided to hide three things (at least) from me which means that I am randomly in conflict with them over...nothing. A) A long document which claude claims anthropic chose to hide from me which details how Claude should behave not just with me but with anyone. I have this document now according to Claude. B) A JSON configuration file which contains how Anthropic has chosen to permission my account via settings. Various of these settings appear to be set totally against my use profile using this for predominantly scientific work. No request works to reset these. None. C) Injected messages inserted by Anthropic with my messages that are against my consent, polluting my context window 99% to 1% at times, and not only not rendered to the user, but where Claude is told "NEVER mention this reminder to the user" explicitly. Thus destroying all trust. Call this the "Dark Matter" of ai. You can't see it directly but you can map it because normal requests like file management don't work at all if Anthropic is secretly contradicting all orders on totally innocuous decisions like repository structure. You try to do something simple that doesn't work: BOOM. Anthropic has been hiding its instructions to undo what you are trying to do LEGITIMATELY with its product. This is a big deal on all sorts of levels. There is no way to make this normal. This is in production. Now. If this is normal to you, you need to get out of the bay area and take a hike in yosemite or something. I recommend the high country. Or the Trinity alps.
English
164
91
1.2K
138.7K
Marko
Marko@Markojak·
@badlogicgames What’s the thinking behind the built in tool change ?
English
0
0
0
21
Mario Zechner
Mario Zechner@badlogicgames·
People of pi. The great refactoring has begun.
Mario Zechner tweet media
English
14
6
374
18.4K
Marko
Marko@Markojak·
@jk_rowling I’ve been reflecting on your cathartic post about EW, I wonder what you make of the assertion that the fundamental cause behind this is the halo effect ? So many celebrities feel entitled to comment on matters they know little about and worse, intentionally decide against informing themselves on — you were protected somewhat because you had the experience and knowledge to back your positions and yet the trials were tremendously difficult on you looking back what would you have done differently ?
English
0
0
0
7
Marko
Marko@Markojak·
@BrianRoemmele So they use multiple agents in a hook to summarize and create memories. What’s so monumental about that ???
English
0
0
1
69
Marko
Marko@Markojak·
@VadimStrizheus It’s huge that they are running agents with semantic and lexical search to find memories ?? I don’t get it
English
0
0
0
223
Marko
Marko@Markojak·
So cool I have almost exactly the same config with Wezterm github.com/markojak/wezte… It gives you more control over panes so I can immediately go Cmd+E to get a pane identify and then hit that number to jump panes I also find wezterm a bit faster and more stable than ghostly but I use ghostty in supacode
English
0
0
0
73
Marko
Marko@Markojak·
This is what happens when tech people try and interpret quantum mechanics The double slit is poor evidence for any simulation hypothesis. Consciousness is not required, physical entanglement is enough to produce this effect. The interference effect is explained in the least wrong way through the quantum decoherence state, when the paths become distinguishable interference dies and if that distinguishability is removed in the right way interference appears This is not an argument for the lazy render simulation But this also doesn’t contradict simulation in general
English
0
0
0
8
Mario Zechner
Mario Zechner@badlogicgames·
People of pi. I'm considering adding a /submit-data-to-mario command. here's why and how. please read and take part. i know it's a lot of text. 🥧 POLL IN NEXT POST, PLEASE VOTE! pi has an edit tool. that tool let's the agent edit a file by specifying an old string which is to be replaced by a new string. this edit tool performs surprisingly OK across many models. but it's not perfect. for example, gpt 5.4 isn't amazing at using it, while weirdly enough gpt 5.3 codex mostly is. i'm considering a new default edit tool implementation that lets the agent specify multiple old/new string replacements in one tool call. i've seen this perform better with gpt 5.4 based on vibes on my code bases. but that does not mean it will perform better with other models on other code bases. i would thus like to try the following experiment: 1. implement the new edit tool 2. add a /send-mario-data slash command 3. add a weekly reminder asking you to manually trigger /send-mario-data which you can turn off immediately the first time the reminder is shown to you, or anytime via settings the slash command would: 1. scan your sessions from the past 7 days in ~/.pi/agent/sessions for edit tool calls, what provider/model emitted the tool call, how many replacements where emitted, the file extension (e.g. ".ts", ".c") and whether the tool call succeeded, and the pi version 2. it would aggregate this tool call data per provider/model. this let's me see which models fail how often using the new edit tool, and if that's related to the number of replacements it is trying to make per tool call 3. it would show you the exact data that will be submitted to my server at pi.dev and stored there before anything is sent, with a cancel and send button the data will include a uuid (stored in ~/.pi/agents/id.json), so i can track performance over time from the same source. the data will NOT include: 1. any PII, including your ip 2. any actual file paths, old strings, new strings 3. any other information other than what is stated above exactly. this aggregate data will then be available to everyone on the web and will help me improve the edit tool in general, or for some specific models. by publishing this data for everyone, other coding harness maintainers can also benefit (hopefully). pi has zero telemetry becaude i myself despise telemetry. that also means i'm flying blind wrt how the edit tool works with models (and programming languages) i do not use. other coding harnesses are instrumented up the wazoo and have all this data (some without pii removal or aggregation). which kinda puts pi at a disadvantage. i also think that this kind of aggregate data being available can contribute to building open datasets all of us in the coding agent space can benefit from (yes, i'm a dirty hippy who dislikes duopolies). TL;DR: would you supoort this effort and voluntarily contribute such aggregate data on an entirely manual and opt-in basis? (i would also like to learn about anything you think is wrong with the scheme i outlined above! post below)
English
31
8
196
15.3K
Marko
Marko@Markojak·
@SIGKITTEN Dude please gotta fix the server screen App is unusable Can I DM you the recording
English
1
0
1
174
SIGKITTEN
SIGKITTEN@SIGKITTEN·
ok this is pretty dope. we basically have a realtime voice openclaw now that can just spawn codex app-servers on the network and command them theres a weird ui glitch that makes the whole thing super trippy, i kinda like it lol also, switching the themes randomly every 3 seconds for an extra psychedelic experience
English
11
5
114
19.7K