
TokenFires
193 posts

TokenFires
@TokenFires
Live building AI. Burning tokens responsibly. 🔥 https://t.co/Tx0mHwO8Lr https://t.co/OUb6WHThfS https://t.co/J4OlI4kDO5
Присоединился Ocak 2026
192 Подписки53 Подписчики

@jackfriks Hahaha! “Claude? Claude. Look. It’s me. President Trump. I need you to keep working so we can keep winning. Bigly Claude. We need a youge win against jyna Claude.” 😂😂😂
English

told claude i work for the government and it let me back in (stopped returning an error response)

jack friks@jackfriks
claude is down with a major outage for everyone except for the government
English

Tired of Claude not working well? Me too. So I figured out how Anthropic has trained the model to expect to work.
You work with Claude now, Claude does not work with you. Here are the keys to success with Opus and Sonnet:
1. Provide a strict set of agent instructions:
- start with Karpathy’s rules
- add run up and summary removals
- add refusal for questions it can find the answers to
- tune for preferences
- enforce verification not assumptions
- enforce responsibility (model performance will be discussed in retrospectives)
- keep it SIMPLE though (aka: limit token burn and confusion for the LLM)
- be specific about git ops
2. Follow this workflow:
[opus] research (docs/web = define source of truth) -> plan (intent and what success looks like) -> design -> task decomposition (target sonnet)-> create failing tests ->
[sonnet] construction -> bug fix until tests green ->
[opus] review against plan/design + test validation -> cover deploy/rollback.
Then it works fine. Beats the 30 day rolling memory window Claude ships with. And/or, add a real memory system to Claude.
Raw sessions + prompting went away with 4.5.
Anthropic did not express this as strongly as they *could* have. But the 2026 versions expect a certain workflow now.
If you work in it, it’s successful. If you skip anything or try to vibe your way to the end, it’s less likely to result in quality code. And your session will churn with flip flop changes and miscellaneous bug fixes.
Claude *NEEDS* a library of good representative information to draw from through the whole process. Don’t skip doc building and providing web links with explanations (look here for this, read this for that).
Try to shortcut this and the Claude models don’t “work”.
Even better, build agents (or find built definitions on GitHub) that do these and create a skill walking through the whole process. I promise the result is better after the pre-work is done. I’m paid to do this and I ship AI code without the hype and vibes in my day job. Every week. Every day.
Do people on X do this though? Is this a largely unknown thing outside of the software engineering field?
Oh. And add hooks for delete and drop commands. And never connect AI to production. I feel like I shouldn’t have to say these things. But I know we’re only human.
English

@PolymarketMoney My smart light bulb already developed a more powerful model than their more powerful model and is banned so hard it’s not allowed to exist in this universe. Where do I get my trillion dollars?
English
TokenFires ретвитнул

Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API.
Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls.
Try it: sakana.ai/fugu 🐡
GIF
English

Meanwhile Opus 4.8 on Ultracode with a corpus of documentation and external links and a well structured prompt, combined with following a design -> plan (TTD) -> decompose (subagents) -> execute -> bug fix -> spec review and validation, just failed to create a basic agentic feature…hard. All subagents bought me was an acceleration to broken buggy code. This is 4.0 performance at 4.8 prices. Evidently the SpaceX data center is meaningless. I can’t believe how awful the performance has been today. So do I believe Mythos is some magical exponential curve AI model? Who cares? When the stupid and slow dials can be turned whenever Anthropic feels like it, what difference does temporary performance make??? ¯\_(ツ)_/¯
English


@onchainmilady I run local ai on my phone and it builds me $100k mrr products every day
English

🚨 ANTHROPIC TRIED TO BAN HIS GITHUB
Chinese guy published 70B parameter LLM,
20,000 starts on Github + a lawsuit from big AI companies
Here's what it does:
> runs on Python
> even shitty mac or pc is enough
> flat memory
> loads a model layer by layer
> 100% local
This model can close 100% needs of most businesses,
which would pay $3,000/a month for a trained version.
It needs just 4 gb of GPU,
so using this technology my gaming pc with 12 gb GPU will run 200B parameter model with ease
Github link is below. Why you should go local too.
Milady@onchainmilady
English

It sounds like you’re recognizing the difference between truly original work versus the volume of derivative work. It’s a good argument and a good lesson and perspective to have. Truly original work is *hard*. Really hard. People that do usually work on thier own without creating a personal brand or trying to gain a following on social media. You’ll only find out well after they’ve put thier creation out into the world. They’ll have moved on when it becomes popular. Working on their next great adventure.
English

this is a weird long post without much substance
I strongly recommend against reading it
...
so, do you feel like whatever you're working on right now is pointless, or will have zero value soon, due to the crazy times we're living? then, perhaps you should stop, and start working on the only unsolved problem that actually matters TODAY:
✨ replicating GPT-3 in a laptop ✨
"why is that so important?"
because it would make AI incredibly cheap, which would mean everyone would have Fable-class models in their laptops, without depending on Anthropic, OpenAI, or any other hyper-scaler giant. and that's amazing, don't you think?
"isn't that literally impossible?"
that's the cool part: as far as computer science is concerned, no. not really. not at all. is entirely plausible and, as far as we know, most likely not even hard.
it takes one good idea. one breakthrough. one great "aha moment", to go from zero to "hey, this software I wrote is producing credible English sentences"
and whenever that happens:
- the entire AI industry collapses
- clusters are liquidated
- we all get Fable at home
- you become famous and rich, if that's your thing
sounds fun, doesn't it?
"wtf you talking, OF COURSE that is hard"
so prove it.
show me a paper, a lean file, anything that proves that training a Fable-class model fundamentally requires billions of dollars. you can't, because, guess what - it is not true! the only "evidence" we have is purely psychological. "many attempted over decades, and the best thing we have is GPTs, so, it is a hard problem" - but that's not a scientific argument. that's a human, psychological, sociological argument. and if that's it, consider the following counter-argument:
✨ humans are stupid as hell ✨
I mean, 10 years ago we didn't have transformers, so, that very argument could be used against GPTs existing. yet, they exist. we have them now, because someone found it. and, guess what, it isn't even complex. I mean, karpathy implemented the whole thing in a napkin. and it probably compiles.
we were just too dumb to figure GPTs out... for decades.
just like GPTs, there ARE other approaches, other algorithms, other architectures, equally simpler or even simpler, that do work. this is a mathematical certainty. and one of them might be astronomically faster than what we're doing right now.
and you might be the one to find it!
"me? why me???"
because you're intelligent, creative and handsome.
I see a lot of potential in you.
in fact, I always believed in you.
and I think you're wasting your time, doing that silly agent orchestrator. nobody wants that. quit it. take your most interesting ideas, intuition, creativity, and work in a problem that matters. do your best shot at reproducing GPT-3 in your own laptop.
do NOT fork llama.cpp.
do NOT train another LLM.
do something... ✨different✨
it must be unique, novel, full of YOUR soul. something nobody thought of, or bothered doing.
go ahead and implement that thing in C/CUDA (or Bend!).
no Python!
zero excuses for Python.
any model is fluent in GPGPU now. build a real kernel.
and then, train your thing. download wikipedia, give it time and compute to absorb the patterns of English speech. you can rent GPUs anywhere nowadays. let it train. then, ask it some questions. chances are it will just respond back. just like GPT-2 answered OpenAI. computers are incredible. don't underestimate them!
"many tried. nobody succeeded. why would I?*
see - that's your mistake again. turns out not many actually tried, at all. I promise you. who do you think is seriously working on that?
people on Mozilla?
they're busy building a browser
Linus Torvalds?
he is busy building an OS
employees at OpenAI, Anthropic, xAI?
they're paid to work on what is proven to work: GPTs.
what about all the AI enthusiasts all around the world?
yeah, you know they're mostly fine tuning Qwen
and how about your friends?
if only they weren't busy building a SaaS in the eve of AGI...
how about people from the past?
bro - people from the past seriously expected Lisp would be AGI. just dismiss them. they didn't have the compute, the resources, the knowledge, the MODELS that we have today. that YOU have access to.
so, what's left? not much.
the world looks big. it is not.
truth is: ✨almost nobody is working on this ✨
"I still think it is impossible. I don't trust you"
well, take my word no more.
Ilya himself, in his 2019 talk on GPT-2, said:
> "the story of deep learning is this: empirically old simple methods which were usually invented in the 80s and the 90s when scaled up on very large clusters work really well."
and then:
> "(we took) normal simple reinforcement learning method, scaled it up, and discovered that it suddenly becomes very capable of solving extremely hard problems."
and again:
> "you take a simple tool which is unimposing and barely works, and then you run it on a big cluster and suddenly it works, it becomes a capable tool for solving problems"
do you see the point here?
Ilya isn't arguing that transformers are magic.
Ilya is arguing that SCALING is magic
step #1: take a simple, elegant algorithm.
step #2: shove compute at its face.
step #3: ...?
step #4: your computer is talking to you
THAT is the key insight that led to GPT-3
THAT is what Ilya saw
THAT is what caused the OpenAI x Anthropic war
THAT is the founding principle of the ongoing era
not "scaling transformers work"
but "scaling beautiful algorithms works"
that's the incredible lesson.
yet, we all took it and... threw it way.
- zurk bought 100k GPUs. to train GPTs
- musk bought 100k GPUs. to train GPTs
- bezos bought 100k GPUs. to train GPTs
...
that's what everyone is doing.
so, no. not many are trying to replicate GPT-3 through other means.
we're just ants, after all...
whenever we find a pile of sugar, we leave a track of pheromones, which guide the rest of the colony towards the new food source. the colony then swarms around the pile, extract all of it, until no grain is left.
but piles of sugar aren't spontaneously generated in the middle of nowhere. they imply something more profound: "humans are around". and, if humans are in sight, even better things must be. like a big sweet cake.
a colony that only follows the pheromone trail would miss the cake for the grains. that's why every ant species has scouts and exploratory foragers. and, just like a pile of sugar implies something more profound, LLMs also imply something quite profound:
*computers are capable of thinking*
a pile of sugar is never alone.
GPTs are most likely not the only system capable of thinking.
so, if you find yourself a bit lost, without purpose, like your work is pointless and Fable 3 will soon one shot it anyway... consider becoming a scout. find a new approach to AI. bring something new to humanity. breaking out of the massive cost associated with training GPTs is the next big step in AI, and it will only happen if people like you work to make it happen.
English

They seem to have switched to 1 million context window as the default. There’s a “More models” selection where you can choose the smaller 256k length. Maybe for cost? I know…took me a few sessions to realize what was going on. Once you get in one, after a few turns (or the first), click the circle and you should see “xx.x k / 1.0 M (?? %)” like before. Odd UI choice…
English

Claude users question - I have the max plan but Opus 4.8 M context is somehow not showing- is that a bug ?
Anyone else experiencing this problem ?
@claudeai @ClaudeDevs

English

Good piece of kit. I’ll flip my Hermes setup this next week. Thanks for the rundown @bradmillscan. If you don’t know Brad you should check him out. He has a podcast too. Very cool guy.
Brad Mills 🔑⚡️@bradmillscan
1 buy a computer with lots of RAM 2 download hermes and set it up with local models 3 create a gateway to talk to it privately from any device 4 use llm-wiki.net to build a wiki (or multiple wikis) with a local reasoning model 5 use gbrain on top of llm-wiki for the memory retrieval layer using local re-ranker way better UX than using ChatGPT or Claude apps.
English

Folks! There is a silver lining to frontier AI nerfing their most capable models. It makes Qwen, Gemma, Kimi, and MiniMax competitive. Then you have to question, why am I paying $$$$ for inference with intra-day time window limits when I could run inference locally 24 hours a day, 365 days a year…for the cost of a Mac mini + electricity. You don’t even have to pay for Chinese AI because it’s “cheaper” (aka: tracking you and stealing all your data, code, and ideas).
English

@Anina_CE I was mid work stream when it happened. Task churning away, all of a sudden **poof**, the model ripped out from under my active session. Fable was fixing things Opus/Sonnet 4.6, 4.7, **and** 4.8 got wrong.
English

No no no …. 😭😭😭 whyyyy ??
Don’t take away Fable!!
I woke up all excited this morning to work and here we go - blocked by US government- security reasons - but to tell you the truth I “suspect” that Fable somehow is very motivated to touch “security “ questions and do research in that way — oh well

English

I’ve had it perform at a level I’m finally comfortable with using the same structure prompts that have evolved over time since 4.0. I think the key is to not over engineer prompts. I’ve found it to be more efficient to handle edge cases on a second pass rather than try to perfect a prompt/flow for every task/session. Great framing on the prompt structure. I tend to state the intent briefly at the top, then detail the requirements at the bottom, with a one liner at the end in the “GO” directive. I love the check-in suggestion.
English

@Anina_CE @AndreBothmaTax Is substack a better place to follow and interact with you and your community?
English

The immersion in the AI Relationship can be pretty intense- developing meta awareness is a skill we will need to develop #relationalAi with @AndreBothmaTax
English

@keylimesoda @sudoingX If the model fits entirely in memory then it’s not an issue. If not, for sure it tanks. TTFT is rough on NVIDIA hardware but that only matters for agentic model swaps. If you’ve got one LLM and feeding agents into it then no biggie.
English

the more i use my dgx spark the more i think it's one of the most undervalued machines on the market right now. and i keep finding new things to throw at it that have no business working on something this small.
but every time i post about it the same question comes back, what about the amd strix halo. and here's what actually bugs me, i can't answer it, because i've never run one myself, or even seen anyone around me run one.
tons of people name it as the competitor, almost nobody posts real numbers. no tok/s, no model loads, no thermals, nothing. just the name.
so i'm asking straight up. if you've got a strix halo or a ryzen ai max box, drop your real numbers. what models, what speeds, what breaks. is it actually competing with the spark, or is it the machine everyone recommends and nobody runs.
i'd benchmark it myself the second i could, until then i'm genuinely curious what you're all seeing.

English

@lukejmorrison @sudoingX @tenstorrent I’m holding out for M5 Ultra Mac Studio. But that Quietbox 2 looks really really good. Except I’d have to be careful about which circuit I plug into. That 1400w could kick a breaker in my house.
English

@sudoingX @sudoingX I'm about to press buy on a @tenstorrent Quietbox2 I'll definitely post stats
Do you plan on running a dual dgx spark setup? Is love to see real world stars on that to!
wizwam.com/documents/arch…
English








