Anton Kuratnik | AI Nerd

2.1K posts

Anton Kuratnik | AI Nerd

@anton_onAI

Big-time AI nerd. Founder of Expert Studio AI: we build automations and AI tools that save your team time (no hype, actual results, security/safety first).

Se unió Mayıs 2022

60 Siguiendo2.3K Seguidores

Anton Kuratnik | AI Nerd@anton_onAI·3h

@someRandomDev5 @zerohedge Ah that's fair!

English

Random Libertarian Tech Lead@someRandomDev5·5h

@anton_onAI @zerohedge I should also say I’m mostly interested in exploring open models for the purpose of running them locally, so the quants there are generally equivalent to (or better than) what I would be running in my use-case.

English

zerohedge@zerohedge·1d

"the share of tokens used for US models on OpenRouter has collapsed": Bloomberg

English

170

513

3.8K

584.8K

Anton Kuratnik | AI Nerd@anton_onAI·7h

@someRandomDev5 @zerohedge Yeah that's a fair use, though it can be misleading. I tried glm5.2 on openrouter first and wasn't impressed, then tried it on fireworks and was quite blown away.

English

Random Libertarian Tech Lead@someRandomDev5·18h

@anton_onAI @zerohedge I use OpenRouter only when there’s a newly-released open-weights model I want to do an initial tryout session with.

English

Anton Kuratnik | AI Nerd@anton_onAI·19h

@homoludens @shashankgoyal95 @zerohedge That's because each time it switches you to a different provider all your input tokens miss cache.

English

homoludens@homoludens·1d

@shashankgoyal95 @anton_onAI @zerohedge i noticed it is much more expensive using deepseek flash on openrouter than using it directly.

English

Anton Kuratnik | AI Nerd@anton_onAI·19h

Yeah, it's awful. Try running the same complex task on DeepSeek or GLM5.2 on OpenRouter vs Fireworks. They're literally different models. Openrouter doesn't have one provider for most models. It has many and it routes you to whichever one is available. Some of those providers serve a quantized model, which reduces performance. Also, if you have a long conversation (eg agentic run) and are switched to another provider, you're paying for ALL those input tokens because they miss KV cache cuz that's with the other provider. On top of that, different providers have different outputs. When I tried running anthropic models via OpenRouter, they'd work fine and suddenly error because I got switched over to Bedrock for provider and Bedrock's API output is different and it broke the app's expected input. Plus, some models require special syntax for openrouter specifically. So Qwen via Alibaba hits cache just fine, via OpenRouter you have to add a special API argument to make sure it does. So you get: random, unpredictable performance at higher cost and with more errors. There are some models it does well with, mostly single-provider ones. Hy-3 is a beast, for example, and hella cheap.

English

Shashank Goyal@shashankgoyal95·1d

@anton_onAI @zerohedge Was there anything specific you noticed with model quality on OpenRouter?

English

240

Anton Kuratnik | AI Nerd@anton_onAI·20h

@paul_dentro @zerohedge At the cost of their performance and overpaying on cache.

English

Paul from DentroAI@paul_dentro·1d

@anton_onAI @zerohedge you don't have to get separate API Keys, can switch around models in a second

English

Anton Kuratnik | AI Nerd@anton_onAI·1d

Folks: what's the best model for long agentic task? Specific use case: 2 skills, 1 mcp, and a large spec. Need a model to just follow it, not miss details, and get it done. Codex refuses to follow what's set out in the skill. Opus refuses to read documentation. Deepseek?

English

181

Anton Kuratnik | AI Nerd@anton_onAI·1d

@ahtoshkaa @zerohedge Then just get $20/month codex and you're good.

English

232

ahtoshkaa@ahtoshkaa·1d

@anton_onAI @zerohedge for small hobby projects

English

258

Anton Kuratnik | AI Nerd@anton_onAI·1d

@LordoftheMounts @zerohedge Exactly

English

222

Lord of the Mountains@LordoftheMounts·1d

@anton_onAI @zerohedge Why would anyone be using open router if they are doing meaningful work?

English

379

Anton Kuratnik | AI Nerd@anton_onAI·1d

@NoetekCo @zerohedge Depends on which provider they route you to. Try fireworks.

English

488

Noetic Co@NoetekCo·1d

@anton_onAI @zerohedge moonshot.ai api is ass, for one. you can get a much better kimi instance on openrouter

English

652

Anton Kuratnik | AI Nerd@anton_onAI·1d

It should be possible to compact an AI conversation from a specific message. Sometimes I can lose certain context but other messages need to be there in full.

English

148

Anton Kuratnik | AI Nerd@anton_onAI·2d

@Alibaba_Qwen That's such an awesome idea! Guessing 3.8 will have this baked in

English

161

Qwen@Alibaba_Qwen·3d

📣📣 Meet Qwen-AgentWorld — a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation. 🤔 LLMs are trained to be better agents — better at acting in environments. But nobody has trained them to model the environments themselves. 🗺️ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes: 1️⃣ Build a foundation model for environment simulation — outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench 2️⃣ Investigate how world modeling enhances agent training: 🔬 Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments 🧠 Learning to predict environments (LWM warm-up) makes agents stronger — remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning 📑 Paper: arxiv.org/abs/2606.24597 📖 Blog: qwen.ai/blog?id=qwen-a… 💻 GitHub: github.com/QwenLM/Qwen-Ag… 🤗 HuggingFace: huggingface.co/collections/Qw… 🧩 ModelScope: modelscope.cn/collections/Qw…

English

198

779

4.7K

1.1M

Anton Kuratnik | AI Nerd@anton_onAI·3d

@alex_grankin @OpenAI @Broadcom Loll they will 100% name the next gen chip "habanero"

English

Alex Grankin@alex_grankin·3d

@OpenAI @Broadcom Can't wait for Habanero, and than California reaper! Just hope we don't get an eggplant... 🍆

English

3.2K

OpenAI@OpenAI·3d

We’ve designed and built our first AI chip: Jalapeño. Designed from the ground up by OpenAI and brought to production with @Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products. Chips are foundational to the AI economy. Building our own expands our full-stack platform from products to models to infrastructure, and will help us scale intelligence, serve more people, and expand access to AI.

English

1.4K

2.4K

22.7K

6.7M

Anton Kuratnik | AI Nerd@anton_onAI·4d

@robinebers @Atlassian Loom after Atlassian takeover has been awful.

English

139

Robin Ebers · AI for Small Business@robinebers·4d

my fucking god @Atlassian is such a scammy company they acquired loom and silently upgraded what used to be free guest users to paid ones (without any opt-in confirmation) only found out today because they kept spamming my inbox then trying to remove one user and the fucking site doesn't work took me a solid 10 min to cancel this hit never using Loom again

Robin Ebers · AI for Small Business tweet media

English

3.7K

Anton Kuratnik | AI Nerd@anton_onAI·5d

@pvncher Having literally the opposite problem right now. Damn thing won't listen no many how many times I tell it how to do stuff!

English

eric provencher@pvncher·20 Haz

Because codex is so good at adhering to your skill files, you have to be very intentional about how you word the description, or they can trigger more often than necessary. The coolest thing is having codex run evals for skill activation using sub agents!

English

181

12K

Anton Kuratnik | AI Nerd@anton_onAI·5d

@growing_daniel It's... not? Professional copywriter here + side hobby is fiction writing. Can get amazing results, just need good prompt engineering/process. Usually it's just not enough data.

English

Daniel@growing_daniel·6d

Why is AI writing still so bad

English

877

1.6K

321.2K

Anton Kuratnik | AI Nerd@anton_onAI·6d

Exactly. In fact I think LLMs can be made MORE creative than humans via temp/top p controls. They already have the weirdest connections between concepts baked in. The biggest issue right now is that LLMs run on a single temp/top setting per answer. And we generally want coherent/reliable answers which punishes creativity. Modulating that during a prompt or introducing a creative output mode that runs before thinking can probably unlock a lot of that.

English

239

ℏεsam@Hesamation·6d

“LLMs CAN’T COME UP WITH NEW IDEAS.” new ideas aren’t out of distribution. they come from recombination, abstraction, analogy, and search. the Wright brothers saw birds, bicycles, wings, engines, and then combined them into an airplane.

Zhu Liang@paradite_

i’m really surprised that people don’t see this. It’s mathematically true that llms can’t come up with novel ideas, because the whole point of training is to reduce loss, gain rewards so that the model adhere to rules and ground truth. if you have a model that can come up with novel ideas, it must have high loss during sft or rl.

English

168

126

1.6K

202.5K

Anton Kuratnik | AI Nerd@anton_onAI·6d

@matvelloso This is why agents are the absolutely wrong thing to hype up for businesses. Not until prompt injection and blackbox issues are resolved

English

125

Mat Velloso@matvelloso·19 Haz

-We built a sandbox for agents! -Oh, cool, so they are blocked from accessing anything outside? -Well, no, they need to access files, emails, APIs... -So... you have a sandbox with a literal port open to the internet? -Well, yeah otherwise the agents would be useless -I see... But at least they can't write and run arbitrary code, right? -What, no, of course they can do that, they are agents -So... your sandbox lets agents write and run code that can literally run anything on internet? -Yeah -Let me ask you this: Are the employees in your company running these on their machines? -Well, they are... -But...? -...but with guardrails -Guardrails? -Yeah -Let me guess: The guardrail is a prompt? -IT'S A VERY NICELY FORMATTED MARKDOWN FILE OK

English

826

100.6K

Anton Kuratnik | AI Nerd@anton_onAI·20 Haz

Claude is awful at updating its knowledge of current models even when it clearly states it knows it's out of date AND I tell it to research recent models only and not rely on its training. I always have to push back to get real results. ChatGPT is better at educating itself first

English

436

kaios@kaiostephens·19 Haz

I asked both GPT-5.5-XHigh and Opus 4.8 High to find me the best model to run on a 3090 class card. Claude said to run gpt-oss-20b, we all know this model is extremely outdated and far from local SOTA, but the thing I found interesting was ChatGPT telling me to use Qwen3.6-27B, IQ4_XS GGUF I would argue this is objectively the correct answer, even if it ran at lower decode and PP, Qwen scores 150% higher than gpt-oss does on Artificial analysis. I doubt this is a knowledge cutoff problem, very curious why this was the output, I would have guessed it would have been the opposite.

English

17.2K

Anton Kuratnik | AI Nerd@anton_onAI·20 Haz

@oleg008 A person just starting to use AI told me they told Claude "not to be dramatic" and I tried it and it actually did really well lol

English

386

Oleg | webstudio.is@oleg008·19 Haz

I have a single word I use that improves LLMs code quality by 10x. It is a simple non-technical word, but non-engineers would never use it. Engineers after decades of engineering know this too well. Guess what this word is?

English

110

151

195.8K

Descubrir

@someRandomDev5 @zerohedge @homoludens @shashankgoyal95 @paul_dentro @ahtoshkaa @LordoftheMounts @NoetekCo