Anton Kuratnik | AI Nerd

2.1K posts

Anton Kuratnik | AI Nerd banner
Anton Kuratnik | AI Nerd

Anton Kuratnik | AI Nerd

@anton_onAI

Big-time AI nerd. Founder of Expert Studio AI: we build automations and AI tools that save your team time (no hype, actual results, security/safety first).

เข้าร่วม Mayıs 2022
60 กำลังติดตาม2.3K ผู้ติดตาม
Anton Kuratnik | AI Nerd
Anton Kuratnik | AI Nerd@anton_onAI·
Folks: what's the best model for long agentic task? Specific use case: 2 skills, 1 mcp, and a large spec. Need a model to just follow it, not miss details, and get it done. Codex refuses to follow what's set out in the skill. Opus refuses to read documentation. Deepseek?
English
0
0
0
163
zerohedge
zerohedge@zerohedge·
"the share of tokens used for US models on OpenRouter has collapsed": Bloomberg
zerohedge tweet media
English
164
495
3.6K
547.2K
Anton Kuratnik | AI Nerd
Anton Kuratnik | AI Nerd@anton_onAI·
It should be possible to compact an AI conversation from a specific message. Sometimes I can lose certain context but other messages need to be there in full.
English
0
0
1
136
Qwen
Qwen@Alibaba_Qwen·
📣📣 Meet Qwen-AgentWorld — a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation. 🤔 LLMs are trained to be better agents — better at acting in environments. But nobody has trained them to model the environments themselves. 🗺️ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes: 1️⃣ Build a foundation model for environment simulation — outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench 2️⃣ Investigate how world modeling enhances agent training: 🔬 Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments 🧠 Learning to predict environments (LWM warm-up) makes agents stronger — remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning 📑 Paper: arxiv.org/abs/2606.24597 📖 Blog: qwen.ai/blog?id=qwen-a… 💻 GitHub: github.com/QwenLM/Qwen-Ag… 🤗 HuggingFace: huggingface.co/collections/Qw… 🧩 ModelScope: modelscope.cn/collections/Qw…
Qwen tweet media
English
194
774
4.7K
1.1M
Alex Grankin
Alex Grankin@alex_grankin·
@OpenAI @Broadcom Can't wait for Habanero, and than California reaper! Just hope we don't get an eggplant... 🍆
English
1
0
4
3.1K
OpenAI
OpenAI@OpenAI·
We’ve designed and built our first AI chip: Jalapeño. Designed from the ground up by OpenAI and brought to production with @Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products. Chips are foundational to the AI economy. Building our own expands our full-stack platform from products to models to infrastructure, and will help us scale intelligence, serve more people, and expand access to AI.
OpenAI tweet media
English
1.4K
2.3K
22.3K
6.2M
Robin Ebers · AI for Small Business
my fucking god @Atlassian is such a scammy company they acquired loom and silently upgraded what used to be free guest users to paid ones (without any opt-in confirmation) only found out today because they kept spamming my inbox then trying to remove one user and the fucking site doesn't work took me a solid 10 min to cancel this hit never using Loom again
Robin Ebers · AI for Small Business tweet media
English
14
0
20
3.7K
Anton Kuratnik | AI Nerd
@pvncher Having literally the opposite problem right now. Damn thing won't listen no many how many times I tell it how to do stuff!
English
0
0
0
84
eric provencher
eric provencher@pvncher·
Because codex is so good at adhering to your skill files, you have to be very intentional about how you word the description, or they can trigger more often than necessary. The coolest thing is having codex run evals for skill activation using sub agents!
English
18
6
180
12K
Anton Kuratnik | AI Nerd
@growing_daniel It's... not? Professional copywriter here + side hobby is fiction writing. Can get amazing results, just need good prompt engineering/process. Usually it's just not enough data.
English
0
0
1
96
Daniel
Daniel@growing_daniel·
Why is AI writing still so bad
English
878
43
1.6K
320.3K
Anton Kuratnik | AI Nerd
Exactly. In fact I think LLMs can be made MORE creative than humans via temp/top p controls. They already have the weirdest connections between concepts baked in. The biggest issue right now is that LLMs run on a single temp/top setting per answer. And we generally want coherent/reliable answers which punishes creativity. Modulating that during a prompt or introducing a creative output mode that runs before thinking can probably unlock a lot of that.
English
0
0
2
239
ℏεsam
ℏεsam@Hesamation·
“LLMs CAN’T COME UP WITH NEW IDEAS.” new ideas aren’t out of distribution. they come from recombination, abstraction, analogy, and search. the Wright brothers saw birds, bicycles, wings, engines, and then combined them into an airplane.
ℏεsam tweet media
Zhu Liang@paradite_

i’m really surprised that people don’t see this. It’s mathematically true that llms can’t come up with novel ideas, because the whole point of training is to reduce loss, gain rewards so that the model adhere to rules and ground truth. if you have a model that can come up with novel ideas, it must have high loss during sft or rl.

English
169
126
1.6K
202.3K
Anton Kuratnik | AI Nerd
@matvelloso This is why agents are the absolutely wrong thing to hype up for businesses. Not until prompt injection and blackbox issues are resolved
English
0
0
0
125
Mat Velloso
Mat Velloso@matvelloso·
-We built a sandbox for agents! -Oh, cool, so they are blocked from accessing anything outside? -Well, no, they need to access files, emails, APIs... -So... you have a sandbox with a literal port open to the internet? -Well, yeah otherwise the agents would be useless -I see... But at least they can't write and run arbitrary code, right? -What, no, of course they can do that, they are agents -So... your sandbox lets agents write and run code that can literally run anything on internet? -Yeah -Let me ask you this: Are the employees in your company running these on their machines? -Well, they are... -But...? -...but with guardrails -Guardrails? -Yeah -Let me guess: The guardrail is a prompt? -IT'S A VERY NICELY FORMATTED MARKDOWN FILE OK
English
55
49
826
100.5K
Anton Kuratnik | AI Nerd
Claude is awful at updating its knowledge of current models even when it clearly states it knows it's out of date AND I tell it to research recent models only and not rely on its training. I always have to push back to get real results. ChatGPT is better at educating itself first
English
0
0
2
436
kaios
kaios@kaiostephens·
I asked both GPT-5.5-XHigh and Opus 4.8 High to find me the best model to run on a 3090 class card. Claude said to run gpt-oss-20b, we all know this model is extremely outdated and far from local SOTA, but the thing I found interesting was ChatGPT telling me to use Qwen3.6-27B, IQ4_XS GGUF I would argue this is objectively the correct answer, even if it ran at lower decode and PP, Qwen scores 150% higher than gpt-oss does on Artificial analysis. I doubt this is a knowledge cutoff problem, very curious why this was the output, I would have guessed it would have been the opposite.
kaios tweet media
English
13
2
91
17.2K
Anton Kuratnik | AI Nerd
Anton Kuratnik | AI Nerd@anton_onAI·
@oleg008 A person just starting to use AI told me they told Claude "not to be dramatic" and I tried it and it actually did really well lol
English
0
0
1
386
Oleg | webstudio.is
Oleg | webstudio.is@oleg008·
I have a single word I use that improves LLMs code quality by 10x. It is a simple non-technical word, but non-engineers would never use it. Engineers after decades of engineering know this too well. Guess what this word is?
English
110
2
151
195.8K
Anton Kuratnik | AI Nerd
Anton Kuratnik | AI Nerd@anton_onAI·
@stevekrouse Night and day for me. The first big "wow" of 2026 in terms of AI intelligence. On my projects Fable would crack stuff in 10 minutes that I now do in 3 hours by combining Opus + Codex + a 4-llm open source council. Miss that model a lot and I used it for 10 hours.
English
0
0
0
132
Steve Krouse
Steve Krouse@stevekrouse·
I used Fable nonstop while it was out and am now back to Opus, and I don't notice a difference If you told me that there was a bug with my Claude Code for those three days, and I was on Opus the whole time, I wouldn't be surprised I am skeptical about all the claims of how much better people find it. Not because I don't think it's better. I trust Anthropic evals. But because I think our guts are poorly calibrated to sense differences in intelligence at this level My guess is that it's a lot like blind taste testing of wine: it's orders of magnitude harder than you'd think it is. It's easy to fool yourself that you can tell the difference Which I guess we can turn into a challenge: when Fable comes back online, I can make a "blind taste test" app to give people a chance to see if they can tell which is which. I'd be very impressed with those that can! I'd love to learn your ways!
Steve Krouse@stevekrouse

English
141
3
276
70.6K
Anton Kuratnik | AI Nerd
Anton Kuratnik | AI Nerd@anton_onAI·
@ClaudeDevs Caching seems broken again. Just used 20% of my WEEKLY max plan usage in like 3 prompts on long but fresh opus convos.
English
0
0
0
34
Anton Kuratnik | AI Nerd
Anton Kuratnik | AI Nerd@anton_onAI·
You're mixing so many signals here. First: text-only has so many use cases. I like GLM5.2, even if it's clearly not Opus level. But second, mixed-inference providers like openrouter are notoriously bad at this. That's not the model, that's how it's delivered to you. Try Kimi from moonshot directly or from fireworks/deep infra. Kimi is a solid model too (but also not Opus level)
English
1
0
0
166
Robin Ebers · AI for Small Business
open source models still suck ass look at the actual work, not the benchmarks just tested Kimi K2.7 and GLM 5.2 again and holy shit both still get stuck in debug death spirals both still burn tokens in loops for minutes, never coming up with the actual solution (SOTA solved this 12+ months ago) GLM 5.2 doesn't even support image input - you are serious people?? stop coping because they're cheap yes, they're important, but it doesn't make them good (yet) the closed models play a completely different game by now for example: - user experience - prompt intent - autonomy every time I see people post bullshit about the latest design arena benchmark, claiming that now model X is almost as good as Fable 5, I'm literally shaking my fucking head stop embarrassing yourself in public rant over. thank you for coming to my ted talk.
Robin Ebers · AI for Small Business tweet media
English
52
3
58
10.6K
Adit_Yah ☄️
Adit_Yah ☄️@Adidotdev·
GLM-5.2 matches Opus 4.8 at one third the size. ( open source ) Anthropic spent billions. i'm not okay with how casual this is..
English
51
9
359
21.3K