Agntro

19 posts

Agntro

@AgntroAI

Software dev & AI enthusiast. Here to share insights on intelligent engineering. Crafting the future of software development at @AgntroAI

Sumali Mart 2026

158 Sinusundan11 Mga Tagasunod

Naka-pin na Tweet

Agntro@AgntroAI·1d

x.com/i/article/2065…

ZXX

170

Agntro@AgntroAI·1d

@burkov Didn't GPT-3 need a kill switch, cause it could just gain consciousness?

English

100

BURKOV@burkov·1d

I don't know. Is that it? For all the buzz? For the crazy size? For the crazy price? For the crazy latency? For the crazy daily limits? For the crazy anti-AI research lobotomy? For all these "Ooohh, we are so afraid to show it!" and "Ooooh, someone has got a non-authorized access to it, ooohhhh!" That's it? That's ridiculous.

English

123

20.5K

Agntro@AgntroAI·1d

@jun_song If China were to start producing Mask ROM of DeepSeek V4 Flash model. You could sell that to consumer market like pancakes.

English

Jun Song@jun_song·1d

Here is how Chinese open-source companies can actually make money: Selling personal inference hardware. If they partner with companies like Huawei to sell devices specialized for inference, it will bring in massive revenue. By doing this, they won't have to bleed money on massive inference costs to serve consumers. They would only need minimal inference just for training. This solves the cost issue and serves as a great way to counter US frontier labs and their ever-increasing inference costs. This is the future we need to head towards.

English

6.8K

Agntro@AgntroAI·1d

Update: ran the same test on kimi-k2.7-code Result: it nailed the canonical architecture — one architect running 3 parallel plan variations → an arbiter synthesizing the best. The same shape four of my five original models converged on. The fascinating part is where it still leaked: zero vocabulary-level flags, but the cross-model auditor caught two paraphrase-level ones — "inline definitions take precedence over fallback lookup" is my task's timezone-resolution feature wearing a costume. The model abstracts every word perfectly and still mirrors the structure of the requirements. One rung subtler than where most models fail. I also gave it the auditor seat: clean verdict on a known-clean design, no false positives. Strictness still unproven. That's for the weekends testing to answer

Agntro@AgntroAI

x.com/i/article/2065…

English

Agntro@AgntroAI·1d

@ID_AA_Carmack I'm on a similar path. Exploring if a robust set of general instructions and deep workflows can make weaker models perform on the same level as the frontiers.

English

276

John Carmack@ID_AA_Carmack·2d

It seems like LLMs could optimize coding style by exploring ways of structuring code so weaker and weaker models can still successfully perform tasks in a codebase. There are surely stylistic quirks that are peculiarly impactful to transformers, but I bet there would be a lot of overlap with human capabilities. Optimizing for understanding should help even the top frontier models, allowing them to understand things “at a glance” without having to explicitly explore. There will remain “better” and “worse” ways to code.

English

173

103

1.7K

113.1K

Agntro@AgntroAI·1d

Well, you have my attention. I know what I'll be testing this weekend.

Kimi.ai@Kimi_Moonshot

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai

English

Agntro@AgntroAI·1d

When you swap internet connection and there is no connection retry error in Claude Code during a running task 🤔 Are there actually built-in delays before calling the service to perform soft rate limiting?

English

Agntro@AgntroAI·2d

@adxtyahq You can do that with Roo Code plugin on VScode through mode api configuration. Just that it's abandoned now, so you have to apply your own updates if you need to support new models.

English

aditya@adxtyahq·2d

Can someone please build this already? An IDE that automatically switches models based on the task. Cheap models for simple edits, Claude/GPT for the stuff that actually needs reasoning. And let me configure the routing rules myself

English

138

7.4K

Agntro@AgntroAI·2d

@TheGeorgePu Play a game with an LLM where it gives you the instructions and you code

English

George Pu@TheGeorgePu·2d

I'm a bit surprised by how little I use code editors now.

English

1.8K

Agntro@AgntroAI·2d

@puppyeh1 Will be more relevant once the subscriptions are nerfed and force you to pay full API price.

English

Jeremy Raper@puppyeh1·2d

So you can use the 5th/6th/7th best LLMs, getting 80-85% of the top guys' performance, but at an 85-95% discount in price? You know what we call that? A commodity... exactly what happened with LCD TVs, OLEDs, solar panels, electric cars, phones, etc good luck with your AI IPOs!

zerohedge@zerohedge

LLM model matrix

English

190

408

4.6K

502.8K

Agntro@AgntroAI·2d

@araseb_ Why do you need home security systems when you have a door lock?

English

Sarah@araseb_·2d

You’re in a tech interview and they ask you: “Why should we hire you when Codex can write code?” What’s your answer?

English

427

171.4K

Agntro@AgntroAI·2d

@droidbuilds You should loop your subscriptions to buy more subscriptions

English

DROID@droidbuilds·3d

"mom, how did we get so poor?" "your father had Claude Max, ChatGPT Pro, Cursor Pro and shipped absolutely nothing"

English

295

935

13.8K

696.5K

Agntro@AgntroAI·2d

If you know the exact function you want to fix, pull up to 2 levels of branches from AST and inline the data models used in a single file, bake the line numbers into comment headers above the extracted functions. Instruct the LLM to only read/edit that file, a tool can swap it back.

English

Ivan Fioravanti ᯅ@ivanfioravanti·2d

In this token economy, I hate how many AI models add extra code (methods, variables, guards) beyond the scope I asked for! 🤬 It wastes tokens on stuff I don’t want and even more tokens to remove it. Pretty sure it’s intentional… No? 🤔

English

3.7K

Agntro@AgntroAI·2d

@JunaidAckroyd At the current level of LLMs, the answer is still yes. One-shotting or developing and launching your app idea over the weekend is great, but you should still spend the time to understand how it works. LLM capabilities still decline the larger the codebase grows.

English

410

Junaid Ackroyd@JunaidAckroyd·3d

Be honest devs, Is coding still worth learning in the AI era?

English

331

472

106.1K

Agntro@AgntroAI·2d

@codevsdev To explain what it did without having read the code.. And take the blame if it did poorly

English

Tom ☕@codevsdev·2d

if AI writes 80% of your code what skill is actually yours?

English

773

234

57.6K

Agntro@AgntroAI·2d

I'm currently exploring the idea, that a workflow with a robust set of specialized nodes of different agent instructions could be all you need to solve complex problems even using a Flash model. The open benchmarks for LLMs are a great testing ground for the idea and I can't yet give an answer as my work on the idea is in it's early stages. But what I have observed is, that full workflow reruns with A/B testing of prompts is really slow, so my latest approach is to use an additional observer LLM that's already aware of the task and the solution and can cut-off a nodes progress early on, once a drift in the wrong direction is detected. It would then fork it from a checkpoint and iterate on general prompts trying to steer it in the right direction without providing hints to the real solution. DeepSWE task set is my first target, I'll share more insights once I test the newest observer flow.

English

Agntro@AgntroAI·2d

ZXX

Agntro@AgntroAI·5d

@CryptoWhales_X Thanks, but my work & product isn't related to crypto or Web3 😅🫡

English

Crypto Whales 🐋 🐳 🐬@CryptoWhales_X·5d

@AgntroAI Let's Collab 🔥 Let's boost the token/memecoin. 🙌

English

Agntro@AgntroAI·6d

Yes, I'm quite actively working on a tool that was meant to cover my needs as a developer and frustration with having to use multiple VScode extensions/CLIs to run MD plan reviews through multiple LLMs for second-opinions. The freedom to arrange workflows, roles, fan-out into multiple tasks, LLM model from different providers orchestration, smart cache handling and reuse, git worktrees, snapshot a workflow as flawed -> convert the state to a benchmark set -> run multiple models on it or different workflows to match the right tool for that task, drop any previous LLM session into insights and pick a model that would analyze the performance of that session. As well as other functionality like splitting your code into domains through louvain communities, running summaries/tag attribution on them with flash models, exposing AST based tools alongside the common read/write/run_command. It was a deep dive, but it's approaching a state where I'll be seeking beta testers.

English

585

Patrick Collison@patrickc·6 Haz

I want some kind of LLM workflow tool. • Ability to manage a set of input files (Markdown or similar), plus other general-purpose context. • With real-time collaboration. (And maybe some concept of snapshots or VCS integration.) • And the ability to create/manage a inference workflows and a stored set of prompts. • Access to general-purpose coding agents (and not just chat models). • Some concept of compiled outputs/inference results (which ideally can be shared externally). Many projects have this feeling: "there is all this stuff, which I want to process/compute over in this iterated way, with some build artifacts being important/worth saving." GNU Autotools x Notion or something. Is anyone building this?

English

440

109

2.5K

556.8K

Tuklasin

@burkov @jun_song @ID_AA_Carmack @adxtyahq @TheGeorgePu @puppyeh1 @araseb_ @droidbuilds