Agntro

21 posts

Agntro

@AgntroAI

Software dev & AI enthusiast. Here to share insights on intelligent engineering. Crafting the future of software development at @AgntroAI

Katılım Mart 2026

161 Takip Edilen11 Takipçiler

Sabitlenmiş Tweet

Agntro@AgntroAI·1d

x.com/i/article/2065…

ZXX

220

Agntro@AgntroAI·1h

@MiniMax_AI Well deserved. Your model proved to be great, following instructions and even reviewing itself critically where others would let their own flaws slip through. x.com/AgntroAI/statu…

Agntro@AgntroAI

x.com/i/article/2065…

English

MiniMax (official)@MiniMax_AI·8h

Who’s claiming follower #100,000? 🧋👀💃

English

325

20K

Agntro@AgntroAI·2h

@ChaoticEclipse0 Rest properly and spend time with people closest to you. There's a lot of interesting developments in technology and the world will need your skills.

English

Nightmare Eclipse@ChaoticEclipse0·1d

I think it's over. I don't think I can do it anymore. My road ends here (hopefully I succeed.)

English

645

19.3K

Agntro@AgntroAI·1d

@burkov Didn't GPT-3 need a kill switch, cause it could just gain consciousness?

English

102

BURKOV@burkov·1d

I don't know. Is that it? For all the buzz? For the crazy size? For the crazy price? For the crazy latency? For the crazy daily limits? For the crazy anti-AI research lobotomy? For all these "Ooohh, we are so afraid to show it!" and "Ooooh, someone has got a non-authorized access to it, ooohhhh!" That's it? That's ridiculous.

English

123

20.6K

Agntro@AgntroAI·1d

@jun_song If China were to start producing Mask ROM of DeepSeek V4 Flash model. You could sell that to consumer market like pancakes.

English

Jun Song@jun_song·1d

Here is how Chinese open-source companies can actually make money: Selling personal inference hardware. If they partner with companies like Huawei to sell devices specialized for inference, it will bring in massive revenue. By doing this, they won't have to bleed money on massive inference costs to serve consumers. They would only need minimal inference just for training. This solves the cost issue and serves as a great way to counter US frontier labs and their ever-increasing inference costs. This is the future we need to head towards.

English

6.8K

Agntro@AgntroAI·1d

Update: ran the same test on kimi-k2.7-code Result: it nailed the canonical architecture — one architect running 3 parallel plan variations → an arbiter synthesizing the best. The same shape four of my five original models converged on. The fascinating part is where it still leaked: zero vocabulary-level flags, but the cross-model auditor caught two paraphrase-level ones — "inline definitions take precedence over fallback lookup" is my task's timezone-resolution feature wearing a costume. The model abstracts every word perfectly and still mirrors the structure of the requirements. One rung subtler than where most models fail. I also gave it the auditor seat: clean verdict on a known-clean design, no false positives. Strictness still unproven. That's for the weekends testing to answer

Agntro@AgntroAI

x.com/i/article/2065…

English

Agntro@AgntroAI·1d

@ID_AA_Carmack I'm on a similar path. Exploring if a robust set of general instructions and deep workflows can make weaker models perform on the same level as the frontiers.

English

277

John Carmack@ID_AA_Carmack·2d

It seems like LLMs could optimize coding style by exploring ways of structuring code so weaker and weaker models can still successfully perform tasks in a codebase. There are surely stylistic quirks that are peculiarly impactful to transformers, but I bet there would be a lot of overlap with human capabilities. Optimizing for understanding should help even the top frontier models, allowing them to understand things “at a glance” without having to explicitly explore. There will remain “better” and “worse” ways to code.

English

173

104

1.7K

113.3K

Agntro@AgntroAI·1d

Well, you have my attention. I know what I'll be testing this weekend.

Kimi.ai@Kimi_Moonshot

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai

English

Agntro@AgntroAI·2d

When you swap internet connection and there is no connection retry error in Claude Code during a running task 🤔 Are there actually built-in delays before calling the service to perform soft rate limiting?

English

Agntro@AgntroAI·2d

@adxtyahq You can do that with Roo Code plugin on VScode through mode api configuration. Just that it's abandoned now, so you have to apply your own updates if you need to support new models.

English

aditya@adxtyahq·2d

Can someone please build this already? An IDE that automatically switches models based on the task. Cheap models for simple edits, Claude/GPT for the stuff that actually needs reasoning. And let me configure the routing rules myself

English

138

7.4K

Agntro@AgntroAI·2d

@TheGeorgePu Play a game with an LLM where it gives you the instructions and you code

English

George Pu@TheGeorgePu·2d

I'm a bit surprised by how little I use code editors now.

English

1.8K

Agntro@AgntroAI·2d

@puppyeh1 Will be more relevant once the subscriptions are nerfed and force you to pay full API price.

English

Jeremy Raper@puppyeh1·2d

So you can use the 5th/6th/7th best LLMs, getting 80-85% of the top guys' performance, but at an 85-95% discount in price? You know what we call that? A commodity... exactly what happened with LCD TVs, OLEDs, solar panels, electric cars, phones, etc good luck with your AI IPOs!

zerohedge@zerohedge

LLM model matrix

English

190

407

4.6K

502.9K

Agntro@AgntroAI·2d

@araseb_ Why do you need home security systems when you have a door lock?

English

Sarah@araseb_·2d

You’re in a tech interview and they ask you: “Why should we hire you when Codex can write code?” What’s your answer?

English

427

171.5K

Agntro@AgntroAI·2d

@droidbuilds You should loop your subscriptions to buy more subscriptions

English

DROID@droidbuilds·3d

"mom, how did we get so poor?" "your father had Claude Max, ChatGPT Pro, Cursor Pro and shipped absolutely nothing"

English

295

935

13.8K

696.7K

Agntro@AgntroAI·2d

If you know the exact function you want to fix, pull up to 2 levels of branches from AST and inline the data models used in a single file, bake the line numbers into comment headers above the extracted functions. Instruct the LLM to only read/edit that file, a tool can swap it back.

English

Ivan Fioravanti ᯅ@ivanfioravanti·2d

In this token economy, I hate how many AI models add extra code (methods, variables, guards) beyond the scope I asked for! 🤬 It wastes tokens on stuff I don’t want and even more tokens to remove it. Pretty sure it’s intentional… No? 🤔

English

3.7K

Agntro@AgntroAI·2d

@JunaidAckroyd At the current level of LLMs, the answer is still yes. One-shotting or developing and launching your app idea over the weekend is great, but you should still spend the time to understand how it works. LLM capabilities still decline the larger the codebase grows.

English

410

Junaid Ackroyd@JunaidAckroyd·3d

Be honest devs, Is coding still worth learning in the AI era?

English

331

472

106.1K

Agntro@AgntroAI·2d

@codevsdev To explain what it did without having read the code.. And take the blame if it did poorly

English

Tom ☕@codevsdev·2d

if AI writes 80% of your code what skill is actually yours?

English

773

234

57.6K

Agntro@AgntroAI·2d

I'm currently exploring the idea, that a workflow with a robust set of specialized nodes of different agent instructions could be all you need to solve complex problems even using a Flash model. The open benchmarks for LLMs are a great testing ground for the idea and I can't yet give an answer as my work on the idea is in it's early stages. But what I have observed is, that full workflow reruns with A/B testing of prompts is really slow, so my latest approach is to use an additional observer LLM that's already aware of the task and the solution and can cut-off a nodes progress early on, once a drift in the wrong direction is detected. It would then fork it from a checkpoint and iterate on general prompts trying to steer it in the right direction without providing hints to the real solution. DeepSWE task set is my first target, I'll share more insights once I test the newest observer flow.

English

Agntro@AgntroAI·2d

ZXX

Agntro@AgntroAI·5d

@CryptoWhales_X Thanks, but my work & product isn't related to crypto or Web3 😅🫡

English

Crypto Whales 🐋 🐳 🐬@CryptoWhales_X·5d

@AgntroAI Let's Collab 🔥 Let's boost the token/memecoin. 🙌

English

Keşfet

@MiniMax_AI @ChaoticEclipse0 @burkov @jun_song @ID_AA_Carmack @adxtyahq @TheGeorgePu @puppyeh1