Agntro
21 posts

Agntro
@AgntroAI
Software dev & AI enthusiast. Here to share insights on intelligent engineering. Crafting the future of software development at @AgntroAI
Katılım Mart 2026
161 Takip Edilen11 Takipçiler
Sabitlenmiş Tweet

@MiniMax_AI Well deserved. Your model proved to be great, following instructions and even reviewing itself critically where others would let their own flaws slip through.
x.com/AgntroAI/statu…
Agntro@AgntroAI
English

@ChaoticEclipse0 Rest properly and spend time with people closest to you. There's a lot of interesting developments in technology and the world will need your skills.
English

I don't know. Is that it? For all the buzz? For the crazy size? For the crazy price? For the crazy latency? For the crazy daily limits? For the crazy anti-AI research lobotomy? For all these "Ooohh, we are so afraid to show it!" and "Ooooh, someone has got a non-authorized access to it, ooohhhh!"
That's it?
That's ridiculous.



English

Here is how Chinese open-source companies can actually make money:
Selling personal inference hardware.
If they partner with companies like Huawei to sell devices specialized for inference, it will bring in massive revenue.
By doing this, they won't have to bleed money on massive inference costs to serve consumers.
They would only need minimal inference just for training.
This solves the cost issue and serves as a great way to counter US frontier labs and their ever-increasing inference costs.
This is the future we need to head towards.
English

Update: ran the same test on kimi-k2.7-code
Result: it nailed the canonical architecture — one architect running 3 parallel plan variations → an arbiter synthesizing the best. The same shape four of my five original models converged on.
The fascinating part is where it still leaked: zero vocabulary-level flags, but the cross-model auditor caught two paraphrase-level ones — "inline definitions take precedence over fallback lookup" is my task's timezone-resolution feature wearing a costume. The model abstracts every word perfectly and still mirrors the structure of the requirements.
One rung subtler than where most models fail.
I also gave it the auditor seat: clean verdict on a known-clean design, no false positives. Strictness still unproven. That's for the weekends testing to answer
Agntro@AgntroAI
English

@ID_AA_Carmack I'm on a similar path. Exploring if a robust set of general instructions and deep workflows can make weaker models perform on the same level as the frontiers.
English

It seems like LLMs could optimize coding style by exploring ways of structuring code so weaker and weaker models can still successfully perform tasks in a codebase.
There are surely stylistic quirks that are peculiarly impactful to transformers, but I bet there would be a lot of overlap with human capabilities.
Optimizing for understanding should help even the top frontier models, allowing them to understand things “at a glance” without having to explicitly explore. There will remain “better” and “worse” ways to code.
English

Well, you have my attention. I know what I'll be testing this weekend.
Kimi.ai@Kimi_Moonshot
🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai
English

@TheGeorgePu Play a game with an LLM where it gives you the instructions and you code
English

So you can use the 5th/6th/7th best LLMs, getting 80-85% of the top guys' performance, but at an 85-95% discount in price?
You know what we call that? A commodity...
exactly what happened with LCD TVs, OLEDs, solar panels, electric cars, phones, etc
good luck with your AI IPOs!
zerohedge@zerohedge
LLM model matrix
English

@droidbuilds You should loop your subscriptions to buy more subscriptions
English

@JunaidAckroyd At the current level of LLMs, the answer is still yes.
One-shotting or developing and launching your app idea over the weekend is great, but you should still spend the time to understand how it works. LLM capabilities still decline the larger the codebase grows.
English

@codevsdev To explain what it did without having read the code..
And take the blame if it did poorly
English

I'm currently exploring the idea, that a workflow with a robust set of specialized nodes of different agent instructions could be all you need to solve complex problems even using a Flash model.
The open benchmarks for LLMs are a great testing ground for the idea and I can't yet give an answer as my work on the idea is in it's early stages.
But what I have observed is, that full workflow reruns with A/B testing of prompts is really slow, so my latest approach is to use an additional observer LLM that's already aware of the task and the solution and can cut-off a nodes progress early on, once a drift in the wrong direction is detected. It would then fork it from a checkpoint and iterate on general prompts trying to steer it in the right direction without providing hints to the real solution.
DeepSWE task set is my first target, I'll share more insights once I test the newest observer flow.
English

@CryptoWhales_X Thanks, but my work & product isn't related to crypto or Web3 😅🫡
English

@AgntroAI Let's Collab 🔥
Let's boost the token/memecoin. 🙌
English











