Naka-pin na Tweet
Agntro
19 posts

Agntro
@AgntroAI
Software dev & AI enthusiast. Here to share insights on intelligent engineering. Crafting the future of software development at @AgntroAI
Sumali Mart 2026
158 Sinusundan11 Mga Tagasunod

I don't know. Is that it? For all the buzz? For the crazy size? For the crazy price? For the crazy latency? For the crazy daily limits? For the crazy anti-AI research lobotomy? For all these "Ooohh, we are so afraid to show it!" and "Ooooh, someone has got a non-authorized access to it, ooohhhh!"
That's it?
That's ridiculous.



English

Here is how Chinese open-source companies can actually make money:
Selling personal inference hardware.
If they partner with companies like Huawei to sell devices specialized for inference, it will bring in massive revenue.
By doing this, they won't have to bleed money on massive inference costs to serve consumers.
They would only need minimal inference just for training.
This solves the cost issue and serves as a great way to counter US frontier labs and their ever-increasing inference costs.
This is the future we need to head towards.
English

Update: ran the same test on kimi-k2.7-code
Result: it nailed the canonical architecture — one architect running 3 parallel plan variations → an arbiter synthesizing the best. The same shape four of my five original models converged on.
The fascinating part is where it still leaked: zero vocabulary-level flags, but the cross-model auditor caught two paraphrase-level ones — "inline definitions take precedence over fallback lookup" is my task's timezone-resolution feature wearing a costume. The model abstracts every word perfectly and still mirrors the structure of the requirements.
One rung subtler than where most models fail.
I also gave it the auditor seat: clean verdict on a known-clean design, no false positives. Strictness still unproven. That's for the weekends testing to answer
Agntro@AgntroAI
English

@ID_AA_Carmack I'm on a similar path. Exploring if a robust set of general instructions and deep workflows can make weaker models perform on the same level as the frontiers.
English

It seems like LLMs could optimize coding style by exploring ways of structuring code so weaker and weaker models can still successfully perform tasks in a codebase.
There are surely stylistic quirks that are peculiarly impactful to transformers, but I bet there would be a lot of overlap with human capabilities.
Optimizing for understanding should help even the top frontier models, allowing them to understand things “at a glance” without having to explicitly explore. There will remain “better” and “worse” ways to code.
English

Well, you have my attention. I know what I'll be testing this weekend.
Kimi.ai@Kimi_Moonshot
🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai
English

@TheGeorgePu Play a game with an LLM where it gives you the instructions and you code
English

So you can use the 5th/6th/7th best LLMs, getting 80-85% of the top guys' performance, but at an 85-95% discount in price?
You know what we call that? A commodity...
exactly what happened with LCD TVs, OLEDs, solar panels, electric cars, phones, etc
good luck with your AI IPOs!
zerohedge@zerohedge
LLM model matrix
English

@droidbuilds You should loop your subscriptions to buy more subscriptions
English

@JunaidAckroyd At the current level of LLMs, the answer is still yes.
One-shotting or developing and launching your app idea over the weekend is great, but you should still spend the time to understand how it works. LLM capabilities still decline the larger the codebase grows.
English

@codevsdev To explain what it did without having read the code..
And take the blame if it did poorly
English

I'm currently exploring the idea, that a workflow with a robust set of specialized nodes of different agent instructions could be all you need to solve complex problems even using a Flash model.
The open benchmarks for LLMs are a great testing ground for the idea and I can't yet give an answer as my work on the idea is in it's early stages.
But what I have observed is, that full workflow reruns with A/B testing of prompts is really slow, so my latest approach is to use an additional observer LLM that's already aware of the task and the solution and can cut-off a nodes progress early on, once a drift in the wrong direction is detected. It would then fork it from a checkpoint and iterate on general prompts trying to steer it in the right direction without providing hints to the real solution.
DeepSWE task set is my first target, I'll share more insights once I test the newest observer flow.
English

@CryptoWhales_X Thanks, but my work & product isn't related to crypto or Web3 😅🫡
English

@AgntroAI Let's Collab 🔥
Let's boost the token/memecoin. 🙌
English

Yes, I'm quite actively working on a tool that was meant to cover my needs as a developer and frustration with having to use multiple VScode extensions/CLIs to run MD plan reviews through multiple LLMs for second-opinions.
The freedom to arrange workflows, roles, fan-out into multiple tasks, LLM model from different providers orchestration, smart cache handling and reuse, git worktrees, snapshot a workflow as flawed -> convert the state to a benchmark set -> run multiple models on it or different workflows to match the right tool for that task, drop any previous LLM session into insights and pick a model that would analyze the performance of that session.
As well as other functionality like splitting your code into domains through louvain communities, running summaries/tag attribution on them with flash models, exposing AST based tools alongside the common read/write/run_command.
It was a deep dive, but it's approaching a state where I'll be seeking beta testers.

English

I want some kind of LLM workflow tool.
• Ability to manage a set of input files (Markdown or similar), plus other general-purpose context.
• With real-time collaboration. (And maybe some concept of snapshots or VCS integration.)
• And the ability to create/manage a inference workflows and a stored set of prompts.
• Access to general-purpose coding agents (and not just chat models).
• Some concept of compiled outputs/inference results (which ideally can be shared externally).
Many projects have this feeling: "there is all this stuff, which I want to process/compute over in this iterated way, with some build artifacts being important/worth saving." GNU Autotools x Notion or something. Is anyone building this?
English








