
FlyMy.AI
108 posts

FlyMy.AI
@FlyMy_AI
New official account. Previous one was blocked O_o. End-game agentic cloud: everything else is a plugin. Built by NVIDIA AI, Stability AI, ICPC champs.



Question for AI engineering community: what is the current best practice for giving a single agent access to a potentially unbounded number of skills? Goals are (in priority order) 1. Maximize skill use accuracy 2. Minimize context use 3. Minimize unnecessary tool calls


Though bash is a completely valid REPL, the amount of time coding agents lose during experimentation because they iterate on scripts instead of a Jupyter-like in-memory REPL is basically dumb. Fixing 1 local bug should not require restarting the whole job. Need better scaffolds.


i imagine the next breakout coding product is something that sticks a single orchestrator you talk with in front of cloud, parallel agents. it's too mentally taxing to keep a high # of parallel agents in the air by yourself. plus brutal merge conflicts.



it’s funny seeing so many “if you hand write code or don’t have 10 agents running, you’re falling behind posts” at the same time, Pi is growing rapidly, and @badlogicgames wrote >50% manually and uses 1 agent at a time what i’m saying is that the focus should be on improving your engineering knowledge and building a great product rather than focusing on building these huge complex agent systems that get rendered useless by the next model release. and every time someone says you’re falling behind, check their bio and see the name of the AI tool they’re selling you 💀


A current best practice is that AI generated content should be consumed by you, not shared with others. This applies to code, bug reports, emails, and so on.


GPT-5.4 is state-of-the-art on GDPval, and here are some examples of how the model is much better at well-specified knowledge work tasks 6mos ago the models could barely make a spreadsheet or slide! progress is happening really fast


I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)




i'm so excited for what comes next







We don’t have evidence of a widespread issue with codex usage being drained faster than it should but there are enough reports and we have reset rate limits for plus & pro subscriptions while we work towards wrapping up our investigation over the coming 1-3 days.


@deredleritt3r I continue to think things are pretty well on track for the sort of powerful AI system defined in machines of loving grace - buildable end of 2026, running many copies 2027. Of course, there are many reasons this could not occur, but lots of progress so far










