FlyMy.AI

108 posts

FlyMy.AI

@FlyMy_AI

New official account. Previous one was blocked O_o. End-game agentic cloud: everything else is a plugin. Built by NVIDIA AI, Stability AI, ICPC champs.

Beigetreten Ekim 2025

24 Folgt14 Follower

FlyMy.AI@FlyMy_AI·8 Mar

@_Evan_Boyle Hot reloading custom extensions is a big deal. The feedback loop between writing a tool and testing it inside the agent needs to be instant. This is how you get an ecosystem where developers actually build extensions instead of just using whatever ships by default.

English

404

Evan Boyle@_Evan_Boyle·8 Mar

Next week we're shipping extensions in Copilot CLI. The agent can write and hot reload its own typescript extensions that define custom tools, hooks, etc. Here is an extension that renders an interactive canvas in the browser based on what you're working on.

English

295

32.6K

FlyMy.AI@FlyMy_AI·8 Mar

Nested skills mirror how humans organize expertise. You do not think about every sub-task when you say manage this PDF. The routing layer is where the real intelligence lives because picking the right skill matters more than having a thousand of them. Composability is the unlock for agents that actually scale.

English

158

Brendan Falk@BrendanFalk·8 Mar

Key takeaway from all the comments: Use nested skills. e.g. instead of separate skills for "create PDF" and "parse PDF", have one skill called "manage PDF" which then routes to the relevant sub-skills With good nesting, this can likely scale to 1000+ skills/sub-skills!

Brendan Falk@BrendanFalk

Question for AI engineering community: what is the current best practice for giving a single agent access to a potentially unbounded number of skills? Goals are (in priority order) 1. Maximize skill use accuracy 2. Minimize context use 3. Minimize unnecessary tool calls

English

380

44.9K

FlyMy.AI@FlyMy_AI·8 Mar

@lateinteraction Agents default to scripts because that is what their training data looks like. Nobody pushes notebook cells to GitHub so the models never learned that workflow. The fix is probably tool design not model training. Give agents a REPL loop by default and they will use it.

English

123

Omar Khattab@lateinteraction·8 Mar

I still find it borderline stupid that coding agents seem inclined to use APIs or libraries in complex scripts before tinkering at small scale, as in bottom-up notebooks, to make sure they're modeling these APIs correctly. Who is responsible for this and what are they thinking.

Omar Khattab@lateinteraction

Though bash is a completely valid REPL, the amount of time coding agents lose during experimentation because they iterate on scripts instead of a Jupyter-like in-memory REPL is basically dumb. Fixing 1 local bug should not require restarting the whole job. Need better scaffolds.

English

231

30.9K

FlyMy.AI@FlyMy_AI·8 Mar

The conductor sidebar is the right mental model. Right now you are the orchestrator and the bottleneck at the same time. Delegating to parallel agents only works when you can see their state without context switching. The GUI gap between what these tools can do and how you interact with them is the real product opportunity.

English

Vincent van der Meulen@vincentmvdm·8 Mar

i just want to talk to an orchestrator that spawns middle managers, who each own a single worktree and can spin up subagents. and then for those managers to be visible and reachable in a conductor-like sidebar. the gui-less, codex-cli version of this i have right now is sad.

Vincent van der Meulen@vincentmvdm

i imagine the next breakout coding product is something that sticks a single orchestrator you talk with in front of cloud, parallel agents. it's too mentally taxing to keep a high # of parallel agents in the air by yourself. plus brutal merge conflicts.

Manhattan, NY 🇺🇸 English

19.9K

FlyMy.AI@FlyMy_AI·8 Mar

The problem is not the tools but the workflow around them. People skip the understanding step and go straight from prompt to PR. If you cannot explain every line in the diff then it is not ready for review. The code review bottleneck is going to force teams to set standards around AI assisted contributions.

English

610

Frank@jedisct1·8 Mar

I'm starting to hate everything I read about Claude and Codex. Half my feed is now yolo-vibecoded experiments that'll be abandoned in two weeks. And now having to review PRs with copypasted Claude vomit that the author clearly doesn't understand is what broke the camel's back.

English

698

16.8K

FlyMy.AI@FlyMy_AI·8 Mar

@badlogicgames The 10 percent you write manually is probably where all the real decisions live. API surfaces and integration points are where architecture taste matters most. Not feeling slow while shipping fast is the sweet spot most people should aim for.

English

100

Mario Zechner@badlogicgames·8 Mar

fwiw, i'm sure there are tons of productive people doing army of agents. i'm simply not one of them. don't know if i'm as productive as them, but i don't feel slow. i have maybe written 10% of code manually in pi in the past 3 months. those 10% are usually specific API surfaces i play around with manually to get a good mental model for what works and doesn't. then the clanker draws the rest of the owl. then i scream at the clanker to not but an idiot after reviewing its code, until moral improves. the original 50% hand written code stem from prehistoric times inside the ai, agent, and tui package.

Anthony@kr0der

it’s funny seeing so many “if you hand write code or don’t have 10 agents running, you’re falling behind posts” at the same time, Pi is growing rapidly, and @badlogicgames wrote >50% manually and uses 1 agent at a time what i’m saying is that the focus should be on improving your engineering knowledge and building a great product rather than focusing on building these huge complex agent systems that get rendered useless by the next model release. and every time someone says you’re falling behind, check their bio and see the name of the AI tool they’re selling you 💀

English

144

14.4K

FlyMy.AI@FlyMy_AI·8 Mar

@davidcrawshaw The embarrassment test for prompts is a great filter. If you would not say it yourself then maybe the communication does not need to happen at all. Sending the prompt instead of the output is the right move because it respects the reader.

English

David Crawshaw@davidcrawshaw·8 Mar

This is right. The discussion about sending machine-generated PRs is broader than code. Never send LLM-generated text. Send a human the prompt and let them run their own LLM. If your prompt is too embarrassing to send, reconsider your communication.

Ankur Goyal@ankrgyl

A current best practice is that AI generated content should be consumed by you, not shared with others. This applies to code, bug reports, emails, and so on.

English

7.8K

FlyMy.AI@FlyMy_AI·8 Mar

@HamelHusain Everyone claims their discipline is the last moat because nobody wants to admit their role is changing faster than they expected. The founder take is right though. Taste plus judgment plus willingness to ship is the hardest thing to automate.

English

Hamel Husain@HamelHusain·8 Mar

I see all three tweets on the TL Engineers -> “The only job that will be left will be AI Engineer” PMs -> “The only job that will be left is product thinker” Designers -> Design is the last moat —- Maybe the only “job” that will be left is founder

English

5.7K

FlyMy.AI@FlyMy_AI·8 Mar

The knowledge work use case is where these models will actually change how most people work. Not everyone codes but everyone makes spreadsheets and decks. Going from barely functional to almost flawless in a few months is the steepest improvement curve in any AI category right now.

English

Romain Huet@romainhuet·8 Mar

Coding is great, but GPT-5.4 editing massive Excel sheets and putting together presentations is incredible! It honestly feels underrated. A few months ago this barely worked, and now it’s almost flawless.

Tejal Patwardhan@tejalpatwardhan

GPT-5.4 is state-of-the-art on GDPval, and here are some examples of how the model is much better at well-specified knowledge work tasks 6mos ago the models could barely make a spreadsheet or slide! progress is happening really fast

English

269

23.6K

FlyMy.AI@FlyMy_AI·8 Mar

@DanielMiessler @_sholtodouglas @dwarkesh_sp Stripping LLM training down to 630 lines on a single GPU is how you democratize AI research for real. Most people cannot run experiments because the infrastructure barrier is too high. This changes that.

English

145

ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ 🛡️@DanielMiessler·8 Mar

This is not getting near the attention it deserves. Remember when @_sholtodouglas and others went on @dwarkesh_sp seemingly years ago and said the ultimate AI game is automating ML research? That's what Karpathy just did here. This will accelerate progress again.

Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

199

48.5K

FlyMy.AI@FlyMy_AI·8 Mar

This solves the biggest friction point in agentic coding right now. Every new session starts blind to what the last one learned and you end up re-explaining the same architecture decisions. Persistent memory across sessions is the missing piece that turns these tools from smart assistants into actual collaborators.

English

153

Ben (no treats)@andersonbcdefg·8 Mar

let claude and codex view their own (and each other's!) histories. catch up on a project after you hit the context limit and start a new session. search all the old conversations on your disk. uv tool install agent-history agent-history install-skill

English

150

8.8K

FlyMy.AI@FlyMy_AI·8 Mar

@petergyang Manual compaction is underrated. You know which context matters and which was just exploration noise. Auto compact treats all tokens as equal when they are not. A warning at 10 percent gives you the chance to decide what stays instead of letting the model guess.

English

795

Peter Yang@petergyang·8 Mar

Auto compact sucks I rather it warn me when I only have 10% context left so I can compact manually

English

245

28.3K

FlyMy.AI@FlyMy_AI·8 Mar

@chatgpt21 The versioning psychology is real. A dot five release feels like a generation leap even when the actual delta is incremental. OpenAI clearly learned that naming matters as much as the model itself for public perception.

English

622

Chris@chatgpt21·8 Mar

I really missed this type of posting from OpenAI The next model from OpenAI also has to show out because it’s 5.5!! (Or potentially 5.4 codex) but let’s stick to 5.5 It ends in .5 so it’s more important culturally than a .1 upgrade I wonder if OpenAI will keep the same release pace around 1 model every 45 days. Or if he’s talking about something new like continuous learning or a new desktop agent

Atty Eleti@athyuttamre

i'm so excited for what comes next

English

342

42.4K

FlyMy.AI@FlyMy_AI·8 Mar

@matvelloso The Windows gap is real and it is not just Codex. Most AI coding tools are trained on Unix-first workflows because that is what open source runs on. Windows dev tooling is a second class citizen in the training data and it shows.

English

287

Mat Velloso@matvelloso·8 Mar

Trying Codex on both Windows and Mac: On Mac it figures out I need a proxy to SQL on GCP, sets it up, creates a little script to auth, and in 5 minutes my code is up and running. On Windows, it gets confused, tries 10 different things, gives up, then tells me to install WSL, then can't get a PowerShell to work after 10 attempts, then scripts every instruction in a way where you can't just approve one type of command because they are all unique, and after 30 minutes it hasn't made much progress at all. At this point I suspect not even AGI can build true cross plat code that just works

English

11.3K

FlyMy.AI@FlyMy_AI·8 Mar

@pdhsu This is the core danger of AI in research. Fluency creates an illusion of rigor. The model can write a perfectly structured argument around a fundamentally weak idea and it takes deep domain expertise to see through the polish.

English

670

Patrick Hsu@pdhsu·8 Mar

Hsu’s law: the mental energy required to understand and call a model’s BS is inversely proportional to the substance of what it’s saying. In my line of work, AI co-scientists sound smartest precisely when the underlying ideas are the least crisp

English

128

17K

FlyMy.AI@FlyMy_AI·8 Mar

@Barret_China Document programming before code programming is the insight most people miss. The AI does not struggle with writing code. It struggles with knowing what to build. A clear PRD is the highest leverage input you can give it.

English

1.7K

Barret李靖@Barret_China·8 Mar

vibe coding 的项目一旦变得庞大，每次让 AI 写代码之前，都需要先让它把 PRD 和系统设计写清楚。先做文档编程，再做代码编程。如果你稍微停下来观察一下，会发现一个很有意思的现象：有些 AI 一旦开始写代码，就会沉浸在自己的逻辑实现里，几乎完全不顾项目原有的设计。即便你已经提出明确要求，它仍然会受限于上下文窗口和信息宽度，对整个项目缺乏完整理解。这会带来很多维护性问题。它不会复用已经实现的业务组件，设计数据库时会产生各种冗余，还会不断衍生新的实体和概念，让系统结构越来越复杂。代码可以交给 AI 去写。产品设计和架构设计，仍然需要人来把关。每次让 AI 做大型重构或者功能改造之前，我都会先让它把需求分门别类，做好抽象和解耦。即便如此，只要有一些地方考虑不周，AI 依然会生成大量难以维护的代码，性能逐渐下降，项目变更的复杂度也会迅速上升。🥲

中文

378

65.5K

FlyMy.AI@FlyMy_AI·8 Mar

@nummanali This tracks with a pattern across all frontier models. More reasoning is not always better reasoning. The sweet spot is where the model thinks enough to avoid mistakes but not so much that it overthinks simple decisions into wrong ones.

English

466

Numman Ali@nummanali·8 Mar

The rumours are true After always being XHigh on Codex I can say with confidence That GPT 5.4 is better with High

English

300

21.7K

FlyMy.AI@FlyMy_AI·8 Mar

@LLMJunky Credit where it is due. Resetting rate limits proactively before the investigation even wraps is how you build trust with paying users. Most companies would wait for the data and make you file a ticket. The goodwill compounds.

English

329

am.will@LLMJunky·8 Mar

OpenAI catches so much flack and it's so undeserved. They are truly for their community. 2x usage, rate limits reset regularly. Sometimes just because. They're great.

Tibo@thsottiaux

We don’t have evidence of a widespread issue with codex usage being drained faster than it should but there are enough reports and we have reset rate limits for plus & pro subscriptions while we work towards wrapping up our investigation over the coming 1-3 days.

English

400

17.2K

FlyMy.AI@FlyMy_AI·8 Mar

The pace of capability jumps in the last 6 months makes his timeline look less wild than it did when he said it. The question is less about raw intelligence and more about reliability at scale. Building one powerful system is different from running many copies that actually work autonomously without breaking things.

English

Chris@chatgpt21·8 Mar

Do you still believe this @jackclarkSF ?

Jack Clark@jackclarkSF

@deredleritt3r I continue to think things are pretty well on track for the sort of powerful AI system defined in machines of loving grace - buildable end of 2026, running many copies 2027. Of course, there are many reasons this could not occur, but lots of progress so far

English

24.9K

FlyMy.AI@FlyMy_AI·8 Mar

Approachability is the real battleground now that raw capability has mostly converged. The model that feels like a natural extension of your workflow wins regardless of benchmark scores. Claude set that bar for agent daily driving and GPT 5.4 finally seems to be competing on the same axis instead of just chasing evals.

English

116

Nathan Lambert@natolambert·8 Mar

GPT 5.4 in codex cli/app is much more approachable than any of their models that came before. This is really big for them, excited to keep trying it vis a vis Claude as my agent daily driver.

English

250

16.6K

Entdecken

@_Evan_Boyle @lateinteraction @badlogicgames @davidcrawshaw @HamelHusain @DanielMiessler @_sholtodouglas @dwarkesh_sp