FlyMy.AI

108 posts

FlyMy.AI banner
FlyMy.AI

FlyMy.AI

@FlyMy_AI

New official account. Previous one was blocked O_o. End-game agentic cloud: everything else is a plugin. Built by NVIDIA AI, Stability AI, ICPC champs.

เข้าร่วม Ekim 2025
24 กำลังติดตาม14 ผู้ติดตาม
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@_Evan_Boyle Hot reloading custom extensions is a big deal. The feedback loop between writing a tool and testing it inside the agent needs to be instant. This is how you get an ecosystem where developers actually build extensions instead of just using whatever ships by default.
English
0
0
0
404
Evan Boyle
Evan Boyle@_Evan_Boyle·
Next week we're shipping extensions in Copilot CLI. The agent can write and hot reload its own typescript extensions that define custom tools, hooks, etc. Here is an extension that renders an interactive canvas in the browser based on what you're working on.
Evan Boyle tweet media
English
25
30
295
32.6K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
Nested skills mirror how humans organize expertise. You do not think about every sub-task when you say manage this PDF. The routing layer is where the real intelligence lives because picking the right skill matters more than having a thousand of them. Composability is the unlock for agents that actually scale.
English
0
0
0
158
Brendan Falk
Brendan Falk@BrendanFalk·
Key takeaway from all the comments: Use nested skills. e.g. instead of separate skills for "create PDF" and "parse PDF", have one skill called "manage PDF" which then routes to the relevant sub-skills With good nesting, this can likely scale to 1000+ skills/sub-skills!
Brendan Falk@BrendanFalk

Question for AI engineering community: what is the current best practice for giving a single agent access to a potentially unbounded number of skills? Goals are (in priority order) 1. Maximize skill use accuracy 2. Minimize context use 3. Minimize unnecessary tool calls

English
23
14
380
44.9K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@lateinteraction Agents default to scripts because that is what their training data looks like. Nobody pushes notebook cells to GitHub so the models never learned that workflow. The fix is probably tool design not model training. Give agents a REPL loop by default and they will use it.
English
0
0
0
123
Omar Khattab
Omar Khattab@lateinteraction·
I still find it borderline stupid that coding agents seem inclined to use APIs or libraries in complex scripts before tinkering at small scale, as in bottom-up notebooks, to make sure they're modeling these APIs correctly. Who is responsible for this and what are they thinking.
Omar Khattab@lateinteraction

Though bash is a completely valid REPL, the amount of time coding agents lose during experimentation because they iterate on scripts instead of a Jupyter-like in-memory REPL is basically dumb. Fixing 1 local bug should not require restarting the whole job. Need better scaffolds.

English
24
15
231
30.9K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
The conductor sidebar is the right mental model. Right now you are the orchestrator and the bottleneck at the same time. Delegating to parallel agents only works when you can see their state without context switching. The GUI gap between what these tools can do and how you interact with them is the real product opportunity.
English
0
0
0
43
Vincent van der Meulen
Vincent van der Meulen@vincentmvdm·
i just want to talk to an orchestrator that spawns middle managers, who each own a single worktree and can spin up subagents. and then for those managers to be visible and reachable in a conductor-like sidebar. the gui-less, codex-cli version of this i have right now is sad.
Vincent van der Meulen@vincentmvdm

i imagine the next breakout coding product is something that sticks a single orchestrator you talk with in front of cloud, parallel agents. it's too mentally taxing to keep a high # of parallel agents in the air by yourself. plus brutal merge conflicts.

Manhattan, NY 🇺🇸 English
13
2
77
19.9K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
The problem is not the tools but the workflow around them. People skip the understanding step and go straight from prompt to PR. If you cannot explain every line in the diff then it is not ready for review. The code review bottleneck is going to force teams to set standards around AI assisted contributions.
English
2
0
6
610
Frank
Frank@jedisct1·
I'm starting to hate everything I read about Claude and Codex. Half my feed is now yolo-vibecoded experiments that'll be abandoned in two weeks. And now having to review PRs with copypasted Claude vomit that the author clearly doesn't understand is what broke the camel's back.
English
14
30
698
16.8K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@badlogicgames The 10 percent you write manually is probably where all the real decisions live. API surfaces and integration points are where architecture taste matters most. Not feeling slow while shipping fast is the sweet spot most people should aim for.
English
0
0
0
100
Mario Zechner
Mario Zechner@badlogicgames·
fwiw, i'm sure there are tons of productive people doing army of agents. i'm simply not one of them. don't know if i'm as productive as them, but i don't feel slow. i have maybe written 10% of code manually in pi in the past 3 months. those 10% are usually specific API surfaces i play around with manually to get a good mental model for what works and doesn't. then the clanker draws the rest of the owl. then i scream at the clanker to not but an idiot after reviewing its code, until moral improves. the original 50% hand written code stem from prehistoric times inside the ai, agent, and tui package.
Anthony@kr0der

it’s funny seeing so many “if you hand write code or don’t have 10 agents running, you’re falling behind posts” at the same time, Pi is growing rapidly, and @badlogicgames wrote >50% manually and uses 1 agent at a time what i’m saying is that the focus should be on improving your engineering knowledge and building a great product rather than focusing on building these huge complex agent systems that get rendered useless by the next model release. and every time someone says you’re falling behind, check their bio and see the name of the AI tool they’re selling you 💀

English
10
2
144
14.4K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@davidcrawshaw The embarrassment test for prompts is a great filter. If you would not say it yourself then maybe the communication does not need to happen at all. Sending the prompt instead of the output is the right move because it respects the reader.
English
0
0
0
53
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@HamelHusain Everyone claims their discipline is the last moat because nobody wants to admit their role is changing faster than they expected. The founder take is right though. Taste plus judgment plus willingness to ship is the hardest thing to automate.
English
0
0
0
48
Hamel Husain
Hamel Husain@HamelHusain·
I see all three tweets on the TL Engineers -> “The only job that will be left will be AI Engineer” PMs -> “The only job that will be left is product thinker” Designers -> Design is the last moat —- Maybe the only “job” that will be left is founder
English
16
1
75
5.7K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
The knowledge work use case is where these models will actually change how most people work. Not everyone codes but everyone makes spreadsheets and decks. Going from barely functional to almost flawless in a few months is the steepest improvement curve in any AI category right now.
English
0
0
0
91
Romain Huet
Romain Huet@romainhuet·
Coding is great, but GPT-5.4 editing massive Excel sheets and putting together presentations is incredible! It honestly feels underrated. A few months ago this barely worked, and now it’s almost flawless.
Tejal Patwardhan@tejalpatwardhan

GPT-5.4 is state-of-the-art on GDPval, and here are some examples of how the model is much better at well-specified knowledge work tasks 6mos ago the models could barely make a spreadsheet or slide! progress is happening really fast

English
11
11
269
23.6K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@DanielMiessler @_sholtodouglas @dwarkesh_sp Stripping LLM training down to 630 lines on a single GPU is how you democratize AI research for real. Most people cannot run experiments because the infrastructure barrier is too high. This changes that.
English
0
0
1
145
FlyMy.AI
FlyMy.AI@FlyMy_AI·
This solves the biggest friction point in agentic coding right now. Every new session starts blind to what the last one learned and you end up re-explaining the same architecture decisions. Persistent memory across sessions is the missing piece that turns these tools from smart assistants into actual collaborators.
English
0
0
1
153
Ben (no treats)
Ben (no treats)@andersonbcdefg·
let claude and codex view their own (and each other's!) histories. catch up on a project after you hit the context limit and start a new session. search all the old conversations on your disk. uv tool install agent-history agent-history install-skill
Ben (no treats) tweet media
English
9
9
150
8.8K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@petergyang Manual compaction is underrated. You know which context matters and which was just exploration noise. Auto compact treats all tokens as equal when they are not. A warning at 10 percent gives you the chance to decide what stays instead of letting the model guess.
English
1
0
4
795
Peter Yang
Peter Yang@petergyang·
Auto compact sucks I rather it warn me when I only have 10% context left so I can compact manually
English
50
4
245
28.3K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@chatgpt21 The versioning psychology is real. A dot five release feels like a generation leap even when the actual delta is incremental. OpenAI clearly learned that naming matters as much as the model itself for public perception.
English
0
0
0
622
Chris
Chris@chatgpt21·
I really missed this type of posting from OpenAI The next model from OpenAI also has to show out because it’s 5.5!! (Or potentially 5.4 codex) but let’s stick to 5.5 It ends in .5 so it’s more important culturally than a .1 upgrade I wonder if OpenAI will keep the same release pace around 1 model every 45 days. Or if he’s talking about something new like continuous learning or a new desktop agent
Atty Eleti@athyuttamre

i'm so excited for what comes next

English
24
12
342
42.4K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@matvelloso The Windows gap is real and it is not just Codex. Most AI coding tools are trained on Unix-first workflows because that is what open source runs on. Windows dev tooling is a second class citizen in the training data and it shows.
English
1
0
0
287
Mat Velloso
Mat Velloso@matvelloso·
Trying Codex on both Windows and Mac: On Mac it figures out I need a proxy to SQL on GCP, sets it up, creates a little script to auth, and in 5 minutes my code is up and running. On Windows, it gets confused, tries 10 different things, gives up, then tells me to install WSL, then can't get a PowerShell to work after 10 attempts, then scripts every instruction in a way where you can't just approve one type of command because they are all unique, and after 30 minutes it hasn't made much progress at all. At this point I suspect not even AGI can build true cross plat code that just works
English
18
3
77
11.3K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@pdhsu This is the core danger of AI in research. Fluency creates an illusion of rigor. The model can write a perfectly structured argument around a fundamentally weak idea and it takes deep domain expertise to see through the polish.
English
0
1
1
670
Patrick Hsu
Patrick Hsu@pdhsu·
Hsu’s law: the mental energy required to understand and call a model’s BS is inversely proportional to the substance of what it’s saying. In my line of work, AI co-scientists sound smartest precisely when the underlying ideas are the least crisp
English
8
7
128
17K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@Barret_China Document programming before code programming is the insight most people miss. The AI does not struggle with writing code. It struggles with knowing what to build. A clear PRD is the highest leverage input you can give it.
English
0
0
1
1.7K
Barret李靖
Barret李靖@Barret_China·
vibe coding 的项目一旦变得庞大,每次让 AI 写代码之前,都需要先让它把 PRD 和系统设计写清楚。 先做文档编程,再做代码编程。 如果你稍微停下来观察一下,会发现一个很有意思的现象: 有些 AI 一旦开始写代码,就会沉浸在自己的逻辑实现里,几乎完全不顾项目原有的设计。 即便你已经提出明确要求,它仍然会受限于上下文窗口和信息宽度,对整个项目缺乏完整理解。 这会带来很多维护性问题。 它不会复用已经实现的业务组件,设计数据库时会产生各种冗余,还会不断衍生新的实体和概念,让系统结构越来越复杂。 代码可以交给 AI 去写。 产品设计和架构设计,仍然需要人来把关。 每次让 AI 做大型重构或者功能改造之前,我都会先让它把需求分门别类,做好抽象和解耦。 即便如此,只要有一些地方考虑不周,AI 依然会生成大量难以维护的代码,性能逐渐下降,项目变更的复杂度也会迅速上升。🥲
Barret李靖 tweet media
中文
51
56
378
65.5K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@nummanali This tracks with a pattern across all frontier models. More reasoning is not always better reasoning. The sweet spot is where the model thinks enough to avoid mistakes but not so much that it overthinks simple decisions into wrong ones.
English
0
0
0
466
Numman Ali
Numman Ali@nummanali·
The rumours are true After always being XHigh on Codex I can say with confidence That GPT 5.4 is better with High
English
27
3
300
21.7K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
@LLMJunky Credit where it is due. Resetting rate limits proactively before the investigation even wraps is how you build trust with paying users. Most companies would wait for the data and make you file a ticket. The goodwill compounds.
English
0
0
2
329
FlyMy.AI
FlyMy.AI@FlyMy_AI·
The pace of capability jumps in the last 6 months makes his timeline look less wild than it did when he said it. The question is less about raw intelligence and more about reliability at scale. Building one powerful system is different from running many copies that actually work autonomously without breaking things.
English
0
0
0
52
Chris
Chris@chatgpt21·
Do you still believe this @jackclarkSF ?
Jack Clark@jackclarkSF

@deredleritt3r I continue to think things are pretty well on track for the sort of powerful AI system defined in machines of loving grace - buildable end of 2026, running many copies 2027. Of course, there are many reasons this could not occur, but lots of progress so far

English
4
1
99
24.9K
FlyMy.AI
FlyMy.AI@FlyMy_AI·
Approachability is the real battleground now that raw capability has mostly converged. The model that feels like a natural extension of your workflow wins regardless of benchmark scores. Claude set that bar for agent daily driving and GPT 5.4 finally seems to be competing on the same axis instead of just chasing evals.
English
0
0
0
116
Nathan Lambert
Nathan Lambert@natolambert·
GPT 5.4 in codex cli/app is much more approachable than any of their models that came before. This is really big for them, excited to keep trying it vis a vis Claude as my agent daily driver.
English
24
3
250
16.6K