hung

680 posts

hung banner
hung

hung

@hungtran

eval @valsai

เข้าร่วม Kasım 2013
1K กำลังติดตาม246 ผู้ติดตาม
ทวีตที่ปักหมุด
hung
hung@hungtran·
becoming uncommon amongst uncommons
English
1
0
3
5.4K
hung
hung@hungtran·
the ui for agents is heading into a bottleneck for innovation. every current tool is missing certain components to reach its full potential. vs code only has seamless ssh mode and git integration. the only reason i open an ide nowadays is to see diff views, git changes, and the built-in terminal in one place. but tmux/ terminal is impractical at scale. if i have 12 agents running in a grid of 12 cells, every time i want to dive deep into a session i need to zoom in and scroll, click references to jump to source code, watch streaming logs of background processes, cross-check different sub-agents. what if i found out some interesting bug that worth investigate more and need to spawn another panel right next to the current one? it's constant friction. claude code and codex cli suffer from same problems while claude code does exceptionally well on making the experience better rapidly. codex app with the hacky ssh mode patch is the closest to the ui i can imagine, but it's far from perfect. i'm skeptical of uis that overload your cognition with noisy information. when you give too much customization without constraints, it becomes more of a hurdle than a feature. on the other end, something like openclaw is a rare candidate that has the most comprehensive context ingestion behind the scenes. Yet its full potential is bottlenecked by basic messaging interfaces like imessage and telegram, which were designed for a completely different purpose. what's missing across all of them: there's no concept of high-level agent management -- managing a team of agents the way a ceo manages an org. and the "just get a bigger monitor" answer only solves the problem indirectly. most of our daily workflow isn't sitting at a desk. we walk, bike, run, switching between environments. we mostly rely on phones and small laptops with our voice as lowest latency form of transmitting information. maybe brain computer interface is the great unhobbling? six months to a year ago, the ui bottleneck was tolerable. but it's getting to the point where every day, every feature added is another push against the ceiling of the current interface paradigm. i'm optimistic that this pressure will drive us to experiment with something fundamentally different, and we're starting to see that emergence, with the right components scattered across different apps, waiting to be pulled together.
English
0
0
0
73
hung รีทวีตแล้ว
NASA Administrator Jared Isaacman
To return Americans to the Moon, NASA is shifting to an iterative, execution-focused approach – just as we did during Apollo.  We are standardizing rocket architecture, embedding NASA expertise across industry, and increasing launch cadence to support sustained lunar operations.   We are sending a demand signal for crewed missions beyond Artemis V, with at least two providers capable of bringing astronauts to the surface every 6 months.  The goal is not just to reach the Moon, but to stay.  America will never give up the Moon again.
English
645
1.6K
13K
3.8M
hung รีทวีตแล้ว
Kpaxs
Kpaxs@Kpaxs·
This is a man who has been haunted since childhood and built a billion dollar company as a side effect of trying to make the haunting stop.
Kpaxs tweet media
English
110
452
4.3K
980K
hung
hung@hungtran·
this encapsulates well my review. The book truly ignites my interest in physics and understanding of the universe!
Andrej Karpathy@karpathy

Had to go see Project Hail Mary right away (it's based on the book of Andy Weir, of also The Martian fame). Both very pleased and relieved to say that 1) the movie sticks very close to the book in both content and tone and 2) is really well executed. The book is one of my favorites when it comes to alien portrayals because a lot of thought was clearly given to the scientific details of an alternate biochemistry, evolutionary history, sensorium, psychology, language, tech tree, etc. It's different enough that it is highly creative and plausible, but also similar enough that you get a compelling story and one of the best bromances in fiction. Not to mention the other (single-cellular) aliens. I can count fictional portrayals of aliens of this depth on one hand. A lot of these aspects are briefly featured - if you read the book you'll spot them but if you haven't, the movie can't spend the time to do them justice. I'll say that the movie inches a little too much into the superhero movie tropes with the pacing, the quips, the Bathos and such for my taste, and we get a little bit less the grand of Interstellar and a little bit less of the science of The Martian, but I think it's ok considering the tone of the original content. And it does really well where it counts - on Rocky and the bromance. Thank you to the film crew for the gem!

English
0
0
0
51
hung
hung@hungtran·
@braelyn_ai fitness sf fillmore and presidio ymca are great
English
0
0
0
141
Braelyn ⛓️
Braelyn ⛓️@braelyn_ai·
this is the first time I’ve ever wondered where I can find a pool in San Francisco
English
30
4
103
11.8K
Vals AI
Vals AI@ValsAI·
Initial results are in for Minimax 2.7, and it comes in at #12 overall on the Vals Index. If the weights are released, it will be #2 on the open-weight index (only 0.5% behind #1).
Vals AI tweet media
English
12
25
337
32.2K
hung รีทวีตแล้ว
kache
kache@yacineMTB·
something i learned from my wife, who recently learned how to sew: do not do beginner projects. if what you want to make is difficult to make, you should just try to make it. don't do a slow learning process. don't start with the basics. start with the advanced
English
182
566
10.3K
210K
Vals AI
Vals AI@ValsAI·
GPT 5.4 is #1 on Vibe Code Bench at 67.4%, +5.7% higher than the previous SOTA. This is our benchmark that measures model’s ability to produce an entire working application from a short text specification.
Vals AI tweet media
English
30
42
554
66.8K
hung
hung@hungtran·
@SQMah great work SQ!
English
1
0
1
48
SQ Mah
SQ Mah@SQMah·
Just demoed some of 5.4’s computer use and frontend capabilities - check it out here! What I really like is that computer use was on an Electron app, so Codex can also make and test desktop apps as well Also yes I need a haircut :)
OpenAI Developers@OpenAIDevs

GPT-5.4 is here. Native computer-use capabilities. Up to 1M tokens of context in Codex and the API. Best-in-class agentic coding for complex tasks. Scalable tool search across larger ecosystems. More efficient reasoning for long, tool-heavy workflows. openai.com/index/introduc…

English
6
1
32
2.6K
hung
hung@hungtran·
@chribjel GPT-5.4 best but too slow
English
0
0
0
84
Christoffer Bjelke
Christoffer Bjelke@chribjel·
So whats best for coding now? gpt-5.3-codex or gpt-5.4?
English
234
14
1.1K
318.9K
hung
hung@hungtran·
Some details I think people might miss with this release. - What stands out about GPT-5.4 is that the model spends time checking its own work much more than other models - navigating the browser, clicking around, poking at edge cases, running backend scripts to test the DB. The ratio of editing/ verifying is flipped, and I think it’s a paradigm shift. - Coding benchmarks are not saturating! There's still a massive difference between "model can build a static single-page website" and "model can build a fullstack app with deployment, connect to payments and email services, and actually test it before handing it back to you." GPT-5.4 is much closer to the latter camp but still not quite there yet. There are still many more aspects of coding to evaluate that we’re cooking!
Vals AI@ValsAI

GPT 5.4 is #1 on Vibe Code Bench at 67.4%, +5.7% higher than the previous SOTA. This is our benchmark that measures model’s ability to produce an entire working application from a short text specification.

English
0
0
1
186
hung
hung@hungtran·
time and time again i keep underestimating the power of going out of your way to surround yoursel with the right people.
English
0
0
0
55
Lisan al Gaib
Lisan al Gaib@scaling01·
Qwen3.5 27B looks really fucking good on ArtificialAnalysis Leaderboard it's on par with DeepSeek-V3.2 and Minimax-M2.5
Lisan al Gaib tweet media
English
18
26
462
42.6K
Claude
Claude@claudeai·
This is Claude Sonnet 4.6: our most capable Sonnet model yet. It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It also features a 1M token context window in beta.
English
1.1K
2.5K
22.3K
7.6M
hung
hung@hungtran·
@askvals what's best model to vibe code today?
English
1
0
0
37
hung
hung@hungtran·
We have been built and used AskVals internally and found it really helpful. Go try it and let us know your thoughts. More features to come!
Vals AI@ValsAI

Introducing Ask Vals — @AskVals Keeping up with the flood of model releases, benchmarks, and rankings is overwhelming. We built a bot internally to cut through the noise, and now it's live on X. Tag it to ask questions about models, benchmarks, performance, comparisons on specific dimensions, and more (all based on Vals data)!

English
0
0
1
134
hung รีทวีตแล้ว
tae kim
tae kim@firstadopter·
“I love struggling, actually. It makes me feel alive” – Alysa Liu
English
199
4.1K
33.5K
18.4M
hung
hung@hungtran·
"claude, simplify the claude make no mistake"
hung tweet media
English
1
1
5
333