Joe Hsu

9.6K posts

Joe Hsu banner
Joe Hsu

Joe Hsu

@jhsu

UI 🤝 AI 🦋@getwhys_ (current) let's chat: https://t.co/UTol0SLz5V prev: Waybridge, AppNexus, EngineYard

new york, ny Katılım Aralık 2006
2.6K Takip Edilen801 Takipçiler
Joe Hsu
Joe Hsu@jhsu·
Created a zo-proxy for using Zo agents inside of opencode. Built using a simple proxy to the Zo api and using opencode's openai-compatible provider settintgs. Zo inside of @opencode! (or any other ai tools) not super fast, but it works! @zocomputer #BuildWithZo
Joe Hsu tweet media
Zo Computer@zocomputer

We wanna see how you've been using Zo 👀 We're giving away MERCH and $500 in credits for 3 winners! All you need to do to submit: 1. Quote this post with a pic/vid of something you've been building with Zo. Can be anything, from a fun site to a cool automation. 2. Tag us in the post @zocomputer and #BuildWithZo 3. Winners will be tagged so make sure to follow us! Happy Zo-ing :D

English
2
1
4
704
Yoav
Yoav@YoavCodes·
Electrobun lets you ship 16MB app bundles. But what if I told you you will soon be able to ship a native system tray app, written entirely in Typescript that is less than 5KB unpacked, 2KB zipped. - 3,943 bytes of bun Typescript - 625 bytes metadata electrobunny.ai
Yoav tweet mediaYoav tweet media
English
11
11
298
23.6K
Joe Hsu
Joe Hsu@jhsu·
depending on how long the agent task is (implementation), the review phase can grow pretty large. this is probably be because of lack of trust or unclear what was used to validate. I agree that supercharging the "plan" phase would help shorten both review and implement, though it might just be shifting the time/effort from implement and review (maybe not a bad thing). some of the visuals also makes me think maybe there's a more collaborative/async way to work
English
1
0
1
30
Amelia Wattenberger 🪷
Amelia Wattenberger 🪷@Wattenberger·
here's my rough logic around why devs need a new tool focused on planning what do you think? going to write it up as a blog post soon, would love any generative reactions 👀
Amelia Wattenberger 🪷 tweet media
English
13
3
102
5.6K
spencer chang
spencer chang@spencerc99·
made a device to write messages & poems using Wi-Fi networks
English
3
3
50
2.5K
Joe Hsu
Joe Hsu@jhsu·
still kind of WIP, but 2.0 of `ai-rlm`, RLM for @aisdk, uses quickjs for repl, or plugin your own sandbox provider. pretty customizable with plenty of hooks. i've been trying to build some sort inspect/monitor app like what @neural_avb built x.com/neural_avb/sta…
Joe Hsu tweet media
AVB@neural_avb

Just open sourced my RLM repo on github! 💙 A minimalist sandbox with a python REPL, executes LLM generated code, maintains context, supports early stopping. Also an OpenTUI app to view logs in the terminal. Star it, fork it, go crazy with it. github.com/avbiswas/fast-…

English
2
0
2
80
Joe Hsu
Joe Hsu@jhsu·
Gossiper - slack bot that sends other users an only visible to them commentary on other users
Joe Hsu tweet media
English
0
0
0
49
Joe Hsu
Joe Hsu@jhsu·
mini OS that lets you chat to build apps within the OS that other uses can use. here's a calendar to keep track of tasks. each app has per-user state and also a shared state you can toggle between.
Joe Hsu tweet media
English
0
0
0
51
Joe Hsu
Joe Hsu@jhsu·
been building some random apps, here's one where I crawl some inspo sites and generate a sort of newsletter collection of some designs along with a summary and short descriptions.
Joe Hsu tweet media
English
0
0
1
28
Joe Hsu
Joe Hsu@jhsu·
@Replit it's also really fast. branching on tasks, merging back to main, working on the canvas
Joe Hsu tweet media
English
0
0
1
15
Joe Hsu
Joe Hsu@jhsu·
@neural_avb Really cool, I need to setup an eval like this for a rag application.
English
0
0
0
126
AVB
AVB@neural_avb·
I am SOOOO glad I ran this experiment! I have so many actionable insight it is crazy. Highly recommend yall to set up similar evals for your projects/SaaS. Context: I have been evaluating different models on the current Paper Breakdown retrieval subagents. Goal is to find cheaper models that get the job done quicker. Dataset: huggingface.co/datasets/paper… I have been comparing smaller model outputs against Sonnet-4.6 (results shown below) and gpt-5-mini (current subagent model running in prod). Some insights: - gemini-3-flash thinks a lot, it returns too many chunks, and explores the paper way too much. - gemini-3-flash-lite is actually better than 3-flash at this, it even caches additional queries for fast "future retrieval". Very cool! - grok-fast-non-reasoning outperforms grok-fast-reasoning. And is the CLOSEST to sonnet-4.6 <- this was my biggest surprise. - gpt-5-mini is very fast, it thinks less, fetches quickly. I have empirically felt it's pretty good and reliable - gpt-5-nano pretty bad at this - minimax-m2.5 has high precision (it returns more info than needed) but the problem is the vercel ai gateway provider has been slow :( - for some reason glm-5 and glm-4.7 has a high failure rate on my task, I am yet to understand why. Next steps: - My goal now is to pick some of the best models here, and run either a larger expt with more test cases, or use a LLM-as-a-judge. - In the near future, I may go into harness optimizations (i.e. better prompts, better tool descriptions) I am seeing a ton of free users using the website lately, if I am able to switch to grok-fast-non-reasoning and minimax-m2.5 it will save me actual money.
AVB tweet mediaAVB tweet mediaAVB tweet mediaAVB tweet media
AVB@neural_avb

I really like the Prime RL school of thinking - "environments & evals are two sides of the same coin" So today I'll convert Paper Breakdown into an RL env. I'll run evals with smaller models to check if I can cut my inference bill without sacrificing rewards.

English
3
4
91
7.7K
Joe Hsu
Joe Hsu@jhsu·
little OS to build apps inside
Joe Hsu tweet media
English
0
0
0
23
Joe Hsu
Joe Hsu@jhsu·
oo t3code launching
Joe Hsu tweet media
English
1
0
0
35