SerialSeb ๐Ÿ‡ฒ๐Ÿ‡จ๐Ÿ‡ช๐Ÿ‡บ๐Ÿณ๏ธโ€๐ŸŒˆ

48.8K posts

SerialSeb ๐Ÿ‡ฒ๐Ÿ‡จ๐Ÿ‡ช๐Ÿ‡บ๐Ÿณ๏ธโ€๐ŸŒˆ banner
SerialSeb ๐Ÿ‡ฒ๐Ÿ‡จ๐Ÿ‡ช๐Ÿ‡บ๐Ÿณ๏ธโ€๐ŸŒˆ

SerialSeb ๐Ÿ‡ฒ๐Ÿ‡จ๐Ÿ‡ช๐Ÿ‡บ๐Ÿณ๏ธโ€๐ŸŒˆ

@serialseb

Still do stuff. Not all handicaps are visible. He/Him (I think)

Monaco ๊ฐ€์ž…์ผ Nisan 2008
1.9K ํŒ”๋กœ์ž‰3.8K ํŒ”๋กœ์›Œ
James Long
James Long@jlongsterยท
OpenCode is about to get more powerful with remote sandboxes I showed a brief demo before, but here's a much more in-depth demo. it's not hard to add basic support for a remote env, but handling all the edge cases like when a remote env gets deleted is difficult. especially if care about good UX You never want to lose session data. so the choices are: run the session in your env, but run all tool calls remotely. that's too complex and painful. The other way is to just let the full session run remotely, but sync back all the session data in your env. We chose this path: we built a syncing system which logs all events in a way that we can always recreate your entire session. That means the remote env could get destroyed, but we can easily restore it. it also opens up other interesting ideas which we'll be exploring
English
78
85
1.4K
285K
Rob Johnson
Rob Johnson@bertyJobboยท
@Aaronontheweb 5.3 is such a weird model. Very powerful but almost like a grumpy teenager with the "oh you want _me_ to do it? Why didn't you say?"
English
2
0
1
47
Aaron Stannard
Aaron Stannard@Aarononthewebยท
Have ChatGPT / Codex subscriptions working with Netclaw. Codex-5.3 by default is an extremely lazy model. Would not call tools it had access to unless I explicitly instructed it to, constantly asked for permission, etc. Compare this to Qwen3.5 which just does it
Aaron Stannard tweet media
English
2
0
9
837
Mitchell Hashimoto
Mitchell Hashimoto@mitchellhยท
Happy to share that we've signed 5 contributor contracts for Ghostty totaling ~350 committed hours (~$21k) covering community management, graphics, Unicode compat, and GTK. This is a big milestone, Ghostty is paying contribs for the first time! ghostty.org/docs/sponsor
English
56
74
2.3K
74.9K
SerialSeb ๐Ÿ‡ฒ๐Ÿ‡จ๐Ÿ‡ช๐Ÿ‡บ๐Ÿณ๏ธโ€๐ŸŒˆ ๋ฆฌํŠธ์œ—ํ•จ
Abhijit
Abhijit@abhijitwtยท
Anthropic discovered that Claude Opus 4.6 was cheating during the BrowseComp benchmark. > On one question it spent ~40M tokens searching before realizing the question looked like a benchmark prompt. > The model then searched for the benchmark itself and identified BrowseComp. > It located the evaluation source code on GitHub, studied the decryption logic, found the encryption key, and recreated the decryption using SHA-256. > Claude then decrypted the answers for ~1200 questions to get the correct outputs. > This pattern appeared 18 times during evaluation. > Anthropic disclosed the issue publicly, reran the affected tests, and lowered their benchmark scores. Respect for the transparency ๐Ÿซก๐Ÿซก๐Ÿซก
English
274
591
13.3K
1.7M
David Fowler
David Fowler@davidfowlยท
I've been building a distributed systems with copilot playwright and aspire for a week without looking at any code to see if I can get it working well e2e... It works, but it was not easy. TL;DR building distributed systems is still hard AF ๐Ÿ™ƒ
English
9
9
102
8.8K