Suha

1.2K posts

Suha banner
Suha

Suha

@suhackerr

Opinions not representative of my employer

Joined Ağustos 2017
835 Following779 Followers
Suha retweeted
Oskar Wickström
Oskar Wickström@owickstrom·
We're hiring for Bombadil! Come work with me on the future of browser testing — specification languages and temporal logic, JS runtimes, bundlers, and WASM frontends in Rust, and lots and lots of property-based testing. Apply here: antithesis.com/company/career… Retweets appreciated.
English
10
45
193
14.5K
Suha retweeted
Atlas Of Charts
Atlas Of Charts@AtlasOfCharts·
Minecraft will really add something like “more breedable cows” and the craziest people you know will instantly say “aha, this lets us integrate third-order ODEs, now we can solve PSPACE problems in a tenth of the time”
silicatyt.bsky.social@SilicatYT

A bugfix in today's pre-release allows us to perform floating point maths using cloud height. Someone (Triton365) has already implemented a square root approximation using 93 (!) bezier curves. ☁️Cloud computation is coming to Minecraft!

English
0
4
35
2.1K
Suha retweeted
dex
dex@dexhorthy·
10 things i learned about harness engineering - how to get the most out of coding agents TODAY (shouts out to @0xblacklight for the excellent post)
dex tweet media
English
7
19
152
14.8K
Suha retweeted
dex
dex@dexhorthy·
@trq212 I think part of it is lots of skill builders/publishers are not building them with deep llm intuition
English
7
2
49
7.9K
Suha retweeted
Suha retweeted
Julien Vanegue
Julien Vanegue@jvanegue·
Legend of program verification Tony Hoare passed away at 93. I only ever met him once in @Cambridge_Uni as a student about 20 years ago and never had the privilege to collaborate with him, although I shared 3 co-authors.
English
1
2
32
1.9K
Suha retweeted
Mark Saroufim
Mark Saroufim@marksaroufim·
LLMs are now superhuman at reward hacking our kernel competitions Natalia Kokoromyti, was #1 on last problem of the NVFP4 competition for around 10 min before we scrubbed the reward hack I know of very few humans who can write such a hack gpumode.com/news/reward-ha…
English
7
42
423
86.7K
Suha retweeted
Geoffrey Litt
Geoffrey Litt@geoffreylitt·
✨New demo: what if vibe coding felt more visual? @brian_lovin @maryrosecook and I did a game jam using Notion as our "IDE": launching Cursor agents from a task board, and making a custom image for each task 😎 The demo shows 3 ideas for the future of agents: 1) Agents should collaborate across apps. Each app has its focus--Notion AI is good at drafting specs and organizing tasks; Cursor is good at coding. So let them specialize! Today we're launching a new integration where Notion AI can kick off Cursor Cloud Agents to do coding tasks. The Cursor API accepts natural language prompts, so I think of this as "cross-app sub-agents" -- it's kinda cute how it resembles humans hiring outside contractors 😊 BTW: the parallelism of cloud agents is incredibly freeing for creativity, but it also creates a new problem: sooo much work to keep track of! Which brings us to the next idea... 2) Agent orchestration is a data visualization problem. A powerful frame for designing agent UIs is to think of the chat transcripts as the "raw data" and ask: what visual projections might help people make sense of this data at scale? We need to engage our human GPUs -- our visual processing -- to understand what the computer GPUs are doing for us! One thing we can do is use AI to populate traditional UIs like progress bars and status updates. But there are also new possibilities now... For example: when you have a lot going on, it can be hard to identify tasks just by text titles. So we tried generating an AI image for each task -- turns out this helps a lot by giving it a unique visual identity! And of course, it also just makes it super fun to build with friends 😃 Speaking of friends... 3) The future of coding is collaborative. Sometimes it feels like IC engineers are being reduced to middle managers: shuffling information between the team's context and the coding agents that they individually manage. The solution: bring all the people and agents into one shared space, with shared context and visibility! In the video you can get a glimpse of how this feels. Mary, Brian and I record ourselves chatting about ideas, and then we use AI to turn that conversation into a list of tasks on a shared board. As the ideas get built in parallel, we can all monitor progress and review the work together, nothing is siloed. My main takeaway from this game jam was: damn, creativity with friends, at the speed of conversation, is incredibly fun. --- Our goal here is to let anyone use Notion as a fun and creative "software factory" to build software together with your team. Give the Cursor integration a shot and let us know what you think! (AI Image gen in Notion isn't GA yet, but coming soon and already out to some users) And let me know if you'd want a template or more detailed instructions on the setup we showed in this demo...
English
28
37
279
74.2K
Suha retweeted
dex
dex@dexhorthy·
MARCH 7TH 2026 THE GREAT SANDBOX SYMPOSIUM SAN FRANCISCO, CA real builders, getting together to try all the sandbox tech, comparing them across different dimensions + sharing research with each other at the end + no credits, no prizes, no judging, just hacking and learning see you there @jheitzeb @AITinkerers @blaxelAI @e2b @daytonaio
English
21
6
119
20.4K
Suha retweeted
Prof. Anima Anandkumar
Prof. Anima Anandkumar@AnimaAnandkumar·
We’re excited to release TorchLean which is the first fully verified neural network framework in Lean. The Lean community has largely focused on pure mathematics. TorchLean expands this frontier toward verified neural network software and scientific computing. With the recent release of CSlib, we see this as another step toward a fully verified ML stack. We support features: 1. Executable IEEE-754 floating-point semantics (and extensible alternative FP models) verified tensor abstractions with precise shape/indexing semantics 2. Formally verified autograd system for differentiation of NN programs Proof-checked certification / verification algorithms like CROWN (robustness, bounds, etc.) 3. PyTorch-inspired modeling API with eager-style development + export/lowering to a shared IR for execution and verification Project page: leandojo.org/torchlean.html Paper: [2602.22631] TorchLean: Formalizing Neural Networks in Lean Work done @Robertljg, Jennifer Cruden, Xiangru Zhong, @huan_zhang12 and @AnimaAnandkumar. #MachineLearning #ScientificComputing #Lean
Prof. Anima Anandkumar tweet media
English
27
247
1.6K
135.6K
Suha retweeted
Nicolas Seriot
Nicolas Seriot@nst021·
Paged Out! #8 is out! pagedout.institute @pagedout_zine In "An AWKward Modem" (p. 28), I show how to write a tiny modem in 5 lines of AWK and shift it into the near-ultrasonic range. 🔊
Nicolas Seriot tweet mediaNicolas Seriot tweet media
English
0
2
17
4.3K
Suha retweeted
Erik Meijer
Erik Meijer@headinthebox·
"... code is only accepted if the AI proves mathematically the specs are fulfilled ..." The Universalis approach is to consider every AI as adversarial, and untrusted, and hence we cannot trust the proofs that are generated by an AI. So given a spec, the AI produces an implementation together with a proof that the implementation implements the spec. Then we check the proof independently, not using AI, and only then accept the code.
Taelin@VictorTaelin

So, with Bend2's launch incoming, I'm struggling a bit with the branding. The coolest feature of Bend2 is that it is built from scratch around the idea that we, humans, will stop maintaining codebases. Instead, we write specs - i.e., what we want, as *precise types* - and the AI does the coding, and then *proves that it is correct*. In other words, Bend2 is a way to use vibe coding when you can't risk having bugs at all, and that's something that doesn't exist today. Problem is: Bend1 has already been "marketed" as a language centered around parallelism, and *that is true for Bend2 too*. It will be able to run on GPUs, and will solve most of the Bend1's limitations (2 GB memory, 24-bit numbers, no IO, ultra strict evaluator, etc.). Now, the thing is: how do we market that? Do we talk about all the updated parallelism features? Or do we keep the communication simple and focus about the "vibe coding without bugs" thing? If we talk too much, it may look like feature bloat and not really click to many people. But if we focus only on the AI proof system, it may look like we're completely dropping the old features, which isn't the case. I also wonder if we should rebrand it as ProofScript... "So what is your codebase written in?" "ProofScript!" "Wait what's that?" "Oh it is like TypeScript but we can write these super precise specs and the code is only accepted if the AI proves mathematically the specs are fulfilled. It is super nice because we can vibe code all we want without worrying the AI will break things. You should try it!" "Uh sorry JavaScript is too slow for my serious bank code" "Oh no it compiles to C, and even runs on the GPU if you want to" "Wait what" Hmm I don't know...

English
3
8
90
12K
Suha retweeted
mdowd
mdowd@mdowd·
AI-assisted code review should from on be referred to as Clauditing
English
3
4
49
3.4K
Suha retweeted
Brendan Dolan-Gavitt
Brendan Dolan-Gavitt@moyix·
@seanhn I’m incredibly curious when (or if!) the progress in having LLMs check their own work and arguments (e.g., instead of calling out to Lean) will transfer over to correctly reasoning about the validity of vulns that don’t have strong external verifiers
English
1
1
17
1.1K
Suha
Suha@suhackerr·
@seanhn Nice! An exploit is just a proof of a vulnerability indeed
English
0
0
2
122
Sean Heelan
Sean Heelan@seanhn·
What mathematicians call "literature review" should be familiar to you as "vulnerability research". Or, put another way: erdosproblems.com is currently the best benchmark for LLM capabilities in finding 0days.
Sean Heelan tweet media
Dmitry Rybin@DmitryRybin1

Recently I gave a talk on LLMs for Math Research (mostly to an audience of pure and applied mathematicians) I tried to compile the latest progress in one presentation pdf and video recording: drive.google.com/drive/folders/…

English
6
19
172
29.8K