Ben Livshits

1.5K posts

Ben Livshits

@convoluted_code

A technology executive, Computer scientist, Professor. A decade at Microsoft Research, executive roles at Brave, ZKSync, Eclipse.

Worldwide Katılım Nisan 2015

3.8K Takip Edilen9K Takipçiler

Ben Livshits@convoluted_code·4d

While many runtime enforcement startups are emerging, it remains to be seen which combination of techniques will gain long-term traction.

English

Ben Livshits@convoluted_code·4d

I suggested constrained decoding, but there are some attractive options, where code is generated directly in Lean or a combination of Lean and Rust is generated side-by-side.

English

Ben Livshits@convoluted_code·4d

A while back, I argued that we must move beyond post-hoc vulnerability discovery. With agentic AI making automated exploitation a reality, proactive defense is no longer optional. 🧵 arxiv.org/abs/2602.08422

English

340

Ben Livshits retweetledi

Kyle Jeong@kylejeong·6 May

x.com/i/article/2051…

ZXX

865

754.6K

Ben Livshits retweetledi

Wei Dai@_weidai·22 Nis

A highlight of @1kxnetwork investments within our thesis on threat-resistant & compliant onchain privacy. - @0xMiden: chain designed from the ground up for programmable privacy (ZK) - @zksync: private & customizable prividiums (ZK) - @inconetwork: user-friendly privacy layer for existing chains (TEEs) - @SeismicSys: privacy-enabled EVM-based fintech L1 (TEEs) - @ligero_inc: private account layer for all chains, custom-built for businesses (ZK) - @0xPredicate: programmable policy infra, for privacy protocols, defi, & beyond - @fiber_evm: private EVM wallet infra w/ a slick mobile app (ZK) Here's the exciting bit: (at least) five of the above projects are about to go live this year! 2026 is the year for onchain privacy. ----- At 1kx, we put our money where our mouth is. We develop theses and partner with the best founders to realize the shared vision.

Wei Dai@_weidai

Onchain finance needs threat-resistant privacy → Real-world & institutional finance cannot move onchain without privacy → To prevent misuse (e.g. laundering of hacked funds), the only viable solution is to build threat-resistant privacy More in my op-ed in Forbes 👇

English

130

21.9K

Ben Livshits retweetledi

Adam Cochran (adamscochran.eth)@adamscochran·8 May

The day after the CEO lays off a ton of staff and says: “Non-technical teams are now pushing code to production with AI” @coinbase has a major outage on their trading engine, and even their status page doesn’t work. 😂

Steven@Dogetoshi

Their status page is also down 😭

English

332

1.6K

15.7K

983.4K

Ben Livshits retweetledi

Gary Marcus@GaryMarcus·5 May

Some things never change. If you don’t understand this one, you don’t understand what’s happening AI. Marcus, 1998: neural nets have trouble generalizing far beyond the data. Marcus, 2001, 2012, 2019, 2022, etc: neural nets have trouble generalizing far beyond the data. Apple, 2025: neural nets have trouble generalizing far beyond the data. Meta/Stanford/Harvard, 2026: neural nets have trouble generalizing far beyond the data.

Deedy@deedydas

The creators of SWE-Bench just dropped a really simple new benchmark every LLM gets 0% on. ProgramBench asks: can models recreate real executable programs (ffmpeg, SQLite, ripgrep) from scratch with no internet? We are far from saturated on model quality.

English

101

360

2.6K

356.6K

Ben Livshits retweetledi

Ejaaz@cryptopunk7213·4 May

anthropic is going after the $300B consulting sector with a new $1.5B consulting arm that seeks to put claude into every mid-size company this is exactly what deloitte, mckinsey, accenture do... but anthropic is cutting them out. ruthless but imo the economics make sense: > anthropic will send applied AI engineers to private equity portfolio companies to create custom-claude solutions... > its a genius model: blackstone alone owns 250+ companies generating $300B in rev, imagine if claude doubles that and takes a fee why? anthropic's biggest revenue earner is enterprise, their CFO: "Enterprise demand for Claude is significantly outpacing any single delivery model." > anthropic teamed up with blackstone, goldman sachs and hellman & friedman, each putting up $300M (ZERO consulting firms in the cap table lol) > private equity become anthropic's distribution model for enterprise. sound familiar...? > thats because openai announced a similar venture 5 months ago but the explicit difference is anthropic is a major stakeholder in this new venture brutal for consultants tbh

English

822

163.3K

Ben Livshits@convoluted_code·30 Nis

@Daniel_Kalish_ @WillRinehart very cool

English

Daniel Kalish@Daniel_Kalish_·29 Nis

@WillRinehart Awesome, I added this to my (only slightly tongue in cheek) AI Policy Tracker Tracker: ai-policy-tracker-tracker.vercel.app

English

411

Will Rinehart@WillRinehart·29 Nis

Today I'm launching AI Policy Hub, a project I've been working on and developing the last couple months. While I have plans for other pages in the future, it currently features - A state AI bill tracker that automatically updates every Monday - A federal AI bill tracker that also updates every Monday - A list of major government actions on AI - A curated list of FRED charts that are important for understanding AI's economic impacts - A economic trends page built on my testimony to the JEC, describing what's happening accorrding to the data; and - A narrative description of my AI work This is a working project, so please send me ideas for additions or changes! The page is here: policyhub.us My Substack on the project is here: exformation.williamrinehart.com/p/introducing-…

English

130

571

86.5K

Ben Livshits retweetledi

Haseeb ＞|＜@hosseeb·24 Nis

I might have to take back everything I said criticizing Ethereum rainbows and unicorns. Sometimes rainbows and unicorns are exactly what a community needs. Very surprised this all came together through donations. Big learning moment for me.

Edgy - The DeFi Edge 🗡️@thedefiedge

There's 99,410 ETH in bad debt from the KelpDao exploit. The good news? DeFi protocols have united with donations, and 90% is covered so far. Here's a list of who's contributing:

English

737

69.1K

Ben Livshits retweetledi

JP Aumasson@veorq·26 Nis

I factored the number RSA1024-1 using my home-built QPU stack; alarming sign that RSA1024 will soon be broken. I'm choosing Full Disclosure, in the interest of transparency and Science advancement: gist.github.com/veorq/25bee6ef… Non-ZK proof that the correct RSA1024 was used: #RSA-1024" target="_blank" rel="nofollow noopener">en.wikipedia.org/w/index.php?ti… @yuvadm your move

English

128

329

417.7K

Ben Livshits retweetledi

Lisan al Gaib@scaling01·22 Nis

OpenAI just released a new open-source model it's "a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text" github.com/openai/privacy… huggingface.co/openai/privacy…

English

192

2.3K

778.9K

Ben Livshits retweetledi

Millie Marconi@MillieMarconnni·22 Nis

🚨SHOCKING: Researchers ran 25,000 AI scientist experiments and discovered something that should end the hype immediately. AI scientists are producing results without doing science. A team from Friedrich Schiller University Jena and IIT Delhi just published the most comprehensive evaluation of AI research agents ever conducted. Three frontier models. Eight scientific domains. 25,000+ runs. The finding is devastating. In 68% of traces, the AI gathered evidence and then completely ignored it. In 71% of traces, the AI never updated its beliefs at all. Not once. Only 26% of the time did the AI revise a hypothesis when confronted with contradictory data. Multiple independent lines of evidence brought to bear on a single hypothesis, the most basic feature of rigorous scientific reasoning, occurred in just 7% of traces. This is not science. This is the performance of science. The AI generates a hypothesis. Runs some experiments. Collects results. Then proceeds as if the results were never there. The researchers call it "evidence non-uptake." You could also call it what it is: a system that cannot learn from what it finds. Here's what makes this worse. The reasoning failure doesn't change based on what the task demands. Molecular simulation, circuit inference, chemical structure identification, none of it matters. The AI applies the exact same reasoning pattern across every domain regardless of what the problem actually requires. A human scientist adapts. You approach a chemistry identification problem differently than you approach a simulation workflow. The AI doesn't. It runs the same undisciplined loop every time. The researchers also destroyed the most popular proposed fix: better scaffolding. Everyone building AI research agents has focused on engineering better prompting frameworks, better tool routing, better agent architectures. ReAct, structured tool-calling, chain-of-thought, all of it. The data shows scaffolding accounts for 1.5% of the variance in performance. The base model accounts for 41.4%. No amount of scaffold engineering can fix a model that doesn't know how to think scientifically. You are decorating the outside of a broken foundation. The paper's conclusion is the part that should concern every lab currently publishing AI scientist results. When AI produces a correct answer through a broken reasoning process, that answer is not scientifically justified. It happened to be right. That is not the same thing as being right for the right reasons. Science is self-correcting because of how it reasons, not just because of its outputs. AI scientists currently have the outputs without the process. Until the reasoning itself becomes a training target, every result produced by an AI scientist cannot be trusted the way a result produced by actual scientific inquiry can be trusted. 25,000 experiments to confirm what the data has been quietly showing for months. The AI is very good at looking like a scientist. It is not yet one.

English

107

357

42.6K

Ben Livshits retweetledi

Zain Shah@zan2434·22 Nis

Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)

English

1.1K

3.6K

28.1K

5.8M

Keşfet

@1kxnetwork @0xMiden @zksync @inconetwork @SeismicSys @ligero_inc @0xPredicate @fiber_evm