Andrew

248 posts

Andrew

@sooftmax

mom testing @phtlabs, mle @ aws sagemaker, research @ mnlp (VL research for egocentric understanding) opinions are my own

the trenches (hayes valley) Katılım Ağustos 2020

716 Takip Edilen112 Takipçiler

Andrew@sooftmax·2d

@NandoDF small nuance but seems interestinng, looking forward to reproducing

English

127

Nando de Freitas@NandoDF·3d

One line of code is all it takes to prevent LLM agent delusions, instead of post-training patches like RL. love4all.ai/blog/why-it-is… ❤️ 4 ∀ github.com/nandodef/love4…

English

276

46.5K

Andrew@sooftmax·3d

@zacharyvalles How good was your chinese?

English

541

Zac Valles@zacharyvalles·4d

72 hours after YC demo day, I moved to Shenzhen for 8 weeks 🤠 I'm headed back to SF with new hardware in hand (sharing more soon), but some takeaways documented below: > If you have even the slightest ambition to found a hardware company, visit SZ. Pre-raise, pre-team, pre-idea, pre-job departure, it doesn't matter. Just go. > Plan your visit according to a major conference that interests you. Use that conference as a supplier meeting springboard - that's your ticket to any factory under the sun. > At the factories, ask about lead times, don't ask about cost (wait on this). Your iteration rate is driven by the lead time on the longest lead time item in your assembly. It pays to identify these parts early to build project timelines. > Visit Huaqiangbei (read: this is a mini-city, not a building). Robotic subassemblies, batteries, chassis's, electronic parts. They all have buildings where vendors are tightly clustered. Plan to spend 4-6 hours walking around before you find exactly what you're interested in. > Business relationships are valuable commodities. Treat them as such. Pay attention to people, learn about them. Bring thoughtful gifts. Wait for them to sit first. With Baiju, fill the glass but with tea leave some room. Cultural customs are fun to learn, but also convey a seriousness towards the working relationship. > Suppliers fit cleanly into discrete buckets. Level of complexity and execution on past projects indicates what is in scope for them. Trivial, but important to level your build expectations. It is easy to design a part with 12 subsequent manufacturing processes, exceptionally hard to find a supplier to fill this order. If you need coffeeshop recs, food recs, or hotel recs I have a few. Move to Shenzhen! Get to building!

English

101

1.4K

309.4K

Andrew@sooftmax·4d

@agapekeleta @MartinShkreli no way the same colah blogs ive been reading is made by the same guy???

English

116

Agape Keleta@agapekeleta·6d

@MartinShkreli cofounder of anthropic 😅

English

1.7K

144.7K

Martin Shkreli@MartinShkreli·6d

when i was younger, i was very insecure. i did this kind of thing too. over time you learn that it is critical to be truthful and not misleading about every single thing you do. i had to learn this lesson the hard way, but boy, have i internalized it. UNDERSELL yourself.

English

232

315

12.1K

1.8M

Andrew@sooftmax·9 May

for the people hitting deadlinks looking for @0xandrewj i renamed to @sooftmax cause it thought it would be funnier and easier to say

English

Andrew@sooftmax·8 May

@gabriberton no way I was just reproducing LLaVa and Tuna-2

English

Gabriele Berton@gabriberton·8 May

MLLMs are made of a vision encoder that pre-processes images into features, which are then fed to an LLM (after a projection) This is the LLaVA [1] (2023) approach and is still used today (see Qwen3VL [2], InternVL3.5 [3], GLM-5V [4])

English

3.3K

Gabriele Berton@gabriberton·8 May

Cool paper from Meta suggesting that future MLLMs will be Native Multimodal Models (NMM), hence no vision encoders anymore But I disagree I actually think we'll go in the other direction (what? more encoders? yes! read on...) All you need to know about the future of MLLMs 🧵

Weiming Ren@wmren993

1/ 🚀 We’re excited to share Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation! Tuna-2 is a native unified multimodal model that supports visual understanding, text-to-image generation, and image editing directly from pixel embeddings. 🐟✨ 📄 Paper: arxiv.org/abs/2604.24763 🌐 Project: tuna-ai.org/tuna-2 💻 Code: github.com/facebookresear… Most unified multimodal models still rely on pretrained vision encoders, which add architectural complexity and can create representation mismatches between understanding and generation. Tuna-2 asks a simple question: Do we still need vision encoders? 👀 Our answer is No! Tuna-2 has a completely encoder-free architecture, where images are processed directly by a unified transformer together with text tokens. Take a glimpse at what our model can generate ↓ 🎨🖼️

English

191

66.1K

Andrew@sooftmax·7 May

@xeophon @interconnectsai @natolambert do you know who they follow on xiaohongshu?

English

551

Florian Brand@xeophon·6 May

Link: florianbrand.com/posts/china-tr… Expect more in-depth coverage on @interconnectsai by @natolambert soon :)

English

114

6.7K

Florian Brand@xeophon·6 May

The vibes in China's AI labs My blog about my recent trip to China is up, link in replies.

English

421

91.7K

Andrew@sooftmax·6 May

@nooriefyi “taste is the moat” “solo leveling”

English

252

Noorie@nooriefyi·5 May

hiring a new grad engineer to help us ship growth and product experiments. you must have a 19 hour screen time, obsess over animes like solo leveling, have a $200+ cursor bill. you'll be #6 and work out of our beautiful office in SF :)

English

8.5K

Andrew@sooftmax·5 May

@GoodfireAI @AISecurityInst is it worth it to insert an unrealistic ask into the system prompt to prime future models then

English

121

Goodfire@GoodfireAI·4 May

New research from @AISecurityInst and Goodfire: Models sometimes recognize they're being evaluated, occasionally even identifying the benchmark. We show this verbalized eval awareness inflates safety scores, meaning safety benchmarks may not reflect real-world behavior. (1/7)

English

259

32.9K

Andrew@sooftmax·28 Nis

@EzgiKorkmazAI @iclr_conf how did you find it different from uni or lab sponsored research?

English

Andrew@sooftmax·26 Nis

@sirbayes How do you verify code generated doesnt secretly hide erroneous code that generates plausible outputs?

English

Kevin Patrick Murphy@sirbayes·22 Nis

Finally, a personal note on how this paper was made. I started this as a hobby project exactly two months ago, building the entire system with Claude Code. The process reminded me of when I was a professor working with my grad students — brainstorming ideas, staring at plots, proposing experiments — but never looking at their code. Claude Code is the ideal collaborator for this style of research. And this is what enabled my first single-author paper ever!

English

3.5K

Kevin Patrick Murphy@sirbayes·22 Nis

New paper: "Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs". Our system (BLF) matches human superforecasters on ForecastBench, and beats all the top methods (GPT-5, Cassi, Grok 4.20, and Foresight-32B). 🧵

English

215

28.9K

Andrew@sooftmax·26 Nis

@deedydas how can profit per employee be higher than revenue per employee is this chart vibe coded

English

1.4K

Deedy@deedydas·25 Nis

Jane Street made ~$40B in 2025 with 3,500 employees, a ~2x from the year before. At ~65-70% profit margin, that's $8M profit / employee, the highest for a 1000+ ppl company. High-frequency trading continues to be the most efficient money making engine. I want to share an old story about my Jane Street interview in 2014. Jane Street was known for hiring a lot of math, physics and CS olympiad winners from top universities and putting them through many rounds - including, for trading roles, a gauntlet of mental math. It was my 6th interview and my final round and I recall being asked "What is the next day after today in DD/MM/YYYY where all the digits are unique?" They'd toy with you and say "You can use a pencil and paper, if you want" but you knew that was an instant no. Painstakingly and as quickly as I could, I came to an answer. "How confident are you that this is correct on a 0-1 probability scale?" the interviewer said. "0.95", I blurted out, not fully knowing how to answer that. "Are you sure?" After thinking harder for a few more seconds, I realized I could've flipped the digits around to get a closer date. I gave the interviewer my answer. It was correct. "0.95 huh?" he chuckled. That's when I knew I failed. Note: fwiw, other companies that come close in efficiency are - Tether ($90M+ profit/emp) - Hyperliquid ($80M+ profit/emp) and on revenue: - Valve ($50M/emp) - OnlyFans ($37M/emp) - Craigslist ($14M/emp) - Anthropic ($12M/emp, run rate) - OpenAI ($8M/emp, run rate) For comparison, Nvidia is very efficient at scale and is $4.4M/emp.

English

241

416

4.8K

13.2M

Andrew@sooftmax·16 Nis

excited to announce my app which interrupts doom scrolling with math challenges is up

English

105

Andrew@sooftmax·15 Nis

so we can imagine model knowledge domain likely exists as a partial subset of human knowledge domain but what happens to knowledge OOD for human language but is understood by humans (style, taste) and further OOD for all humans in general

English

Andrew@sooftmax·15 Nis

egocentric cooking is so nice because all its facets are in-domain for the models and annotators, compare that to car repair where half the labels are repair because you don’t get so fine grained talking about “adjusting torque on drill”

English

Andrew@sooftmax·14 Nis

@DhravyaShah how did u guys get first customer? I imagine ai memory was hard to pitch to enterprise initially

English

Dhravya Shah@DhravyaShah·14 Nis

supermemory has 4 full time employees. and 2 part time. we're growing super fast. (more than) doubling our revenue every month we maintain 15+ open source projects, 2 different products with ~6 fig monthly rev it's incredible how AI has changed the world.

English

326

14.8K

Andrew@sooftmax·14 Nis

goodfire.ai/research/inter…

ZXX

Andrew@sooftmax·14 Nis

A bit ago someone asked me why model interpretability matters, good read from goodfire

English

Andrew@sooftmax·10 Nis

the worst part of a great design is not being able to use it again 🥲

English

Andrew@sooftmax·9 Nis

approved in 4 hours, appears apple approvals come faster as you establish a track record

English

Andrew@sooftmax·8 Nis

@_inception_ai @StefanoErmon @iendeavors @singhhardik_

QAM

Inception@_inception_ai·7 Nis

An Evening with Inception x @iendeavors - Tues, April 14, Palo Alto. @StefanoErmon and the Inception team. Drinks, bites, and conversation about diffusion LLMs. No talks. No slides. We'd love to meet researchers, engineers, and students who are curious about dLLMs and where they're headed. Space is limited - invite here: luma.com/5p7cz4tq

English

8.8K

Keşfet

@NandoDF @zacharyvalles @agapekeleta @MartinShkreli @gabriberton @xeophon @interconnectsai @natolambert