Cameron Thacker

337 posts

Cameron Thacker

@CameronMThacker

Los Angeles, CA Katılım Ağustos 2013

176 Takip Edilen35.2K Takipçiler

Cameron Thacker@CameronMThacker·2d

@badlogicgames I'm not sure Rick is completely correct here, but I like the crossover 😆

English

Cameron Thacker@CameronMThacker·2d

@badlogicgames damn you really do have good taste ...

English

Mario Zechner@badlogicgames·2d

rick beato is jow part of the local inference open weights underground. this timeline is killing me. youtu.be/YTLnnoZPALI?is…

YouTube

English

7.2K

Cameron Thacker@CameronMThacker·2d

@lilyjclifford amazing attitude - and for an outsider this is hilarious for you to post!

English

3.2K

lily clifford@lilyjclifford·2d

damn alright

English

191

178

11.4K

1.2M

Cameron Thacker@CameronMThacker·4d

@arb8020 Interesting because I find the residual stream so unsatisfying the way each layer just gets added in. To me, it just seems like it is missing something important and it’s surprising that it works as well as it does.

English

106

arb8020@arb8020·5d

aesthetically i hate every architecture that’s been fucking with the residual stream. get your filthy hands off my beautiful information highway

English

2.9K

Cameron Thacker@CameronMThacker·4d

If you don’t make time, you won’t find it.

English

Cameron Thacker@CameronMThacker·6 May

@badlogicgames The only real abstraction layer that matters is the one I stop at obviously. Meanwhile we can all just read the equations and get at the essence without implementing anything lol

English

Mario Zechner@badlogicgames·6 May

> or copied and pasted into PyTorch rather than writing bare Python; banger

English

5.3K

Mario Zechner@badlogicgames·6 May

jfc this is hilarious, wow.

Eliezer Yudkowsky@allTheYud

Everyone bragging that THEY understand how AI works and THEY know it can't be conscious, explain right now from memory why it was very clever that the positional encoding in the original transformers paper used both sines and cosines.

English

102

26K

Cameron Thacker@CameronMThacker·2 May

@badlogicgames unfortunately at this rate you will get it in zig first 😆

English

Mario Zechner@badlogicgames·2 May

@CameronMThacker please don't nerd snipe me. i'd kill for a non python version of numpy and pytorch.

English

140

Mario Zechner@badlogicgames·2 May

guess it's time to build my own model with spit and duct tape as well now. what a time to be alive ... ridonculous.

English

309

18.7K

Cameron Thacker@CameronMThacker·2 May

@badlogicgames I guess you have to create pi-torch now ...

English

Mario Zechner@badlogicgames·2 May

@CameronMThacker yes, it's the worst.

English

Cameron Thacker@CameronMThacker·2 May

@badlogicgames Anything to avoid Python?

English

Mario Zechner@badlogicgames·2 May

ok, how do i instruction fine tune gpt-2 in typescript. how hard can it be. x.com/badlogicgames/…

Mario Zechner@badlogicgames

felt cute, did some @karpathy style cozy coding. now i can run GPT 2 124M in pure TypeScript at 7 tps. played with implementing the GEMV via C/WASM, but that only got me a 1.7x speed up.

English

9.1K

Cameron Thacker@CameronMThacker·2 May

@michellechen @badlogicgames I would never have expected cloudflare to be pushing boundaries in these areas like they are, but it's really awesome. You guys are moving so quickly, I can't even keep up!

English

michelle@michellechen·1 May

@badlogicgames cloudflare 🤝 pi

Dansk

2.9K

Mario Zechner@badlogicgames·1 May

ok, i already posted this but holy shit it's built on pi?! #L3" target="_blank" rel="nofollow noopener">github.com/withastro/flue… this makes me super happy!

fks@FredKSchott

Introducing Flue — The First Agent Harness Framework Flue is a TypeScript framework for building the next generation of agents, designed around a built-in agent harness. Flue is like Claude Code, but 100% headless and programmable. There's no baked in assumption like requiring a human operator to function. No TUI. No GUI. Just TypeScript. But using Flue feels like using Claude Code. The agents you build act autonomously to solve problems and complete tasks. They require very little code to run. Most of the "logic" lives in Markdown: skills and context and AGENTS.md. Flue is like Astro or Next.js for agents (not surprising, given my background 🙃). It's not another AI SDK. It's a proper runtime-agnostic framework. Write once, build, and deploy your agents anywhere (Node.js, Cloudflare, GitHub Actions, GitLab CI/CD, etc). We originally built Flue to power AI workflows inside of the Astro GitHub repo. But then @_bgiori got his hands on it, and we realized that every agent needs a framework like Flue, not just us. Check it out! It's early, but I'm curious to hear what people think. Are agents ready for their library -> framework moment?

English

833

78K

Cameron Thacker@CameronMThacker·1 May

@zeeg I like it! What ended up as the most useful piece for you? I’m going to give this a try when it’s ready

English

475

David Cramer@zeeg·30 Nis

If you use pi-ai and have opinions on how a test harness should look, I'm going to solve this problem once and for all. github.com/getsentry/vite…

English

305

23.7K

Cameron Thacker@CameronMThacker·19 Nis

Cool idea, but your results don't really show what you imply in your post. sec 6: "post-hoc mode runs all layers on every step... does not achieve wall-clock layer skipping" so you aren't actually skipping layers. Thus the only speedup you are getting is due to your fused kernels right? Looking forward to see if you can nail the true skip mode.

English

673

Jaber@Akashi203·19 Nis

been thinking about how wasteful LLM inference is at the token level every token goes through every layer. "the" gets 32 matmuls. a hard reasoning step also gets 32 matmuls. same compute for wildly different information content. always a bit silly, but now it's actually expensive, reasoning models emit thousands of thinking tokens per query and most are "ok", "so", "wait", "let me" the fix is sitting right there in the representations. for most tokens the hidden state at ~layer 11 is already nearly identical to the final layer. the rest barely moves the output. you just need a cheap per-token signal to notice so we built TIDE. tiny MLP routers (~4MB) that sit on a frozen model and predict "has this token converged yet". post training, no retraining, bolt it onto any HF causal LM. calibration is 2000 wikitext samples, under 3 min on one GPU deepseek r1 distill 8B on A100: 100% prefill exit rate, 7.2% lower latency, 99% of decode tokens exit early on a multi step math problem with the answer unchanged. 8B is the floor. the methodology compounds with depth and output length, 70B+ has ~80 layers of redundancy and inference time scaling models emit 10 to 100x more tokens per query. opus class + long chain of thought is where the lever gets real paper: arxiv.org/abs/2603.21365 code: github.com/RightNow-AI/TI… (this kind of kernel level stuff is what we bake into @runinfrai by default, check it out runinfra.ai)

English

480

28.3K

Cameron Thacker@CameronMThacker·8 Nis

@tonis_a_gayaraj @sedielem @CSProfKGD Interesting! I’m gonna look into to this. I love when an “old” technique turns out to be useful again

English

Toni Sagayaraj@tonis_a_gayaraj·7 Nis

@sedielem @CSProfKGD I can’t believe CDCD tried so hard to diffuse on embeddings and actually the solution was just to throw one-hots at modern diffusion architectures and let them figure it out

English

280

Sander Dieleman@sedielem·7 Nis

Continuous language diffusion strikes back! Flow maps are really starting to come their own as a viable method for language modelling with very fast inference. FMLMs produce good results even with a just a _single_ forward pass!

Nicholas Boffi@nmboffi

🤯 big update to our flow map language models paper! we believe this is the future of non-autoregressive text generation. read about it in the blog: one-step-lm.github.io/blog/ full details in the paper: arxiv.org/abs/2602.16813 we introduce a new class of continuous flow-based language models and distill them into their corresponding flow map for one-step text generation. we beat all discrete diffusion baselines at ~8x speed! v2 gives a complete theory of the flow map over discrete data, with three equivalent ways to learn it (semigroup, lagrangian, eulerian). it turns out you can train these with cross-entropy objectives that look very similar to standard discrete diffusion — but without the factorization error that kills discrete methods at few steps. beyond improving results across the board, we showcase properties that are unique to continuous flows. in particular, inference-time steering and guidance become straightforward. autoguidance brings generative perplexity down to 51.6 on LM1B, while discrete baselines completely collapse at the same guidance scale. we also show reward-guided generation for steering topic, sentiment, grammaticality, and safety at inference time — and it works even at 1-2 steps with our flow map model. simple, well-understood techniques from continuous flows just work incredibly well in practice for language. we’re extremely excited about the future of this class of models. stay tuned for results on scaling, reasoning, and reinforcement learning-based fine-tuning. 🚀

English

197

26.1K

Cameron Thacker@CameronMThacker·3 Nis

@MindsAI_Jack If I comment here, will the algo bring me more linear algebra? 🙏😆

English

372

Jack Cole@MindsAI_Jack·2 Nis

What happened to all the AI/ML papers being announced on x? They seem to have disappeared for me. Are others noticing the same?

English

424

45.1K

Cameron Thacker@CameronMThacker·2 Nis

@PaulNeverovsky It's honestly really nice. Maybe a tad complicated for the average user, but I want this mode lol.

English

226

Paul Never@PaulNeverovsky·1 Nis

@CameronMThacker this is my goal 👀

English

6.5K

Paul Never@PaulNeverovsky·1 Nis

Anthropic just leaked a new Claude app design, and it’s crazy good

English

119

176

5.2K

550.6K

Cameron Thacker@CameronMThacker·30 Mar

I'm surprised this is getting a lot of traction. This has been a thing for a long time already. You don't need a heavy plugin. You can just tell your agent to use `codex exec` or build a simple skill from that like I do. Just make sure to send std error to dev null (2>/dev/null) so thinking tokens don't pollute your context.

English

199

Romain Huet@romainhuet·30 Mar

We’ve seen Claude Code users bring in Codex for code review and use GPT-5.4 for more complex tasks, so we thought: why not make that easier? Today we’re open sourcing a plugin for it! You can call Codex from Claude Code with your ChatGPT subscription. We love an open ecosystem!

dominik kundel@dkundel

I built a new plugin! You can now trigger Codex from Claude Code! Use the Codex plugin for Claude Code to delegate tasks to Codex or have Codex review your changes using your ChatGPT subscription. Start by installing the plugin: github.com/openai/codex-p…

English

288

351

5.4K

924.8K

Cameron Thacker@CameronMThacker·27 Mar

@varunneal Thanks for sharing - that is not intuitive. For me, this competition is just fun to experiment with and I'm actually only interested in novel solutions and architectures, not treating it like some kaggle competition 😂

English

varun@varunneal·27 Mar

the AI swarms are really good at hill-climbing reward hacks. Most of the recent submissions use unnormalized N-gram distributions allow logits to be arbitrarily high

varun@varunneal

apparently the agent harnesses keep 'rediscovering' TTT (cheating) on their own

English

1.9K

Cameron Thacker@CameronMThacker·26 Mar

@atulit_gaur Lot of botted content. Post this type of content because you learn when you teach - or you just want to. Don't do it for external validation.

English

atulit@atulit_gaur·25 Mar

why does no one give 2 fucks about educational content? im not even sad this didn't get any reach, im just truly wondering

atulit@atulit_gaur

the router in mixture of experts models is a linear layer. it takes a token's hidden state, multiplies it by a weight matrix of shape (num_experts, hidden_dim), softmaxes the result, and picks the top-k experts. that's it. but why does a matrix multiply "know" which expert to pick? each row of the router matrix is basically a learned prototype for that expert. the dot product measures how similar the token is to that prototype. high score = that expert gets activated. the cool part is nobody hardcodes what each expert specializes in. during training, gradient descent naturally pushes experts toward specialization because it minimizes loss better that way. one problem though - without a load balancing auxiliary loss, the router collapses and keeps sending tokens to the same 2-3 experts while the rest rot. that's why every moe paper has some balancing trick.

English

7.8K

Cameron Thacker@CameronMThacker·25 Mar

@EastlondonDev I think this is a super interesting direction that would also be cool to incorporate with recursive language models...the repl is the model?? lol

English

Andrew Jefferson@EastlondonDev·25 Mar

It’s fucking working This LLM brain has been fused with a mini computer and it can switch between generating text and generating and executing machine code - all running in a single GPU & torch graph

Andrew Jefferson@EastlondonDev

It turns out that teaching an existing language model new tokens takes a bit of work. To use wasm directly in the neural network I need the language model to output specific wasm tokens and byte tokens (one token for every byte value 0-255) that match the hard coded wasm interpreter subgraph. There are two problems. 1) the language model has never seen wasm tokens before and 2) when wasm tokens are used they flow into the wasm interpreter which will compute them and will hard fail if given invalid instructions. So the llm has to learn to use tokens it has never seen before in perfectly correct sequences. Thats enough of a challenge that my AI agent couldn’t get SFT on pretrained nanochat language model to work with about a week of trying different approaches. We either got mode collapse where the only wasm token predicted was the most common one (CONST_I32) or it learned to use the wasm operations but completely lobotomised the language model in the process and it could not produce correct byte values for inputs.

English

1.2K

107.5K

Keşfet

@badlogicgames @lilyjclifford @arb8020 @michellechen @elonmusk @BarackObama @taylorswift13 @cristiano