Sabitlenmiş Tweet
elana
195 posts

elana retweetledi

1/4
New paper with @weijie444!
We introduce a symmetry-compatible principle for LLM optimizer design and, as a byproduct, get an end-to-end layerwise optimizer stack where every major matrix-valued parameter (embeddings, LM heads, SwiGLU MLPs, MoE routers) has its own principled update!
📝 arxiv.org/abs/2605.18106
💻 github.com/timlautk/equiv…
English
elana retweetledi
elana retweetledi

NLAs are claimed to verbalize model activations. But can they faithfully interpret steered activations?
In our latest paper, we show that steering moves activations into non-invertible regions; and almost surely, no prompt maps to steered activations!
NLAs fail to interpret steered activation states faithfully, supporting our results! ↓
@anqi_liu33 @DanielKhashabi
x.com/AnthropicAI/st…

English

@indefeasible_ also personal bias & not really “intro to proofs” at all but these lecture notes ~follow Apostol’s Calculus Volume II (more rigorous than typical Calc 3 & LinAlg), both that and the notes here could be useful depending on where you are mathematically math.columbia.edu/~mtwang/teachi…
English

@indefeasible_ longformmath.com/proofs-book/
The Book of Proof and Velleman’s How to Prove It are both also commonly recommended, but based on my friends’ experience this one seems pretty fun & unique :)
English
elana retweetledi

I'm in the process of DMing every mutual that I know plays Minecraft rn, but if I don't DM you and you want to join a Supersymmetry (GregTech) server hmu!! curseforge.com/minecraft/modp…
English

@usr_bin_roygbiv @t3nsor “shape rotator” “wordcel” is a dumb dichotomy, the latest weschler tests do
verbal comprehension
fluid reasoning
quantitive reasoning
visual spatial
working memory
processing speed
which is way more effective, lots of fri+qri+vcimaxxed ppl slaying in CS&math
English

Hot take: you don't need much shape rotation ability at all to do most coding tasks, and writing "good code" is a wordcel ability.
Roy@usr_bin_roygbiv
how does one politely tell their friend he's probably ngmi as a SWE and should just do sales instead because he lacks the shape rotation capacity and is extremely extraverted
English

This is devastating for skids everywhere, I got so much social mileage out of being able to set up a decent server
silentwosperer👻🎃@silentwisperer_
MINECRAFT JAVA EDITION IS FINALLY GETTING EASY MULTIPLAYER SUPPORT!! And a friends list!! Easy access to play with your friends is easily the biggest thing that bedrock edition does better than java!
English

@TW1NKD3STR0YER To get the opposite of this experience, go to Jersey for superior bagels
English
elana retweetledi

I got a real transformer language model running locally on a stock Game Boy Color (thanks Codex)!
No phone, PC, Wi-Fi, link cable, or cloud inference.
• The cartridge boots a ROM, and the GBC runs the model itself.
• The model is @karpathy’s TinyStories-260K, converted to INT8 weights with fixed-point math so it can run without floating point.
• Built with GBDK-2020 as an MBC5 Game Boy ROM.
• The model weights live in bank-switched cartridge ROM. Prompt entry happens on-device with the D-pad/buttons and an on-screen keyboard.
• The prompt is tokenized on the Game Boy, then the ROM runs transformer prefill + autoregressive generation. The KV cache is stored in cartridge SRAM, because the GBC’s work RAM is tiny.
It is extremely slow, and the output is gibberish because the math is heavily quantized/approximated, but the core thing works!
Hardware: stock Game Boy Color + EZ Flash Junior + microSD. No soldering, no internal mods.

English

@multimodali I was live watching on call and said “I feel like he’s gonna win” before it even started, was like “Man I suck at predictions” after round 1, then was vindicated
English

@1owroller twewy (not a 3ds game necessarily but the combat is fun imo)
English












