brandon wang

287 posts

brandon wang

brandon wang

@fluorane

... | prev undergrad @miteecs and @mitbiology, @cartesia @janestreetgroup

california Katılım Nisan 2021
293 Takip Edilen960 Takipçiler
Sabitlenmiş Tweet
brandon wang
brandon wang@fluorane·
happy to announce that we've gotten rid of tokenizers! especially excited with what we've replaced them with: end-to-end trainable modules that not only learn to group characters into (sub)words, but can iterate to group words into phrases and further higher-order concepts see @sukjun_hwang's thread for more details 👇
Sukjun (June) Hwang@sukjun_hwang

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

English
13
52
755
72.1K
brandon wang
brandon wang@fluorane·
thanks for the response and the explanation. makes a lot of sense, though it's quite sad that model makers have to eat the maintenance burden too. really hope this does not encourage more conservative design decisions, i feel like the architecture innovation in oss models has been pretty exciting
English
1
0
1
52
Lucas Atkins
Lucas Atkins@latkins·
In some cases, that’s the right move. DeepSeek will usually do something like that: release a reference repo and leave it to the community to figure out. But it’s risky. And while Z.ai is obviously a respected name now, the pressure on open-weight models to keep up has probably never been higher. When you end up with 10 different modeling files and the goal is numerical consistency, meaning logits and predicted tokens stay within a reasonable margin of error across implementations, a community implementation that isn’t quite right can become a real problem. If it’s off, and as a result underperforms, you can lose a lot of users and generate bad sentiment simply because the model wasn’t being implemented or served properly. I’m making a lot of assumptions here based on my initial thesis, so this could be pretty irrelevant to their actual reasons for keeping it closed. But I’m glad you brought it up, because this point gets cited a lot. There are plenty of cases where community implementations have subtle bugs that quietly nerf performance and don’t get identified or patched until months later. This isn’t to say community involvement and development aren’t important. I genuinely think they’re the biggest accelerator and flywheel for open weights. My only point is that the main distribution channels, HF, vLLM, and the others mentioned above, should ideally be handled by the model developer directly. Especially in this RL paradigm, where the vLLM and HF implementations need to be as close to 1:1 as possible.
English
1
0
5
496
Lucas Atkins
Lucas Atkins@latkins·
If it touches anything remotely new in arch or infra, it makes perfect sense not to roll it out across every downstream provider and library until there’s something more substantive than just speed. The engineering overhead to support HF, llama.cpp, MLX, vLLM, SGLang, etc is definitely non-trivial.
Z.ai@Zai_org

Note: As an experimental version, GLM-5-Turbo is currently closed-source. All capabilities and findings will be incorporated into our next open-source model release.

English
4
3
164
17.7K
Carina Hong
Carina Hong@CarinaLHong·
Excited to announce Axiom’s Series A. We raised $200 million fresh capital at a $1.6 billion+ valuation in a round led by Menlo Ventures to accelerate our strong execution momentum — extending our lead in formal math into Verified AI. Mathematicians and theoretical scientists dream up theories, formulate hypotheses. They then come up with proofs, a two-step process of discovery. We created Axiom to turn the sparks of curiosity into known truths - and to compress the timeline of breakthroughs. The Verified AI dream is a generalization of this dream. It is more than providing safeguards for mission-critical systems. This same gap between expert intuitions and the machinery needed for grounding exists today in any domain where the generation-verification iteration loop can be tighter. And yes, software eats the world, recursive self-improvement is a near sight. Verified AI is not about hallucinations, what’s lousy; instead, it’s about superintelligence, the brilliant. We work on Verified AI not due to a distrust in technology, but rather, we think the rapid advances of AI compels it. I’m grateful to work with and learn from the best team in the world. It’s not an easy journey, but climbing with you is what makes it worth it. And can’t wait to build with a more accelerated speed - nod to @shubho for grounding an ambitious vision in relentless execution everyday. This round was led by @mkraning with @CCgong. Thanks also to existing investors who doubled down for your conviction since the start (@jturow, @mattmcilwain of @MadronaVentures; @marcievu of @greycroftvc; @yanda, @IdaGirma, @nickgiometti of @BCapitalGroup; @ChrisAbshire_ of @Toyota_Ventures; @xtzhou, @jhuber of @TriatomicCap) and the new firms who we got to meet through the process.
Axiom@axiommathai

Axiom launched six months ago with one conviction: mathematics is the right foundation for building systems that reason. Today we announce Axiom's Series A. We raised $200M at a $1.6B+ valuation, led by @MenloVentures, to extend our lead in formal mathematics into Verified AI.

English
45
34
372
74.3K
brandon wang
brandon wang@fluorane·
this incident aside, seems important to understand the role of ai-in-the-loop on strategic decision making seems plausible that a ruthless reward-maxer would be more willing to choose objectives that lead to escalating conflict
Phil Stewart@phildstewart

(Reuters) - Iran's parliament speaker said on Saturday that the attack on a freshwater desalination plant on Qeshm island was carried out with support from one of the airbases in a southern neighboring country. He did not name the country. "The crime will receive a proportionate response," he said.

English
1
1
2
786
Andrew Carr 🤸
Andrew Carr 🤸@andrew_n_carr·
Ranking every neuron in Qwen 3.5 0.8B
Andrew Carr 🤸 tweet media
English
25
92
2.9K
91.6K
brandon wang
brandon wang@fluorane·
@_thomasip i would be surprised if ant directly rls on vending, though agreed that it mostly reflects something about which capabilities are prioritized during posttraining
English
1
0
2
110
Thomas Ip
Thomas Ip@_thomasip·
@fluorane Most likely some labs directly RL their models on the vending bench environment so increase their score. Doesn't really mean anything.
English
1
0
8
170
brandon wang retweetledi
Gideon Futerman
Gideon Futerman@GFuterman·
It is my view that no one, on the left or right, is seriously grappling with the extent to which anything can be left of a republic post-powerful AI. Even the very best visions seem to suggest a small oligarchy rather than a republic.
Dean W. Ball@deanwball

I don’t want to comment on the DoW-Anthropic issue because I don’t know enough specifics, but stepping back a bit: If near-medium future AI systems can be used by the executive branch to arbitrary ends with zero restrictions, the U.S. will functionally cease to be a republic.

English
8
7
97
55.4K
brandon wang
brandon wang@fluorane·
@Teknium the entire ds section is so funny, claude as grader is distillation now? also 150k requests is so tiny it makes it feel like ds concluded that claude wasnt even worth distilling from...
English
0
0
17
1K
brandon wang
brandon wang@fluorane·
agi is when llm labs figure out how to release models without borking their serving code
brandon wang tweet media
English
0
0
11
494
Max Spero
Max Spero@max_spero_·
We fetched 871 articles published in the Guardian by Bryan Armen Graham over the last six years. It's clear that he is increasingly relying on AI. In two weeks in February he churned out nine articles classified by Pangram as fully AI-generated. Receipts below:
Max Spero tweet media
Max Tani@maxwelltani

A spokesperson for the Guardian says this is false: "Bryan is an exemplary journalist, and this is the same style he’s used for 11 years writing for the Guardian, long before LLM’s existed. The allegation is preposterous."

English
71
266
2.8K
438.1K
brandon wang
brandon wang@fluorane·
kimi allegretto lowkey the highest value ai subscription
English
0
0
2
556
brandon wang retweetledi
elie
elie@eliebakouch·
i think we don't realize the impact that deepseek had on the open ecosystem, there is so much from them that you can find in almost every frontier open llm today > most of the open frontier models follow the "finegrain + sparse + shared expert" deepseek moe recipe > a lot of them use MLA > first (with minicpm) to use sparse attention in prod (DSA) > first to do reasoning in the open with R1 > GRPO which is the foundation for most of the newer RL algorithms > they also innovated on the training recipe at scale, first to do fp8? MTP? load balancing schemes that now other lab is using > advance training/inference infra with oss release like DeepEP that pretraining lib like megatron use i'm so grateful deepseek exists
English
11
35
295
25.4K
brandon wang
brandon wang@fluorane·
@FangYi11101 @teortaxesTex this is actually true though, i've heard stories of recent (post 2020) tcs papers whose main idea is similar to chinese/polish oi problems (the same is not true for pure math vis a vis math olympiads)
English
0
0
1
29
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
There is something in common between solving IMO-like problems and exploring math frontier. Skills and abilities required, mainly. Thus, about half of Fields medalists were IMO contestants. Not necessarily Gold medalists though. But it's the strongest known under-20 predictor.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet mediaTeortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet mediaTeortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget

Because there is nothing in common between solving problems that are known to be solvable (IMO), and exploring math frontier.

English
13
4
124
10.7K