AT

90 posts

AT banner
AT

AT

@waterloo_intern

making models go fast @baseten studying eng @uwaterloo https://t.co/lCL6q1MBPY

San Fran เข้าร่วม Ekim 2024
105 กำลังติดตาม1.5K ผู้ติดตาม
ทวีตที่ปักหมุด
AT
AT@waterloo_intern·
- 230 training runs - 1,623 GPU hours (67 B200 days) - 76 TB of training data - a 2x faster model Every paper said it can't be done. Quantization Aware Distillation made it possible.
AT@waterloo_intern

x.com/i/article/2029…

English
19
107
1.2K
146.7K
AT
AT@waterloo_intern·
@amiruci ``` dominate them so thoroughly that the comparison looks embarassing ``` should be our new logo
English
0
0
0
22
Amir Haghighat
Amir Haghighat@amiruci·
We now have a product specifically created for AI labs and their closed-weight models: we'll take care of not just inference, but auth, rate limits, metering, and billing integrations. We'll take care of providing both shared and dedicated inference, compliance needs, and matching end customers' geo requirements (us, ca, eu, uk, aus, jp, etc). It's called Baseten Frontier Gateway and is already battle-tested by multiple AI labs, like Poolside and their impressive Laguna M.1 agentic coding model.
Amir Haghighat tweet media
English
8
6
42
4.9K
AT
AT@waterloo_intern·
@modal this is a sick read...hats off to you guys
English
1
0
3
221
Philip Kiely
Philip Kiely@philipkiely·
Developing empathy for LLMs by doing benchmark problems by hand.
Philip Kiely tweet media
English
5
0
55
2.8K
AT
AT@waterloo_intern·
@edenchan solid to the power of solid squared
English
0
0
1
313
Kanjun 🐙
Kanjun 🐙@kanjun·
Twitter’s algorithm is optimized for addiction, not for us. We deserve better. We’re releasing Bouncer today so you can take back control of your feed. Describe what you don't want, and Bouncer removes it. It’s free, doesn’t collect your data, and will be open source soon.
English
213
295
3.2K
585.8K
AT
AT@waterloo_intern·
we dug into 1-bit bonsai with @part_harry_ the grand canyon of a gap they showed... is just THREE (3) points away from normal PTQ but they already knew that here's the graph (fixed)
AT tweet media
PrismML@PrismML

This scatter plot shows the Pareto frontier of intelligence vs. size, defined by models like Qwen3 0.6B, 1.7B, 4B, 8B, and Ministral3 3B. The 1-bit Bonsai family shifts that frontier dramatically to the left. This changes the tradeoff itself: models no longer have to be large to be capable.

English
7
4
100
17.1K
AT
AT@waterloo_intern·
@nisten @part_harry_ we used their axis to plot on their chart their benchmarks to get the intelligence scores the x-axis is the weight file size this is what PrismML used
English
0
0
0
333
AT
AT@waterloo_intern·
@HenkPoley @part_harry_ fair, the point is more that the graph was designed to make 3 points look like a generational leap
English
0
0
2
266
Henk Poley
Henk Poley@HenkPoley·
@AliesTaha @part_harry_ 3 percentage point better is still quite a bit better. 🤷‍♂️ 73.8 to 76.8, about 11% less errors on these tests. Given that most of these tests have errors, so a perfect score cannot be achieved, probably even a bit better.
English
2
0
2
426
Josh
Josh@JoshPurtell·
@AliesTaha @part_harry_ Taking this as permission to publicly sanity test forthcoming Baseten results/research
English
1
0
6
928
AT
AT@waterloo_intern·
@oneill_c whoaaaaa
English
0
0
2
468
AT
AT@waterloo_intern·
@philipkiely what is inference? how does it work? @philipkiely can i come to learn (and also maybe get ice-cream)?
English
2
0
5
369
Philip Kiely
Philip Kiely@philipkiely·
Ice cream and books were a hit yesterday. ICYMI we're doing another, this time at the Ferry Building. Thursday 4/2 from 2-4 PM: luma.com/khxc93ju
Philip Kiely tweet mediaPhilip Kiely tweet mediaPhilip Kiely tweet media
English
2
0
28
3.2K
AT
AT@waterloo_intern·
@gaoj0017 only 3. Their experiments used single-core CPU for RaBitQ vs A100 GPU for TurboQuant has merit as a complaint the other 2 just don't hold
English
5
2
23
8.5K
Jianyang Gao
Jianyang Gao@gaoj0017·
We need to publicly clarify serious issues in Google’s ICLR 2026 paper TurboQuant. TurboQuant misrepresents RaBitQ in three ways: 1. Avoids acknowledging key methodological similarity (JL transform) 2. Calls our theory “suboptimal” with no evidence 3. Reports results under unfair experimental settings We have expressed our concerns to the authors before their submission, but they chose not to fix them in their paper submission. The paper was accepted at ICLR 2026 and heavily promoted by Google (tens of millions of views). At that scale, uncorrected claims quickly become “consensus.” Facts: 1. RaBitQ already proves asymptotic optimality (FOCS’17 bound) 2. TurboQuant uses the same random rotation step but misses stating the connection 3. Their experiments used single-core CPU for RaBitQ vs A100 GPU for TurboQuant None of these is properly disclosed. We’ve filed a formal complaint and posted on OpenReview (openreview.net/forum?id=tO3AS…). We’ll release a detailed technical report on arXiv. Our goal is simple: keep the academic record accurate. Would appreciate people taking a look and sharing.
English
19
97
1.3K
99.4K
Jianyang Gao
Jianyang Gao@gaoj0017·
The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views. We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (openreview.net/forum?id=tO3AS…). We would greatly appreciate your attention and help in sharing it.
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
98
975
6.5K
1M
AT
AT@waterloo_intern·
@Phenomenon_One well both really accesible via simplifying it confirm their claims with the gpu kernels, and my verdict was not useful at current perf for latest gpus
English
0
0
0
10
Omen
Omen@Phenomenon_One·
@AliesTaha Update after perusing I see where the time went deep diving algos and testing claims on arbitrary datasets .. are you trying to make it accessible or are you trying to confirm their claims?
English
1
0
1
19
AT
AT@waterloo_intern·
@Phenomenon_One i think partly due to my background being rooted in swe and not maths, but this was total time, including time it took to write article and make graphs
English
0
0
1
196
Omen
Omen@Phenomenon_One·
@AliesTaha Lmao 31 hours 😂 How long did the original researchers spend on it … Will critique after I read this.
English
2
0
0
358