William Fleshman

337 posts

William Fleshman banner
William Fleshman

William Fleshman

@willcfleshman

US Army & PhD student at Johns Hopkins University

Katılım Ağustos 2017
153 Takip Edilen424 Takipçiler
Sabitlenmiş Tweet
William Fleshman
William Fleshman@willcfleshman·
Did you know that LoRA A matrices can be frozen at init w/o degrading performance? 🤯 We leverage this trick to construct an unsupervised routing procedure that achieves identical performance to the previous best with orders of magnitude fewer FLOPs and ~50% less GPU memory. 🧵
William Fleshman tweet media
English
1
5
11
2.3K
William Fleshman retweetledi
Benjamin Van Durme
Benjamin Van Durme@ben_vandurme·
JHU mmBERT extended from 8k to 32k token length by vLLM Semantic Router Team. Cutting edge results on 1,800+ languages, now with longer context! huggingface.co/llm-semantic-r…
English
0
7
28
1.6K
William Fleshman
William Fleshman@willcfleshman·
@ChromeHODLs @hillery_dan That's definitely easier, just might not be optimal depending on your tax situation. Assuming you can almost capture both rates, swapping back and forth with T-bills would compound faster up to a 25% tax. Tax free accounts, if available, is the real way to go.
English
0
0
0
19
William Fleshman
William Fleshman@willcfleshman·
@ChromeHODLs @hillery_dan Opportunity cost. If you can capture the dividend by only tying up your capital for a couple of days then the rest of the month that capital can be making money elsewhere.
English
2
0
1
251
Dan Hillery
Dan Hillery@hillery_dan·
To be eligible for January's STRC dividend, you had to be recorded holding STRC upon market up today, January 15th. Therefore, you can sell STRC today and still get the dividend. STRC sold off less than the monthly dividend payment. There is insatiable demand, and I don't know why.
English
59
23
565
34.9K
Edward Raff
Edward Raff@EdwardRaffML·
I just did a literature review on a specific topic to add more related work to a paper. Asked GPT5 to do the same to see how well it would compare. GPT5 had 0% recall and 0% precision on its returned list. At least they were real papers and not hallucinated though 🤷
English
2
1
1
167
Jack Jingyu Zhang
Jack Jingyu Zhang@jackjingyuzhang·
I’m super thrilled and honored to be named an Amazon AI PhD Fellow 💫 Huge thanks to @AmazonScience for generously supporting our research at JHU! We’ll be advancing AI alignment in collaboration with folks at Amazon.
Rohit Prasad@RohitPrasadAI

Excited to announce @amazon's new AI PhD Fellowship Program supporting 100+ students across 9 universities like Carnegie Mellon, MIT & Stanford. Fellows will be paired with senior scientists working in related fields, plus receive financial support and AWS credits for research. Learn more: amazon.science/news/amazon-la…

English
17
9
92
11.7K
William Fleshman
William Fleshman@willcfleshman·
SEQR provably routes to the same adapters as SpectR, yielding the same high level of task performance at a fraction of the cost 🤑. Like previous unsupervised approaches, SEQR is secure, with no risk of data leakage if LoRA B matrices are kept private!🔐
English
1
0
1
69
William Fleshman
William Fleshman@willcfleshman·
Did you know that LoRA A matrices can be frozen at init w/o degrading performance? 🤯 We leverage this trick to construct an unsupervised routing procedure that achieves identical performance to the previous best with orders of magnitude fewer FLOPs and ~50% less GPU memory. 🧵
William Fleshman tweet media
English
1
5
11
2.3K
William Fleshman retweetledi
Orion Weller
Orion Weller@orionweller·
XLM-R has been SOTA for 6 years for multilingual encoders. That's an eternity in AI 🤯 Time for an upgrade. Introducing mmBERT: 2-4x faster than previous models ⚡ while even beating o3 and Gemini 2.5 Pro 🔥 + open models & training data - try it now! How did we do it? 🧵
Orion Weller tweet media
English
13
59
250
43.2K
William Fleshman retweetledi
Marc Marone
Marc Marone@ruyimarone·
3T tokens, ~1800 languages, 2 models - we’re releasing mmBERT, a modern multilingual encoder model!
Marc Marone tweet media
English
11
63
403
30.9K
William Fleshman
William Fleshman@willcfleshman·
@jxmnop Cool stuff, when we did RE-Adapt (arxiv.org/abs/2405.15007) with llama we saw many of the base->instruct weight updates are approx. low rank but some layers were not. You could repeat your experiment with the llama instruct models to see how close to base you actually get.
English
0
0
3
108
dr. jack morris
dr. jack morris@jxmnop·
OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵
dr. jack morris tweet mediadr. jack morris tweet media
English
163
447
6.1K
928.7K
William Fleshman
William Fleshman@willcfleshman·
Obviously have to attack you as my main AAAI contact 🤣
English
0
0
0
113
William Fleshman
William Fleshman@willcfleshman·
@investingidiocy Thanks for answering my question on TTU. I'm looking forward to the new series of blog posts. Cheers!
English
0
0
0
102