Fu-Ming Guo

169 posts

Fu-Ming Guo

@FumingGuo

Chief Architect AI/ML @ Visa. Opinions are my own.

Massachusetts, USA Katılım Şubat 2018

1.5K Takip Edilen269 Takipçiler

Fu-Ming Guo@FumingGuo·24 Şub

@pmddomingos Exactly! Human brain and the whole human civilizations recorded by language

English

Pedro Domingos@pmddomingos·24 Şub

AI is a distillation attack on the human brain.

English

173

216

2.6K

298.4K

Fu-Ming Guo@FumingGuo·3 Eyl

@hugo_larochelle @Mila_Quebec Congratulations!🎉

English

Hugo Larochelle@hugo_larochelle·2 Eyl

Heureux d’annoncer aujourd'hui mon nouveau rôle de directeur scientifique au @Mila_Quebec! Grand honneur d'avoir cette opportunité de servir cette communauté de leaders et d'innovateurs en IA, que j'ai toujours chérie et dont j'ai moi-même bénéficié. mila.quebec/fr/nouvelle/hu…

Français

144

16.3K

Fu-Ming Guo retweetledi

Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·29 Tem

❓Can I collect some feedbacks: Is fully open-source research necessary? Earlier, I released a family of 1-8B models (open data, code, weights): beating Llama3-8B with <10% pretrain time, beating most (all?) open-data models of this scale. 🔓No shortcuts: 10+ legal debates for using open data, 10+ more for weights, months of blockers, endless nights scavenging GPUs. 📈All to provide a strong, reproducible baseline — one I believed critical for understanding the physics of LLMs. Next planned was GLA + Canon, outperforming all modern linear models in tests. Yet attention was low. ❓Should I close-source to save time and focus on pure research? Honest feedback appreciated.

Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu

Phase 1 of Physics of Language Models code release ✅our Part 3.1 + 4.1 = all you need to pretrain strong 8B base model in 42k GPU-hours ✅Canon layers = strong, scalable gains ✅Real open-source (data/train/weights) ✅Apache 2.0 license (commercial ok!) 🔗github.com/facebookresear…

English

448

115K

Fu-Ming Guo@FumingGuo·15 Nis

Congratulations! Well deserved!

ICLR@iclr_conf

Test of Time Winner Adam: A Method for Stochastic Optimization Diederik P. Kingma, Jimmy Ba Adam revolutionized neural network training, enabling significantly faster convergence and more stable training across a wide variety of architectures and tasks.

English

231

Fu-Ming Guo@FumingGuo·15 Nis

Congratulations! Well deserved!

🇺🇦 Dzmitry Bahdanau@DBahdanau

@iclr_conf many many many thanks to @kchonyc and @Yoshua_Bengio for enabling the wildest ever start of my research career 2014 was a very special time to do deep learning, a commit that changes 50 lines of code could give you a ToT award 10 years later 😲

English

175

Fu-Ming Guo@FumingGuo·19 Eki

@srush_nlp Same here. Email is also slow. Download a kindle app on iPhone, open the arxiv link pdf file version on iPhone, share to the kindle app through Safari sharing button, almost immediately appeared on Kindle Scribe. Only issue to improve is that scribe size larger would be better

English

337

Sasha Rush@srush_nlp·19 Eki

rand q: I've been using a Kindle Scribe to read papers and it's pretty good. Except that the arxiv -> kindle process is some weak. Does anyone have a good system?

English

9.5K

Fu-Ming Guo@FumingGuo·17 Eyl

@Wenxuan_Zhou I think your suggestion is a better approach based on the conclusions in recent amazing research from @ZeyuanAllenZhu Physics of Language Models

English

Wenxuan Zhou@Wenxuan_Zhou·15 Eyl

What's the reason for not distilling test-time compute into the model itself so that it can skip the thoughts/comparison during test-time? Is there any necessity for "thinking out loud" or is it just a transitional approach?

Noam Brown@polynoamial

@OpenAI o1 is trained with RL to “think” before responding via a private chain of thought. The longer it thinks, the better it does on reasoning tasks. This opens up a new dimension for scaling. We’re no longer bottlenecked by pretraining. We can now scale inference compute too.

English

649

245.4K

Fu-Ming Guo retweetledi

James Zou@james_y_zou·14 Haz

🏆Exciting that our Mixture of Agents (MoA) tops the AlpacaEval leaderboard! We introduce the MoA architecture: layers of diverse LLM agents fuse + improve prev LLMs' outputs. MoA of only open-source LLMs outperforms GPT-4o by 7% on AlpacaEval2 while more cost-efficient🚀1/5

English

308

51K

Fu-Ming Guo retweetledi

Chelsea Finn@chelseabfinn·14 Haz

How can we train full-size humanoid robots? New paper introducing: - learned controller for shadowing humans - imitation learning of demos collected via shadowing Website with code & videos: humanoid-ai.github.io

Zipeng Fu@zipengfu

Introduce HumanPlus - Autonomous Skills part Humanoids are born for using human data. Imitating humans, our humanoid learns: - fold sweatshirts - unload objects from warehouse racks - diverse locomotion skills (squatting, jumping, standing) - greet another robot Open-sourced!

English

171

32.9K

Fu-Ming Guo retweetledi

OpenAI@OpenAI·6 Haz

We're sharing progress toward understanding the neural activity of language models. We improved methods for training sparse autoencoders at scale, disentangling GPT-4’s internal representations into 16 million features—which often appear to correspond to understandable concepts. openai.com/index/extracti…

English

331

800

4.8K

1.5M

Fu-Ming Guo retweetledi

Furong Huang@furongh·4 Haz

9/ 🙏 Shoutout to our team and collaborators who made this research possible! 💐 @SOURADIPCHAKR18 @ghosal_suvra @MingYin_0312 @dmanocha @MengdiWang10 @amritsinghbedi3 @furongh #TeamWork 🔗 to paper: arxiv.org/abs/2405.20495

English

894

Fu-Ming Guo retweetledi

Hui Guan@guanh01·7 May

MLSys is around the corner. Catch @Azaliamirh talk at the #MLSys2024 Young Professionals Symposium! May 13th in Santa Clara, California. Full agenda here: mlsys.org/virtual/2024/c…

English

46.7K

Fu-Ming Guo retweetledi

Tianqi Chen@tqchenml·6 May

Catch @KurtKeutzer keynote talk at the #MLSys2024 Young Professionals Symposium! May 13th in Santa Clara, California. Full agenda here: mlsys.org/virtual/2024/c…

English

4.6K

Fu-Ming Guo@FumingGuo·2 May

@agihippo FlashAttention?

English

yi@agihippo·1 May

What are some of the AI papers from academia in the LLM era that are set to get a few thousand citations? The only ones I can think of are maybe DPO or benchmark papers. Anything else?

English

23.2K

Fu-Ming Guo@FumingGuo·2 May

@agihippo Interesting guessing game 😄 learning prom your reply ~

English

Fu-Ming Guo retweetledi

Yi Tay@YiTayML·15 Nis

It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! 💪 One of the goals we’ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a personal goal for many of us in the team. Doing it from scratch, on top of starting a company, makes it even more challenging but rewarding. 😁 Core is still improving (not done training!) but we’re happy to ship an early version 🚢. I’ve been vibe-checking it for a bit and it’s a really cool model (especially at multimodal) 😎. Check out the blogpost, technical report and very non-cherry picked, “in the wild” showcase/demo in the thread below! Core is competitive with true frontier models. It beats Claude3 Opus on multimodal chat and matches GPT4-V on MMMU. Text metrics are competitive too (~83+ MMLU). In my mind, this is our arrival at the frontier. 😎👌🔥 More fun stuff to come in the following weeks! 😋

Reka@RekaAILabs

Meet Reka Core, our best and most capable multimodal language model yet. 🔮 It’s been a busy few months training this model and we are glad to finally ship it! 💪 Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the 3 body trailer.👇

English

927

216.5K

Fu-Ming Guo@FumingGuo·3 Nis

@lupantech @UCLAComSci @kaiwei_chang @uclanlp Congratulations!

English

Pan Lu@lupantech·3 Nis

I am thrilled to defend my PhD and finally earn the title of Doctor🧑‍🎓. It's been a truly rewarding journey at @UCLAComSci. I'm so fortunate and grateful for the invaluable mentorship from Prof. @kaiwei_chang @uclanlp. He has always been incredibly encouraging, helpful, and supportive! I am deeply grateful to Prof. Song-Chun Zhu and other committee members, @guyvdb, @VioletNPeng, and Ying Nian Wu, for their excellent supervision.

Kai-Wei Chang@kaiwei_chang

Congrats 🎉 to the newly titled Dr. Lu @lupantech on defending his thesis about mathematical reasoning with language models"! 🧮 Pan has published a series of works on quantifying and improving math and scientific reasoning ability in LLMs. Some highlights:

English

225

23.9K

Fu-Ming Guo retweetledi

Sasha Rush@srush_nlp·1 Nis

Graham Neubig - Can we make building with open-source AI as simple as prompting ChatGPT? (@gneubig ) youtube.com/watch?v=BiklOj…

YouTube

English

5.6K

Fu-Ming Guo retweetledi

The Nobel Prize@NobelPrize·8 Mar

Happy International Women's Day! We're celebrating women who have changed the world. Here's all of the amazing women who have received the #NobelPrize and their remarkable achievements at the time of the award. Who are the women who inspire you the most? #IWD2024

English

147

7.9K

596.9K

Fu-Ming Guo@FumingGuo·31 Ara

Thanks so much for sharing! Wonderful new year holiday reading!

John Schulman@johnschulman2

A compelling intuition is that deep learning does approximate Solomonoff induction, finding a mixture of the programs that explain the data, weighted by complexity. Finding a more precise version of this claim that's actually true would help us understand why deep learning works so well. There are a couple recent papers studying how NNs solve algorithmic tasks, which seem like exciting progress in this direction. - arxiv.org/abs/2309.02390 - develops a theory around when NN training learns a "memorizing" vs "generalizing" solution, which depends on each solution's "efficiency" -- how much param norm is needed to get correct & confident outputs. This theory predicts grokking phenomena - arxiv.org/abs/2310.16028 - transformers can't represent turing machines, but they can can represent a smaller class of computations, described by RASP programs. This paper finds that indeed, if data is generated by a RASP-L program, the transformer will learn exactly the right function.

English

258

Keşfet

@pmddomingos @hugo_larochelle @Mila_Quebec @srush_nlp @Wenxuan_Zhou @ZeyuanAllenZhu @SOURADIPCHAKR18 @ghosal_suvra