Fu-Ming Guo

169 posts

Fu-Ming Guo

Fu-Ming Guo

@FumingGuo

Chief Architect AI/ML @ Visa. Opinions are my own.

Massachusetts, USA Katılım Şubat 2018
1.5K Takip Edilen269 Takipçiler
Fu-Ming Guo
Fu-Ming Guo@FumingGuo·
@pmddomingos Exactly! Human brain and the whole human civilizations recorded by language
English
0
0
0
10
Pedro Domingos
Pedro Domingos@pmddomingos·
AI is a distillation attack on the human brain.
English
173
216
2.6K
298.4K
Hugo Larochelle
Hugo Larochelle@hugo_larochelle·
Heureux d’annoncer aujourd'hui mon nouveau rôle de directeur scientifique au @Mila_Quebec! Grand honneur d'avoir cette opportunité de servir cette communauté de leaders et d'innovateurs en IA, que j'ai toujours chérie et dont j'ai moi-même bénéficié. mila.quebec/fr/nouvelle/hu…
Français
11
6
144
16.3K
Fu-Ming Guo retweetledi
Zeyuan Allen-Zhu, Sc.D.
Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·
❓Can I collect some feedbacks: Is fully open-source research necessary? Earlier, I released a family of 1-8B models (open data, code, weights): beating Llama3-8B with <10% pretrain time, beating most (all?) open-data models of this scale. 🔓No shortcuts: 10+ legal debates for using open data, 10+ more for weights, months of blockers, endless nights scavenging GPUs. 📈All to provide a strong, reproducible baseline — one I believed critical for understanding the physics of LLMs. Next planned was GLA + Canon, outperforming all modern linear models in tests. Yet attention was low. ❓Should I close-source to save time and focus on pure research? Honest feedback appreciated.
Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu

Phase 1 of Physics of Language Models code release ✅our Part 3.1 + 4.1 = all you need to pretrain strong 8B base model in 42k GPU-hours ✅Canon layers = strong, scalable gains ✅Real open-source (data/train/weights) ✅Apache 2.0 license (commercial ok!) 🔗github.com/facebookresear…

English
59
38
448
115K
Fu-Ming Guo
Fu-Ming Guo@FumingGuo·
@srush_nlp Same here. Email is also slow. Download a kindle app on iPhone, open the arxiv link pdf file version on iPhone, share to the kindle app through Safari sharing button, almost immediately appeared on Kindle Scribe. Only issue to improve is that scribe size larger would be better
English
0
0
1
337
Sasha Rush
Sasha Rush@srush_nlp·
rand q: I've been using a Kindle Scribe to read papers and it's pretty good. Except that the arxiv -> kindle process is some weak. Does anyone have a good system?
English
14
1
23
9.5K
Fu-Ming Guo
Fu-Ming Guo@FumingGuo·
@Wenxuan_Zhou I think your suggestion is a better approach based on the conclusions in recent amazing research from @ZeyuanAllenZhu Physics of Language Models
English
0
0
0
78
Wenxuan Zhou
Wenxuan Zhou@Wenxuan_Zhou·
What's the reason for not distilling test-time compute into the model itself so that it can skip the thoughts/comparison during test-time? Is there any necessity for "thinking out loud" or is it just a transitional approach?
Noam Brown@polynoamial

@OpenAI o1 is trained with RL to “think” before responding via a private chain of thought. The longer it thinks, the better it does on reasoning tasks. This opens up a new dimension for scaling. We’re no longer bottlenecked by pretraining. We can now scale inference compute too.

English
96
32
649
245.4K
Fu-Ming Guo retweetledi
James Zou
James Zou@james_y_zou·
🏆Exciting that our Mixture of Agents (MoA) tops the AlpacaEval leaderboard! We introduce the MoA architecture: layers of diverse LLM agents fuse + improve prev LLMs' outputs. MoA of only open-source LLMs outperforms GPT-4o by 7% on AlpacaEval2 while more cost-efficient🚀1/5
James Zou tweet media
English
5
53
308
51K
Fu-Ming Guo retweetledi
Chelsea Finn
Chelsea Finn@chelseabfinn·
How can we train full-size humanoid robots? New paper introducing: - learned controller for shadowing humans - imitation learning of demos collected via shadowing Website with code & videos: humanoid-ai.github.io
Zipeng Fu@zipengfu

Introduce HumanPlus - Autonomous Skills part Humanoids are born for using human data. Imitating humans, our humanoid learns: - fold sweatshirts - unload objects from warehouse racks - diverse locomotion skills (squatting, jumping, standing) - greet another robot Open-sourced!

English
4
30
171
32.9K
Fu-Ming Guo retweetledi
OpenAI
OpenAI@OpenAI·
We're sharing progress toward understanding the neural activity of language models. We improved methods for training sparse autoencoders at scale, disentangling GPT-4’s internal representations into 16 million features—which often appear to correspond to understandable concepts. openai.com/index/extracti…
English
331
800
4.8K
1.5M
yi
yi@agihippo·
What are some of the AI papers from academia in the LLM era that are set to get a few thousand citations? The only ones I can think of are maybe DPO or benchmark papers. Anything else?
English
19
2
50
23.2K
Fu-Ming Guo
Fu-Ming Guo@FumingGuo·
@agihippo Interesting guessing game 😄 learning prom your reply ~
English
0
0
0
76
Fu-Ming Guo retweetledi
Yi Tay
Yi Tay@YiTayML·
It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! 💪 One of the goals we’ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a personal goal for many of us in the team. Doing it from scratch, on top of starting a company, makes it even more challenging but rewarding. 😁 Core is still improving (not done training!) but we’re happy to ship an early version 🚢. I’ve been vibe-checking it for a bit and it’s a really cool model (especially at multimodal) 😎. Check out the blogpost, technical report and very non-cherry picked, “in the wild” showcase/demo in the thread below! Core is competitive with true frontier models. It beats Claude3 Opus on multimodal chat and matches GPT4-V on MMMU. Text metrics are competitive too (~83+ MMLU). In my mind, this is our arrival at the frontier. 😎👌🔥 More fun stuff to come in the following weeks! 😋
Reka@RekaAILabs

Meet Reka Core, our best and most capable multimodal language model yet. 🔮 It’s been a busy few months training this model and we are glad to finally ship it! 💪 Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the 3 body trailer.👇

English
63
85
927
216.5K
Pan Lu
Pan Lu@lupantech·
I am thrilled to defend my PhD and finally earn the title of Doctor🧑‍🎓. It's been a truly rewarding journey at @UCLAComSci. I'm so fortunate and grateful for the invaluable mentorship from Prof. @kaiwei_chang @uclanlp. He has always been incredibly encouraging, helpful, and supportive! I am deeply grateful to Prof. Song-Chun Zhu and other committee members, @guyvdb, @VioletNPeng, and Ying Nian Wu, for their excellent supervision.
Kai-Wei Chang@kaiwei_chang

Congrats 🎉 to the newly titled Dr. Lu @lupantech on defending his thesis about mathematical reasoning with language models"! 🧮 Pan has published a series of works on quantifying and improving math and scientific reasoning ability in LLMs. Some highlights:

English
42
2
225
23.9K
Fu-Ming Guo retweetledi
The Nobel Prize
The Nobel Prize@NobelPrize·
Happy International Women's Day! We're celebrating women who have changed the world. Here's all of the amazing women who have received the #NobelPrize and their remarkable achievements at the time of the award. Who are the women who inspire you the most? #IWD2024
English
147
4K
7.9K
596.9K
Fu-Ming Guo
Fu-Ming Guo@FumingGuo·
Thanks so much for sharing! Wonderful new year holiday reading!
John Schulman@johnschulman2

A compelling intuition is that deep learning does approximate Solomonoff induction, finding a mixture of the programs that explain the data, weighted by complexity. Finding a more precise version of this claim that's actually true would help us understand why deep learning works so well. There are a couple recent papers studying how NNs solve algorithmic tasks, which seem like exciting progress in this direction. - arxiv.org/abs/2309.02390 - develops a theory around when NN training learns a "memorizing" vs "generalizing" solution, which depends on each solution's "efficiency" -- how much param norm is needed to get correct & confident outputs. This theory predicts grokking phenomena - arxiv.org/abs/2310.16028 - transformers can't represent turing machines, but they can can represent a smaller class of computations, described by RASP programs. This paper finds that indeed, if data is generated by a RASP-L program, the transformer will learn exactly the right function.

English
0
0
1
258