Haoxuan You (@XyouH) - Twitter Profili | Zamantika Mersobahis Locabet

Haoxuan You retweetledi

Jiahui Yu@jhyuxm·8 Nis

Happy to share Muse Spark, a natively multimodal reasoning model w/ tool-use, visual chain of thought, and multi-agent orchestration! It’s been a fulfilling journey not just building the model, but the team and culture behind it. Now live in product. ai.meta.com/blog/introduci…

English

18

52

445

44.1K

Haoxuan You retweetledi

Aran Komatsuzaki@arankomatsuzaki·22 Eyl

Apple presents Manzano: Simple & scalable unified multimodal LLM • Hybrid vision tokenizer (continuous ↔ discrete) cuts task conflict • SOTA on text-rich benchmarks, competitive in gen vs GPT-4o/Nano Banana • One model for both understanding & generation • Joint recipe: pretrain + refine + SFT • Scales cleanly (300M → 30B) with consistent gains

English

4

63

317

50.6K

Haoxuan You@XyouH·25 Eki

@cihangxie Cihang, our team (Foundation Models at Apple) is hiring 2025 interns. Please see my post if they are interested in!

English

2

0

3

470

Cihang Xie@cihangxie·25 Eki

Two of my 4th-year PhD students, Xianhang Li (xhl-video.github.io/xianhangli/) and Zeyu Wang (zw615.github.io), are seeking for (possbily their last) internship opportunties. They are talented, hardworking and accomplished researchers who have been the leading contributors to several important works from our lab over the past few years, such as CLIPA, Recap-DataComp, and AdvXL. Their interests include multimodal LLMs, image/video generation, and AI safety. If you have related positions, please consider them 😉 You'll be amazed!

English

1

2

19

2.2K

Haoxuan You@XyouH·15 Eki

@JoelEsler @zhegan4 ooops should be phd students/candidates lol

English

0

1

1.3K

Joel Esler@JoelEsler·15 Eki

@XyouH @zhegan4 “PhD” + “intern”?

English

1

0

2

1.5K

Haoxuan You@XyouH·15 Eki

Looking for a 2025 summer research intern, in the Foundation Model Team at Apple AI/ML, with the focus of Multimodal LLM / Vision-Language. Phd preferred. Apply through jobs.apple.com/en-us/details/… Also email me your resume to haoxuanyou@gmail.com! 😊

English

17

69

429

56.8K

Haoxuan You retweetledi

Zhe Gan@zhegan4·1 Eki

🚀🚀 Thrilled to share MM1.5! MM1.5 is a significant upgrade of MM1. With one single set of weights, MM1.5 excels at (1) read your charts, tables, any text-rich images, (2) understand visual prompts like points and boxes, provide grounded outputs, and (3) multi-image reasoning. 🔥🔥 We also introduce two variants: (1) MM1.5-UI to understand your iPhone screen 📱, and (2) MM1.5-Video for video inputs 🎥. As a research study, we also share the detailed ablations that guided our research process (🧵)

English

5

29

180

27.6K

Haoxuan You@XyouH·30 Tem

So proud of the team! Looking forward to contributing to the next step!

Ruoming Pang@ruomingpang

As Apple Intelligence is rolling out to our beta users today, we are proud to present a technical report on our Foundation Language Models that power these features on devices and cloud: machinelearning.apple.com/research/apple…. 🧵

English

0

8

1.2K

Haoxuan You retweetledi

Ruoming Pang@ruomingpang·29 Tem

As Apple Intelligence is rolling out to our beta users today, we are proud to present a technical report on our Foundation Language Models that power these features on devices and cloud: machinelearning.apple.com/research/apple…. 🧵

English

13

185

700

161.1K

Haoxuan You@XyouH·19 Tem

@jhyuxm 🎉👏👏

QME

0

115

Jiahui Yu@jhyuxm·18 Tem

priced at 15 cents per million input tokens and 60 cents per million output tokens, with multimodal supports openai.com/index/gpt-4o-m…

English

4

58

7.4K

Haoxuan You@XyouH·14 Eki

@LiLiunian @GoogleAI Big Congrats! 🥳

English

0

1

67

Harold@LiLiunian·14 Eki

Thank you and I am honored to receive the fellowship! Many thanks to Kai-Wei and my amazing collaborators and lab mates! Thank @GoogleAI for the generous support!

Kai-Wei Chang@kaiwei_chang

Congrats to @LiLiunian for winning Google PhD Fellowship! 🎉🥳🎊 Harold led pioneering efforts in vision-language research, including developing notable models such as VisualBERT, CLIP, and recently introduced Desco. He will be on the market this year! @uclanlp @UCLAengineering

English

9

2

58

8.5K

Haoxuan You@XyouH·13 Eki

Excited to present my summer internship work🍎! Ferret is a new multimodal LLM that can accurately understand any region in an image no matter how you refer to it, and precisely localize the open-vocabulary descriptions in output! It can beat GPT-4V very often in above tasks!

Zhe Gan@zhegan4

🚀🚀Introducing Ferret, a new MLLM that can refer and ground anything anywhere at any granularity. 📰arxiv.org/abs/2310.07704 1⃣ Ferret enables referring of an image region at any shape 2⃣ It often shows better precise understanding of small image regions than GPT-4V (sec 5.6)

English

2

4

22

3K

Haoxuan You retweetledi

Zhe Gan@zhegan4·12 Eki

🚀🚀Introducing Ferret, a new MLLM that can refer and ground anything anywhere at any granularity. 📰arxiv.org/abs/2310.07704 1⃣ Ferret enables referring of an image region at any shape 2⃣ It often shows better precise understanding of small image regions than GPT-4V (sec 5.6)

English

9

104

445

110.2K

Haoxuan You

Keşfet