Peter Schafhalter

118 posts

Peter Schafhalter

Peter Schafhalter

@pschafhalter

AI-Sys PhD student @ucbrise @ucberkeley focusing on systems for self-driving cars

Katılım Mayıs 2019
248 Takip Edilen368 Takipçiler
Peter Schafhalter
Peter Schafhalter@pschafhalter·
I’m incredibly grateful for my collaborators at @GoogleDeepMind: Shun Liao, @zhouyanqi30, Chih-Kuan Yeh, Arun Kandoor, and James Laudon. Also, thank you to @ICGog and the Pathways Team for their support and feedback.
English
0
0
2
97
Peter Schafhalter
Peter Schafhalter@pschafhalter·
We exploit parallelism in MoDE’s architecture to apply flexible sharding configurations that place experts weights on different devices from the pre-trained model’s weights. Such configurations can reduce communication overheads and improve training speeds by up to 38%.
Peter Schafhalter tweet media
English
1
0
2
133
Peter Schafhalter
Peter Schafhalter@pschafhalter·
Introducing Modular Domain Experts (MoDE), a new multi-domain adaptation technique for LLMs. MoDE independently trains experts on different domains and composes them to boost LLM performance on complex, multi-domain tasks. Paper: arxiv.org/abs/2410.10181
Peter Schafhalter tweet media
English
1
3
13
1.4K
Vikram Sreekanti
Vikram Sreekanti@vsreekanti·
Feels like the discussion around long context has disappeared recently — are there any big applications of it that are popular right now?
English
1
0
3
725
Sam Kumar
Sam Kumar@samkumar_cs·
It’s an honor to have been recognized as a Runner-Up for the ⁦@TheOfficialACM⁩ SIGSAC Doctoral Dissertation Award at CCS 2024 (⁦@acm_ccs⁩)! I’d like to thank my nominator and letter writers for this award, and my advisors, mentors, and collaborators during my PhD.
Sam Kumar tweet media
English
7
5
80
6.5K
Sam Kumar
Sam Kumar@samkumar_cs·
I've wrapped up my postdoc and just moved down to Los Angeles. I'm super excited to start at UCLA CS (@CS_UCLA)! Today is my first day working in person at the @UCLA campus.
Sam Kumar tweet media
English
8
1
30
724
Peter Schafhalter
Peter Schafhalter@pschafhalter·
@profjoeyg Usage-based pricing makes sense AI infra (fine-tuning, generating tokens). AI Infra is like AWS, and is used to build applications (ChatGPT, Copilot). Fundamentally, infra and apps target different use-cases and could benefit from different price models.
English
1
0
0
79
Peter Schafhalter
Peter Schafhalter@pschafhalter·
@profjoeyg Interesting post. I think that monthly fees for AI products make a lot of sense because, as you mentioned, prices are predictable. Usage-based pricing increases complexity, and could harm adoption despite being more cost-effective.
English
1
0
0
35
Joey Gonzalez
Joey Gonzalez@profjoeyg·
What is the right pricing model for AI? Should it be a monthly fee or a flat rate per token? Do you pay extra for more knowledge? Three years ago, I was focused on server-less computing for AI and how to allocate inference engines. At the time, consumption based pricing was the future. Maybe it still is? @vsreekanti and I have been thinking about pricing models and we just published our latest thoughts: frontierai.substack.com/p/the-future-o…
English
1
2
1
479
Tianjun Zhang
Tianjun Zhang@tianjun_zhang·
It has been really a rewarding journey since I joined the #LLaMA3 team @AIatMeta a little more than 2 months ago, and yet today we are releasing one of the world's best models! 🔥With the new license, we allow synthetic data generation from Llama to enhance your own model! Checkout our research paper on how we build this: ai.meta.com/research/publi…. Excited to see what we can build on top! 🫡
Tianjun Zhang tweet media
English
11
6
115
14.4K
Peter Schafhalter
Peter Schafhalter@pschafhalter·
@robertnishihara @mrry Hi Robert, considering all the lessons learned building Ray, are there any changes you would have made back when you first started the project? Personally, I always wondered whether the flexibility of dynamic task graphs would eventually lead to performance bottlenecks.
English
0
0
5
45.2K
Robert Nishihara
Robert Nishihara@robertnishihara·
Ray originally started with just the "task" API for executing Python functions asynchronously (with some resemblance to systems like Dask, Celery, PySpark, etc). Actually, the system most closely resembling Ray's task API is CIEL (built by @mrry). usenix.org/legacy/events/… That said, a lot of AI is stateful, and the task API was just too limited, so pretty early on we ended up needing to add the actor API (essentially the ability to spin up a Python class as a little actor / microservice). The actor API was what really opened the floodgates and enabled Ray to support training workloads, online serving workloads, reinforcement learning workloads, and so on. Even Ray Data is built on actors despite data processing workloads being traditionally stateless.
ray@raydistributed

Ray operates at two levels: Ray Core, which scales Python functions and classes with tasks and actors, and its libraries, offering easy-to-use abstractions tailored for ML workloads. #Ray #ML #DistributedComputing

English
2
7
83
13.3K
Zhanghao Wu
Zhanghao Wu@Michaelvll1·
I am honored to share that our recent paper won the Outstanding Paper Award in NSDI’24! The paper explores the policy design of our SkyPilot managed spot for @skypilot_org: Can’t Be Late: Optimizing Spot Instance Savings under Deadlines It would not be possible, if it were not with the fantastic folks and advisors: @infwinston @ziming_mao @zongheng_yang Eric Friedman, Scott Shenker and Ion Stoica; and the whole SkyPilot team @skypilot_org
Zhanghao Wu tweet media
English
10
7
97
21.2K
Peter Schafhalter retweetledi
Simon Guo
Simon Guo@simonguozirui·
Spent the last few weeks working on this blog! When I first read the Ring Attention paper, I kind of get the concept yet not really. Diving into the details from the math to compute was incredibly rewarding for our understanding, and hope it be a fun read for you too!
Kilian Haefeli@khshind

How do state-of-the-art LLMs like Gemini 1.5 and Claude 3 scale to long context windows beyond 1M tokens? Well, Ring Attention by @haoliuhl presents a way to split attention calculation across GPUs while hiding the communication overhead in a ring, enabling zero overhead scaling

English
2
16
179
58.7K