David Foster

419 posts

David Foster

@davidADSP

Author of Generative Deep Learning: Teaching Machines how to Paint, Write, Compose and Play (O'Reilly), #generativeAI, Founding Partner of ADSP.

London, UK Katılım Temmuz 2019

577 Takip Edilen777 Takipçiler

David Foster@davidADSP·6 Nis

@mattshumer_ Yeah looks awesome - any idea how they calculated the $0.19-$0.49 PPM tokens? They say it's based on $2/hour H100 cost and serve rate of 0.03 ms / token I think?

English

116

Matt Shumer@mattshumer_·5 Nis

Llama 4's price/perf looks absolutely incredible. And a 10M token context window? Insane. Assuming the vibes check out, we'll be switching over many of our systems to Maverick.

English

172

25.7K

David Foster@davidADSP·21 Ara

@Thom_Wolf It's a reference to the fact that an ensemble of all submissions would have scored 81% on the private test set (i.e. 19% of solutions were unsolved by any solution) x.com/fchollet/statu…

François Chollet@fchollet

Does this mean the ARC-AGI benchmark has saturated? Yes -- the v1 version of the benchmark is starting to saturate. There were already signs of this in the Kaggle competition this year -- an ensemble of all submissions would score 81%. The competition next year will run on ARC-AGI-2, an updated version of the dataset that keeps the same format as v1, but features fewer tasks that can be easily brute-forced. Early indications are that ARC-AGI-v2 will represent a complete reset of the state-of-the-art, and it will remain extremely difficult for o3. Meanwhile, a smart human or a small panel of average humans would still be able to score >95%.

English

505

Thomas Wolf@Thom_Wolf·21 Ara

what was this thing btw? "Moreover, ARC-AGI-1 is now saturating – besides o3's new score, the fact is that a large ensemble of low-compute Kaggle solutions can now score 81% on the private eval" big ensemble of heuristics?

English

9.2K

David Foster@davidADSP·20 Ara

@arcprize @OpenAI o3 solves ARC-AGI? Huuuuge news if that's it...

English

492

ARC Prize@arcprize·20 Ara

Watch the finale of "12 Days of @OpenAI" livestream for a big announcement, starting in 3 minutes... openai.com/12-days/

English

199

14.3K

David Foster@davidADSP·18 Kas

@fchollet Out of interest @fchollet, what % of arc test set puzzles remain unsolved by any submitted solution? And what would the top 2 entries score if ensembled (I know this means they'd have 4 attempts). Just curious how much they overlap.

English

105

François Chollet@fchollet·14 Kas

Consulting my heart... Ok, looks like you haven't. But whenever you have a SotA (or close) solution built on top of the OpenAI API we're more than happy to verify it and add it to the public ARC Prize leaderboard. Anything using less than $10k worth of API calls is eligible.

Sam Altman@sama

@DavidSHolz @willdepue in your heart do you believe we’ve solved that one or no?

English

1.1K

180.6K

David Foster@davidADSP·16 Kas

@OfficialLoganK What's the rate limit?

English

Logan Kilpatrick@OfficialLoganK·16 Kas

Gemini-exp-1114 is now available via the Gemini API, happy building / testing! Will follow up Monday with more 🚢 ai.google.dev/gemini-api/docs

English

110

140

1.3K

225.6K

David Foster@davidADSP·9 Kas

@jsuarez @hirschibar Awesome write up! What about action masking - i.e. how do you handle cases where certain actions aren't possible (and the env returns you the mask at each timestep). Is this something PufferLib supports?

English

Joseph Suarez 🐡@jsuarez·8 Kas

@hirschibar It's just a list of discrete actions. Instead of 1 linear layer to output action, you have n layers. And then you just sum the losses for each

English

Joseph Suarez 🐡@jsuarez·8 Kas

x.com/i/article/1851…

ZXX

581

71.5K

David Foster@davidADSP·15 Eki

@lmarena_ai @01AI_Yi Will the multimodal Llama 3.2 models be added to the overall leaderboard?

English

636

Arena.ai@arena·15 Eki

Big News from Chatbot Arena! @01AI_YI's latest model Yi-Lightning has been extensively tested in Arena, collecting over 13K community votes! Yi-Lightning has climbed to #6 in the Overall rankings (#9 in Style Control), matching top models like Grok-2. It delivers robust performance in technical areas like Math, Hard Prompts, and Coding. Huge congrats to @01AI_YI! Meanwhile, GLM-4-Plus by Zhipu AI (@ChatGLM) has also entered the top 10, marking a strong surge for Chinese LLMs. They're quickly becoming highly competitive. Stay tuned for more! More analysis below👇

Arena.ai@arena

Yi-Lightning is now in Chatbot Arena! The latest and most capable model from @01AI_Yi. Come chat and vote at lmarena. ai. The leaderboard will be updated soon.

English

273

164.3K

David Foster@davidADSP·7 Eki

@lmarena_ai @lmsysorg Will Llama 3.2 (the multimodal models) and Gemini 1.5-002 be added to the main leaderboard?

English

1.4K

Arena.ai@arena·6 Eki

As part of Chatbot Arena's graduation🎓, we're excited to announce that we changed our X handle to @lmarena_ai! For open-source systems & research at LMSys, please follow @lmsysorg. This account, @lmarena_ai, will be dedicated to sharing Arena projects & leaderboard updates. See you tomorrow for another one 👀

Arena.ai@arena

We are happy to announce a new site for Chatbot Arena! Over the past year, with the incredible support of our community, Chatbot Arena has evolved into a mature ecosystem and platform. We believe it's time for it to graduate and stand on its own. By giving Chatbot Arena its own platform, we aim to provide it with more independence and ensure its long-term growth. With a strong partnership with LMSys, we're expanding the platform to evaluate frontier models, not only for chatbots but also in areas like coding, complex tasks, and red-teaming. LMSys has been a research collective dedicated to a variety of projects, such as Vicuna, Chatbot Arena, SGLang, S-LoRA, RouteLLM, and more — beyond just one initiative. Moving forward, LMSys will continue to serve as an incubator for new projects and as a platform for open research and development. Come join us! Chatbot Arena: lmarena.ai New blog site: blog.lmarena.ai Blog: lmsys.org/blog/2024-09-2…

English

228

191.5K

David Foster@davidADSP·20 May

@SullyOmarr Would you be willing to share the leaderboard from your evals?

English

267

Sully@SullyOmarr·20 May

underrated: gemini 1.5 flash overrated: gpt-4o We really need better ways to benchmark these models cause lmsys aint it stuff like cost, speed, tool use, writing, etc. arent considered Most ppl just use the top model based on leaderboards, but it's way more nuanced than that

English

206

24.4K

David Foster@davidADSP·3 May

Spot the data viz fail 🤦‍♂️@BBC @BBCPolitics @BBCNews

314

David Foster@davidADSP·22 Şub

@giffmana @pastaraspberry What is it about Gemma that makes it open, but not open source? Thanks!

English

312

Lucas Beyer (bl16)@giffmana·22 Şub

@davidADSP @pastaraspberry Open=directly accessible, not behind API Open-source: opensource.org/osd

English

495

Lucas Beyer (bl16)@giffmana·21 Şub

You know what's my favourite part with our Gemma release? That we do not misuse the term "open source" like other labs have. It was explicit in the comms briefing that we should call them "open models" and not "open source models". Much respect to the team.

English

262

29.2K

David Foster@davidADSP·22 Şub

@giffmana @pastaraspberry How are you defining open vs open source. Thanks!

English

341

Lucas Beyer (bl16)@giffmana·22 Şub

@pastaraspberry Yeah bloody hell I'm annoyed by them...

English

810

David Foster@davidADSP·16 Şub

@NPCollapse Funny story - William Peebles co-authored the Mar 2023 Diffusion Transformer paper on which Sora is based, whilst at Meta as an intern. But then joined OpenAI last year to co-lead Sora. So I guess they did know how to do it, but let him leave 😂

English

644

Connor Leahy@NPCollapse·16 Şub

lol, lmao

228

26.4K

David Foster@davidADSP·26 Eyl

@Thom_Wolf Theory of Everything: youtube.com/watch?v=q7i_DY…

YouTube

English

Thomas Wolf@Thom_Wolf·25 Eyl

almost 10 years in and I'm still listening to the soundtrack for Interstellar when I need to code some epic stuff. will it be ever topped

English

140

16.2K

David Foster@davidADSP·14 Ağu

@realGeorgeHotz Given the current breakthroughs, "linguistics" is a left-field candidate 🤔

English

161

David Foster@davidADSP·27 Haz

@nickfloats Does the --iw parameter affect remixes? In the docs it says it doesn't, but I'm never sure how much to trist the docs :)

English

103

Nick St. Pierre@nickfloats·22 Haz

Remixing with images can give you even more control in Midjourney You maintain more of the details and can do really fun things like turn group photos into animal balloon parties. A quick series of images, w/ a tutorial on how to do it at the end. It's actually super easy.

English

1.1K

484.7K

David Foster@davidADSP·7 Haz

@nickfloats Related question / challenge - how do you get Midjourney to output the usual meaning of 'fork in the road', rather than this? Changing the prompt to use different words isn't allowed 😃

English

Nick St. Pierre@nickfloats·4 Haz

Duck you and your stupid ducking AI

English

16.5K

David Foster@davidADSP·6 May

@SullyOmarr Nice idea! Can you create a short example as a demo?

English

Sully@SullyOmarr·6 May

Someone should just use GPT4 to create a unbiased news agency. Feed it all the data and let it create news articles. Bonus point: you can let users chat with it as well, so they can ask questions. Now that i think of it, why hasn't anyone done this yet?

English

101

290

75.4K

David Foster@davidADSP·20 Nis

@StabilityAI When your stochastic parrot drinks too much coffee. 🚀 Awesome work @StabilityAI !

English

176

Stability AI@StabilityAI·19 Nis

Announcing StableLM❗ We’re releasing the first of our large language models, starting with 3B and 7B param models, with 15-65B to follow. Our LLMs are released under CC BY-SA license. We’re also releasing RLHF-tuned models for research use. Read more→ stability.ai/blog/stability…