Will Harvey

13 posts

Will Harvey

@willarvey

machine learning phd student @ UBC 🏞️ - currently doing a research internship @ google deepmind

Katılım Mart 2013

119 Takip Edilen94 Takipçiler

Will Harvey retweetledi

Christian Weilbach [email protected]@wh1lo·12 Ara

It is @NeurIPS time again! I am excited to present our trans-dimensional jump diffusion work with @AndrewC_ML @willarvey @ValentinDeBort1 @tom_rainforth and @ArnaudDoucet1 ! Come over on Thursday 2nd poster session, neurips.cc/virtual/2023/p…. arxiv.org/abs/2305.16261 #NeurIPS2023

English

2.2K

Will Harvey@willarvey·28 May

This was a lot of fun to work on! And works well with test-time guidance: we can train on varying-length RoboDesk videos and then, at test-time, fix the first and last frames and automatically figure out how far apart they are - i.e. how long the robot needs to move between them!

Andrew Campbell@AndrewC_ML

How can we apply diffusion models to data with varying dimensionality? We use jump diffusions to simultaneously generate the size and state values for varying size data e.g. molecules arxiv.org/abs/2305.16261 w/ @willarvey @wh1lo @ValentinDeBort1 @tom_rainforth @ArnaudDoucet1

English

500

Will Harvey retweetledi

Sander Dieleman@sedielem·2 Haz

This paper is a goldmine for anyone training diffusion models, carefully picking apart theory and practice and showing which choices really matter. I was quite excited to see the authors of the StyleGAN series of papers tackle this topic, and boy do they deliver!

AK@_akhaliq

Elucidating the Design Space of Diffusion-Based Generative Models abs: arxiv.org/abs/2206.00364 improve efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of an existing ImageNet-64 model from 2.07 to near-SOTA 1.55

English

107

583

Will Harvey@willarvey·29 May

@sirbayes @frankdonaldwood @sama @demishassabis @ylecun We know :) We cite Video Diffusion Models heavily in the paper (arxiv.org/abs/2205.11495) but focus on long-term coherence, jointly generating frames up to 1000 timesteps apart (instead of 64 like the Google work). Anyone at google looking into scaling that model to longer videos?

English

Kevin Patrick Murphy@sirbayes·29 May

@frankdonaldwood @sama @demishassabis @ylecun You asked leaders at OpenAI, Meta, Deepmind who will scale this kind of work - but Google already has :) See video-diffusion.github.io

English

Frank Wood@frankdonaldwood·27 May

I think, much more than large language models, this work might be the first glimpse of what the foundation model for vision-based planning for embodied real-world AGI might look like. @sama, @demishassabis, @ylecun who is going to scale this first? cs-plai-2019.sites.olt.ubc.ca/2022/05/20/fle…

English

324

Will Harvey@willarvey·29 May

@jekbradbury @frankdonaldwood Definitely sounds interesting, will be in touch!

English

James Bradbury@jekbradbury·28 May

@willarvey @frankdonaldwood we can probably get you a lot more compute through the TPU Research Cloud; lmk if that sounds interesting!

English

Will Harvey@willarvey·27 May

Thanks for the shout out @frankdonaldwood - the videos still have occasional glitches but are much better after scaling from training on 1 GPU to 4 GPUs. Simply scaling further might be the right direction to take

Frank Wood@frankdonaldwood

English

Will Harvey@willarvey·27 May

@tejasdkulkarni @frankdonaldwood @sama @demishassabis @ylecun Maybe we can improve object/landmark permanence by conditioning frames on e.g. the corresponding camera position similar to GQN. But I sense that pixel-level models with lots of compute are likely to win out over anything much more structured than that

English

Tejas Kulkarni@tejasdkulkarni·27 May

@frankdonaldwood @sama @demishassabis @ylecun yeah indeed. what is your projection on the role of structured representations in sensory domains after working on this? the nerf point is interesting - do you think these two directions get integrated or it will be "geometry free"?

English

Will Harvey@willarvey·27 May

@b11tz @frankdonaldwood @sama @demishassabis @ylecun in the order of 1 GPU-week - almost nothing compared to most of the recent video models I've seen

English

poy@agi_coming·27 May

@frankdonaldwood @sama @demishassabis @ylecun Speaking of scale, how much compute was used for training this model?

English

Will Harvey@willarvey·27 May

@frankdonaldwood @saeidnaderip @VadenMasrani @NandoDF @sirbayes @sama @ylecun Haha well at the very least let's see if we can get some vision-based planning working before my "wasted summer" begins 😅

English

Frank Wood@frankdonaldwood·27 May

@willarvey @saeidnaderip @VadenMasrani and Christian Weilbach deserve _all_ the props for this by the way. @willarvey in particular is wasting his summer in finance in London. Surely @NandoDF, @sirbayes, @sama, @ylecun there are better ways for @willarvey to spend his time.

English

Will Harvey@willarvey·25 May

@adam_golinski Thanks @adam_golinski !

English

Adam Golinski @adam_golinski·25 May

@willarvey, great work!

AK@_akhaliq

Flexible Diffusion Modeling of Long Videos abs: arxiv.org/abs/2205.11495 demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length

English

Will Harvey retweetledi

AK@_akhaliq·24 May

English

Will Harvey@willarvey·27 Şub

Our results suggest a possible future application of such high-fidelity image completion tools: they could be used to select maximally informative sequences of small field of view x-ray scans.

English

Will Harvey@willarvey·27 Şub

Excited to announce our work (arxiv.org/abs/2102.12037…) with hierarchical variational autoencoders - we found that they're ideal for making into realistic image completion models (with @saeidnaderip and @frankdonaldwood)

English

Keşfet

@AndrewC_ML @ValentinDeBort1 @tom_rainforth @ArnaudDoucet1 @sirbayes @frankdonaldwood @sama @demishassabis