AaltoMediaAI

1.4K posts

AaltoMediaAI

@aaltomediaai

@AaltoUniversity’s AI&ML for media, art & design course. This is a public backlog of material for updating the course.

Katılım Aralık 2019

168 Takip Edilen250 Takipçiler

AaltoMediaAI retweetledi

Nam Hee Gordon Kim@NamHeeGordonKim·28 Nis

Can we build AI players that are not just great at the game, but play *like people*? New #Eurographics 2026 paper: Robo-Saber: Generating and Simulating Virtual Reality Players (Links in the reply!)

English

AaltoMediaAI retweetledi

Paul Couvert@itsPaulAi·16 Haz

Microsoft has revolutionized the automation game You can automate any task just by recording your screen and explaining it to the AI. Copilot will then analyze the mouse movements, audio... And build the automation flow all by itself! (Way easier than n8n or Make.) 00:00 - Requirements 01:02 - Record with Copilot 01:54 - Recording Demo 05:03 - Flow adjustments 06:01 - Automation test 07:30 - Results

English

246

1.9K

201.3K

AaltoMediaAI retweetledi

Rohan Paul@rohanpaul_ai·17 Haz

It’s a hefty 206-page research paper, and the findings are concerning. "LLM users consistently underperformed at neural, linguistic, and behavioral levels" This study finds LLM dependence weakens the writer’s own neural and linguistic fingerprints. 🤔🤔 Relying only on EEG, text mining, and a cross-over session, the authors show that keeping some AI-free practice time protects memory circuits and encourages richer language even when a tool is later reintroduced.

English

308

2.4K

11.5K

2.3M

AaltoMediaAI retweetledi

Sundar Pichai@sundarpichai·20 May

At #GoogleIO, we shared how decades of AI research have now become reality. From a total reimagining of Search to Agent Mode, Veo 3 and more, Gemini season will be the most exciting era of AI yet. Some highlights 🧵

English

262

1.5K

14.1K

1.8M

AaltoMediaAI@aaltomediaai·7 May

New open (Apache 2.0-licensed) music generation model. Model weights and LoRA finetuning code available. Based on Reddit comments, this is on par with Suno 3.5, although not the very latest Suno. ace-step.github.io

English

AaltoMediaAI retweetledi

Andrej Karpathy@karpathy·25 Nis

Noticing myself adopting a certain rhythm in AI-assisted coding (i.e. code I actually and professionally care about, contrast to vibe code). 1. Stuff everything relevant into context (this can take a while in big projects. If the project is small enough just stuff everything e.g. `files-to-prompt . -e ts -e tsx -e css -e md --cxml --ignore node_modules -o prompt.xml`) 2. Describe the next single, concrete incremental change we're trying to implement. Don't ask for code, ask for a few high-level approaches, pros/cons. There's almost always a few ways to do thing and the LLM's judgement is not always great. Optionally make concrete. 3. Pick one approach, ask for first draft code. 4. Review / learning phase: (Manually...) pull up all the API docs in a side browser of functions I haven't called before or I am less familiar with, ask for explanations, clarifications, changes, wind back and try a different approach. 6. Test. 7. Git commit. Ask for suggestions on what we could implement next. Repeat. Something like this feels more along the lines of the inner loop of AI-assisted development. The emphasis is on keeping a very tight leash on this new over-eager junior intern savant with encyclopedic knowledge of software, but who also bullshits you all the time, has an over-abundance of courage and shows little to no taste for good code. And emphasis on being slow, defensive, careful, paranoid, and on always taking the inline learning opportunity, not delegating. Many of these stages are clunky and manual and aren't made explicit or super well supported yet in existing tools. We're still very early and so much can still be done on the UI/UX of AI assisted coding.

English

456

1.1K

12.3K

1.2M

AaltoMediaAI@aaltomediaai·4 Nis

Video: youtube.com/watch?v=4GNXqe… Download: cascadeur.com

YouTube

English

AaltoMediaAI@aaltomediaai·4 Nis

Cascadeur animation editor adds AI-driven inbetweening. This looks promising, as with their physics-based editing features, it's easy to finetune the result, e.g., adjust how heavy or light the movement feels. Link in the comment below.

English

132

AaltoMediaAI retweetledi

AK@_akhaliq·1 Nis

EasyControl just dropped on Hugging Face Adding Efficient and Flexible Control for Diffusion Transformer Free chatgpt style ghibli image generation with easy control

English

113

842

222.1K

AaltoMediaAI@aaltomediaai·3 Nis

Deepmind’s Dreamer V3 published in Nature. The first system to mine diamonds in Minecraft without human demonstrations. I’ve seen a plenty of buzz about it before but now it became mandatory reading, I guess. nature.com/articles/s4158…

English

AaltoMediaAI retweetledi

Anthropic@AnthropicAI·27 Mar

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

English

181

1.4K

8.4K

1.6M

AaltoMediaAI@aaltomediaai·31 Mar

A comprehensive repo of RL algorithms (plus MCTS) as notebooks with both theory and code. Great for learning as every algorithm is a single notebook with no architectural obfuscation. github.com/FareedKhan-dev…

English

AaltoMediaAI retweetledi

Poonam Soni@CodeByPoonam·16 Mar

NoteGen Paper: arxiv.org/abs/2502.18008 GitHub: github.com/ElectricAlexis…

English

2.9K

AaltoMediaAI@aaltomediaai·14 Mar

Really cool sports simulation work github.com/SMPLOlympics/S…

English

AaltoMediaAI retweetledi

Antti Oulasvirta@oulasvirta·11 Mar

New @ijhcs_journal paper out 📣 It presents findings on visual search in a large dataset (N=84) including over 900 real-world GUIs. The dataset is released sciencedirect.com/science/articl…

English

1.8K

AaltoMediaAI@aaltomediaai·4 Mar

A single-line modification to any momentum-based optimizer such as AdamW, with nice empirical results backed by theory arxiv.org/pdf/2411.16085

English

AaltoMediaAI retweetledi

Andrej Karpathy@karpathy·27 Şub

This is interesting as a first large diffusion-based LLM. Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different - it doesn't go left to right, but all at once. You start with noise and gradually denoise into a token stream. Most of the image / video generation AI tools actually work this way and use Diffusion, not Autoregression. It's only text (and sometimes audio!) that have resisted. So it's been a bit of a mystery to me and many others why, for some reason, text prefers Autoregression, but images/videos prefer Diffusion. This turns out to be a fairly deep rabbit hole that has to do with the distribution of information and noise and our own perception of them, in these domains. If you look close enough, a lot of interesting connections emerge between the two as well. All that to say that this model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!

Inception@_inception_ai

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

English

373

1.5K

11.5K

942.8K

AaltoMediaAI retweetledi

Pika@pika_labs·20 Şub

Today we’re launching Pikaswaps: replace anything in your videos using photos you upload, or scenes you describe. The results are unbelievably believable, and the possibilities are as unlimited as your imagination. Try it at Pika dot art

English

145

345

2.3K

407.4K

AaltoMediaAI retweetledi

Freddy Chávez Olmos@FreddyChavezO·20 Şub

Testing Pika’s new Modify Region tool “Pikaswaps”, which allows you to specify what you want to change in video footage and what you want to replace it with, using prompts, a paint brush and image references. This tool clearly shows how rapidly this tech is advancing. I’m grateful to be collaborating with Pika’s research scientist team as an early tester, helping to refine the tool and explore new use cases. There are sure to be plenty of exciting video-to-video releases this year, and this is definitely something that will keep improving. Stock footage by Action VFX.

English

609

75.9K

AaltoMediaAI retweetledi

Simo Ryu@cloneofsimo·17 Şub

This is really insane. They took all the bet and scaled up discrete diffusion model to llama-7B scale. IIRC nobody dared to do this at this scale but these madlads done it. They even fine-tuned it to be a dialogue model. This is really frontier-level shit that is genuinely new and novel, that Americans should be worried about, but I bet my ass media wont talk about it because its not click-fomo-baity metarial. Btw this also fixed the reversal curse, and probably a lot more capabilities out of the box like typical diffusion models, like prefix-suffix conditional, UL2 objectives (obviously), sigma-gpt-like sampling, differentiable guidance based sampling, etc etc.

English

102

1.3K

109.9K

Keşfet

@ijhcs_journal @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine