AaltoMediaAI

1.4K posts

AaltoMediaAI

AaltoMediaAI

@aaltomediaai

@AaltoUniversity’s AI&ML for media, art & design course. This is a public backlog of material for updating the course.

Katılım Aralık 2019
168 Takip Edilen250 Takipçiler
AaltoMediaAI retweetledi
Nam Hee Gordon Kim
Nam Hee Gordon Kim@NamHeeGordonKim·
Can we build AI players that are not just great at the game, but play *like people*? New #Eurographics 2026 paper: Robo-Saber: Generating and Simulating Virtual Reality Players (Links in the reply!)
English
2
2
1
70
AaltoMediaAI retweetledi
Paul Couvert
Paul Couvert@itsPaulAi·
Microsoft has revolutionized the automation game You can automate any task just by recording your screen and explaining it to the AI. Copilot will then analyze the mouse movements, audio... And build the automation flow all by itself! (Way easier than n8n or Make.) 00:00 - Requirements 01:02 - Record with Copilot 01:54 - Recording Demo 05:03 - Flow adjustments 06:01 - Automation test 07:30 - Results
English
59
246
1.9K
201.3K
AaltoMediaAI retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
It’s a hefty 206-page research paper, and the findings are concerning. "LLM users consistently underperformed at neural, linguistic, and behavioral levels" This study finds LLM dependence weakens the writer’s own neural and linguistic fingerprints. 🤔🤔 Relying only on EEG, text mining, and a cross-over session, the authors show that keeping some AI-free practice time protects memory circuits and encourages richer language even when a tool is later reintroduced.
Rohan Paul tweet media
English
308
2.4K
11.5K
2.3M
AaltoMediaAI retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
At #GoogleIO, we shared how decades of AI research have now become reality.  From a total reimagining of Search to Agent Mode, Veo 3 and more, Gemini season will be the most exciting era of AI yet.  Some highlights 🧵
Sundar Pichai tweet media
English
262
1.5K
14.1K
1.8M
AaltoMediaAI
AaltoMediaAI@aaltomediaai·
New open (Apache 2.0-licensed) music generation model. Model weights and LoRA finetuning code available. Based on Reddit comments, this is on par with Suno 3.5, although not the very latest Suno. ace-step.github.io
English
0
0
0
41
AaltoMediaAI retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Noticing myself adopting a certain rhythm in AI-assisted coding (i.e. code I actually and professionally care about, contrast to vibe code). 1. Stuff everything relevant into context (this can take a while in big projects. If the project is small enough just stuff everything e.g. `files-to-prompt . -e ts -e tsx -e css -e md --cxml --ignore node_modules -o prompt.xml`) 2. Describe the next single, concrete incremental change we're trying to implement. Don't ask for code, ask for a few high-level approaches, pros/cons. There's almost always a few ways to do thing and the LLM's judgement is not always great. Optionally make concrete. 3. Pick one approach, ask for first draft code. 4. Review / learning phase: (Manually...) pull up all the API docs in a side browser of functions I haven't called before or I am less familiar with, ask for explanations, clarifications, changes, wind back and try a different approach. 6. Test. 7. Git commit. Ask for suggestions on what we could implement next. Repeat. Something like this feels more along the lines of the inner loop of AI-assisted development. The emphasis is on keeping a very tight leash on this new over-eager junior intern savant with encyclopedic knowledge of software, but who also bullshits you all the time, has an over-abundance of courage and shows little to no taste for good code. And emphasis on being slow, defensive, careful, paranoid, and on always taking the inline learning opportunity, not delegating. Many of these stages are clunky and manual and aren't made explicit or super well supported yet in existing tools. We're still very early and so much can still be done on the UI/UX of AI assisted coding.
English
456
1.1K
12.3K
1.2M
AaltoMediaAI
AaltoMediaAI@aaltomediaai·
Cascadeur animation editor adds AI-driven inbetweening. This looks promising, as with their physics-based editing features, it's easy to finetune the result, e.g., adjust how heavy or light the movement feels. Link in the comment below.
AaltoMediaAI tweet media
English
1
0
2
132
AaltoMediaAI retweetledi
AK
AK@_akhaliq·
EasyControl just dropped on Hugging Face Adding Efficient and Flexible Control for Diffusion Transformer Free chatgpt style ghibli image generation with easy control
English
32
113
842
222.1K
AaltoMediaAI
AaltoMediaAI@aaltomediaai·
Deepmind’s Dreamer V3 published in Nature. The first system to mine diamonds in Minecraft without human demonstrations. I’ve seen a plenty of buzz about it before but now it became mandatory reading, I guess. nature.com/articles/s4158…
English
0
0
0
39
AaltoMediaAI retweetledi
Anthropic
Anthropic@AnthropicAI·
New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.
English
181
1.4K
8.4K
1.6M
AaltoMediaAI
AaltoMediaAI@aaltomediaai·
A comprehensive repo of RL algorithms (plus MCTS) as notebooks with both theory and code. Great for learning as every algorithm is a single notebook with no architectural obfuscation. github.com/FareedKhan-dev…
English
0
0
0
31
AaltoMediaAI
AaltoMediaAI@aaltomediaai·
A single-line modification to any momentum-based optimizer such as AdamW, with nice empirical results backed by theory arxiv.org/pdf/2411.16085
English
0
0
0
25
AaltoMediaAI retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
This is interesting as a first large diffusion-based LLM. Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different - it doesn't go left to right, but all at once. You start with noise and gradually denoise into a token stream. Most of the image / video generation AI tools actually work this way and use Diffusion, not Autoregression. It's only text (and sometimes audio!) that have resisted. So it's been a bit of a mystery to me and many others why, for some reason, text prefers Autoregression, but images/videos prefer Diffusion. This turns out to be a fairly deep rabbit hole that has to do with the distribution of information and noise and our own perception of them, in these domains. If you look close enough, a lot of interesting connections emerge between the two as well. All that to say that this model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!
Inception@_inception_ai

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

English
373
1.5K
11.5K
942.8K
AaltoMediaAI retweetledi
Pika
Pika@pika_labs·
Today we’re launching Pikaswaps: replace anything in your videos using photos you upload, or scenes you describe. The results are unbelievably believable, and the possibilities are as unlimited as your imagination. Try it at Pika dot art
English
145
345
2.3K
407.4K
AaltoMediaAI retweetledi
Freddy Chávez Olmos
Freddy Chávez Olmos@FreddyChavezO·
Testing Pika’s new Modify Region tool “Pikaswaps”, which allows you to specify what you want to change in video footage and what you want to replace it with, using prompts, a paint brush and image references. This tool clearly shows how rapidly this tech is advancing. I’m grateful to be collaborating with Pika’s research scientist team as an early tester, helping to refine the tool and explore new use cases. There are sure to be plenty of exciting video-to-video releases this year, and this is definitely something that will keep improving. Stock footage by Action VFX.
English
22
89
609
75.9K
AaltoMediaAI retweetledi
Simo Ryu
Simo Ryu@cloneofsimo·
This is really insane. They took all the bet and scaled up discrete diffusion model to llama-7B scale. IIRC nobody dared to do this at this scale but these madlads done it. They even fine-tuned it to be a dialogue model. This is really frontier-level shit that is genuinely new and novel, that Americans should be worried about, but I bet my ass media wont talk about it because its not click-fomo-baity metarial. Btw this also fixed the reversal curse, and probably a lot more capabilities out of the box like typical diffusion models, like prefix-suffix conditional, UL2 objectives (obviously), sigma-gpt-like sampling, differentiable guidance based sampling, etc etc.
Simo Ryu tweet mediaSimo Ryu tweet media
English
17
102
1.3K
109.9K