Mikkel Frimer-Rasmussen

839 posts

Mikkel Frimer-Rasmussen

Mikkel Frimer-Rasmussen

@FriMikAnik

Katılım Şubat 2025
190 Takip Edilen27 Takipçiler
Mikkel Frimer-Rasmussen
@pgasawa The benchmark should/could contain changes in fact quality over time (information gets outdated) and sources that do not concur and requires inference
English
0
0
0
14
Parth Asawa
Parth Asawa@pgasawa·
Most evals ask the question: “how capable is a static trained model at this distribution of tasks?” We think benchmarks like ARC-AGI take steps in the right direction to force models to learn new patterns online. With Continual Learning Bench, we ask a different question: can we measure how well learning systems adapt and improve by leveraging what they learn in real world, stateful environments? (2/n)
Parth Asawa tweet media
English
3
4
37
3.4K
Parth Asawa
Parth Asawa@pgasawa·
Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings. Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened. But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)
Parth Asawa tweet media
English
21
90
660
145.1K
Ammaar Reshi
Ammaar Reshi@ammaar·
Had to bring MS Office 2000's Einstein, he also yawns if you don't give him a task 😂 Codex Pets are fun :)
Ammaar Reshi tweet media
English
8
2
79
9.9K
👩‍💻 Paige Bailey
👩‍💻 Paige Bailey@DynamicWebPaige·
🙏 grateful to models for reminding me to be nice and to be patient
👩‍💻 Paige Bailey tweet media
English
5
0
18
1.6K
Kath Korevec
Kath Korevec@simpsoka·
Pets are cool, but have you read the docs?
GIF
English
13
7
149
9.2K
Miles Dyson
Miles Dyson@menrva33·
True. But the old problems are the same. Distribution/ marketing / runway / VC I’ve built this with google ai studio (Prototyping) and lovable alone. 👇 Normally the beta would have cost approx. 750k € if using external IT agency. I’ve paid around 840€. x.com/menrva33/statu…
English
1
0
0
217
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
AI is going to radically reduce the cost to run a marketplace
English
165
62
1.5K
94.4K
SHAHNAB AHMED
SHAHNAB AHMED@AhmedShahnab·
Bored of boring flat heatmaps? Introducing Topographical Explorer that turns any country into a real-time tactile landscape Search any country → watch real elevation data come alive as thousands of satisfying 3D blocks. Built entirely client-side with: - React Three Fiber + Three.js - AWS Terrain Tiles + OpenStreetMap - Real-time boundary @threejs @reactthreefiber #DataVisualization #Geospatial #CreativeCoding #BuildInPublic
English
11
85
626
519.7K
Carmelyne Thompson
Carmelyne Thompson@carmelyne·
OpenAI saw a behavior to fix. I see a behavior worth understanding. A word can start as a training artifact and still become useful because the meaning transfers. If you say 'gobliny' to another dev and they immediately know what you mean, that's not pareidolia. That's working compression. Full post + all 9 Compression Creature Cards: carmelyne.com/what-if-the-go…
English
1
0
0
11
Carmelyne Thompson
Carmelyne Thompson@carmelyne·
OpenAI just explained where the goblins came from. Reward artifact. Style tic. A word that learned it was welcome. I buy the mechanism. But I think they missed something. What if some of these creature words aren't just tics... what if they're doing compression? One word carrying a whole cluster of system behaviors. Like how a crooked wand = the entire Harry Potter universe. I mapped 9 of them. - 🟢 GOBLIN — Chaotic system energy Small cause, oversized effects. Janky but functional. Mischievous, not malicious. OpenAI called this a reward artifact. But look at everything this one word carries. "The loop goblin found the snacks."
Carmelyne Thompson tweet media
English
1
0
0
26
Google AI Developers
Google AI Developers@googleaidevs·
Gemma 3 understands images, text, and video - all at once. In this deep dive, learn how the model integrates multiple sources and performs a range of tasks from answering questions about documents to describing visual scenes in detail. Explore why multimodality matters.
English
28
100
843
41.2K
Mikkel Frimer-Rasmussen
@patloeber Next step: give users access to relevant sources and let them choose their own output format. Making slides for other people feels so irrelevant and manipulative
English
0
0
0
9
OpenAI Developers
OpenAI Developers@OpenAIDevs·
Add Codex seats with a $0 seat fee for a limited time. Through the end of June, eligible ChatGPT Business and Enterprise customers can add Codex-only seats, making it easier to give more developers access to Codex in their day-to-day workflows.
English
75
79
2.6K
387.1K
Mikkel Frimer-Rasmussen
@fchollet So a lot of people will need to have a lot of jobs to keep society running? Something must be missing
English
0
0
0
2
François Chollet
François Chollet@fchollet·
AI automates tasks, not jobs, and when a task gets cheaper, demand for the job grows. AI cannot automate jobs end-to-end because it lacks autonomy and cannot operate without supervision. There is still zero job from 2022 that can be performed end-to-end by AI, not even translator or customer support associate.
James Pethokoukis ⏩️⤴️@JimPethokoukis

"A decade ago, AI was supposed to replace radiologists. Today, radiologists make more than $500,000 per year, and their employment continues to grow, see chart below. Reading scans is a task, not a job, and when the task gets cheaper, demand for the job grows."

English
141
232
1.5K
138.2K
Firebase
Firebase@Firebase·
Google AI Studio now proactively detects when your natural language prompt requires a backend and offer to set up Firestore and Firebase Authentication. Unlock these capabilities with zero complex coding: ☁️ Save data to the cloud 🔄 Sync across devices in real-time 📶 Support offline capabilities 🔐 Manage user identities All with a free tier. Get started → goo.gle/4eSixez
Firebase tweet media
English
7
19
159
7.9K
Mikkel Frimer-Rasmussen
@ivangdavila I built a new Skill today for improving vibe coded systems into safe architectures today. It's a part of the migration from vibe to spec to production
English
0
0
1
11
Ivan Davila
Ivan Davila@ivangdavila·
One of the easiest ways to use Plan Mode in Codex: After explaining what you want, just say: “Ask me anything you need to make sure we’re fully aligned before you proceed” It forces clarity upfront and avoids bad assumptions later
English
2
0
10
958
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Google is the best company in the world
Logan Kilpatrick tweet media
English
152
133
2.8K
237.5K