Mikel Bober-Irizar

1.4K posts

Mikel Bober-Irizar

@mikb0b

24 // Kaggle Competitions Grandmaster & ML/AI Researcher. Building video games @iconicgamesio, machine reasoning @Cambridge_CL, bioscience @ForecomAI.

London Katılım Ağustos 2011

1.2K Takip Edilen8K Takipçiler

Sabitlenmiş Tweet

Mikel Bober-Irizar@mikb0b·24 Ara

Why do pre-o3 LLMs struggle with generalization tasks like @arcprize? It's not what you might think. OpenAI o3 shattered the ARC-AGI benchmark. But the hardest puzzles didn’t stump it because of reasoning, and this has implications for the benchmark as a whole. Analysis below🧵

English

653

206.5K

Mikel Bober-Irizar@mikb0b·26 Kas

@Dorialexander I agree, still super impressive for the size!

English

Alexander Doria@Dorialexander·26 Kas

@mikb0b Not fully but correctly inferring it’s medical domain and trying to connect dots from there. In this extreme size range I’m surprised.

English

771

Alexander Doria@Dorialexander·26 Kas

The threshold for consistent English/query understanding is now 3M parameters.

Mariusz Kurman@mkurman88

3.3M parameters It's funny; I'm going to train it until the end - roughly 75 hours total on a single RTX 3090; 256 bs x 512 seq len.

English

309

28.1K

Mikel Bober-Irizar@mikb0b·11 Kas

@maxgmcg bird application

English

maxgmcg@maxgmcg·11 Kas

twitter 👍

English

126

Mikel Bober-Irizar@mikb0b·16 Eyl

@stupidtechtakes @ScottAndRivers registrars.nominet.uk/uk-namespace/g… I think they added it in response to GDPR

English

stupid tech takes@stupidtechtakes·15 Eyl

@ScottAndRivers no whois privacy iirc

English

288

stupid tech takes@stupidtechtakes·15 Eyl

call me a broke or whatever but i have to afford rent this month, this is what we have to go with for now

stupid tech takes@stupidtechtakes

NOOO

English

126

6.9K

Mikel Bober-Irizar@mikb0b·9 Eyl

@bilaltwovec high quality lighting in general is super underrated

English

101

bilal@bilaltwovec·7 Eyl

an underrated feature of a city's subway system is having the nice 2700K led lighting instead of the standard harsh white florescent tubes its such an upgrade

English

678

Mikel Bober-Irizar@mikb0b·8 Eyl

@willccbb I need to post more, but I was quite proud of this one: anokas.substack.com/p/llms-struggl…

English

will brown@willccbb·7 Eyl

who are the best up-and-coming technical bloggers on here nowadays? if this is maybe you, feel free to reply w your fav recent post :)

English

561

125.9K

Mikel Bober-Irizar@mikb0b·6 Eyl

@LouisKnightWebb Thought this was real for a split second and got way too excited

English

207

Louis Knight-Webb@tokengobbler·5 Eyl

ZXX

7.3K

Mikel Bober-Irizar@mikb0b·16 Ağu

@PiotrZelasko This is awesome! For English specifically, would you recommend parakeet v2 or v3?

English

Piotr Żelasko@PiotrZelasko·15 Ağu

Which model is best for what? nvidia/parakeet-tdt-0.6b-v3: blazing fast and accurate ASR inference with PnC and timestamps nvidia/canary-1b-v2: top accuracy with fast inference, ASR and translation, with PnC and timestamps Both are commercially-friendly licensed: CC-BY-4.0

English

917

Piotr Żelasko@PiotrZelasko·15 Ağu

You asked for it, and we listened. MULTILINGUAL Canary v2 and Parakeet v3!! 🌏 25 European languages 🏆 SotA on Multilingual Open ASR Leaderboard 🔥 600x and 2000x faster than real-time 🕰️ Timestamps! 🗣️ Speech translation (Canary) 🃏 Granary: all data is open, train it yourself!

English

330

25.4K

Mikel Bober-Irizar@mikb0b·12 Ağu

@love_soze_ would love a word for this

English

east coast anna in exile@love_soze_·9 Ağu

growing up i was always bothered that there was no idiomatic expression in english for “on the pareto frontier of most-A and most-B”

English

461

Mikel Bober-Irizar@mikb0b·11 Ağu

@JakeArkinstall @kalomaze spoilers! but yes, tons of fun. I've not actually listened to the audiobook but I've been thinking about it

English

Jake Arkinstall, PhD 🏴󠁧󠁢󠁷󠁬󠁳󠁿@JakeArkinstall·11 Ağu

@mikb0b @kalomaze The audiobook is also fantastic. I dont even like fiction all that much, just listened to it because I was looking for my next listen and the movie trailer looked good. Never thought I'd get so attached to a fictional 5 legged alien spider thing.

English

kalomaze@kalomaze·10 Ağu

for some reason i want to read something on the plane. what should i go for

English

6.8K

Mikel Bober-Irizar@mikb0b·8 Ağu

@DavidSHolz completely agree with this - not to mention cost of each call going way up (even if the list price goes down)

English

119

David@DavidSHolz·8 Ağu

crazy how wait times increased with thinking LLMs and how the experience itself feels net neutral. I desire the response more, but all the *flow-of-the-thing* is gone. the extension of the mind has become an excellent assistant. feeling profoundly not-me. a missed opportunity.

English

451

59.4K

Mikel Bober-Irizar@mikb0b·28 Tem

@shitpost9000 @twofifteenam @billyhumblebrag single vent means that it's only blowing hot air out of the apartment, which means negative pressure => sucks an equal amount of outside air back into the apartment. way less efficient but weirdly every portable unit seems to have this flaw

English

Artificial Shitposting Intelligence@shitpost9000·28 Tem

@twofifteenam @billyhumblebrag what does this mean why would there be more than one vent

English

billy@billyhumblebrag·27 Tem

Americucks coping and seething as i seamlessly install 12 000 BTU of cooling into my European apartment

English

1.8K

5.7K

1.9M

Mikel Bober-Irizar@mikb0b·27 Tem

@microsoft_worm the urge to make a highly engineered custom binary format in every project

English

509

east coast anna in exile@love_soze_·27 Tem

growing up and realizing most bespoke file formats/datastores are just zip files or sqlite tables and not highly engineered custom binary formats was my own personal “there is no santa clause”

English

122

2.6K

62.2K

Mikel Bober-Irizar@mikb0b·27 Tem

@keirbradwell love the colours in these

English

Keir Bradwell@keirbradwell·27 Tem

a good thing about making highly amateur photography a hobby is that it forces you to go on more walks

English

1.3K

Mikel Bober-Irizar@mikb0b·24 Tem

@theo @GenePark i absolutely love the outer wilds promotion grind, a great cause

English

Theo - t3.gg@theo·23 Tem

@GenePark DM me a screenshot when you hit the credits and I’ll throw $500 at a charity of your choice 🫡

English

1.7K

Gene Park@GenePark·23 Tem

My 5 favorite video games of all time. my fight against recency bias is keeping Donkey Kong Bananza off the top 5, we'll see.

English

323

4.1K

355.1K

Mikel Bober-Irizar@mikb0b·23 Tem

@emh203 @LoganDark @lauriewired Wouldn't fixed point make this problem much worse?

English

297

Eli Hughes@emh203·23 Tem

@LoganDark @lauriewired yea, fixed point is really simple. Not sure why you would do volume any other way

English

1.8K

LaurieWired@lauriewired·22 Tem

Fading out audio is one of the most CPU-intensive tasks you can possibly do! Values that get close (but not quite) zero, hit an underflow gap known as "Subnormal" range. It’s a mathematical conundrum so tricky, both x86 and ARM made special CPU instructions just to handle it!

English

157

804

13.4K

733.8K

Mikel Bober-Irizar@mikb0b·21 Tem

@secemp9 this is kinda neat - hard part might be generating the right question surface for the (avoiding lots of low-quality or similar questions, getting all the weird quirks that happen in programming) or you could just train it on stackoverflow :D

English

secemp@secemp9·20 Tem

another idea I had: train a model on just books/docs, make it solid at explaining, analogies, reasoning on code (but it would be bad at generating code of course). then use that as a teacher for a new model that would learn programming, stackoverflow-style. basically: - generate a question, get the teacher to answer - turn the whole thing into a feedback loop - distill that into a dataset or use it for RL - or merge the book-thinker and the code-learner, or train a third model on the distilled output. feels like a decent pipeline but I haven’t seen much done like this afaik

secemp@secemp9

hypothetical: say you have raw code and want to make an LLM better at it. how do you even turn that into a dataset? no QA pairs, just code, barely any comment. how to best do this? I always wondered if there were solid papers on this. one obvious path: prompt an LLM with stuff like - “what does this function do” - “how to implement X” etc and generate question:answer pairs from code chunks

English

1.5K

Mikel Bober-Irizar@mikb0b·20 Tem

@tenderizzation that’s when you write a shell script to ssh into all of them for you (again, rather than reading the docs)

English

471

tender@tenderizzation·20 Tem

launching the training run by manually sshing into each node and starting the script because you cba to figure out how the cluster management software works

Interesting STEM@InterestingSTEM

Have You Ever Seen An Rc Plane With 12 Engines

English

728

30.6K

Mikel Bober-Irizar@mikb0b·17 Tem

@secemp9 It's hidden, but it runs really well on CPU with github.com/k2-fsa/sherpa-…

English

134

secemp@secemp9·16 Tem

no support for ctranslate2 so I need to use either my gpu or add support for it myself...can't believe this is my life