Sam Gijsen

69 posts

Sam Gijsen

@SamCJG

ML @ Tübingen AI Center, Hertie AI

Tham gia Ağustos 2024

207 Đang theo dõi33 Người theo dõi

Sam Gijsen@SamCJG·15h

@_ueaj Actually pretty relevant for one of my projects; dropout you mean good old attn+ffn dropout?

English

ueaj@_ueaj·1d

It would be very fun to apply all the data augmentation/data effeciency techniques we've learned over the years but weren't applicable to LLMs due to the abundance of data + all the new ones. dropout, layer looping, massive overparameterization, muon, optimal hparams, etc.

David Duvenaud@DavidDuvenaud

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

English

4.1K

Sam Gijsen@SamCJG·2d

@drmichaellevin @davidasinclair @ScienceMagazine I thought it was predominantly a telomere story?

English

Michael Levin@drmichaellevin·2d

The other part of the heart/cancer story is control of resting potential. Cardiac cells have a very strong control of their bioelectric state, and are extremely resistant to the kind of persistent depolarization necessary for the cancer phenotype. pmc.ncbi.nlm.nih.gov/articles/PMC42… thoughtforms.life/cancer-and-cel… Neural cells too, but they can be forced into mitosis etc. by persistent depolarizarion: sciencemag.org/cgi/reprint/19…

English

235

8.3K

David Sinclair@davidasinclair·2d

Wondered why we don’t hear about heart cancer? Contraction-sensing Nesprin-2 protein is discovered to prevent heart cancer By causing cells to pulse (or adding in Nespirin) we might be able to treat cancer in other organs 👏 @ScienceMagazine

English

174

1.4K

63.3K

Sam Gijsen@SamCJG·2d

@vikhyatk @BelieveOnJesus I’m in neither of these categories but wouldn’t both like language models?

English

vik@vikhyatk·2d

@BelieveOnJesus there's a big divide between people who became programmers because they wanted to make video games, and people who joined the industry because someone told them it would make them money

English

2.2K

vik@vikhyatk·2d

"AI is going to wipe out programming jobs." Then why is every programmer I know working 20 hours a day ever since they started using AI? I thought this technology was going to free us from the toil of labor.

English

164

107

2.7K

197.6K

Sam Gijsen@SamCJG·3d

@teortaxesTex There’s a whole cohort of Elon-bucks farming accounts which aim to get these kind of quote tweets via sycophancy

English

1.1K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·3d

Not bad, my ass Elon is somewhat in denial of xAI's current position I think

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

Elon Musk@elonmusk

Not bad

English

200

24.2K

Sam Gijsen@SamCJG·5d

@FEijkelboom

QME

994

Floor Eijkelboom (@ICLR 🇧🇷)@FEijkelboom·5d

Happening now :D #iclr2026 - Hall 3, poster 722 Come talk about variational flow matching over manifolds for proteins and materials

English

3.1K

Sam Gijsen@SamCJG·5d

@vikhyatk They need to introduce daily plans so we can keep up with these recommendations

English

192

vik@vikhyatk·5d

if you’re still using opus after 5.5…

vik@vikhyatk

if you’re still using codex after opus 4.7 release. ngmi

English

6.1K

Sam Gijsen@SamCJG·21 Nis

Come check out our poster on training brain foundation models using self-distillation at @iclr_conf Friday 10:30AM!

English

Sam Gijsen@SamCJG·19 Nis

@yacineMTB Nice val loss bro

English

122

kache@yacineMTB·19 Nis

ZXX

4.7K

Sam Gijsen@SamCJG·17 Nis

@kalomaze @celestepoasts Why is the bs so critical here?

English

kalomaze@kalomaze·17 Nis

@celestepoasts doing this at a bs approaching pretraining CLT sounds absolutely brutal at small scale. i have low faith both in the concept of "doing it on a model smaller than ~32b" and "doing it without per step batch sizes in the thousands" and ofc you have to be VERY rigorous wrt goodhart

English

250

Celeste@celestepoasts·17 Nis

wonder if pretraining scaling laws are holding at all, seems like something only a frontier lab would find out if they don't, and they obviously wouldn't publicize this

English

6.2K

Sam Gijsen@SamCJG·16 Nis

@kuberdenis Go to the lakes south-west, go for walks in the forests there or ferry tour over the lakes or visit Potsdam

English

Denislav Gavrilov@kuberdenis·16 Nis

what should i do in Berlin? i am stuck here for two more days

English

2.6K

Sam Gijsen@SamCJG·16 Nis

@sedielem Never got a semanticist setup to work with neural data, wondering if there's something about natural image statistics thats particularly favourable to this kind of approach.

English

136

Sander Dieleman@sedielem·16 Nis

FlexTok/Semanticist provided an elegant recipe to learn semantically coarse-to-fine sequence representations of images. This works for video as well: preserve the temporal axis, replace the spatial axes with a semantic coarse-to-fine axis. Promising for long video generation!

Andrei Atanov@andrew_atanov

Are all videos worth the same number of tokens? Whether rich in motion or visually minimal, standard 3D-grid tokenizers treat them equally. We present VideoFlexTok, which represents videos using a flexible-length, coarse-to-fine sequence of tokens. Page: videoflextok.epfl.ch Demo: huggingface.co/spaces/EPFL-VI… Paper: arxiv.org/abs/2604.12887 1/n

English

9.9K

Sam Gijsen@SamCJG·11 Nis

@vikhyatk @yacineMTB Distilling the errors

English

180

vik@vikhyatk·11 Nis

distilling from 1B to 400M is going well

English

112

16.5K

Sam Gijsen@SamCJG·10 Nis

@vikhyatk You’re missing out

English

vik@vikhyatk·10 Nis

haven't watched youtube shorts ever since the training cluster came online

English

1.9K

Sam Gijsen@SamCJG·10 Nis

@robustus 50 cents to the miners, 100 bucks to nvidia. Jensen wins again

English

Dan@robustus·9 Nis

This is pretty cool (tho way beyond my knowledge base). I would not have imagined qsafe txns possible without a new key type, but here we go apparently. Obv this isn't practical today (~$100 in GPU time needed to construct the tx), but it at least opens up the possibility space.

Avihu Levy ✨🐺@avihu28

Quantum-Safe Bitcoin Transactions Without Softforks github.com/avihu28/Quantu…

English

8.6K

Sam Gijsen@SamCJG·9 Nis

@francoisfleuret bottom-to-top, right? ...right?

English

791

François Fleuret@francoisfleuret·9 Nis

ZXX

340

62.8K

Sam Gijsen@SamCJG·9 Nis

@dearmadisonblue @sporadica @ClementDelangue but anthropic was for-looping over individual files as well

English

125

madison@dearmadisonblue·9 Nis

@sporadica @ClementDelangue yeah, once you have a spotlight on where to look you've solved most of the problem

English

2.2K

clem 🤗@ClementDelangue·8 Nis

"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." aisle.com/blog/ai-cybers…

English

111

344

2.4K

723K

Sam Gijsen@SamCJG·8 Nis

@_ueaj Palantir spoke of this

English

1.9K

ueaj@_ueaj·7 Nis

Worryingly, ontologically, all the zero-days were already in the training data

English

1.1K

72.5K

Sam Gijsen@SamCJG·7 Nis

@thetrocro @LynAldenContact They only talk about exploits that have already been fixed, but currently bitcoin isnt mentioned in the 244 page model card

English

432

Troy Cross@thetrocro·7 Nis

@LynAldenContact Did they find a bug in bitcoin?

English

3.5K

Lyn Alden@LynAldenContact·7 Nis

Someone should write a book about this.

Haseeb ＞|＜@hosseeb

This is terrifying. @AnthropicAI 's new unreleased Mythos model is so good at hacking, it found bugs in "every major operating system and web browser." 83.1% were exploited on first attempt. This thing is like COVID but for software. Actually apocalyptic in the wrong hands.

English

1.1K

205.2K

Sam Gijsen@SamCJG·7 Nis

Reading claude mythos model card and slotted in this activity for tomorrow morning

English

Sam Gijsen@SamCJG·4 Nis

@badlogicgames @thsottiaux Can’t wait for berlins Alexnet in about 6 years then

English

Mario Zechner@badlogicgames·3 Nis

@thsottiaux oh no :( but also: it's definitely not europe! we're like 20 AI years behind.

English

2.5K

Mario Zechner@badlogicgames·3 Nis

i'm a CEST european. you can't hurt me.

Tibo@thsottiaux

With Codex the there is quite the gulf in load between peak and off-peak times, and we would like to achieve more of a smoother traffic pattern as that would be a more optimal use of our compute. We have ideas, but curious what you all think we should do? Would more usage during off-peak and surge multiplier during peak times make sense?

English

189

35.3K

Khám phá

@_ueaj @drmichaellevin @davidasinclair @ScienceMagazine @vikhyatk @BelieveOnJesus @teortaxesTex @FEijkelboom