Luke Salamone

518 posts

Luke Salamone

@LukeASalamone

Machine learning engineer. In the words of a wise man, "I'm nice at ping pong"

Bay area Beigetreten Mayıs 2016

469 Folgt281 Follower

Angehefteter Tweet

Luke Salamone@LukeASalamone·26 Nis

I have discovered a truly remarkable proof of P=NP which this tweet is too small to contain.

English

Luke Salamone@LukeASalamone·8 Şub

@y0b1byte In section 2.3.2 they said that cold started RL still had language mixing problems. They had to specifically introduce a language matching reward to mitigate this.

English

yobibyte@y0b1byte·8 Şub

This doesn't make too much sense to me given that R1-Zero had formatting rewards. Why does pure RL leads to language mixing/poor readability, but coldstated R1 does not? I don't think this is convincingly explained in the paper itself, and the actual reason for this is still unknown. Technical contributions aside, I found the speculations in the paper to be quite weak, and the 'aha' moment passage is just a joke.

English

2.3K

Luke Salamone@LukeASalamone·5 Ağu

I have never been able to generate an accurate chess board in any position. This may be an “AI hard” task: solving it probably requires a breakthrough in reasoning @GaryMarcus

AK@_akhaliq

huggingface.co/spaces/black-f…

English

94.3K

Luke Salamone retweetet

Noam Brown@polynoamial·20 Haz

Frontier models like GPT-4o (and now Claude 3.5 Sonnet) may be at the level of a "Smart High Schooler" in some respects, but they still struggle on basic tasks like tic-tac-toe. There was hope that native multimodal training would help but that hasn't been the case.

English

489

101.9K

Luke Salamone retweetet

Yann LeCun@ylecun·30 May

A short post on the best architectures for real-time image and video processing. TL;DR: use convolutions with stride or pooling at the low levels, and stick self-attention circuits at higher levels, where feature vectors represent objects. PS: ready to bet that Tesla FSD uses convolutions (or perhaps more complex *local* operators) at the low levels, combined with more global circuits at higher levels (perhaps using self-attention). Transformers on low-level patch embeddings are a complete waste of electrons.

Yann LeCun@ylecun

I'm not saying ViTs are not practical (we use them). I'm saying they are way too slow and inefficient to be practical for real-time processing of high-resolution images and video. [Also, @sainingxie's work on ConvNext has shown that they are just as good as ViTs if you do it right. But whatever]. You need at least a few Conv layers with pooling and stride before you stick self-attention circuits. Self-attention is equivariant to permutations, which is completely nonsensical for low-level image/video processing (having a single strided conv at the front-end to 'patchify' also doesn't make sense). Global attention is also nonsensical (and not scalable), since correlations are highly local in images and video. At high level, once features represent objects, then it makes sense to use self-attention circuits: what matters is the relationships and interactions between objects, not their positions. This type of hybrid architecture was inaugurated by the DETR system by @alcinos26 and collaborators. as I've said since the DETR work, my favorite family of architectures is conv/stride/pooling at the lower levels, and self-attention circuits at the higher levels.

English

110

1.4K

748.4K

Luke Salamone retweetet

Aran Komatsuzaki@arankomatsuzaki·3 Nis

Octopus v2: On-device language model for super agent Presents a new method that empowers an on-device 2B model to outperform GPT-4 in both accuracy and latency, and decrease the context length by 95% arxiv.org/abs/2404.01744

English

244

66.8K

Luke Salamone retweetet

Robert Komaniecki@Komaniecki_R·19 Mar

The Trautonium. Invented early 1930s. Just listen to this thing.

English

498

6.4K

34.3K

2.1M

Luke Salamone retweetet

AK@_akhaliq·26 Şub

Google presents Genie Generative Interactive Environments introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

English

500

2.3K

684.1K

Luke Salamone retweetet

AK@_akhaliq·8 Şub

Google Deepmind presents Grandmaster-Level Chess Without Search paper page: huggingface.co/papers/2402.04… largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.

English

259

1.4K

266.4K

Luke Salamone retweetet

𝕭𝖏ø𝖗𝖓 𝕾𝖙𝖆𝖆𝖑@_nonfigurativ_·22 Kas

Entangled #fxhash

English

1.9K

9.3K

64.1K

10.2M

Luke Salamone retweetet

MAA@maanow·27 Eyl

Three logicians walk into a bar. The bartender asks: 'Does everyone want a drink?' The first logician says: 'I don't know.' The second logician says: 'I don't know.' The third logician says: 'Yes.'

English

650

7.1K

508.8K

Luke Salamone retweetet

LaurieWired@lauriewired·11 Eyl

The SHA256 for this sentence begins with: one, eight, two, a, seven, c and nine.

English

298

2.3K

419.4K

Luke Salamone retweetet

Luke Gessler@LukeGessler·12 Tem

this paper's nuts. for sentence classification on out-of-domain datasets, all neural (Transformer or not) approaches lose to good old kNN on representations generated by.... gzip aclanthology.org/2023.findings-…

English

122

818

4.7K

3.4M

Luke Salamone retweetet

AK@_akhaliq·31 Oca

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models abs: arxiv.org/abs/2301.12503 project page: audioldm.github.io

English

346

79.9K

Luke Salamone@LukeASalamone·31 Oca

@VictorButoi The biggest red flag is claiming to measure perplexity without access to the model logits. It’s borderline fraudulent.

English

Victor Butoi@ion_barrel·16 Oca

Things like GPTZero are scary. I'm sure the creator didn't have the intention, but the fact that it's marketed as "a solution to detecting AI written responses" even though there's no evidence to show it works consistently and is nevertheless being EMPLOYED by schools, is crazy.

English

594

89.7K

Luke Salamone@LukeASalamone·16 Oca

@anmarasovic @soldni For my blog search, I tokenize searchable text into trigrams before ranking with BM25. It’s snappy because it’s all happening in the browser and gives intuitive results. lukesalamone.github.io/posts/rolling-…

English

134

Ana Marasović@anmarasovic·16 Oca

Looking for examples [for teaching] where converting text to a set of n-grams or tf-idf features is not worse than using embeddings, or it is the only thing you could do given the scale of corpora and compute you have 🙏

English

29.3K

Luke Salamone retweetet

Mosquito Capital@MosquitoCapital·18 Kas

I've seen a lot of people asking "why does everyone think Twitter is doomed?" As an SRE and sysadmin with 10+ years of industry experience, I wanted to write up a few scenarios that are real threats to the integrity of the bird site over the coming weeks.

English

1.1K

14.4K

56.6K

Luke Salamone retweetet

NASA@NASA·12 Tem

It's here–the deepest, sharpest infrared view of the universe to date: Webb's First Deep Field. Previewed by @POTUS on July 11, it shows galaxies once invisible to us. The full set of @NASAWebb's first full-color images & data will be revealed July 12: nasa.gov/webbfirstimages

English

7.6K

124.5K

536.7K

Luke Salamone retweetet

Armin Ronacher ⇌@mitsuhiko·2 Tem

I don't want to say anything but that's not the right license Mr Copilot.

GIF

English

4.3K

Luke Salamone retweetet

Giannis Daras@giannis_daras·31 May

DALLE-2 has a secret language. "Apoploe vesrreaitais" means birds. "Contarra ccetnxniams luryca tanniounons" means bugs or pests. The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs. A thread (1/n)🧵

English

185

2.2K

8.4K

Luke Salamone retweetet

Alex Tabarrok@ATabarrok·5 Nis

Freakishly good. Better than many humans.

English

148

1.2K

Entdecken

@y0b1byte @GaryMarcus @anmarasovic @soldni @POTUS @NASAWebb @elonmusk @BarackObama