Luke Salamone

518 posts

Luke Salamone banner
Luke Salamone

Luke Salamone

@LukeASalamone

Machine learning engineer. In the words of a wise man, "I'm nice at ping pong"

Bay area Beigetreten Mayıs 2016
469 Folgt281 Follower
Angehefteter Tweet
Luke Salamone
Luke Salamone@LukeASalamone·
I have discovered a truly remarkable proof of P=NP which this tweet is too small to contain.
English
2
0
3
0
Luke Salamone
Luke Salamone@LukeASalamone·
@y0b1byte In section 2.3.2 they said that cold started RL still had language mixing problems. They had to specifically introduce a language matching reward to mitigate this.
English
0
0
0
25
yobibyte
yobibyte@y0b1byte·
This doesn't make too much sense to me given that R1-Zero had formatting rewards. Why does pure RL leads to language mixing/poor readability, but coldstated R1 does not? I don't think this is convincingly explained in the paper itself, and the actual reason for this is still unknown. Technical contributions aside, I found the speculations in the paper to be quite weak, and the 'aha' moment passage is just a joke.
yobibyte tweet media
English
3
1
11
2.3K
Luke Salamone retweetet
Noam Brown
Noam Brown@polynoamial·
Frontier models like GPT-4o (and now Claude 3.5 Sonnet) may be at the level of a "Smart High Schooler" in some respects, but they still struggle on basic tasks like tic-tac-toe. There was hope that native multimodal training would help but that hasn't been the case.
Noam Brown tweet mediaNoam Brown tweet mediaNoam Brown tweet mediaNoam Brown tweet media
English
40
49
489
101.9K
Luke Salamone retweetet
Yann LeCun
Yann LeCun@ylecun·
A short post on the best architectures for real-time image and video processing. TL;DR: use convolutions with stride or pooling at the low levels, and stick self-attention circuits at higher levels, where feature vectors represent objects. PS: ready to bet that Tesla FSD uses convolutions (or perhaps more complex *local* operators) at the low levels, combined with more global circuits at higher levels (perhaps using self-attention). Transformers on low-level patch embeddings are a complete waste of electrons.
Yann LeCun@ylecun

I'm not saying ViTs are not practical (we use them). I'm saying they are way too slow and inefficient to be practical for real-time processing of high-resolution images and video. [Also, @sainingxie's work on ConvNext has shown that they are just as good as ViTs if you do it right. But whatever]. You need at least a few Conv layers with pooling and stride before you stick self-attention circuits. Self-attention is equivariant to permutations, which is completely nonsensical for low-level image/video processing (having a single strided conv at the front-end to 'patchify' also doesn't make sense). Global attention is also nonsensical (and not scalable), since correlations are highly local in images and video. At high level, once features represent objects, then it makes sense to use self-attention circuits: what matters is the relationships and interactions between objects, not their positions. This type of hybrid architecture was inaugurated by the DETR system by @alcinos26 and collaborators. as I've said since the DETR work, my favorite family of architectures is conv/stride/pooling at the lower levels, and self-attention circuits at the higher levels.

English
61
110
1.4K
748.4K
Luke Salamone retweetet
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
Octopus v2: On-device language model for super agent Presents a new method that empowers an on-device 2B model to outperform GPT-4 in both accuracy and latency, and decrease the context length by 95% arxiv.org/abs/2404.01744
Aran Komatsuzaki tweet media
English
11
57
244
66.8K
Luke Salamone retweetet
Robert Komaniecki
Robert Komaniecki@Komaniecki_R·
The Trautonium. Invented early 1930s. Just listen to this thing.
English
498
6.4K
34.3K
2.1M
Luke Salamone retweetet
AK
AK@_akhaliq·
Google presents Genie Generative Interactive Environments introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
English
78
500
2.3K
684.1K
Luke Salamone retweetet
AK
AK@_akhaliq·
Google Deepmind presents Grandmaster-Level Chess Without Search paper page: huggingface.co/papers/2402.04… largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.
AK tweet media
English
35
259
1.4K
266.4K
Luke Salamone retweetet
MAA
MAA@maanow·
Three logicians walk into a bar. The bartender asks: 'Does everyone want a drink?' The first logician says: 'I don't know.' The second logician says: 'I don't know.' The third logician says: 'Yes.'
English
40
650
7.1K
508.8K
Luke Salamone retweetet
LaurieWired
LaurieWired@lauriewired·
The SHA256 for this sentence begins with: one, eight, two, a, seven, c and nine.
English
83
298
2.3K
419.4K
Luke Salamone retweetet
Luke Gessler
Luke Gessler@LukeGessler·
this paper's nuts. for sentence classification on out-of-domain datasets, all neural (Transformer or not) approaches lose to good old kNN on representations generated by.... gzip aclanthology.org/2023.findings-…
Luke Gessler tweet media
English
122
818
4.7K
3.4M
Luke Salamone
Luke Salamone@LukeASalamone·
@VictorButoi The biggest red flag is claiming to measure perplexity without access to the model logits. It’s borderline fraudulent.
English
0
0
0
63
Victor Butoi
Victor Butoi@ion_barrel·
Things like GPTZero are scary. I'm sure the creator didn't have the intention, but the fact that it's marketed as "a solution to detecting AI written responses" even though there's no evidence to show it works consistently and is nevertheless being EMPLOYED by schools, is crazy.
English
26
41
594
89.7K
Ana Marasović
Ana Marasović@anmarasovic·
Looking for examples [for teaching] where converting text to a set of n-grams or tf-idf features is not worse than using embeddings, or it is the only thing you could do given the scale of corpora and compute you have 🙏
English
15
4
42
29.3K
Luke Salamone retweetet
Mosquito Capital
Mosquito Capital@MosquitoCapital·
I've seen a lot of people asking "why does everyone think Twitter is doomed?" As an SRE and sysadmin with 10+ years of industry experience, I wanted to write up a few scenarios that are real threats to the integrity of the bird site over the coming weeks.
English
1.1K
14.4K
56.6K
0
Luke Salamone retweetet
NASA
NASA@NASA·
It's here–the deepest, sharpest infrared view of the universe to date: Webb's First Deep Field. Previewed by @POTUS on July 11, it shows galaxies once invisible to us. The full set of @NASAWebb's first full-color images & data will be revealed July 12: nasa.gov/webbfirstimages
NASA tweet media
English
7.6K
124.5K
536.7K
0
Luke Salamone retweetet
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
I don't want to say anything but that's not the right license Mr Copilot.
GIF
English
63
1K
4.3K
0
Luke Salamone retweetet
Giannis Daras
Giannis Daras@giannis_daras·
DALLE-2 has a secret language. "Apoploe vesrreaitais" means birds. "Contarra ccetnxniams luryca tanniounons" means bugs or pests. The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs. A thread (1/n)🧵
Giannis Daras tweet media
English
185
2.2K
8.4K
0
Luke Salamone retweetet
Alex Tabarrok
Alex Tabarrok@ATabarrok·
Freakishly good. Better than many humans.
Alex Tabarrok tweet media
English
35
148
1.2K
0