jonclement

325 posts

jonclement

jonclement

@jonclement

Toronto Beigetreten Mart 2008
1K Folgt231 Follower
leo
leo@leojrr·
hey guys, last call vibe coders group is getting pretty big closing it at 400 so we can actually get to know each other if you are interested in joining, say 👋
leo tweet media
English
1.6K
16
823
139.5K
jonclement
jonclement@jonclement·
@svpino Imagine every time you remove your fingers from the home row -- a bell goes off. Every keyboard without a trackpoint 'nub' is 25% less efficient.
English
0
0
0
15
Santiago
Santiago@svpino·
Many years later, the Logitech MX Keys is the best keyboard ever. No contest.
Santiago tweet media
English
487
163
5.1K
779.6K
Andrej Karpathy
Andrej Karpathy@karpathy·
@rileybrown_ai Would be interesting if you could organize them into groups, turn them on and off as groups, and share them, vote them etc. Would then basically be a lite version similar to controlling and the algorithm in a marketplace that @jack has been thinking about.
English
120
43
2.2K
98.6K
Riley Brown
Riley Brown@rileybrown·
812 muted words and counting.
Riley Brown tweet media
English
125
17
987
124.5K
Sid Bharath
Sid Bharath@Siddharth87·
Not as exciting a story as the Arrival rumor but an important one nevertheless. One question though. You build a transformer that predicts on the character level in Makemore. LLMs predict on token level. Has a sentence level predictor been attempted? Or is there way to combine sentence and token level prediction to capture more meaning? The reason I ask is, a character alone has no meaning, but meaning emerges when it becomes a word. So predicting on a token level is better than predicting on a character level. Similarly, a token by itself has some meaning but more meaning emerges at the sentence level. Hence, sentence level predictions could be interesting?
English
3
0
3
3.6K
Andrej Karpathy
Andrej Karpathy@karpathy·
The (true) story of development and inspiration behind the "attention" operator, the one in "Attention is All you Need" that introduced the Transformer. From personal email correspondence with the author @DBahdanau ~2 years ago, published here and now (with permission) following some fake news about how it was developed that circulated here over the last few days. Attention is a brilliant (data-dependent) weighted average operation. It is a form of global pooling, a reduction, communication. It is a way to aggregate relevant information from multiple nodes (tokens, image patches, or etc.). It is expressive, powerful, has plenty of parallelism, and is efficiently optimizable. Even the Multilayer Perceptron (MLP) can actually be almost re-written as Attention over data-indepedent weights (1st layer weights are the queries, 2nd layer weights are the values, the keys are just input, and softmax becomes elementwise, deleting the normalization). TLDR Attention is awesome and a *major* unlock in neural network architecture design. It's always been a little surprising to me that the paper "Attention is All You Need" gets ~100X more err ... attention... than the paper that actually introduced Attention ~3 years earlier, by Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: "Neural Machine Translation by Jointly Learning to Align and Translate". As the name suggests, the core contribution of the Attention is All You Need paper that introduced the Transformer neural net is deleting everything *except* Attention, and basically just stacking it in a ResNet with MLPs (which can also be seen as ~attention per the above). But I do think the Transformer paper stands on its own because it adds many additional amazing ideas bundled up all together at once - positional encodings, scaled attention, multi-headed attention, the isotropic simple design, etc. And the Transformer has imo stuck around basically in its 2017 form to this day ~7 years later, with relatively few and minor modifications, maybe with the exception better positional encoding schemes (RoPE and friends). Anyway, pasting the full email below, which also hints at why this operation is called "attention" in the first place - it comes from attending to words of a source sentence while emitting the words of the translation in a sequential manner, and was introduced as a term late in the process by Yoshua Bengio in place of RNNSearch (thank god? :D). It's also interesting that the design was inspired by a human cognitive process/strategy, of attending back and forth over some data sequentially. Lastly the story is quite interesting from the perspective of nature of progress, with similar ideas and formulations "in the air", with a particular mentions to the work of Alex Graves (NMT) and Jason Weston (Memory Networks) around that time. Thank you for the story @DBahdanau !
Andrej Karpathy tweet media
English
133
977
6.6K
855.5K
jonclement
jonclement@jonclement·
Sad to see the destruction of Ontario Place for the benefit of an Austrian spa company. By the looks of these 8000 Therme Google reviews in Europe -- we're in for an unpleasant overpriced spacino: ontarioplace.lobbykit.com
English
6
58
157
3.9K
jonclement
jonclement@jonclement·
RIP Ontario Place forest. See link to browse a 3D scan of the before and after effects (July 24 -> Oct 24+). #slide-id-245435" target="_blank" rel="nofollow noopener">ion.cesium.com/stories/viewer… @ONPlace4All
jonclement tweet media
English
60
359
831
161.5K
jonclement
jonclement@jonclement·
Distressed birds circling the forest destruction at Ontario Place @ONPlace4All
English
283
780
1.6K
990K
jonclement
jonclement@jonclement·
@karpathy @nrehiew_ what's a simple equivalent for non-gradient methods? Random masking/search for back prop? Which also seems like a precursor to attention masks..
English
0
0
0
145
Andrej Karpathy
Andrej Karpathy@karpathy·
“turned out that by only defining the derivatives for scalar values, it was sufficient to generalise to any higher dimensional Tensors. Therefore, I think building backpropagation intuition from the scalar valued perspective is extremely educational” Yep exactly. I think matrix calculus scares everyone and it’s just unnecessary to go there at all. Scalar valued autograd has the main concept, everything else is just vectorization, there’s no other deeper algorithmic concept there.
English
14
36
952
68.8K
wh
wh@nrehiew_·
Last week, I started building @karpathy's micrograd in Rust By the end of the week, I ended up with a Tensor library with autograd support using only the Rust standard library I learnt a lot about PyTorch through this process so I wrote about it here :)
wh tweet media
English
21
133
2.1K
175.3K
jonclement
jonclement@jonclement·
@eyeonthefly This image is a bit outdated. The plan is to destroy the pebble beach entirely and put up a sea wall. The "new" beach beside the highway enters at the sewer outflow. Likely the whole area will be privately patrolled.
English
0
0
2
11
jonclement
jonclement@jonclement·
@ColinDMello Value? I mean here's 8000 terrible business reviews from people who visited a Therme Spa. I'd say the HUMAN EXPERIENCE VALUE is very low (too bad it's hard to bean count): ontarioplace.lobbykit.com
English
0
1
2
5
jonclement
jonclement@jonclement·
@jerryjliu0 just one output hook missing that'll knock Google off the podium: RSS
English
0
0
0
137
Jerry Liu
Jerry Liu@jerryjliu0·
A lot of LLM apps - particular consumer facing ones - might need to account for user preferences. This is one of the first projects I've seen that blends recommender systems with RAG. Full architecture diagram below 🖼️ There's a full ingestion pipeline setup to extract metadata/embeddings from incoming documents. Then, there's a recommendation pipeline that extracts relevant docs based on user preferences. The "chat with data" feature is at the end, after relevant articles have already been extracted, as a "second-stage" interaction to surface more fine-grained insights. Check it out! 👇
Jerry Liu tweet media
LlamaIndex 🦙@llama_index

We’re excited to feature NewsGPT (by timho102003) 📰🧠 - a production-grade news aggregator augmented with LLM capabilities. ✅ Daily pipeline of reliable news sources ✅ Tailored News Recommendations ✅ For any given article, chat with related articles Best of all, it’s fully open-source. It’s an awesome reference application for anyone looking to build production-grade RAG combined with recommendation systems 🔎 There’s some awesome architecture details ⚙️: 1️⃣ Data pipeline: Spark batch processing for NER/embeddings 2️⃣ Personalization: @Firebase for auth, AWS lambda for recommendations, @qdrant_engine as vector db 3️⃣ Application: @llama_index for RAG capabilities, @streamlit for personalization Full blog here: blog.llamaindex.ai/newsgpt-neotic… Open-source repo: github.com/timho102003/Ne… This has since turned into a production app (Neotice), check it out here! neotice.app Full credits: Tim Ho (timho102003) as the author of this hackathon-project-turned-full-stack-app! Congrats 🙌

English
12
109
561
150.6K
jonclement
jonclement@jonclement·
@theirishking @radiogirl985 The firm billy bishop airport keep-out zone touches the south end of Ontario Place. Can't fly in the construction zone either but can definitely get lawful drone videos of whats' happening.
English
1
0
2
37
TDot Resident
TDot Resident@TDotResident·
#ONpoli As a reminder, Ontario Place had 2.9M visitors in 2022 and made a record profit of $5.7M with basically zero investment from the Ford Government! That profit is almost double what Therme Bucharest made for the same year.
TDot Resident tweet mediaTDot Resident tweet media
TDot Resident@TDotResident

#ONpoli NEW: Therme Bucharest, which opened in 2016, has lost money overall as a business. Net Profits have been -$15.5M for its history (RON to CAD is about 0.3). For 2022, the net profit was $3.2M CAD. As a reminder, Therme's estimated construction costs are $350M CAD.

English
21
354
701
44K