Emre

2.6K posts

Emre banner
Emre

Emre

@etunch

42. The answer is 42. But what is the question? That's the ultimate question. Living to be the person my dog thinks I am. People @AppSamurai

Hammersmith, London شامل ہوئے Ağustos 2009
845 فالونگ741 فالوورز
Emre
Emre@etunch·
@emrefa we love that feeling!!
English
0
0
1
59
Emre ری ٹویٹ کیا
212.vc
212.vc@212vc·
Happy to see so many of our portfolio companies listed in @FastCompanyT's Startup 100 List this year! 🎉
212.vc tweet media
English
1
3
11
670
Emre ری ٹویٹ کیا
Owain Evans
Owain Evans@OwainEvans_UK·
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Owain Evans tweet media
English
282
1.1K
8.4K
2M
Emre ری ٹویٹ کیا
Zafer
Zafer@ZaferElcik·
Merhaba, Son bir yıldır üzerinde çalıştığımız CrayonClub sonunda yayında! 🎉 Deneyimlerinizi, ⭐️ puanlarınızı ve yorumlarınızı bekliyoruz. Destekleriniz için şimdiden çok teşekkürler! 👉 App Store: apps.apple.com/us/app/crayon-… 👉 Play Store: play.google.com/store/apps/det…
Zafer tweet media
Türkçe
2
3
9
494
Emre ری ٹویٹ کیا
Grant Sanderson
Grant Sanderson@3blue1brown·
I just put up a new video, which was a collaboration with Terence Tao about the cosmic distance ladder. You can find the full video on YouTube, and here's a bit of extra footage that didn't make it into the final.
English
89
584
5.7K
305.6K
Emre ری ٹویٹ کیا
Chris Lattner
Chris Lattner@clattner_llvm·
@deedydas I’m glad I didn’t take this compiler class, I would have also gotten 0/100. No wonder people think compilers are scary, they shouldn’t be taught this way! It’s also flawed in many ways (and old) but I think this is more approachable llvm.org/docs/tutorial/
English
42
364
6.2K
963.3K
Emre ری ٹویٹ کیا
andrew chen
andrew chen@andrewchen·
this stat always surprises me >50% of consumer in-app spend on iOS and Android is on mobile games 🤯 That's right, for iOS: - $25.2B total spend (that's up +13.1%) - $12.85B come from gaming - Android is even more tilted towards gaming the number is huge bc so much of the social media apps that take our time monetize through advertising, where you are the product, as opposed to letting you pay for the product!
andrew chen tweet media
English
22
26
298
45.1K
Emre ری ٹویٹ کیا
Andrej Karpathy
Andrej Karpathy@karpathy·
In 2019, OpenAI announced GPT-2 with this post: openai.com/index/better-l… Today (~5 years later) you can train your own for ~$672, running on one 8XH100 GPU node for 24 hours. Our latest llm.c post gives the walkthrough in some detail: github.com/karpathy/llm.c… Incredibly, the costs have come down dramatically over the last 5 years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g. the FineWeb-Edu dataset). For this exercise, the algorithm was kept fixed and follows the GPT-2/3 papers. Because llm.c is a direct implementation of GPT training in C/CUDA, the requirements are minimal - there is no need for conda environments, Python interpreters, pip installs, etc. You spin up a cloud GPU node (e.g. on Lambda), optionally install NVIDIA cuDNN, NCCL/MPI, download the .bin data shards, compile and run, and you're stepping in minutes. You then wait 24 hours and enjoy samples about English-speaking Unicorns in the Andes. For me, this is a very nice checkpoint to get to because the entire llm.c project started with me thinking about reproducing GPT-2 for an educational video, getting stuck with some PyTorch things, then rage quitting to just write the whole thing from scratch in C/CUDA. That set me on a longer journey than I anticipated, but it was quite fun, I learned more CUDA, I made friends along the way, and llm.c is really nice now. It's ~5,000 lines of code, it compiles and steps very fast so there is very little waiting around, it has constant memory footprint, it trains in mixed precision, distributed across multi-node with NNCL, it is bitwise deterministic, and hovers around ~50% MFU. So it's quite cute. llm.c couldn't have gotten here without a great group of devs who assembled from the internet, and helped get things to this point, especially ademeure, ngc92, @gordic_aleksa, and rosslwheeler. And thank you to @LambdaAPI for the GPU cycles support. There's still a lot of work left to do. I'm still not 100% happy with the current runs - the evals should be better, the training should be more stable especially at larger model sizes for longer runs. There's a lot of interesting new directions too: fp8 (imminent!), inference, finetuning, multimodal (VQVAE etc.), more modern architectures (Llama/Gemma). The goal of llm.c remains to have a simple, minimal, clean training stack for a full-featured LLM agent, in direct C/CUDA, and companion educational materials to bring many people up to speed in this awesome field. Eye candy: my much longer 400B token GPT-2 run (up from 33B tokens), which went great until 330B (reaching 61% HellaSwag, way above GPT-2 and GPT-3 of this size) and then exploded shortly after this plot, which I am looking into now :)
Andrej Karpathy tweet media
English
124
749
6.3K
724K
Emre ری ٹویٹ کیا
Jeff Barr ☁️
Jeff Barr ☁️@jeffbarr·
Thank you to everyone who brought this article to our attention. We agree that customers should not have to pay for unauthorized requests that they did not initiate. We’ll have more to share on exactly how we’ll help prevent these charges shortly. #AWS #S3 How an empty S3 bucket can make your AWS bill explode - @maciej.pocwierz/how-an-empty-s3-bucket-can-make-your-aws-bill-explode-934a383cb8b1" target="_blank" rel="nofollow noopener">medium.com/@maciej.pocwie…
English
83
542
3.4K
1.3M
Emre ری ٹویٹ کیا
nano
nano@nanulled·
My speculation: GPT2 is an advanced multi-transformer architecture that combines two transformers (Find and Replace) The results speak for themselves This is from paper that was published by an anonymous authors
nano tweet media
English
12
23
197
34.6K
Emre ری ٹویٹ کیا
dr. jack morris
dr. jack morris@jxmnop·
one of the most important things I know about deep learning I learned from this paper: "Pretraining Without Attention" this what I found so surprising: these people developed an architecture very different from Transformers called BiGS, spent months and months optimizing it and training different configurations, only to discover that at the same parameter count, a wildly different architecture produces identical performance to transformers this may imply that as long as there are enough parameters, and things are reasonably well-conditioned (i.e. a decent number of nonlinearities and and connections between the pieces) then it really doesn't matter how you arrange them, i.e. any sufficiently good architecture works just fine i feel there's something really deep here, and we may be already very close to the upper bound of how well we can approximate a given function given a certain amount of compute. so we should spend more time thinking about other questions, such as what that function should actually look like (what data? which objective function?) and how to make it more efficient
dr. jack morris tweet media
English
93
408
3.1K
489.2K
Emre ری ٹویٹ کیا
Ian Johnson 🔬🤖
Ian Johnson 🔬🤖@enjalot·
Where do dads keep all of their jokes? In a dad-a-base! But what does a dadabase look like when you try to retrieve a joke? Introducing Latent Scope: a new open source instrument for visualizing unstructured data
English
2
12
42
5.6K
Emre ری ٹویٹ کیا
Jason Citron
Jason Citron@jasoncitron·
Big news for developers today on Discord. We’ve opened up the developer preview for user installable apps as well as HTML5 experiences for apps. This dramatically changes what’s possible to build on Discord. I can’t wait to see what y’all come up with! discord.com/developers/doc…
English
42
72
666
265.5K
Emre ری ٹویٹ کیا
Robert Lukoszko
Robert Lukoszko@Karmedge·
If you look deeper, @GroqInc and 500 tokens / sec mixtral tech was founded by the same person who created TPU for @GoogleAI Tensor Processing Unit – the core thing google AI servers a most likely rely on Whatever Jonathan Ross is about to do is about to change the AI chip industry Its already been 9 years since groq is founded Beast is about to be unleashed
English
15
50
377
104.1K