Kris

411 posts

Kris banner
Kris

Kris

@AllAbtAI

Generative AI 🤖 ⌨️ YouTube: https://t.co/NCI4K1wJye

Katılım Eylül 2021
80 Takip Edilen3.1K Takipçiler
Kris
Kris@AllAbtAI·
very promising results from OpenAI and the o3 (!) model. 87.5% on the ARC AGI seems kinda shocking. excited to to hear more from tests and try it out in early 2025
Kris tweet media
English
1
0
9
1.1K
SpaceX
SpaceX@SpaceX·
Mechazilla has caught the Super Heavy booster!
English
11.4K
61.5K
249K
45.1M
Kris retweetledi
Cursor
Cursor@cursor_ai·
OpenAI’s new o1 models are available in Cursor! We’ve found o1 to be excellent at well-specified, reasoning-intense problems. We still recommend sonnet/4o for most tasks. We’re initially rolling out the models with usage-based pricing but will iterate as rate limits increase.
English
116
338
3.4K
479K
Kris
Kris@AllAbtAI·
these ai vids lately 😅
English
0
0
3
907
Kris retweetledi
AI at Meta
AI at Meta@AIatMeta·
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context window and improved support for 8 languages among other improvements. Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation. The models are available to download now directly from Meta or @huggingface. With today’s release the ecosystem is also ready to go with 25+ partners rolling out our latest models — including @awscloud, @nvidia, @databricks, @groqinc, @dell, @azure and @googlecloud ready on day one. More details in the full announcement ➡️ go.fb.me/tpuhb6 Download Llama 3.1 models ➡️ go.fb.me/vq04tr With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest models will unlock across all levels of the AI community.
English
261
1.4K
5.6K
1.3M
Kris
Kris@AllAbtAI·
🔥🔥
Andrej Karpathy@karpathy

📽️ New 4 hour (lol) video lecture on YouTube: "Let’s reproduce GPT-2 (124M)" youtu.be/l8pRSuU81PU The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model: - first we build the GPT-2 network - then we optimize it to train very fast - then we set up the training run optimization and hyperparameters by referencing GPT-2 and GPT-3 papers - then we bring up model evaluation, and - then cross our fingers and go to sleep. In the morning we look through the results and enjoy amusing model generations. Our "overnight" run even gets very close to the GPT-3 (124M) model. This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar. Github. The associated GitHub repo contains the full commit history so you can step through all of the code changes in the video, step by step. github.com/karpathy/build… Chapters. On a high level Section 1 is building up the network, a lot of this might be review. Section 2 is making the training fast. Section 3 is setting up the run. Section 4 is the results. In more detail: 00:00:00 intro: Let’s reproduce GPT-2 (124M) 00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint 00:13:47 SECTION 1: implementing the GPT-2 nn.Module 00:28:08 loading the huggingface/GPT-2 parameters 00:31:00 implementing the forward pass to get logits 00:33:31 sampling init, prefix tokens, tokenization 00:37:02 sampling loop 00:41:47 sample, auto-detect the device 00:45:50 let’s train: data batches (B,T) → logits (B,T,C) 00:52:53 cross entropy loss 00:56:42 optimization loop: overfit a single batch 01:02:00 data loader lite 01:06:14 parameter sharing wte and lm_head 01:13:47 model initialization: std 0.02, residual init 01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms 01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms 01:39:38 float16, gradient scalers, bfloat16, 300ms 01:48:15 torch.compile, Python overhead, kernel fusion, 130ms 02:00:18 flash attention, 96ms 02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms 02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping 02:21:06 learning rate scheduler: warmup + cosine decay 02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms 02:34:09 gradient accumulation 02:46:52 distributed data parallel (DDP) 03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU) 03:23:10 validation data split, validation loss, sampling revive 03:28:23 evaluation: HellaSwag, starting the run 03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro 03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA 03:59:39 summary, phew, build-nanogpt github repo

ART
0
0
1
516
Matthew Berman
Matthew Berman@MatthewBerman·
The BEAST has arrived. Intel Xeon W7-2475X (37.5 MB cache, 20 cores, 40 threads, 2.6 GHz to 4.8 GHz Turbo, 225 W) Dual Nvidia RTX 6000 Ada Generation, 48 GB GDDR6, 4 DP 🔥🔥 128GB, 8x16GB, DDR5, 4800MHz, RDIMM ECC Memory 2x 4TB, M.2, PCIe NVMe, SSD, Class 40 We are going to push this monster to the limit. Thanks to @nvidia and @Dell for my dream machine.
Matthew Berman tweet mediaMatthew Berman tweet media
English
58
6
238
25.6K
Kris
Kris@AllAbtAI·
I created a tutorial on how to setup and use Ollama as a Code Assistant in VS Code using the Continue Extension✨ ✅ Full Tutorial ✅ Free Local AI Code Assistant ✅ Mistral AI Codestral 22b ✅ Full Video on YT
English
4
9
75
9.4K
Kris retweetledi
Andrew Ng
Andrew Ng@AndrewYNg·
New agentic short course! Multi AI Agent Systems with crewAI, built with @crewAIInc's founder and CEO @joaomdmoura. In this course, you'll learn how to break down complex tasks into subtasks for multiple AI agents, each playing a specialized role, to execute. For example, to generate a research report, you might have researcher, writer, and quality assurance agents collaborate. You'll define the roles, expectations, and interactions between the agents—like a manager organizing a team. You'll work with key agentic AI techniques like role-playing, tool use, memory, guardrails, and cross-agent collaboration. And you'll build your own multi-agent systems that can tackle complex tasks. I think you'll find it both productive and fun to design agents and watch them collaborate to get things done. Let me know what you think! I believe multi-agent architectures will drive significant progress in AI systems. Please sign up here! deeplearning.ai/short-courses/…
English
85
268
1.5K
349.4K