dave

1.1K posts

dave

@dvxdo

member of technical staff. building agents and evals in a loop

Beigetreten Ağustos 2012

1.8K Folgt576 Follower

dave retweetet

Lou@louszbd·7 Mar

interesting new work from Alibaba and WHU (Agentic Memory). most agent memory systems now are basically hardcoded infra, vector db + hand-written rules for when to store/delete/summarize. the model never gets to touch any of it. they made memory ops into actions. add, delete, update, retrieve, summarize, filter, same as calling a tool. then RL trains the whole thing end to end. the neat part is the model discovers on its own that it should proactively clean up its context when things get noisy. nobody wrote a "if tokens > 4k then summarize" rule. And it just emerged from the reward signal. makes you wonder how many other parts of the RAG pipeline are secretly just learnable actions we've been hand-coding for no good reason. arxiv.org/abs/2601.01885

English

724

46.7K

dave retweetet

Rohit@rohit4verse·18 Oca

x.com/i/article/2008…

ZXX

142

268

2.8K

773.3K

dave@dvxdo·15 Ara

here’s the blog post I wrote on environments davidokpare.com/blog/rl-101-q-…

English

dave@dvxdo·15 Ara

first steps in rl (top-bottom approach) > read @jsuarez guide on rl > wrote a simple environment (basic python class) > built a q-learning agent > wrote a two parts blog post on what I learned

Joseph Suarez 🐡@jsuarez

x.com/i/article/1851…

English

259

dave@dvxdo·24 Kas

@toriste1 @Odenigbo511 @ebukagaus @OkeyMeta @GroqInc @real_okechukwu @Jeffreypaul_ Not exactly — I get your point tho, only that it’s worded wrongly. We based our work from existing research and added our flavors to it (every good research starts and takes inspiration from past research)

English

Name cannot be blank@toriste1·24 Kas

@DavidOkpare @Odenigbo511 @ebukagaus @OkeyMeta @GroqInc @real_okechukwu @Jeffreypaul_ Ya, you trained a model but from a foundation model, which is my point. But nonetheless you guys did a great job.

English

Ebuka l Socrates🦙🐍🦜🦀@ebukagaus·23 Kas

"Imagine a 17-year-old boy built an LLM @OkeyMeta from scratch" Meanwhile it is a llama3 model from @GroqInc inference via API Y'all are giving a platform and publicity to a FRAUDULENT PERSON and using age sentiment as a waiver @real_okechukwu you can not deceive everyone

Tosin Olugbenga@TosinOlugbenga

I honestly love how creative this generation is. Imagine a 17-year-old boy built an LLM @OkeyMeta from scratch. At 17!, this is the kind of story that makes you stop and think about how much potential young people carry. Join me this Thursday with @real_okechukwu as we hear his journey firsthand. 🔗 x.com/i/spaces/1lpkq…

English

295

73.8K

dave@dvxdo·24 Kas

Happy to know you took the time to read the paper :) yes the model is part of GPT-J family, we never stated we invented a new architecture. Similarly to almost every SoTA model out today being based on an MoE architecture, or the like of Mistral, Gemma, Qwen all based on Llama’s architecture

English

730NARD@Leonard_Nonso·24 Kas

@toriste1 @Jeffreypaul_ @DavidOkpare @Odenigbo511 @ebukagaus @OkeyMeta @GroqInc @real_okechukwu I NEVER EVER said they fine tuned a model. I literally suggested they trained a model from scratch. However, it was based on the GPT-J architecture. Meaning the model was pre-trained on GPT-J and SabiYarn was not on an entirely new architecture.

English

111

dave@dvxdo·24 Kas

@toriste1 @Odenigbo511 @ebukagaus @OkeyMeta @GroqInc @real_okechukwu @Jeffreypaul_ Here’s the paper too aclanthology.org/2025.africanlp…

English

2.1K

dave@dvxdo·24 Kas

@toriste1 @Odenigbo511 @ebukagaus @OkeyMeta @GroqInc @real_okechukwu @Jeffreypaul_ Wrong, we pretrained and fine-tuned the model (I’m part of the team) but thanks for the compliment :)

English

346

dave@dvxdo·24 Kas

@Odenigbo511 @ebukagaus @OkeyMeta @GroqInc @real_okechukwu @Jeffreypaul_ trains from scratch (just like other major labs) huggingface.co/BeardedMonster…

English

789

Odenigbo@Odenigbo511·24 Kas

@ebukagaus @OkeyMeta @GroqInc @real_okechukwu Everyone in AI knows no one builds AI from scratch, i don’t know what you’re expecting to gain by dunking on a 17 year old on the path to greatness.

English

3.7K

dave retweetet

nanda@nandafyi·21 Eki

New post 🎉 Going back to my roots on writing about the inner workings of things, a breakdown of key-value databases and how you might make one from scratch: nan.fyi/database

English

250

328.7K

dave retweetet

Andrew Ng@AndrewYNg·16 Eki

Readers responded with both surprise and agreement last week when I wrote that the single biggest predictor of how rapidly a team makes progress building an AI agent lay in their ability to drive a disciplined process for evals (measuring the system’s performance) and error analysis (identifying the causes of errors). It’s tempting to shortcut these processes and to quickly attempt fixes to mistakes rather than slowing down to identify the root causes. But evals and error analysis can lead to much faster progress. In this first of a two-part letter, I’ll share some best practices for finding and addressing issues in agentic systems. Even though error analysis has long been an important part of building supervised learning systems, it is still underappreciated compared to, say, using the latest and buzziest tools. Identifying the root causes of particular kinds of errors might seem “boring,” but it pays off! If you are not yet persuaded that error analysis is important, permit me to point out: - To master a composition on a musical instrument, you don’t only play the same piece from start to end. Instead, you identify where you’re stumbling and practice those parts more. - To be healthy, you don’t just build your diet around the latest nutrition fads. You also ask your doctor about your bloodwork to see if anything is amiss. (I did this last month and am happy to report I’m in good health! 😃) - To improve your sports team’s performance, you don’t just practice trick shots. Instead, you review game films to spot gaps and then address them. To improve your agentic AI system, don’t just stack up the latest buzzy techniques that just went viral on social media (though I find it fun to experiment with buzzy AI techniques as much as the next person!). Instead, use error analysis to figure out where it’s falling short, and focus on that. Before analyzing errors, we first have to decide what is an error. So the first step is to put in evals. I’ll focus on that for the remainder of this letter and discuss error analysis next week. If you are using supervised learning to train a binary classifier, the number of ways the algorithm could make a mistake is limited. It could output 0 instead of 1, or vice versa. There is also a handful of standard metrics like accuracy, precision, recall, F1, ROC, etc. that apply to many problems. So as long as you know the test distribution, evals are relatively straightforward, and much of the work of error analysis lies in identifying what types of input an algorithm fails on, which also leads to data-centric AI techniques for acquiring more data to augment the algorithm in areas where it’s weak. With generative AI, a lot of intuitions from evals and error analysis of supervised learning carry over — history doesn’t repeat itself, but it rhymes — and developers who are already familiar with machine learning and deep learning often adapt to generative AI faster than people who are starting from scratch. But one new challenge is that the space of outputs is much richer, so there are many more ways an algorithm’s output might be wrong. Take the example of automated processing of financial invoices where we use an agentic workflow to populate a financial database with information from received invoices. Will the algorithm incorrectly extract the invoice due date? Or the final amount? Or mistake the payer address for the biller address? Or get the financial currency wrong? Or make the wrong API call so the verification process fails? Because the output space is much larger, the number of failure modes is also much larger. Rather than defining an error metric ahead of time, it is therefore typically more effective to first quickly build a prototype, then manually examine a handful of agent outputs to see where it performs well and where it stumbles. This allows you to focus on building datasets and error metrics — sometimes objective metrics implemented in code, and sometimes subjective metrics using LLM-as-judge — to check the system’s performance in the dimensions you are most concerned about. In supervised learning, we sometimes tune the error metric to better reflect what humans care about. With agentic workflows, I find tuning evals to be even more iterative, with more frequent tweaks to the evals to capture the wider range of things that can go wrong. I discuss this and other best practices in detail in Module 4 of the Agentic AI course on deeplearning.ai that we announced last week. After building evals, you now have a measurement of your system’s performance, which provides a foundation for trying different modifications to your agent, as you can now measure what makes a difference. The next step is then to perform error analysis to pinpoint what changes to focus your development efforts on. I’ll discuss this further next week. [Original text: deeplearning.ai/the-batch/issu… ]

English

286

1.7K

311.9K

dave retweetet

Tom Yeh@ProfTomYeh·9 Eki

Copy-pasting PyTorch code is fast — using an AI coding model is even faster — but both skip the learning. That's why I asked my students to write by hand ✍️. 🔽 Download: byhand.ai/pytorch After the exercise, my students can understand what every line really does and connect it to the math. You can too!

English

521

34K

dave retweetet

David@dzhng·7 Eki

Bad take, OpenAI just validated the market. There will always need to be products that support other models / ecosystems.

Sahil@sahilypatel

openai just killed n8n

English

1.6K

dave retweetet

Unsloth AI@UnslothAI·29 Eyl

LoRA in reinforcement learning (RL) can match full-finetuning performance when done right! 💡 A new @thinkymachines post shows how using 10x larger learning rates, applying LoRA on all layers & more, LoRA at rank=1 even works. We're excited to have collaborated on this blog!

Thinking Machines@thinkymachines

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

146

953

72.7K

dave retweetet

Simon Willison@simonw·23 Eyl

It's delightful how easy it is to deploy working prompt injection attacks via LinkedIn

Cameron!!@cameronmattis

i can't believe this shit actually works

English

270

4.3K

449.1K

dave retweetet

Gabriele Berton@gabriberton·18 Eyl

Many people think LLMs are non-deterministic. This is often not true! You just need 3 lines of code to make your LLM deterministic LLMs (as any PyTorch model) are non-deterministic only when they include certain operations or when using multiple GPUs Try the code yourself

Thinking Machines@thinkymachines

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/defeating…

English

175

1.6K

220.2K

dave@dvxdo·13 Eyl

quality slop

Ahmad@TheAhmadOsman

> be you > want to actually learn how LLMs work > sick of “just start with linear algebra and come back in 5 years” > decide to build my own roadmap > no fluff. no detours. no 200-hour generic ML playlists > just the stuff that actually gets you from “what’s a token?” to “I trained a mini-GPT with LoRA adapters and FlashAttention” > goal: build, fine-tune, and ship LLMs > not vibe with them. not "learn the theory" forever > build them > you will: > > build an autograd engine from scratch > > write a mini-GPT from scratch > > implement LoRA and fine-tune a model on real data > > hate CUDA at least once > > cry > > keep going > 5 phases > if you already know something? skip > if you're lost? rewatch > if you’re stuck? use DeepResearch > this is a roadmap, not a leash > by the end: you either built the thing or you didn’t > phase 0: foundations > > if matrix multiplication is scary, you’re not ready yet > > watch 3Blue1Brown’s linear algebra series > > MIT 18.06 with Strang, yes, he’s still the GOAT > > code Micrograd from scratch (Karpathy) > > train a mini-MLP on MNIST > > no frameworks, no shortcuts, no mercy > phase 1: transformers > > the name is scary > > it’s just stacked matrix multiplies and attention blocks > > Jay Alammar + 3Blue1Brown for the “aha” > > Stanford CS224N for the theory > > read "Attention Is All You Need" only AFTER building mental models > > Karpathy's "Let's Build GPT" will break your brain in a good way > > project: build a decoder-only GPT from scratch > > bonus: swap tokenizers, try BPE/SentencePiece > phase 2: scaling > > LLMs got good by scaling, not magic > > Kaplan paper -> Chinchilla paper > > learn Data, Tensor, Pipeline parallelism > > spin up multi-GPU jobs using HuggingFace Accelerate > > run into VRAM issues > > fix them > > welcome to real training hell > phase 3: alignment & fine-tuning > > RLHF: OpenAI blog -> Ouyang paper > > SFT -> reward model -> PPO (don’t get lost here) > > Anthropic's Constitutional AI = smart constraints > > LoRA/QLoRA: read, implement, inject into HuggingFace models > > fine-tune on real data > > project: fine-tune gpt2 or distilbert with your own adapters > > not toy examples. real use cases or bust > phase 4: production > this is the part people skip to, but you earned it > inference optimization: FlashAttention, quantization, sub-second latency > read the paper, test with quantized models > resources: > math/coding: > > 3Blue1Brown, MIT 18.06, Goodfellow’s book > PyTorch: > > Karpathy, Zero to Mastery > > transformers: > > Alammar, Karpathy, CS224N, Vaswani et al > > scaling: > > Kaplan, Chinchilla, HuggingFace Accelerate > > alignment: > > OpenAI, Anthropic, LoRA, QLoRA > > inference: > > FlashAttention > the endgame: > > understand how these models actually work > > see through hype > > ignore LinkedIn noise > > build tooling > > train real stuff > > ship your own stack > > look at a paper and think “yeah I get it” > > build your own AI assistant, infra, whatever > make it all the way through? > ship something real? > DM me. > I wanna see what you built. > happy hacking.

English

dave@dvxdo·11 Eyl

I just realized that I have been building out a “demo project” with Kubernetes and load balancers even after squeezing out 10k request/secs from it

English

dave@dvxdo·6 Eyl

@aigeek__ @jobergum yeah, that was the initial pushback but then people started suggesting skipping evals altogether, instead of pushing for DIY evals

English

ai geek (wishesh) ⚡️@aigeek__·6 Eyl

@jobergum @DavidOkpare i think, it is more of a revolt against the tools than the fact that people New’s to start looking at their data more closely to get better results.

English

118

Jo Kristian Bergum@jobergum·6 Eyl

The upside of the eval wars is that it triggered creation of a lot of high alpha content

English

dave@dvxdo·6 Eyl

tl;dr: don’t outsource evals. log your traces and QA them yourself. evals aren’t a sham, they’re necessary but best done diy.

Alex Reibman 🖇️@AlexReibman

Evals are a scam. And we're being gaslit into believing they aren't. New post just dropped (🧵).

English

Entdecken

@jsuarez @toriste1 @Odenigbo511 @ebukagaus @OkeyMeta @GroqInc @real_okechukwu @Jeffreypaul_