Matt Jones

218 posts

Matt Jones

Matt Jones

@drewfustin

Katılım Nisan 2018
11 Takip Edilen2 Takipçiler
Matt Jones retweetledi
Philipp Schmid
Philipp Schmid@_philschmid·
Mini-R1: Reproduce @deepseek_ai R1 „aha moment“ a RL tutorial! Recreate an RL "aha moment" using Group Relative Policy Optimization (GRPO) and train an open model using reinforcement learning to teach it self-verification and search abilities all on its own to solve the Countdown Game. TL;DR: 🤯 DeepSeek R1's "aha moment" demonstrates RL's potential for self-improvement in LLMs. 2️⃣ Using 2 reward functions, 1x for format (,) and 1x for correctness 🤖 Qwen2.5-3B-Instruct model learns self-verification and search abilities. ⚙️ Use @MSFTDeepSpeed and @vllm_project for efficient and distributed online RL Training with @huggingface TRL 🤟 Include Training Observations and Hyperparameter improvements 🧮 Uses Countdown Game (arithmetic puzzles) to teach models self-correction via and tags 📊 Achieves 50% success rate after 450 training steps on 4x H100 GPUs ⚡ Training takes ~6 hours on 4x H100 GPUs for 450 steps
Philipp Schmid tweet media
English
30
150
811
77.2K
Matt Jones retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
NetworkX from NVIDIA is one THE most popular Python graph analytics library with ~15K Github starts and 80M downloads monthly. This library is for working with networks and graphs. It helps analyze connections between things - like social networks, computer networks, or any system where objects are connected to each other. And now NetworkX just got massively accelerated after its backend integration with NVIDIA's cuGraph. ✨ Up to 500x speedups on large graph workloads in NetworkX with zero code changes. And it is Zero Code Change Acceleration. 📌 cuGraph is NVIDIA's GPU-accelerated graph analytics library within the RAPIDS ecosystem. The library provides fast graph algorithms on GPUs, supporting property graphs, remote operations, and graph neural networks (GNNs). Works with GPU DataFrames (cuDF) and integrates smoothly with NetworkX-like API. -------- 📌 The traditional bottleneck of NetworkX's pure Python implementation becomes apparent when processing graphs larger than 100K nodes and 1M edges. 📌 And so now cuGraph solves this by offloading supported algorithms to the GPU. PageRank, Louvain community detection, betweenness centrality, and about 60 other algorithms get instant acceleration. 📌 This acceleration enables previously impractical use cases. Fraud detection systems can now process massive transaction networks in real-time. Recommendation engines handle millions of user-item interactions efficiently. Social network analysis scales to entire platforms worth of data on a single machine. @NVIDIAAIDev
Rohan Paul tweet media
English
10
154
916
63K
Matt Jones retweetledi
Heng Li
Heng Li@lh3lh3·
Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613
Heng Li tweet media
English
9
217
714
192.8K
Matt Jones retweetledi
elvis
elvis@omarsar0·
o3-mini-high (left) vs. deepseek-r1 (right) results from the first try deepseek-r1 is cracked... wtf!
English
103
172
2.4K
719.7K
Matt Jones retweetledi
Xiang Yue
Xiang Yue@xiangyue96·
Introducing Critique Fine-Tuning (CFT): a more effective SFT method for enhancing LLMs' reasoning abilities. 📄 Paper: arxiv.org/pdf/2501.17703 CFT is simple: instead of training models to directly answer questions, we train them to critique noisy answers. What's fascinating is that while most approaches focus on using generative critique or reward models to provide feedback for policy models, these critique models can themselves serve as policy models: directly answering questions with stronger reasoning. Interestingly, we also found that CFT saturates quickly: overtraining on critiques can even degrade problem-solving performance. Work led by @YuboWang726 and collaborated with @WenhuChen
Xiang Yue tweet media
English
11
67
305
23.2K
Matt Jones retweetledi
Jürgen Schmidhuber
Jürgen Schmidhuber@SchmidhuberAI·
DeepSeek [1] uses elements of the 2015 reinforcement learning prompt engineer [2] and its 2018 refinement [3] which collapses the RL machine and world model of [2] into a single net through the neural net distillation procedure of 1991 [4]: a distilled chain of thought system. REFERENCES (easy to find on the web): [1] #DeepSeekR1 (2025): Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2501.12948 [2] J. Schmidhuber (JS, 2015). On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arXiv 1210.0118. Sec. 5.3 describes the reinforcement learning (RL) prompt engineer which learns to actively and iteratively query its model for abstract reasoning and planning and decision making. [3] JS (2018). One Big Net For Everything. arXiv 1802.08864. See also US11853886B2. This paper collapses the reinforcement learner and the world model of [2] (e.g., a foundation model) into a single network, using the neural network distillation procedure of 1991 [4]. Essentially what's now called an RL "Chain of Thought" system, where subsequent improvements are continually distilled into a single net. See also [5]. [4] JS (1991). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992. Based on TR FKI-148-91, TUM, 1991. First working deep learner based on a deep recurrent neural net hierarchy (with different self-organising time scales), overcoming the vanishing gradient problem through unsupervised pre-training (the P in CHatGPT) and predictive coding. Also: compressing or distilling a teacher net (the chunker) into a student net (the automatizer) that does not forget its old skills - such approaches are now widely used. See also [6]. [5] JS (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990, introducing high-dimensional reward signals and the GAN principle). Contains summaries of [2][3] above. [6] JS (AI Blog, 2021). 30-year anniversary: First very deep learning with unsupervised pre-training (1991) [4]. Unsupervised hierarchical predictive coding finds compact internal representations of sequential data to facilitate downstream learning. The hierarchy can be distilled [4] into a single deep neural network. 1993: solving problems of depth >1000.
Jürgen Schmidhuber tweet media
English
280
891
4.7K
847K
Matt Jones retweetledi
ILIAS ISM
ILIAS ISM@illyism·
You don't need a reasoning model like R1 or o3, just use this .cursorrules with Claude Sonnet to add a thinking step, works 100x better.
ILIAS ISM tweet media
English
80
273
4.9K
557.6K
Matt Jones retweetledi
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
🔥 o3-mini-high beats deepseek r1 and o1-pro! in a p5.js challenge! 03-mini result is so good that deserves a video on its own. deepseek r1 (bad result) and o1-pro (better) in comments below. Prompt in last comment. 1/4
English
70
131
1.2K
463.2K
Matt Jones retweetledi
Flavio Adamo
Flavio Adamo@flavioAd·
🚨 o3-mini crushed DeepSeek R1 🚨 "write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically"
English
675
1.6K
18.5K
5M
Matt Jones retweetledi
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. Paper on arxiv coming on Monday. Link to a talk I gave on this below 👇 Super excited about this work!
Dimitris Papailiopoulos tweet mediaDimitris Papailiopoulos tweet mediaDimitris Papailiopoulos tweet media
English
19
142
1K
166.7K
Matt Jones retweetledi
Sam Altman
Sam Altman@sama·
o3-mini is out! smart, fast model. available in ChatGPT and API. it can search the web, and it shows its thinking. available to free-tier users! click the "reason" button. with ChatGPT plus, you can select "o3-mini-high", which thinks harder and gives better answers.
English
1.6K
2K
26.1K
3.2M
Matt Jones retweetledi
Seunghyun Seo
Seunghyun Seo@SeunghyunSEO7·
what up guys, I made a one-page comparison of MHA and MLA from @deepseek_ai for those who skipped the DS-V2 paper. pls correct me if I'm wrong.
Seunghyun Seo tweet media
English
4
48
363
39.3K
Matt Jones retweetledi
Breeze
Breeze@BreezeChai·
Ascending to the Divine
Breeze tweet media
English
1.5K
27.5K
404.6K
41.8M
Matt Jones retweetledi
LangChain
LangChain@LangChain·
📚🤖 Advanced RAG + Agents Cookbook A comprehensive open-source guide delivering production-ready implementations of cutting-edge RAG techniques with AI agents. Built with LangChain and LangGraph, it features advanced implementations like Hybrid, Self, and ReAct RAG. Learn more: github.com/athina-ai/rag-…
LangChain tweet media
English
5
159
703
61.1K
Matt Jones retweetledi
Andi Marafioti
Andi Marafioti@andimarafioti·
Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s🔥 Inspired by our team's effort to open-source DeepSeek's R1 training, we are releasing the training and evaluation code on top of the weights 🫡 Now you can train any of our SmolVLMs—or create your own custom VLMs!
Andi Marafioti tweet media
English
34
213
1.3K
98.6K
Matt Jones retweetledi
AK
AK@_akhaliq·
OpenAI o3-mini System Card
AK tweet media
Português
11
69
361
46.7K
Matt Jones retweetledi
Han Xiao
Han Xiao@hxiao·
Letter-dropping physics comparison: o3-mini vs. deepseek-r1 vs. claude-3.5 in one-shot - which is the best? Prompt: Create a JavaScript animation of falling letters with realistic physics. The letters should: * Appear randomly at the top of the screen with varying sizes * Fall under Earth's gravity (9.8 m/s²) * Have collision detection based on their actual letter shapes * Interact with other letters, ground, and screen boundaries * Have density properties similar to water * Dynamically adapt to screen size changes * Display on a dark background
English
154
254
2.6K
603.6K
Matt Jones retweetledi
elvis
elvis@omarsar0·
AI Agents for Computer Use This report provides a comprehensive overview of the emerging field of instruction-based computer control, examining available agents – their taxonomy, development, and resources.
elvis tweet media
English
15
141
658
65.5K
Matt Jones retweetledi
Gabriel Massadas
Gabriel Massadas@G4brym·
Gemini 2.0 doesn’t get nearly enough credit. I just dumped all my workers-qb source code into it, hit it with a simple, humble prompt, and boom => it one-shotted the docs. Not just good docs, way better than what I had before, packed with examples. Kinda insane.
English
30
60
719
115.4K