Matt Jones

218 posts

Matt Jones

@drewfustin

Katılım Nisan 2018

11 Takip Edilen2 Takipçiler

Matt Jones retweetledi

Philipp Schmid@_philschmid·30 Oca

Mini-R1: Reproduce @deepseek_ai R1 „aha moment“ a RL tutorial! Recreate an RL "aha moment" using Group Relative Policy Optimization (GRPO) and train an open model using reinforcement learning to teach it self-verification and search abilities all on its own to solve the Countdown Game. TL;DR: 🤯 DeepSeek R1's "aha moment" demonstrates RL's potential for self-improvement in LLMs. 2️⃣ Using 2 reward functions, 1x for format (,) and 1x for correctness 🤖 Qwen2.5-3B-Instruct model learns self-verification and search abilities. ⚙️ Use @MSFTDeepSpeed and @vllm_project for efficient and distributed online RL Training with @huggingface TRL 🤟 Include Training Observations and Hyperparameter improvements 🧮 Uses Countdown Game (arithmetic puzzles) to teach models self-correction via and tags 📊 Achieves 50% success rate after 450 training steps on 4x H100 GPUs ⚡ Training takes ~6 hours on 4x H100 GPUs for 450 steps

English

150

811

77.2K

Matt Jones retweetledi

Rohan Paul@rohanpaul_ai·25 Eki

NetworkX from NVIDIA is one THE most popular Python graph analytics library with ~15K Github starts and 80M downloads monthly. This library is for working with networks and graphs. It helps analyze connections between things - like social networks, computer networks, or any system where objects are connected to each other. And now NetworkX just got massively accelerated after its backend integration with NVIDIA's cuGraph. ✨ Up to 500x speedups on large graph workloads in NetworkX with zero code changes. And it is Zero Code Change Acceleration. 📌 cuGraph is NVIDIA's GPU-accelerated graph analytics library within the RAPIDS ecosystem. The library provides fast graph algorithms on GPUs, supporting property graphs, remote operations, and graph neural networks (GNNs). Works with GPU DataFrames (cuDF) and integrates smoothly with NetworkX-like API. -------- 📌 The traditional bottleneck of NetworkX's pure Python implementation becomes apparent when processing graphs larger than 100K nodes and 1M edges. 📌 And so now cuGraph solves this by offloading supported algorithms to the GPU. PageRank, Louvain community detection, betweenness centrality, and about 60 other algorithms get instant acceleration. 📌 This acceleration enables previously impractical use cases. Fraud detection systems can now process massive transaction networks in real-time. Recommendation engines handle millions of user-item interactions efficiently. Social network analysis scales to entire platforms worth of data on a single machine. @NVIDIAAIDev

English

154

916

63K

Matt Jones retweetledi

Heng Li@lh3lh3·4 Eyl

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613

English

217

714

192.8K

Matt Jones retweetledi

elvis@omarsar0·1 Şub

o3-mini-high (left) vs. deepseek-r1 (right) results from the first try deepseek-r1 is cracked... wtf!

English

103

172

2.4K

719.7K

Matt Jones retweetledi

Xiang Yue@xiangyue96·31 Oca

Introducing Critique Fine-Tuning (CFT): a more effective SFT method for enhancing LLMs' reasoning abilities. 📄 Paper: arxiv.org/pdf/2501.17703 CFT is simple: instead of training models to directly answer questions, we train them to critique noisy answers. What's fascinating is that while most approaches focus on using generative critique or reward models to provide feedback for policy models, these critique models can themselves serve as policy models： directly answering questions with stronger reasoning. Interestingly, we also found that CFT saturates quickly: overtraining on critiques can even degrade problem-solving performance. Work led by @YuboWang726 and collaborated with @WenhuChen

English

305

23.2K

Matt Jones retweetledi

Unsloth AI@UnslothAI·31 Oca

Run DeepSeek-R1 (671B) locally on @OpenWebUI - Full Guide No GPU required. Using our 1.58-bit Dynamic GGUF and llama.cpp. Tutorial: docs.openwebui.com/tutorials/inte…

English

176

838

67.6K

Matt Jones retweetledi

Jürgen Schmidhuber@SchmidhuberAI·31 Oca

DeepSeek [1] uses elements of the 2015 reinforcement learning prompt engineer [2] and its 2018 refinement [3] which collapses the RL machine and world model of [2] into a single net through the neural net distillation procedure of 1991 [4]: a distilled chain of thought system. REFERENCES (easy to find on the web): [1] #DeepSeekR1 (2025): Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2501.12948 [2] J. Schmidhuber (JS, 2015). On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arXiv 1210.0118. Sec. 5.3 describes the reinforcement learning (RL) prompt engineer which learns to actively and iteratively query its model for abstract reasoning and planning and decision making. [3] JS (2018). One Big Net For Everything. arXiv 1802.08864. See also US11853886B2. This paper collapses the reinforcement learner and the world model of [2] (e.g., a foundation model) into a single network, using the neural network distillation procedure of 1991 [4]. Essentially what's now called an RL "Chain of Thought" system, where subsequent improvements are continually distilled into a single net. See also [5]. [4] JS (1991). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992. Based on TR FKI-148-91, TUM, 1991. First working deep learner based on a deep recurrent neural net hierarchy (with different self-organising time scales), overcoming the vanishing gradient problem through unsupervised pre-training (the P in CHatGPT) and predictive coding. Also: compressing or distilling a teacher net (the chunker) into a student net (the automatizer) that does not forget its old skills - such approaches are now widely used. See also [6]. [5] JS (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990, introducing high-dimensional reward signals and the GAN principle). Contains summaries of [2][3] above. [6] JS (AI Blog, 2021). 30-year anniversary: First very deep learning with unsupervised pre-training (1991) [4]. Unsupervised hierarchical predictive coding finds compact internal representations of sequential data to facilitate downstream learning. The hierarchy can be distilled [4] into a single deep neural network. 1993: solving problems of depth >1000.

English

280

891

4.7K

847K

Matt Jones retweetledi

ILIAS ISM@illyism·1 Şub

You don't need a reasoning model like R1 or o3, just use this .cursorrules with Claude Sonnet to add a thinking step, works 100x better.

English

273

4.9K

557.6K

Matt Jones retweetledi

Ivan Fioravanti ᯅ@ivanfioravanti·1 Şub

🔥 o3-mini-high beats deepseek r1 and o1-pro! in a p5.js challenge! 03-mini result is so good that deserves a video on its own. deepseek r1 (bad result) and o1-pro (better) in comments below. Prompt in last comment. 1/4

English

131

1.2K

463.2K

Matt Jones retweetledi

Flavio Adamo@flavioAd·1 Şub

🚨 o3-mini crushed DeepSeek R1 🚨 "write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically"

English

675

1.6K

18.5K

Matt Jones retweetledi

Dimitris Papailiopoulos@DimitrisPapail·2 Şub

Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. Paper on arxiv coming on Monday. Link to a talk I gave on this below 👇 Super excited about this work!

English

142

166.7K

Matt Jones retweetledi

Sam Altman@sama·1 Şub

o3-mini is out! smart, fast model. available in ChatGPT and API. it can search the web, and it shows its thinking. available to free-tier users! click the "reason" button. with ChatGPT plus, you can select "o3-mini-high", which thinks harder and gives better answers.

English

1.6K

26.1K

3.2M

Matt Jones retweetledi

Seunghyun Seo@SeunghyunSEO7·1 Şub

what up guys, I made a one-page comparison of MHA and MLA from @deepseek_ai for those who skipped the DS-V2 paper. pls correct me if I'm wrong.

English

363

39.3K

Matt Jones retweetledi

Breeze@BreezeChai·1 Şub

Ascending to the Divine

English

1.5K

27.5K

404.6K

41.8M

Matt Jones retweetledi

LangChain@LangChain·31 Oca

📚🤖 Advanced RAG + Agents Cookbook A comprehensive open-source guide delivering production-ready implementations of cutting-edge RAG techniques with AI agents. Built with LangChain and LangGraph, it features advanced implementations like Hybrid, Self, and ReAct RAG. Learn more: github.com/athina-ai/rag-…

English

159

703

61.1K

Matt Jones retweetledi

Andi Marafioti@andimarafioti·31 Oca

Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s🔥 Inspired by our team's effort to open-source DeepSeek's R1 training, we are releasing the training and evaluation code on top of the weights 🫡 Now you can train any of our SmolVLMs—or create your own custom VLMs!

English

213

1.3K

98.6K

Matt Jones retweetledi

AK@_akhaliq·31 Oca

OpenAI o3-mini System Card

Português

361

46.7K

Matt Jones retweetledi

Han Xiao@hxiao·1 Şub

Letter-dropping physics comparison: o3-mini vs. deepseek-r1 vs. claude-3.5 in one-shot - which is the best? Prompt: Create a JavaScript animation of falling letters with realistic physics. The letters should: * Appear randomly at the top of the screen with varying sizes * Fall under Earth's gravity (9.8 m/s²) * Have collision detection based on their actual letter shapes * Interact with other letters, ground, and screen boundaries * Have density properties similar to water * Dynamically adapt to screen size changes * Display on a dark background

English

154

254

2.6K

603.6K

Matt Jones retweetledi

elvis@omarsar0·1 Şub

AI Agents for Computer Use This report provides a comprehensive overview of the emerging field of instruction-based computer control, examining available agents – their taxonomy, development, and resources.

English

141

658

65.5K

Matt Jones retweetledi

Gabriel Massadas@G4brym·2 Şub

Gemini 2.0 doesn’t get nearly enough credit. I just dumped all my workers-qb source code into it, hit it with a simple, humble prompt, and boom => it one-shotted the docs. Not just good docs, way better than what I had before, packed with examples. Kinda insane.

English

719

115.4K

Keşfet

@deepseek_ai @vllm_project @huggingface @NVIDIAAIDev @YuboWang726 @WenhuChen @OpenWebUI @elonmusk