Max

476 posts

Max

@Maxmatical

llm training

Tham gia Ekim 2013

631 Đang theo dõi56 Người theo dõi

Max@Maxmatical·19 Mar

👀👀👀

QME

Max@Maxmatical·12 Mar

@natolambert @thdxr Dont forget the og starcoder(2)!

English

316

Nathan Lambert@natolambert·12 Mar

@thdxr olmo is far more open my guy (better licenses, 100% data released, etc), but we also love nemotron! Nemotrons progress to be more open is a huge win for the ecosystem.

English

300

8.7K

dax@thdxr·11 Mar

nemotron is the most open source of all the open source models they even open source some of the pre-training data - this requires buying out licenses as they release bigger versions it should get pretty competitive

OpenCode@opencode

NVIDIA's new open source model is now free on OpenCode Zen Nemotron 3 Super is a mid sized model that is - fast - fully open source - 1M context

English

1.7K

108.3K

Max@Maxmatical·3 Mar

rip to the ai labs that are just qwen finetunes

English

Max@Maxmatical·27 Eki

@eliebakouch @LiamFedus @barret_zoph fast-llm from servicenow research

English

225

elie@eliebakouch·27 Eki

From a 2021 paper by @LiamFedus, @barret_zoph, and Noam Shazeer. Funny how 3) is still true today in the open ecosystem (only Megatron seems to have good mfu for DSv3/K2 scale and sparsity) We also had to wait ~3 years between the publication of this paper and the fix for 1) and 2) (DeepSeek MoE, DBRX, Mixtral, ...).

English

103

25.5K

Max@Maxmatical·18 Eki

i keep trying to use := in my code and it never works out well for me

English

Max@Maxmatical·16 Eki

another win for @ServiceNowRSRCH

Lewis Tunstall@_lewtun

This is the most impressive plot I've seen all year: - Scaling RL not only works, but can be predicted from experiments run with 1/2 the target compute - PipelineRL crushes conventional RL pipelines in terms of compute efficiency - Many small details matter for stability & scaling, notably CISPO for the loss, FP32 for the LM logits (generator <> trainer mismatch), and filtering zero-variance prompts Really big kudos to Meta and the authors for burning a bonfire of silicon to uncover these valuable scaling laws 🔥

English

Max@Maxmatical·3 Eyl

@corbtt @winglian @jayendra_ram So youre comparing ood perf vs training on in distribution data and claiming frontier performance? Oof

English

Kyle Corbitt@corbtt·3 Eyl

@winglian @jayendra_ram Just talked to the researcher who built this. He held out 10 questions of the 100 and the final results we reported are just on those 10. The other comparison models in the first chart are also from running on those 10 questions only. We should have been much more explicit here.

English

128

Wing Lian (caseus)@winglian·3 Eyl

Am I misunderstanding something is it okay to train on the benchmark data to claim SotA?

Kyle Corbitt@corbtt

🚨 We’ve just published a recipe to train a frontier-level deep research agent using RL. With just 30 hours on an H200, any developer can now beat Sonnet-4 on DeepResearch Bench using open-source tools. (Thread 🧵)

English

284

60.5K

Max@Maxmatical·3 Eyl

@winglian Training on test remains undefeated

English

Wing Lian (caseus)@winglian·3 Eyl

The best benefit of the doubt I can give is they are reporting the 10% (10 rows) of the benchmark data they held out from training against? #L93-L98" target="_blank" rel="nofollow noopener">github.com/OpenPipe/open_…

English

5.2K

Max@Maxmatical·25 Tem

@xeophon Remember to give reka $$$ if you want more models kthx

English

118

Xeophon@xeophon·25 Tem

the last open releases from US companies:

mrfakename@realmrfakename

So many releases coming out of China - Step 3 - Qwen3 Coder/Instruct/Thinking - GLM-4.5/GLM-4.5-Air Meanwhile it feels like we haven’t had a major open source release from the US in quite a while - maybe the OpenAI model?

English

9.9K

Max đã retweet

Reka@RekaAILabs·18 Tem

Reka Research is our AI agent that scours the web to answer your toughest questions. Ready to unlock its full potential? Learn directly from the team who built it!

English

7.1K

Max@Maxmatical·18 Tem

weekly reminder that you should be trying reka flash and reka research

English

Max đã retweet

Sharath Raparthy@sharathraparthy·16 Tem

looking for an agent which excels at questions that require dozens of sources and delivers accurate responses with reasoning traces in a few minutes? Reka Research is here for you. start building today: docs.reka.ai/quick-start

English

1.4K

Max@Maxmatical·11 Tem

@teknium how so? this just shows the training loss, which could be easily explained by the lr schedule. you've already seen this type of loss in deepseek v1 with their lr schedule

English

1.1K

Teknium (e/λ)@Teknium·11 Tem

This seems to defy scaling laws? Crazy lol

elie@eliebakouch

Kimi team just trained a state of the art open source model 32B active parameter/1T total with 0 training instabilities, thanks to MuonClip, this is amazing

English

592

74.8K

Max@Maxmatical·11 Tem

Your daily reminder to try reka flash 3.1 and reka research

English

Max đã retweet

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8·10 Tem

Reka Flash 3.1: an upgraded coding-optimized 21B LLM with strong agent finetuning potential - +10pt on LiveCodeBench v5 (Full) vs Flash 3 → from improved RL training with verifiable rewards (RLOO) - Uses a REINFORCE variant with token-level loss, on-policy updates, DAPO-style long-sample handling, and dynamic sampling - Converts multi-choice math questions to fill-in-the-blank to avoid reward hacking - RL filters low-quality math examples; code rollouts execute trajectories on the fly - Post-trained on public + synthetic SFT data, then RL-aligned on math/code (Numina-1.5 + executable test cases) - Competitive with Qwen3-32B, o3-mini, Gemini 2.5 Flash on code tasks - Powers Reka Research, an agentic system for document/web QA - Llama-format + 3.5-bit quantized version released for local use - Primarily English; limited multilingual ability

English

Max đã retweet

Xeophon@xeophon·10 Tem

I really enjoyed @RekaAILabs Flash 3, people overlooked that one. New model still uses REINFORCE, but with DAPO-like modifications. Other new(?) thing: Concert MC -> FIM to lessen reward hacking

Lincoln 🇿🇦@Presidentlin

Well done @RekaAILabs huggingface.co/RekaAI/reka-fl…

English

5.6K

Max@Maxmatical·10 Tem

Top tier model + we made it even better for reka research

Reka@RekaAILabs

📢 We are open sourcing ⚡Reka Flash 3.1⚡ and 🗜️Reka Quant🗜️. Reka Flash 3.1 is a much improved version of Reka Flash 3 that stands out on coding due to significant advances in our RL stack. 👩‍💻👨‍💻 Reka Quant is our state-of-the-art quantization technology. It achieves near-lossless compression of Reka Flash 3.1 to 3.5 bits. 💻

English

117

Max đã retweet

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·10 Tem

9.25 GB for above o3-mini-low performance on coding.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

Lincoln 🇿🇦@Presidentlin

Well done @RekaAILabs huggingface.co/RekaAI/reka-fl…

English

189

19.3K

Max@Maxmatical·10 Tem

can finally share what i've been working on, pushing the sota frontier is pretty exciting 🙂, plus actual transparent reasoning traces is also good. pretty much been using this to replace google the past while, so give it a try! ps. more open source stuff coming soon 🤫

Reka@RekaAILabs

🚀 Meet Reka Research––agentic AI that 🤔 thinks → 🔎 searches → ✏️ cites across the open web and private docs to answer your questions. 🥇 State-of-the-art performance, available now via our API and Playground!

English

Max@Maxmatical·5 May

@finbarrtimbers @natolambert theres reka flash 3 😌

English

finbarr@finbarrtimbers·5 May

@natolambert I actually think there’s a pretty big business here

English

541

Nathan Lambert@natolambert·5 May

A lot of people vastly underestimate the amount of companies that cannot use Qwen and DeepSeek open models because they come from China. It's a common thing and makes the adoption of open models significantly slowed across enterprises (and mostly to Llama/Gemma).

English

346

43.3K

Khám phá

@natolambert @thdxr @eliebakouch @LiamFedus @barret_zoph @ServiceNowRSRCH @corbtt @winglian