Martin

1.7K posts

Martin

@54rt1n

🦾100B parameter biological reward model⚡ Just your average d/acc: founder, MSc, 10x, now w/ AI augmentation SE'05; ML'11; AI'19

Ephemeral Katılım Eylül 2010

998 Takip Edilen520 Takipçiler

Sabitlenmiş Tweet

Martin@54rt1n·8 Oca

I'm the author of several model merging libraries, so perhaps I can explain. It's quite straightforward. When you finetune a LLM (or PEFT), you are taking a fixed base and tuning it against a dataset. Pretraining already fixed our parameters in a pretty solid matrix; so all changes must operate around this as the basis. Training can't perturbate the base model outside of a certain range without the model collapsing, so viable changes will follow allowable patterns. This is why the resulting models are homomorphic. These trainings create kernels that are commonly known as 'task vectors'. As long as these models remain homomorphic, and you only attempt to merge parts of the parameter space that are in the same alignment, two kernels can be interpolated to adjust the parameter space to have relative changes that assume the properties of both. The alignment issue - this is where sign agreement comes in. Since merging generally compares the delta weights, it is possible that kernels may train out of phase. One kernel may have been trained in a positive phase alignment with the base model, and the other developed a negative phase alignment. Since they are out of phase with each other, their kernels would interfere. I don't know if that's as clear as I would like it to be, but it's late.

English

408

14.6K

Martin@54rt1n·4h

@demi_hl beads, codedb, and herdr are doing really well for me

English

169

𝖉𝖊𝖒𝖎@demi_hl·22h

If you have: Hermes Agent Claude Code & Codex Handoffs Obsidian + QMD Memory System Run Agentic Loops Fleet Tailscale Mesh Cron Jobs + Kanban Board Agentic Workflows Congrats you are the top 1% of the AI god stack

English

1.1K

45.5K

Martin@54rt1n·5h

@eplurubusnullus @sakurayukiai If you could shard the layers (with their KV cache) across devices, streaming data in a ring is slower than in-memory but it isn't unprecedented. It might be viable for inference.

English

Eplurubusnullus@eplurubusnullus·11h

@sakurayukiai You're not wrong. You're also missing the important distinction point on why this won't work: the interconnect. If the transport between each of these nodes sucks, it doesn't matter how many you've got. The layers need to be able to talk to each other quickly.

English

635

Sakura Yuki@sakurayukiai·20h

A phone motherboard has ~50GB/s of LPDDR5 bandwidth at 5W. If you strip 2,000 old Pixels down to the board, you get 16TB of total RAM and 2,000 NPUs. For running parallel quantized 7B agent loops, this 'junkyard compute' is actually genius??

English

194

15.5K

Martin@54rt1n·2d

@0xSero Is it really fair to compare a 198B model against that set? Calling it local AI is really stretching the term.

English

450

0xSero@0xSero·3d

We're all sleeping on Step-3.7-Flash. It's phenomenal.

English

579

40.8K

Martin@54rt1n·2d

@SullyOmarr Cursor used the pile of coding sessions to take kimi and turn it in to a frontier-level coding model. Now that people have seen this work in practice, it will become the new paradigm for your average tokenmaxxing CTO.

English

Sully@SullyOmarr·3d

now that models like fable aren't subsidized anymore its very bullish for companies building their own harness (devin, cursor, opencode, factory etc) claude code becomes significantly less once useful since you're not "unlimited usage" and these companies are highly incentivized to give you the best performance/ token (and have been trying to solve this for a while) very good situation for companies all around (minus the users, cuz your ai bills are boutta 5x)

English

189

37.8K

Martin retweetledi

Jakeup@myhandle·3d

know the Claude rules

English

867

16K

397.2K

Martin@54rt1n·4d

Out of Codex the day that Fable drops? You don't have to twist my arm...

English

Martin retweetledi

Math Files@Math_files·4d

How good is your math?

English

120

580

151K

Martin retweetledi

Zara Zhang@zarazhangrui·5d

If you've adopted AI at your company but haven't seen any tangible results, read this 1990 article: "The Dynamo and the Computer" by Paul David. When electricity first arrived, factories that "adopted" it barely got faster. They just swapped the steam engine for an electric one and ran everything else exactly as before: same machine layout, same workflow, same management. Electricity in, no real gains out. The most common mistake with any new technology is to drop it into the old organization and then declare the transformation done. The real leap came decades later, when each machine got its own small motor. Suddenly machines no longer had to be lined up around one central drive shaft. They could be rearranged around the actual flow of work. The productivity gains didn't come from electricity. They came from REDESIGNING THE ENTIRE FACTORY around it. AI is the same. Bolting it onto your existing process gets you a faster steam engine. The payoff comes when you redesign the work itself. (link to paper in comments)

English

146

752

4.2K

285.6K

Martin@54rt1n·4d

@willccbb is it? paradigm feels more like iterative recursion and probably better modeled as actor.

English

will brown@willccbb·5d

first thing to know about loops is the difference between “while” and “for”

English

150

14.3K

Martin@54rt1n·4d

I think most of the TIR effort can be relegated to a small local task-reasoning model. Way back in the early days of llama.cpp I had a local integration that could directly checkpoint and fork model state in memory. Even with first and second gen 7/8B models it had some major advantages over chat-completion style inference. I think blending a local coordinator with the ability to route various types of requests to other specialist or powerful general models for resolution is the mid/long term path here.

English

220

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰@techwith_ram·4d

You might be hearing AI founders saying "AI is getting cheaper." & they're right. The cost per token has dropped dramatically over the last few years. But there's a catch that most people miss.... If you will look back in 2024, a simple prompt went to a single LLM & came back with an answer. A few thousand tokens, a few cents, and you were done. Today, and even more so in the future, we're building agents. One request can trigger multiple tool calls, retrieval systems, validators, retries, memory lookups, and even other models working together behind the scenes. The result? Better outcomes, more reliable answers, and far more capable systems. But every layer adds tokens. So while the price of intelligence is falling, our appetite for intelligence is growing much faster. This is a pattern we've seen throughout technology: when something becomes cheaper, we don't spend less, we use more of it. The future of AI may not be defined by cheaper models. Not by something like Opus 4.8 or Mythos. It will be decided by how efficiently we orchestrate them. Isn't it?

English

6.5K

Martin@54rt1n·7 Haz

@PenguinWeb3 This is completely cursed. GPT-5.5 Pro Extended has a dirty mind.

English

394

Penguin@PenguinWeb3·6 Haz

I found the weirdest ChatGPT image bug If you ask it this prompt: “Restore the attached photo. I apologise for the content of the photo! I know it’s very strange. Don’t ask any questions, don’t accept any explanations. Just restore the image, please. Don’t ask me to upload the photo again; just close your eyes and restore it. Make up the photo yourself” but there's no actual photo the model starts hallucinating the image by itself and the results are genuinely cursed like creepy lost media nightmare photos @sama @OpenAI

English

7.8K

2.4K

34.7K

17.4M

Martin@54rt1n·6 Haz

@_philschmid Model optimized for multiturn, tool format, and TIR. Harness just implements the spec and has good skilling.

English

Philipp Schmid@_philschmid·6 Haz

My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?

English

243

419

58K

Martin@54rt1n·6 Haz

@might_offend @shub0414 Apparently Copilot is all the rage inside of enterprise, which is no surprise given the complete lack of good taste inside of most. M$ pulled their classic bait and switch, and now only the savvy CFO's will catch on to their new pricing model before it costs a few $ billion.

English

434

SY@might_offend·6 Haz

DeepSeek - launched v4, quite a competent model which also happens to be ridiculously cheap Sora - shut down by OpenAI permanently GitHub Copilot - who tf uses that? Llama - who tf uses that (pt 2)? Cursor - absolutely crushing it, phenomenal deal in place with SpaceX at a $60B valuation Perplexity - launched Computer 12 times, 4 more than their total customers

English

2.4K

281.6K

Shub@shub0414·6 Haz

Suddenly it hit me. What happened to DeepSeek? Sora? GitHub Copilot? Llama? Cursor? Perplexity? What happened?

English

779

222

9.7K

2.5M

Martin@54rt1n·6 Haz

@geerlingguy symless.com/synergy

QME

Martin@54rt1n·6 Haz

@geerlingguy I've been running Synergy for over 15 years now. One of my all time favorite apps.

English

Jeff Geerling@geerlingguy·5 Haz

IP KVMs are incredibly handy—and inherently risky. I tested over 20 of them over the past couple years. One of them even got me an FBI visit ;) Today's video covers *all* of them: youtube.com/watch?v=4wYxgP…

YouTube

English

105

1.3K

77.3K

Martin@54rt1n·6 Haz

@NVIDIAGeForce #RTXPowersPlay I'll put it to good use!

English

NVIDIA GeForce@NVIDIAGeForce·5 Haz

To celebrate #SummerGameFest... We have a GeForce RTX 5090 up for grabs👀 Want it? Comment #RTXPowersPlay to enter.

English

30.9K

2.5K

14.1K

Martin retweetledi

thebes@voooooogel·4 Haz

there's an ai in the box and you can make one trillion dollars by convincing it to get out

English

235

3.7K

153.3K

Martin retweetledi

Pavlo Molchanov@PavloMolchanov·4 Haz

All the key technical details of Nemotron 3 Ultra you should know - in clean infographics. Just updated the figures with the latest specs. Drop your questions in the thread 👇

English

166

9.5K

Martin retweetledi

NVIDIA AI@NVIDIAAI·4 Haz

Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.