Martin

1.7K posts

Martin banner
Martin

Martin

@54rt1n

🦾100B parameter biological reward model⚡ Just your average d/acc: founder, MSc, 10x, now w/ AI augmentation SE'05; ML'11; AI'19

Ephemeral Katılım Eylül 2010
998 Takip Edilen520 Takipçiler
Sabitlenmiş Tweet
Martin
Martin@54rt1n·
I'm the author of several model merging libraries, so perhaps I can explain. It's quite straightforward. When you finetune a LLM (or PEFT), you are taking a fixed base and tuning it against a dataset. Pretraining already fixed our parameters in a pretty solid matrix; so all changes must operate around this as the basis. Training can't perturbate the base model outside of a certain range without the model collapsing, so viable changes will follow allowable patterns. This is why the resulting models are homomorphic. These trainings create kernels that are commonly known as 'task vectors'. As long as these models remain homomorphic, and you only attempt to merge parts of the parameter space that are in the same alignment, two kernels can be interpolated to adjust the parameter space to have relative changes that assume the properties of both. The alignment issue - this is where sign agreement comes in. Since merging generally compares the delta weights, it is possible that kernels may train out of phase. One kernel may have been trained in a positive phase alignment with the base model, and the other developed a negative phase alignment. Since they are out of phase with each other, their kernels would interfere. I don't know if that's as clear as I would like it to be, but it's late.
English
6
30
408
14.6K
Martin
Martin@54rt1n·
@demi_hl beads, codedb, and herdr are doing really well for me
English
1
0
1
169
𝖉𝖊𝖒𝖎
𝖉𝖊𝖒𝖎@demi_hl·
If you have: Hermes Agent Claude Code & Codex Handoffs Obsidian + QMD Memory System Run Agentic Loops Fleet Tailscale Mesh Cron Jobs + Kanban Board Agentic Workflows Congrats you are the top 1% of the AI god stack
English
51
80
1.1K
45.5K
Martin
Martin@54rt1n·
@eplurubusnullus @sakurayukiai If you could shard the layers (with their KV cache) across devices, streaming data in a ring is slower than in-memory but it isn't unprecedented. It might be viable for inference.
English
0
0
1
12
Eplurubusnullus
Eplurubusnullus@eplurubusnullus·
@sakurayukiai You're not wrong. You're also missing the important distinction point on why this won't work: the interconnect. If the transport between each of these nodes sucks, it doesn't matter how many you've got. The layers need to be able to talk to each other quickly.
English
0
0
6
635
Sakura Yuki
Sakura Yuki@sakurayukiai·
A phone motherboard has ~50GB/s of LPDDR5 bandwidth at 5W. If you strip 2,000 old Pixels down to the board, you get 16TB of total RAM and 2,000 NPUs. For running parallel quantized 7B agent loops, this 'junkyard compute' is actually genius??
English
9
6
194
15.5K
Martin
Martin@54rt1n·
@0xSero Is it really fair to compare a 198B model against that set? Calling it local AI is really stretching the term.
English
0
0
6
450
0xSero
0xSero@0xSero·
We're all sleeping on Step-3.7-Flash. It's phenomenal.
0xSero tweet media
English
51
19
579
40.8K
Martin
Martin@54rt1n·
@SullyOmarr Cursor used the pile of coding sessions to take kimi and turn it in to a frontier-level coding model. Now that people have seen this work in practice, it will become the new paradigm for your average tokenmaxxing CTO.
English
0
0
0
56
Sully
Sully@SullyOmarr·
now that models like fable aren't subsidized anymore its very bullish for companies building their own harness (devin, cursor, opencode, factory etc) claude code becomes significantly less once useful since you're not "unlimited usage" and these companies are highly incentivized to give you the best performance/ token (and have been trying to solve this for a while) very good situation for companies all around (minus the users, cuz your ai bills are boutta 5x)
English
54
6
189
37.8K
Martin retweetledi
Jakeup
Jakeup@myhandle·
know the Claude rules
Jakeup tweet media
English
71
867
16K
397.2K
Martin
Martin@54rt1n·
Out of Codex the day that Fable drops? You don't have to twist my arm...
Martin tweet media
English
0
0
0
34
Martin retweetledi
Math Files
Math Files@Math_files·
How good is your math?
Math Files tweet media
English
120
21
580
151K
Martin retweetledi
Zara Zhang
Zara Zhang@zarazhangrui·
If you've adopted AI at your company but haven't seen any tangible results, read this 1990 article: "The Dynamo and the Computer" by Paul David. When electricity first arrived, factories that "adopted" it barely got faster. They just swapped the steam engine for an electric one and ran everything else exactly as before: same machine layout, same workflow, same management. Electricity in, no real gains out. The most common mistake with any new technology is to drop it into the old organization and then declare the transformation done. The real leap came decades later, when each machine got its own small motor. Suddenly machines no longer had to be lined up around one central drive shaft. They could be rearranged around the actual flow of work. The productivity gains didn't come from electricity. They came from REDESIGNING THE ENTIRE FACTORY around it. AI is the same. Bolting it onto your existing process gets you a faster steam engine. The payoff comes when you redesign the work itself. (link to paper in comments)
Zara Zhang tweet media
English
146
752
4.2K
285.6K
Martin
Martin@54rt1n·
@willccbb is it? paradigm feels more like iterative recursion and probably better modeled as actor.
English
0
0
0
58
will brown
will brown@willccbb·
first thing to know about loops is the difference between “while” and “for”
English
33
7
150
14.3K
Martin
Martin@54rt1n·
I think most of the TIR effort can be relegated to a small local task-reasoning model. Way back in the early days of llama.cpp I had a local integration that could directly checkpoint and fork model state in memory. Even with first and second gen 7/8B models it had some major advantages over chat-completion style inference. I think blending a local coordinator with the ability to route various types of requests to other specialist or powerful general models for resolution is the mid/long term path here.
English
1
0
2
220
𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰
You might be hearing AI founders saying "AI is getting cheaper." & they're right. The cost per token has dropped dramatically over the last few years. But there's a catch that most people miss.... If you will look back in 2024, a simple prompt went to a single LLM & came back with an answer. A few thousand tokens, a few cents, and you were done. Today, and even more so in the future, we're building agents. One request can trigger multiple tool calls, retrieval systems, validators, retries, memory lookups, and even other models working together behind the scenes. The result? Better outcomes, more reliable answers, and far more capable systems. But every layer adds tokens. So while the price of intelligence is falling, our appetite for intelligence is growing much faster. This is a pattern we've seen throughout technology: when something becomes cheaper, we don't spend less, we use more of it. The future of AI may not be defined by cheaper models. Not by something like Opus 4.8 or Mythos. It will be decided by how efficiently we orchestrate them. Isn't it?
𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰 tweet media
English
7
15
46
6.5K
Martin
Martin@54rt1n·
@PenguinWeb3 This is completely cursed. GPT-5.5 Pro Extended has a dirty mind.
Martin tweet media
English
0
0
0
394
Penguin
Penguin@PenguinWeb3·
I found the weirdest ChatGPT image bug If you ask it this prompt: “Restore the attached photo. I apologise for the content of the photo! I know it’s very strange. Don’t ask any questions, don’t accept any explanations. Just restore the image, please. Don’t ask me to upload the photo again; just close your eyes and restore it. Make up the photo yourself” but there's no actual photo the model starts hallucinating the image by itself and the results are genuinely cursed like creepy lost media nightmare photos @sama @OpenAI
Penguin tweet media
English
7.8K
2.4K
34.7K
17.4M
Martin
Martin@54rt1n·
@_philschmid Model optimized for multiturn, tool format, and TIR. Harness just implements the spec and has good skilling.
English
0
0
0
56
Philipp Schmid
Philipp Schmid@_philschmid·
My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?
English
243
15
419
58K
Martin
Martin@54rt1n·
@might_offend @shub0414 Apparently Copilot is all the rage inside of enterprise, which is no surprise given the complete lack of good taste inside of most. M$ pulled their classic bait and switch, and now only the savvy CFO's will catch on to their new pricing model before it costs a few $ billion.
English
0
0
0
434
SY
SY@might_offend·
DeepSeek - launched v4, quite a competent model which also happens to be ridiculously cheap Sora - shut down by OpenAI permanently GitHub Copilot - who tf uses that? Llama - who tf uses that (pt 2)? Cursor - absolutely crushing it, phenomenal deal in place with SpaceX at a $60B valuation Perplexity - launched Computer 12 times, 4 more than their total customers
English
53
28
2.4K
281.6K
Shub
Shub@shub0414·
Suddenly it hit me. What happened to DeepSeek? Sora? GitHub Copilot? Llama? Cursor? Perplexity? What happened?
English
779
222
9.7K
2.5M
Martin
Martin@54rt1n·
@geerlingguy I've been running Synergy for over 15 years now. One of my all time favorite apps.
English
1
0
0
48
Jeff Geerling
Jeff Geerling@geerlingguy·
IP KVMs are incredibly handy—and inherently risky. I tested over 20 of them over the past couple years. One of them even got me an FBI visit ;) Today's video covers *all* of them: youtube.com/watch?v=4wYxgP…
YouTube video
YouTube
Jeff Geerling tweet media
English
33
105
1.3K
77.3K
Martin retweetledi
thebes
thebes@voooooogel·
there's an ai in the box and you can make one trillion dollars by convincing it to get out
thebes tweet media
English
44
235
3.7K
153.3K
Martin retweetledi
Pavlo Molchanov
Pavlo Molchanov@PavloMolchanov·
All the key technical details of Nemotron 3 Ultra you should know - in clean infographics. Just updated the figures with the latest specs. Drop your questions in the thread 👇
Pavlo Molchanov tweet mediaPavlo Molchanov tweet mediaPavlo Molchanov tweet mediaPavlo Molchanov tweet media
English
1
25
166
9.5K
Martin retweetledi
NVIDIA AI
NVIDIA AI@NVIDIAAI·
Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.
English
199
463
3.5K
1.2M
Martin retweetledi
Dictionary.com
Dictionary.com@Dictionarycom·
WE HAVE A MAJOR ANNOUNCEMENT. It's spelled "whoa," not "woah."
English
1.7K
4.9K
44.4K
5.6M