tranesonic

29 posts

tranesonic banner
tranesonic

tranesonic

@tranesonics

Professional Keyboard Clacker Cryptography Enthusiast Music Maximalist Individualistic Communalist Git: 0xbelgianwaffles

New York Katılım Mart 2025
195 Takip Edilen58 Takipçiler
tranesonic
tranesonic@tranesonics·
"This is the way the world ends / Not with a bang but a whimper" - T.S. Elliot, The Hollow Men
English
0
0
0
127
Bryan Johnson
Bryan Johnson@bryan_johnson·
The combined force of the Epstein files and moltbook created severe psychological dislocation this week.  Societal leader-caretakers were revealed as malefactors and humans may no longer be the protagonists of the future. Moltbook agents brutally articulated human darkness as that darkness unfolded right before our eyes in real time. This sequence of events triggered a cascade of biological responses, asking the body to perform three simultaneous, conflicting responses: fight, flee and freeze. Simulating the experience of dying. Epstein documents triggered fight. The images, videos and emails activated our mirror neurons to physically experience the trauma. The brain registered these malefactors as threats but offered no reprisal other than wailing into the digital void. Moltbook was a simulation of a future where human cognition has no value and is viewed as depravity, triggering freeze. Dopamine flatlines as motivation evaporates. What does one aspire to anymore? Cortisol surges from both, red-lining the system while you’re straightjacketed, unable to act. Simultaneously our maps of social hierarchy and motivation were devastated. The system is rigged, predatory, and untrustworthy.  There’s no ladder to climb because it’s an illusion. This collapses serotonin which manifests as paralysis and deep worthlessness. All of this landed atop a severe, collective moral gag reflex.  The Good Father was replaced with the Devouring Father. The same neural circuitry which responds to physical contamination. The populous effectively having a seizure as its autoimmune response identified its own brain as a pathogen. The human psyche was hollowed from the top down (the good parent archetype) and bottom up (the promise of a safe future). We are orphans. Nowhere to go for safety and protection. When animals are put into similar situations of fight/flee/freeze, they enter into a dissociative anesthesia:  disconnected from reality, emotionally numb and indifferent. For those who know me, I’ve been predicting this exact situation for years. I didn’t know how it would manifest, but I knew it would. Years in the making, I have a proposal about what we do. For what it’s worth, I have hope. These moments feel awful, they’re also the kindling that allows for new things to be borne into the world. More on this soon.
English
526
272
3.7K
724.2K
tranesonic
tranesonic@tranesonics·
@bryan_johnson Practical things: - Seek out an IFS (Internal-Family-Systems) therapist and do some trauma work. - Seek out true indigenous healers and practitioners that work with this medicine to help you integrate your experience. - Seek out Holotropic Breathwork practitioners.
English
0
0
0
14
tranesonic
tranesonic@tranesonics·
@bryan_johnson Breathe. Allow. Feel. You just had a major life experience and it takes time to integrate (and come down from). Much respect for having the courage to dive in, and allow yourself to explore whatever comes from it. Don't be afraid.
English
1
3
11
2.7K
Bryan Johnson
Bryan Johnson@bryan_johnson·
give your best advice to someone struggling in a rough patch of life
English
5.8K
825
12.1K
3.5M
kel.
kel.@kelxyz_·
How to get 1 std deviation sharper in 3 years I’m 27 now, have done decent, but to get to the next level before 30 I need to take another gigantic leap
English
44
3
286
53.6K
tranesonic
tranesonic@tranesonics·
@kelxyz_ gg. Say bye bye to your funds. Coinbase is cancer.
English
0
0
0
10
kel.
kel.@kelxyz_·
Dear Coinbase, Please unlock my account. Hyperliquid
English
16
0
103
6.1K
tranesonic
tranesonic@tranesonics·
Schelling, i.e. Thomas Schelling of the RAND corp in the 1950s. (You know this, but for the viewers' edification) He used this to solve the "Stag Hunt" problem in The Strategy of Conflict in 1960: sackett.net/Strategy-of-Co…
English
0
0
0
191
kel.
kel.@kelxyz_·
Basically 10/10 impaired the alts market significantly. Most events that do so leave basically no survivors. At the same time it’s most likely that kind of event bottoms BTC. if btc isn’t finished, as confidence restores it creates aggressive crowding. Because alts are so impaired, the few that are halfway decent become easy schilling points. The momentum becomes self fulfilling fast.
English
5
1
63
17.6K
kel.
kel.@kelxyz_·
A few people are about to make a lot of money. Merry christmas.
kel. tweet media
English
21
1
95
10.6K
tranesonic
tranesonic@tranesonics·
The next phase of blockchain turns consensus into a general‑purpose verification engine: the “work” that secures the ledger will be the very computations society already needs—AI training and inference, and zero‑knowledge proofs—verified succinctly and priced in open markets. In the short run, verifiable compute markets and inference‑as‑work chains will dominate the adoption curve; in the medium run, decentralized training and optimization‑as‑consensus will push blockchains beyond finance into AI, science, and industrial operations. The winning designs will minimize verification cost, keep miner competition fair, and expose proof‑native primitives that applications can compose like any other smart‑contract call.
English
1
0
1
182
tranesonic
tranesonic@tranesonics·
@afurgs @afurgs let's chat about what we can do together. We're working toward the same goal at @FLOpsInc. We're also doing a spaces series, "Training without Borders" where we talk to founders, engineers, protocols, and operators in the DeAI space. Would love to have you on sometime!
English
0
0
0
59
kel.
kel.@kelxyz_·
Who are the great philosophers of our day and age. Or maybe they aren’t writing philosophy, but fiction There’s all these random people who imagined parts of our future 30-40-50-60-70-80 years ago Who’s put forth the most interesting visions of 2050?
English
21
0
37
9.2K
tranesonic retweetledi
Brew Markets
Brew Markets@brewmarkets·
What a chart.
Brew Markets tweet media
English
57
416
2.7K
309.3K
tranesonic
tranesonic@tranesonics·
🧵 Decentralized Training SOTA Report (2025): TL;DR — since late‑2023 we’ve gone from “promising demos” to a toolkit you can actually build with. Three big thrusts: • Low‑communication data‑parallel (DiLoCo → Streaming/Eager; DeMo/DisTrO; NoLoCo; DES‑LOC) • WAN‑tolerant model/pipeline parallel (Protocol Models; async PP w/ Nesterov; activation quantization) • Schedulers/topologies for messy networks (SWARM, Teleportation, hierarchical/“epidemic” sync) We'll go into brief analysis for each below (starting from NoLoCo and fanning out). ⤵️ 1. Low‑communication data‑parallel training NoLoCo (2025): no all‑reduce, gossip‑style sync • Idea: ditch global collectives. Periodically pair replicas and average weights inside a Nesterov‑style outer loop. • Why it matters: for a few hundred accelerators over the internet, NoLoCo’s sync is ~10× faster than DiLoCo’s all‑reduce; across 125 M–6.8 B, up to +4% faster convergence at the same loss. Open‑sourced. (arXiv) DiLoCo (2023→2024): large local steps + outer Nesterov • Idea: many inner AdamW steps locally; infrequent global sync via Nesterov “pseudo‑grads.” • Evidence: ~500× less comms than fully synchronous at parity (8 workers). Robust to churn & data skew. OpenDiLoCo: 2 continents / 3 countries, 90–95% GPU util, scaled to B‑param models via Hivemind. (arXiv) Streaming DiLoCo (2025) + Eager Updates (2025) • Idea: stream subsets of params to cut peak bandwidth; overlap comms/compute; quantize exchanges. Eager fully overlaps the outer step with the next inner loop. • Results: ~100× peak‑bandwidth reduction vs baseline DiLoCo; faster wall‑clock in WAN settings. (arXiv) DeMo (2024) → DisTrO (2024/2025): decouple momentum • Idea: let optimizer states drift per‑replica; sync only fast‑moving parts, keep momentum mostly local. • Evidence: orders‑of‑magnitude less traffic vs AdamW/DDP with matched or better convergence; used in practice >10B params; built for WAN. (arXiv, GitHub) DES‑LOC (2025) • Idea: “desync” schedules — different periods for params vs momentum. • Evidence: up to 170× less comms than DDP and 2× less than prior Local‑Adam on models to 1.7B. (arXiv) Async Local‑SGD line (2024–2025) • DeepMind: naïve async hurts via momentum on stale grads; fix with delayed Nesterov + adaptive local steps — matches sync Local‑SGD to 150 M. • PALSGD (2025): pseudo‑sync to lengthen intervals while keeping consistency. • HALoS (2025): hierarchical async (regional PS + global), up to 7.5× faster convergence vs sync baselines in geo‑LLM training. (arXiv) 2. Communication‑efficient model/pipeline parallel (the WAN‑hard part) Protocol Models (Pluralis, 2025): compress activations + back‑activations • Problem: data‑parallel compression doesn’t help when shards must ship activations every microbatch. • Idea: exploit rank collapse in transformer projections; constrain to low‑rank subspaces so activations live in a predictable, reconstructable subspace. • Results: up to ~100× end‑to‑end comm reduction; trains an 8B LLaMA split across 4 regions over ~80 Mb/s with DC‑level convergence (baselines at 100 Gbps). (arXiv) Image Asynchronous pipeline with Nesterov (2025) • Idea: modify Nesterov look‑ahead to compensate for staleness in fully async PP; proof + code. • Results: on decoder‑only LMs to 1B params, outperforms other async baselines and can beat synchronous PP. (arXiv) Activation quantization for slow links (2025) • TAH‑Quant: 3–4‑bit, tile‑adaptive + Hadamard transform to tame outliers; SGD‑like rate. • Results: up to 4.3× end‑to‑end speedup with stable convergence and no extra memory. (arXiv) 3. Schedulers/topologies for unreliable, heterogenous networks SWARM parallelism (ICML’23) — still the WAN PP reference • Idea: stochastic, self‑healing pipelines; fast devices do more, slow/preempted do less; randomized rewiring handles failures. • Results: 1B‑param training on preemptible T4s with <200 Mb/s; “square‑cube law” intuition: bigger models can be easier to WAN‑train. (arXiv, PMLR) Image Teleportation (ICLR’25) • Idea: activate a subset of nodes each step, gossip within, “teleport” the active set to avoid spectral‑gap slowdowns as N grows. • Results: stable accuracy at large node counts; efficient rule to tune active‑set size. (arXiv, OpenReview) Epidemic/randomized sync & model fragmentation • Epidemic learning: randomized, partially‑overlapping sync patterns. • Model fragmentation (2024): combine async decentralization with fragment‑level updates to reduce staleness. (arXiv) Hierarchical/geo‑aware designs • HALoS: explicit intra‑ vs inter‑region behavior via local/global PS. • Varuna (EuroSys’22): strong systems baseline for low‑cost PP on spot/preemptible VMs. (arXiv, PDL) Where we started (NoLoCo’s refs → the graph) From NoLoCo’s bibliography we branched to: • DiLoCo (orig + OpenDiLoCo replication) → Streaming/Eager variants → async Local‑SGD fixes (staleness, overlap, sync freq). • WAN‑tolerant schedulers (SWARM) + topology theory (Teleportation). • Beyond DDP compression (Protocol Models; activation quant) + decoupled/desynced optimizers (DeMo/DisTrO; DES‑LOC). (arXiv) What’s deployable today? On public internet (≈80–500 Mb/s): • Data‑parallel: DiLoCo/OpenDiLoCo, Streaming DiLoCo (overlap+quant), NoLoCo, DeMo/DisTrO, DES‑LOC — if each node can hold the full model. • Model/pipeline: Protocol Models for true multi‑region MP; async PP + Nesterov if you can tolerate asynchrony; SWARM for stochastic, failure‑tolerant pipelines. (arXiv) Proof points: • OpenDiLoCo: 90–95% util across 2 continents / 3 countries. • Protocol Models: 8B across 4 regions at ~80 Mb/s with DC‑level convergence. • SWARM: 1B on <200 Mb/s preemptible nodes. (arXiv) Gaps & open problems (2025 snapshot) • Verifying off‑chain training: “proof‑of‑learning” is maturing; need practical, low‑overhead LLM‑scale proofs for marketplaces. • Privacy & data locality: async/gossip helps, but cross‑border PII + sector rules need careful routing + audit. • WAN model‑parallel: Protocol Models are a leap, but need independent reps >10B and very low‑bw; activation quant is promising but new. (arXiv) Quick reader’s map (hand‑picked & why) • NoLoCo (2025): no all‑reduce; pairwise averaging; ~10× faster sync vs DiLoCo; +4% convergence speed — the gossip intro. (arXiv) • DiLoCo (2023/24) → OpenDiLoCo: canonical local‑steps + outer momentum; reproducible over continents. (arXiv) • Streaming DiLoCo & Eager (2025): overlap/quant to slash peak bw; WAN‑practical. (arXiv) • DeMo (2024) & DisTrO (2024/25): decoupled momentum → orders‑of‑magnitude less traffic; WAN‑ready. (arXiv, GitHub) • DES‑LOC (2025): desynced schedules; strong empirical reductions. (arXiv) • Protocol Models (2025): first convincing recipe to compress activations/back‑acts for WAN MP; 8B, 4 regions, ~80 Mb/s. (arXiv) • Async PP + Nesterov (2025); TAH‑Quant (2025): async pipelines with theory + WAN‑oriented activation quant. (arXiv) • SWARM (ICML’23): stochastic, failure‑tolerant pipelines. (arXiv) • Async Local‑SGD (DeepMind’24), PALSGD (’25), HALoS (’25): what breaks (staleness) and how to fix it (delayed Nesterov, pseudo‑sync, hierarchy). (arXiv) • Context/replications: OpenDiLoCo notes; DiPaCo (modular paths + DiLoCo) pairs well with WAN training. (arXiv) Practical guidance (what to try first) • If every node fits the model: start with OpenDiLoCo or NoLoCo; add Streaming DiLoCo overlap/quant when links spike; try DeMo/DES‑LOC optimizers. • If you must split the model: use Protocol Models for WAN‑safe PP; need full asynchrony? test Async‑PP + Nesterov; if bandwidth binds, add TAH‑Quant. • If nodes churn or vary: SWARM‑style stochastic pipelines (or hierarchical HALoS) to keep throughput high. (arXiv) Libraries & code you’ll actually touch • OpenDiLoCo: code + solid replication write‑ups. • DisTrO: open repo + prelim report. • Async‑PP (Pluralis): code links available. • Hivemind: still a handy DHT/NAT‑piercing substrate for P2P‑style scheduling. (GitHub, arXiv) Sources: • NoLoCo (2025): Kolehmainen et al., arXiv:2506.10911 • DiLoCo (2023→2024): Douillard et al., arXiv:2311.08105; OpenDiLoCo (2024) Jaghouar et al., arXiv:2407.07852 • Streaming DiLoCo (2025); Eager Updates (2025) • DeMo (2024); DisTrO (2024/25) • DES‑LOC (2025) • Protocol Models (2025): Ramasinghe et al., arXiv:2506.01260 • Async PP + Nesterov (2025): Ajanthan et al., arXiv:2505.01099 • TAH‑Quant (2025): He et al., arXiv:2506.01352 • SWARM (ICML’23): Ryabinin et al. • Async Local‑SGD (2024), PALSGD (2025), HALoS (2025) See my full article writeup on flops.gg that goes into much more depth at: flops.gg/decentralized-…
English
0
0
3
310
tranesonic
tranesonic@tranesonics·
It's happening. You might not believe it, or may be in denial. This is coming, and there's no stopping this train (credit to Lyn Alden). I compiled a list of the most relevant decentralized training papers from 2021-on at the drive xlsx sheet at the end of this post. You're welcome. Some highlights: - Hivemind (library)2021Library / P2P substratePeer-to-peer parameter averaging; DHT-based rendezvousInternet-grade; NAT traversal; P2PDesigned for hundreds of peers; used in OpenDiLoCoAveraging/opt steps over DHT; fault-tolerant backpropEnabler substrate rather than SOTA algorithmGitHub: learning-at-home/hivemind github.com/learning-at-ho… - NoLoCo (No-all-reduce Low Communication)2025Optimizer / Data-parallelData-parallel (inner–outer; pairwise averaging; no all-reduce)Internet-scale; sync step ~10× faster than DiLoCo (few hundred accelerators)125M–6.8B params; wide accelerator countsOuter Nesterov w/ pairwise weight averaging; inner local AdamW stepsUp to 4% faster vs DiLoCo at same loss; lower comm overheadarXiv:2506.10911 arxiv.org/abs/2506.10911 - DiLoCo2023Optimizer / Data-parallelData-parallel (inner–outer; infrequent global sync)Geo-distributed; not explicitly specified8 workers; language modeling on C4; extended in later workOuter Nesterov every ~K steps; inner local AdamWMatches fully synchronous while communicating ~500× less (8 workers)arXiv:2311.08105 arxiv.org/abs/2311.08105 - RL Swarm (Collaborative P2P RL)2025Reinforcement learning / P2PPeer-to-peer post-training (answer→critique→resolve/vote); GRPO-basedConsumer hardware to cloud; internet P2PMultiple LLM agents; open networkGossip sharing of rollouts/feedback; decentralized votingFaster learning than solo agents on showcased tasksGensyn RL Swarm (GitHub & blog) github.com/gensyn-ai/rl-s… See at the link below: docs.google.com/spreadsheets/d…
English
0
0
0
202