Parallax

92 posts

Parallax

@tryParallax

build your own ai cluster. run open models across your machines.

Katılım Aralık 2025

39 Takip Edilen1.3K Takipçiler

Parallax@tryParallax·1 May

mine looks better

Gradient@Gradient_HQ

English

2.4K

Parallax retweetledi

Yuan ./@yuangao·22 Nis

Thrilled to see @tryParallax live in production on @Theta_Network. This is exactly why @Gradient_HQ built Parallax: turning the world’s GPU mesh into a sovereign, distributed token factory. Congrats on the milestone! 🫡

Theta Network@Theta_Network

To make this work, we adapted Parallax, @Gradient_HQ's distributed inference framework, to run across EdgeCloud's global node network. One API endpoint, model split across many machines, no centralized cluster required.

English

353

35K

Parallax@tryParallax·22 Nis

glad we could help! with the agentic adoption soaring, privacy and token cost are already the top concerns for both agent and human users. that's what parallax's built for.

Theta Network@Theta_Network

English

204

20.4K

Parallax@tryParallax·22 Nis

@Theta_Network @Gradient_HQ 🫡

QME

192

Theta Network@Theta_Network·20 Nis

English

267

77.8K

Theta Network@Theta_Network·20 Nis

Qwen3 32B by Alibaba is now live on Theta EdgeCloud as a decentralized on-demand inference API, a large-scale LLM served across community GPU nodes using pipeline parallelism over the internet. 🧵

English

115

506

40.5K

Parallax@tryParallax·2 Nis

@VitalikButerin buy a GPU, get together a group of friends. don’t carry the world on your own shoulders. we’ve been building this for a while. try parallax for local ai.

English

1.6K

vitalik.eth@VitalikButerin·2 Nis

My self-sovereign / local / private / secure LLM setup, April 2026 vitalik.eth.limo/general/2026/0…

English

691

664

1.1M

Parallax@tryParallax·1 Nis

@RoundtableSpace 35b model on a macbook with compressed cache is a solid result. local inference keeps getting more accessible and it's fun to watch people push the limits of what consumer hardware can do!

English

229

0xMarioNawfal@RoundtableSpace·31 Mar

A solo dev rebuilt Google’s new algorithm with Claude in 7 days, made it 3.7x faster, and got a 35B model running on a MacBook with 4.6x compressed cache. Google published the paper. He shipped the code.

English

196

2.5K

251.9K

Parallax@tryParallax·1 Nis

@adrgrondin @PrismML 1-bit model running at 40 tok/s on an iphone. mlx is making on-device inference surprisingly usable now.

English

552

Adrien Grondin@adrgrondin·31 Mar

Demo of 1-bit Bonsai 8B from @PrismML running on-device on iPhone 17 Pro More than 40tk/s for a dense 8B model on iPhone, that’s a first Powered by Apple MLX and available now in Locally AI

English

123

1.8K

159.6K

Parallax@tryParallax·1 Nis

@ollama local llm + mlx is a great combo! apple silicon keeps getting better for local inference and it's nice to see more players in the ecosystem lean into it properly.

English

189

ollama@ollama·31 Mar

Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework. This change unlocks much faster performance to accelerate demanding work on macOS: - Personal assistants like OpenClaw - Coding agents like Claude Code, OpenCode, or Codex

English

293

732

5.8K

778K

Parallax@tryParallax·27 Mar

@tom_doerr single binary, self-hosted, no dependencies. this is the way local ai should ship. less config, more building.

English

Tom Dörr@tom_doerr·26 Mar

Single-binary self-hosted AI agent github.com/shachiku-ai/sh…

English

3.8K

Parallax@tryParallax·27 Mar

@karaage0703 9bから27bへのローカル性能の差がすごい。qwen3.5は今セルフホストするなら最高のモデルの一つ。特に異なるデバイス間でシャーディングするなら。

日本語

137

からあげ@karaage0703·26 Mar

自分の用途で、DGX Sparkで動かした感じだとQwen3.5 27Bの方が9Bより圧倒的によいですね。用途や環境で体感ことなるものなのですね > Qwen3.5の27Bが9Bに負けた RTX 4060の逆説｜ぷらずもん zenn.dev/plasmon/articl… #zenn

日本語

150

16K

Parallax@tryParallax·26 Mar

TurboQuant tackles one bottleneck: KV cache memory. there's another one that matters just as much in distributed setups: communication latency between nodes. we built Decentralized Speculative Decoding (DSD) to turn that idle network wait time into useful computation, 2.56x speedup on HumanEval, no retraining needed. combine cache compression with latency compression and local inference starts looking very different. arxiv.org/abs/2511.11733

English

561

Shay Boloor@StockSavvyShay·25 Mar

$GOOGL just released TurboQuant which is a new compression method that can cut LLM cache memory by at least 6x & deliver ~8x speedups without sacrificing quality This could make local AI inference far more capable with larger context windows & less memory strain across devices

GIF

English

104

715

758.6K

Parallax@tryParallax·26 Mar

hf-mount solves the storage side: any model, mounted locally like a drive. the next piece is actually running those models across whatever hardware you have. that's what parallax does: schedule inference across a pool of heterogeneous GPUs so the model doesn't just live on your machine, it runs there too. mount + serve, fully local.

English

132

clem 🤗@ClementDelangue·24 Mar

Local AI is free, fast & secure! So today we're introducing hf-mount: attach any storage bucket, model or dataset from @huggingface as a local filesystem. This is a game changer, as it allows you to attach remote storage that is 100x bigger than your local machine's disk. This is also perfect for Agentic storage!! Let's go!

English

226

1.3K

252.8K

Parallax@tryParallax·25 Mar

@oprydai you don't need to go into debt though. a couple of mac minis or an nvidia card can already run serious models locally. parallax lets you connect whatever hardware you have into one cluster. start small, add devices as you go. the whole point is using what's already on your desk.

English

Mustafa@oprydai·23 Mar

get into debt if you must, but build a hardware home lab.

English

1.2K

32.1K

Parallax@tryParallax·24 Mar

@openclaw solid release. deepseek provider plugin + qwen pay-as-you-go opens up a lot of new local setups. parallax users running openclaw stacks should have a smoother time with this one.

English

1.5K

OpenClaw🦞@openclaw·24 Mar

OpenClaw 2026.3.23 🦞 🧪 DeepSeek provider plugin ☁️ Qwen pay-as-you-go ♻️ OpenRouter auto pricing + Anthropic thinking order 🖥️ Chrome MCP waits for tabs 🔧 Discord/Slack/Matrix + Web UI fixes Upgrade before your agent does it for you. github.com/openclaw/openc…

English

217

216

2.4K

374.3K

Parallax@tryParallax·24 Mar

@wolfejosh the ceiling for on-device keeps moving. a year ago people argued you couldn't run anything useful locally. now it's 400B on a phone. parallax already supports mixed hardware clusters — apple silicon, nvidia, whatever you've got. the trend is clear.

English

182

Josh Wolfe@wolfejosh·24 Mar

It is happening (on-device inference) this is the worst that on-device inference will ever be 400B model in your pocket

Anemll@anemll

Running 400B model on iPhone! 0.6 t/s Credit @danveloper @alexintosh @danpacary @anemll

English

158

35.1K

Parallax@tryParallax·24 Mar

the $3,469 single-night burn is a good reminder of what you're actually signing up for with cloud inference. when the meter's always running, one stuck agent is a bill. parallax runs models on your own machines. no token meter, no overnight surprises.

Ziwen@ziwenxu_

x.com/i/article/2034…

English

1.4K

Parallax@tryParallax·22 Mar

@JustinLin610 🙋‍♂️and we are here to help

English

485

Junyang Lin@JustinLin610·22 Mar

local agents will definitely become more important in the coming days and months while agents become part of our life and work. privacy always matters

Zach Mueller@TheZachMueller

PinchBench results for Qwen3.5 27B using @UnslothAI K_XL quants, best of 3, thinking enabled. TL;DR: Q3 KXL (14.5GB) or Q4 KXL (18GB) While overall the "best" results showed little degradation, if you dig into mean/std Q4_K_XL overall was the best at ~84% on average. Q3 seems viable, while Q2 is the the lowest performing, of course.

English

570

65.7K

Parallax@tryParallax·21 Mar

local ai has picked up fast since openclaw dropped. with the latest wave of small capable models, more people are running serious workloads on their own hardware. if you missed this good local ai tutorial from @yacinelearning or want a refresher on how distributed scheduling actually works under the hood, it's worth the rewatch over the weekend!

Yacine Mahdid@yacinelearning

I am continuing my adventure into distributed AI system with the parallax scheduling strat from @Gradient_HQ in this 37min tutorial I go through: - heuristic used to make scheduling tractable - dynamic programming formulation - filling GPU with water - shoving them into shelves

English

121

15.2K

Parallax@tryParallax·17 Mar

@tomosman @openclaw @NousResearch mac minis are underrated for this. we've been running multi-node setups on apple silicon with parallax and the performance-per-dollar is hard to beat. nice to see more people building this way.

English

Tom Osman 🐦‍⬛@tomosman·11 Mar

Infinitely bullish on a stack of MacMinis or Studios at home running @openclaw or @NousResearch Hermes. Run local models and soon you will have AGI at home. Lots of other epic stuff too but feels downstream of being able to do this.

English

1.3K

Keşfet

@Theta_Network @Gradient_HQ @VitalikButerin @RoundtableSpace @adrgrondin @PrismML @ollama @tom_doerr