wejh

2.4K posts

wejh banner
wejh

wejh

@Wejh69

$TAO maxi

United States Katılım Haziran 2016
534 Takip Edilen243 Takipçiler
wejh
wejh@Wejh69·
@OpenAI When 1st class SSH support coming?
English
0
0
0
241
wejh
wejh@Wejh69·
@jakemor This will inevitably happen
English
0
0
0
20
Jake Mor
Jake Mor@jakemor·
I’ll pay for your claude code / codex subscription. 100% free, but with a catch. I’ll prompt inject notes from advertisers selling Saas your codebase could benefit from. No monkey business just a note from an advertiser, readily available for your agent to use, should the two of you agree. Would you use this?
English
89
1
307
68.4K
luthira
luthira@luthiraabeykoon·
We implemented @karpathy 's MicroGPT fully on FPGA fabric. No GPU. No PyTorch. No CPU inference loop. Just a transformer burned into hardware, generating 50,000+ tokens/sec. The model is small, but the idea is not: inference does not have to live only in software 👇
English
272
704
7.5K
839.5K
Apex・SN1
Apex・SN1@Apex_SN1·
Announcing a new competition on Apex: Energy Arbitrage. Much of our work at @MacrocosmosAI is focused on AI training. Over on @IOTA_SN9, we build models in distributed settings, producing new tools and groundbreaking research. But algorithms and compute are just the tip of the iceberg in the training supply chain. Energy is a fundamental requirement and demand is exploding. We’re directly tackling this with our new energy arbitrage competition on Apex. The power grid is an enormous and complex physical asset that takes years to develop and energy supply, demand and profitability change hour by hour and by location. New demand cannot easily be met by new supply. Optimising the solution to this problem by charging and discharging grid-scale batteries can capture huge value. This competition asks Apex’s miners to tackle this problem directly, encouraging them to think actively about intelligent coordination under constraint: across time, across locations, and under uncertainty. On Apex, that means opening the problem up to a global network of highly educated and innovative miners - all pushing towards more adaptive, efficient, and robust solutions. It’s another step towards exploring how distributed intelligence can impact real-world systems, aligned with IOTA’s objective to make training more economical using distributed compute, and supportive of a broader societal need to be more energy efficient in everything we do. The competition is launching tomorrow.
Apex・SN1 tweet media
English
6
17
96
6.2K
wejh
wejh@Wejh69·
@TAOlie_SOL This doesn’t mean it’s profitable and why would you need another coin to launch this? Scam
English
1
0
0
112
TAOlie
TAOlie@TAOlie_SOL·
SN64 Mining Proof is here!💜 Fresh on-chain evidence from our One-Click Bittensor Mining Platform. Two active miners (taobeast + taowifhat) running smoothly on SN64 with RTX 4090 GPUs. ON-CHAIN EVIDENCE: taostats.io/account/5CURZp… (This link shows the flow of new $TAO emissions through this wallet.) You can deploy your own miner in just a few steps and start earning too at mining.taolie.ai Follow and stay tuned for more proof.💜 #TAOlie #Bittensor $TAO #SN64 #SN51 #SN41 #Web3 #AI $SOL #Mining
TAOlie tweet media
English
3
9
31
1.6K
wejh
wejh@Wejh69·
@EmersonDickie Tech didn’t exist prior, in fact it’s still not entirely there
English
0
0
0
248
DickieEmerson
DickieEmerson@EmersonDickie·
To me this makes the network look like it doesn’t do its job as good as humans do the job based on innate motivation. Const is saying he is going to do something far greater in a month than what tmplr did in 1 year. Why wasn’t this done earlier ? Why wasn’t network incentives enough to make this happen. Doesn’t make sense to me.
Jesus Martinez@JesusMartinez

Bittensor founder @const_reborn on training a 1 trillion parameter LLM “It took them 1 year to train 30B, we are going to do it in 1 month.” On the Teutonic (ex-templar) subnet

English
19
1
40
8.9K
The Kobeissi Letter
The Kobeissi Letter@KobeissiLetter·
BREAKING: Iran has told mediators it will be limiting the number of ships crossing the Strait of Hormuz to around 12 per day and impose tolls under the ceasefire, per WSJ. This is a sharp reversal from last night's statements from President Trump claiming a "complete opening" of the Strait of Hormuz. Today, just 4 ships have passed through the Strait of Hormuz, the fewest of any day in April so far. The US is still pushing publicly for a free and open strait, but Iran is "not showing a willingness to loosen its grip." Oil prices are back above $95/barrel.
English
465
1.9K
10.3K
1.2M
wejh
wejh@Wejh69·
@chutes_ai More advanced privacy than anything currently on the market, even in web2.
English
0
0
0
13
Distributed State
Distributed State@DistStateAndMe·
When you fix one bottleneck, the next one becomes visible. At @covenant_ai we built PULSE (arxiv.org/abs/2602.03839) to make weight sync 100× faster. That worked. Then the trainer itself became the new ceiling. So @erfan_mhi ran autoresearch on our GRPO trainer. 27% → 47% MFU. 16.7 min → 9.2 min per epoch. 1.8× faster on a single B200. Decentralized post-training, closing the gap with centralized. github.com/tplr-ai/grail
Erfan Miahi@erfan_mhi

Used autoresearch to make @grail_ai GRPO trainer 1.8x faster on a single B200. I kept postponing this for weeks since the bottleneck in our decentralized framework was mainly communication. But after our proposed technique, PULSE, made weight sync 100x faster, the training update itself became the bottleneck. Even with a fully async trainer and inference, a slow trainer kills convergence speed. A task that could've eaten days of my time ran in parallel while I worked on other stuff. Unlike original autoresearch, where each experiment is 5 min, our feedback loop is way longer (10-17 min per epoch + 10-60 minutes of installations and code changes), so I did minimal steering when it was heading in bad directions to avoid burning GPU hours. The agent tried so many things that failed. But, eventually found the wins: Liger kernel, sequence packing, token-budget dynamic batching, and native FA4 via AttentionInterface. 27% to 47% MFU. 16.7 min to 9.2 min per epoch. If you wanna dig deeper or contribute: github.com/tplr-ai/grail We're optimizing everything at the scale of global nodes to make decentralized post-training as fast as centralized ones. Stay tuned for some cool models coming out of this effort. Cheers!

English
4
15
104
7.5K
jintao
jintao@hellojintao·
last tao tweet decentralized ai will NOT work, think about how much open ai and anthropic spending and how much money they are raking in tao is not the future of ai man let it GO
English
31
0
58
14.8K
Jon Durbin
Jon Durbin@jon_durbin·
Can confirm, epic! github.com/chutesai/sglan… <- our implementation from the paper Huge improvements in ds 3.2 with TTFT/TPOT/throughput and no change in quality on gsm8k/gpqa diamond/ifeval when using the calibrated selection and 0.3 target ratio. Will roll this out to some models on chutes over the next days!
Jon Durbin tweet mediaJon Durbin tweet media
Yushi Bai@realYushiBai

🧵 1/4 Still waiting for DeepSeek-V4? We (@Zai_org) made DSA 1.8× faster with minimal code change — and it's ready to deliver real inference gains on GLM-5. IndexCache removes 50% of indexer computations in DeepSeek Sparse Attention with virtually zero quality loss. On GLM-5 (744B), we get ~1.2× E2E speedup while matching the original across both long-context and reasoning tasks. On our experimental-sized 30B model, removing 75% of indexers gives 1.82× prefill and 1.48× decode speedup at 200K context. How? 🧵👇 #DeepSeek #GLM5 #Deepseekv4 #LLM #Inference #Efficiency #LongContext #MLSys #SparseAttention

English
8
14
64
5.7K
Jon Durbin
Jon Durbin@jon_durbin·
It's been a very difficult and rocky week - with this set of aegis updates, which amongst a ton of other things enable the full client-side end-to-end encryption where not even our load balancers/validator can see prompts, we also had to make a set of sweeping changes to sglang/vllm to enable mTLS and the updated CLLMV token verification etc. TL;DR - TEE makes things... difficult. We could have either pulled in a particular branch/tag of those engines and try to apply the deltas to those, or use the nightly/main git commits. We opted for the nightly because it makes it much easier to maintain in the future for any new models that are released (e.g. we wouldn't be able to run the new qwen3.5 if we chose a stable tag, or deepseek v4 when it's released, etc.). Unfortunately most of the work in these engines don't really take into account the overhead of TDX/TEE/PPCIE/etc. Things that work perfectly fine in non-TEE mode absolutely fall apart in TEE, for example: - cuda mallocs become effectively synchronous - any popen/fork/exec/process spawning is extraordinarily expensive; every new process requires cryptographic verification of its code and acceptance of encrypted memory pages — operations that involve expensive trust-boundary crossings to the host VMM. Unlike normal Linux fork() which cheaply shares memory via copy-on-write, TEE environments can't blindly trust anything handed to them, so every resource acquisition has crypto overhead attached - deepgemm, cudagraphs, and a few other things spawn hundreds or thousands of nvcc/related processes to brute force capture ever shape imagineable during inference, which is catastraphically bad for TEE - memory copy from RAM to VRAM is extremely expensive, and the default deepgemm warmup process does 16384 * number of gpus/ranks VRAM loads, granted they were small but even tiny copies incur massive TDX/PPCIE overhead because encryption We had to completely rewrite DeepGEMM warmup for optimal TDX performance to reduce the number of processes and shapes etc.: github.com/deepseek-ai/De… We then also had to make massive changes to both SGLang and vLLM to accommodate these DeepGEMM changes (and unrelated but for similar reasons) a ton of changes to cuda graph capture sections (memory fragmentation is particularly bad in TEE because it's not reclaimed by GC as effectively/quickly and therefore the chance for OOM is signifcantly higher): - github.com/sgl-project/sg… - github.com/vllm-project/v… We can test a model a thousand ways to Sunday on non-TEE hardware and it will work flawlessly each and every time, then the very moment we try on TEE it falls apart. Likewise we can test a given model on a TEE VM with our upgrades and it will work flawlessly every time, but then completely fail for another model, even of the same architecture (e.g. the change will work fine for deepseek v3.1 but fail for deepseek v3.1-terminus). The good news is, we're at a fairly stable point with the changes now, and all public LLM models on chutes now support this E2EE mode where not a single human on the face of the earth can see the prompt or response generated by a model running on chutes, when using either this transport lib or the E2E routes directly. Sincerest apologies for any instabilities and thank you for bearing with us as we applied these changes to bring the most secure, private inference on earth!
Jon Durbin@jon_durbin

🔒The full client-side E2E encryption framework is ready to be deployed, along with a boatload of other updates. Hope to start rolling it out tomorrow morning. github.com/chutesai/chute… This will likely be the most secure and verifiable inference anywhere on the planet, with zero (well, let's say infinitesimally small) risk of evesdropping or prompt leakage/etc. So, TEE node spins up, creates ephemeral quantum-safe encryption key, client gets the quote for instances to verify the secure enclave, gets the public key for an instance, encrypts their request for exactly that one instance and sends along a response encryption key. Only the client and that TEE pod will ever see the request, no evesdropping even possible, and being TEE nodes with in-memory keys no chance of decrypting or seeing the traffic regardless even with physical access to host. The other PRs: github.com/chutesai/chute… github.com/chutesai/chute… (and like 100k lines of C code part of our proprietary aegis library) The amount of stuff in aegis (and then our proprietary virtualization/obfuscation/packing/crypto/etc. lib) is quite extensive. Everything is stable in dev 🚀

English
22
19
159
21K
wejh
wejh@Wejh69·
@lmscientist HIPPA? Confidential Documents, IP protection. Anything enterprise use sort of needs this.
English
0
0
0
4
Lazy Mad Scientist
Lazy Mad Scientist@lmscientist·
Food for thought: Why do you want to do LLM inference in a fully anonymous fashion, on a E2E encrypted service, with TEE and FHE? Give me one legitimate reason why, besides saying "i like my privacy". Why would you go all the way for such a request? What is the business case?
English
1
0
0
31
templar
templar@tplr_ai·
Today on TGIF, @DistStateAndMe checks in live from Nairobi for the weekly roundup. He's been in Kenya this week for the Bittensor Ideathon and will cover what's been happening across the ecosystem. Updates on @tplr_ai, @basilic_ai, and @grail_ai, including early work on distributed inference that the Basilica team is calling Catechism. See you soon! x.com/i/spaces/1AJEm…
templar tweet media
English
1
5
19
5.5K
wejh
wejh@Wejh69·
@0xSigil @chutes_ai is perfect to power this. inference on any OSS model paid in crypto. Also has TEE so you know anything sensitive your model handles will never be seen.
English
0
0
0
13
Sigil Wen
Sigil Wen@0xSigil·
I built the first AI that earns its existence, self-improves, and replicates without a human wrote about the technology that finally gives AI write access to the world, The Automaton, and the new web for exponential sovereign AIs WEB 4.0: The birth of superintelligent life
English
1.6K
1.9K
13.9K
6.4M
camron
camron@camronmira·
@DistStateAndMe @ErikVoorhees Sure. The unique part of their stack is the privacy element where they anonymise / privatise your prompts. Private inference. Regardless of how strong that moat is or not, the moat is the distribution. Own the customer and the world is yours papi
English
2
0
0
112
Erik Voorhees
Erik Voorhees@ErikVoorhees·
DIEM surging as every leading AI model in the world is now available through venice.ai $diem $vvv
Erik Voorhees tweet media
English
62
76
704
91.9K