Chamalka Muwangala

90 posts

Chamalka Muwangala

@ChamalkaAI

ML Engineer and AI enthusiast. Based out of 🇸🇪

Malmo, Sweden Katılım Ekim 2024

100 Takip Edilen14 Takipçiler

Chamalka Muwangala retweetledi

Bryan Johnson@bryan_johnson·15 May

If Hantavirus mutated into a global threat, it would unleash AI + biotech unlike anything we've ever seen. > genome sequenced and public in 4 hours > AlphaFold maps every protein target > AI screens 10,000 drugs in 24 hrs > 50 vaccine candidates designed simultaneously > AI designed antibodies in days > risk of death computed instantly > decentralized trials launch globally > enroll from home > 20 countries manufacturing at once > first doses in three weeks > real-time dose characterization > your genome + biomarkers determine your protocol > variant map updates every hour No one would wait for governments.

English

520

246

4.6K

571.2K

Chamalka Muwangala retweetledi

Daniel A. Saedi (DataManDan)@TheRealDanSaedi·13 May

The singular red dot is causing a massive compute shortage and you think memory stocks have topped?

Daniel A. Saedi (DataManDan) tweet media

English

123

312

664.7K

Chamalka Muwangala retweetledi

Fredrik Hjelm 🇸🇪@FredrikHjelm4·24 Nis

Ambition since childhood: become the world's best hacker. Met Wilma Emanuelsson yesterday. Wilma is such an impressive human being. Started hacking at 7, finding exploits while playing MovieStarPlanet. Turned hacktivist as a teenager. Went after child exploitation and trafficking rings. Founded iTrack Reading before that. Software for people with dyslexia. Look at a word, hear it in your headphones. The idea came from a friend who kept getting held back in school. Now running her next company, Ezteric. She's 22.

English

385

15.2K

Chamalka Muwangala retweetledi

Rohan Paul@rohanpaul_ai·19 Nis

Big claim in this paper. "Prefill-as-a-Service" Prefill, the heaviest part of inference, may finally be portable. Long-context AI is no longer trapped inside a single datacenter. Shows how to run LLM prefill on remote clusters by sending much smaller saved prompt state. So long-prompt work can be done on remote machines and sending back only the smaller saved state needed to answer. The breakthrough is not sending everything farther, but sending the right requests farther. --- When you ask a model a long question, it first has to read and digest the whole prompt before it starts answering. That first step is called prefill, and it is brutally compute-heavy. The second step is decode, where the model generates tokens one by one, and that part is more about memory bandwidth than raw compute. But moving the saved prompt state between those phases is usually so data-heavy that both parts must stay in the same tightly connected cluster. So Until now, those two steps usually had to stay close together inside the same fast network, because prefill creates a huge blob of temporary memory called KVCache that had to be moved quickly to the decode machine. That is the bottleneck. What changed is model design. Newer hybrid-attention models produce much smaller KVCache than older dense-attention models, so shipping that state across ordinary datacenter links starts to become practical instead of absurd. The paper’s idea is a Prefill-as-a-Service setup that sends only long, uncached prompts to a remote prefill cluster, then ships back the saved prompt state, called KV cache, over normal Ethernet while short requests stay local. This works mainly because newer hybrid-attention models create far less KV cache than older dense models, and the system adds smart routing, bandwidth-aware scheduling, and cache-aware placement so the network does not clog up. The authors test this with an internal 1T-parameter hybrid model on a mixed setup that uses H200 GPUs for remote prefill and H20 GPUs for local decode. With a routing threshold near 19.4K tokens, about 50% of requests go remote, average cross-cluster traffic is only 13Gbps on a 100Gbps link, and throughput rises 54% over a local-only baseline and 32% over a naive heterogeneous setup. The real point is that smaller KV cache alone was not enough, but paired with selective offloading and scheduling it makes cross-datacenter LLM serving workable, more flexible, and easier to scale across different hardware. ---- Paper Link – arxiv. org/abs/2604.15039v1 Paper Title: "Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter"

English

Chamalka Muwangala@ChamalkaAI·11 Nis

@vitaliidodonov Damn it worked

English

Vitalii Dodonov@vitaliidodonov·11 Nis

Dear algo, Please show this tweet to founders, builders and creators in Europe.

English

124

367

19.6K

Chamalka Muwangala@ChamalkaAI·10 Nis

Features i am working on: > Drag and drop components from different papers and build a novel model. > Generate the code for that in real time > Ask questions about the reasoning about your idea.

English

Chamalka Muwangala@ChamalkaAI·10 Nis

Currently building a new tool that I would personally find quite useful. Its an interactive machine learning tool. Its main feature would be so it can help you visualize a model in a research paper, manipulate it and generate code to test it yourself.

English

Chamalka Muwangala@ChamalkaAI·10 Nis

@kiriyo9302 @0xSero And make it local lets gooo

English

0xSero@0xSero·10 Nis

It’s time to test these claims out: github.com/NationalSecuri… GLM-5.1 + Ghidra + Hermes

English

798

46.8K

Chamalka Muwangala retweetledi

Claude@claudeai·9 Nis

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English

2.8K

38.4K

4.7M

Chamalka Muwangala@ChamalkaAI·10 Nis

Tried to use ForgeCode's Forge Harness yesterday night, ran into a bunch or errors regarding the database being full and crashed out hard. Anyone know how to fix this? @forgecodehq

English

121

Chamalka Muwangala@ChamalkaAI·10 Nis

@0xSero Are you coding out of discord tf?!

English

0xSero@0xSero·8 Nis

Finally perma switched my Mac Mini to Hermes, it's better. GLM-5.1 going to shill the fuck out of opensource from now on.

English

526

45.7K

Chamalka Muwangala retweetledi

Wildminder@wildmindai·9 Nis

RotorQuant - upgraded TurboQuant. > 10x KV cache compression > 28% faster decoding > 5x faster prefill > 44x fewer parameters Same quality as full attention. 1/10th the memory. Ok, another massive VRAM discount for local LLMs. github.com/scrya-com/roto…

English

162

1.4K

65.6K

Chamalka Muwangala@ChamalkaAI·8 Nis

Be anthroptic >give your ai researchers a crap ton of caffeine and ask them to build a SOTA model >Use this model to accelerate product development >Keep it hidden from the public since Feb 2024 >Leak it so everyone knows whats coming >Drop the model and generate crazy hype

English

Chamalka Muwangala@ChamalkaAI·7 Nis

Mythos about to make a monumental shift in AI on the same day Trump goes on to bomb Iran What the hell is 2026 about 😭

English

Chamalka Muwangala retweetledi

Deedy@deedydas·7 Nis

Claude Mythos just obliterated every single benchmark in AI. I can't believe what I'm reading.

English

318

744

6.6K

774.4K

Chamalka Muwangala retweetledi

NIK@ns123abc·7 Nis

🚨 Anthropic just revealed their unreleased frontier model called Claude Mythos Preview The model is INSANE It found thousands of zero-day vulnerabilities in EVERY major operating system and browsers: > 27-year-old bug in OpenBSD > 16-year-old bug in FFmpeg that automated tools hit 5M times without catching Completely autonomous. No human steering. They assembled an entire industry coalition called Project Glasswing around it: AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike, JPMorgan, Cisco, Palo Alto, Linux Foundation Goal: patch the world’s software BEFORE releasing it > SWE-bench: 93.9% (Opus 4.6: 80.8%) > Anthropic is committing $100M in usage credits > Thousands of vulnerabilities in 40+ organizations are being fixed right now Yesterday OpenAI published a 13-page essay warning about cyber threats and asking the government to help… Today Anthropic actually fixed them.