Mohit

542 posts

Mohit

@0xhashqueu

Building bitty-ai || 0.5x engineer , I use Neovim x Arch BTW Tweeting till andreessen horowitz follows me https://t.co/vZsQiBKyju

Katılım Eylül 2024

344 Takip Edilen31 Takipçiler

Mohit@0xhashqueu·2d

conv model : so we get the sematics out from a model transformer model : so that the attention is spread and the relevance is maintained in continous latent space

English

Mohit@0xhashqueu·3d

best correlation between Build time vs Compile time is thinking docker build and docker run

English

Mohit@0xhashqueu·4d

@nengjiali this looks amazing from first glance, can you create a blog post / video for same would be amazing to get all things at one place

English

Nengjia Li@nengjiali·5d

I spent 4 brutal months building a full Stereo Visual SLAM system from scratch in C++17 + CUDA. NO pre-made libraries. NO black-box magic. I just wanted to understand the underlying mechanics behind SLAM. Here’s the intuitive breakdown (the full SLAM pipeline, the math that almost broke me, and the real KITTI footage)👇 cc @aelluswamy your work in this space is a massive inspiration for tackling this from absolute scratch

English

134

1.3K

63.4K

Mohit@0xhashqueu·6d

@solairaiapp @spiritbuun 😂

QME

Solair AI@solairaiapp·6d

@spiritbuun 0bit incoming

English

200

buun@spiritbuun·6d

Had a huge breakthrough in quantizing weights today. Holy fug. You have no idea how small we're going and how high quality we're going. Soon.

English

785

37.4K

Mohit@0xhashqueu·6d

@pupposandro and what about unsloth ? were those guys also not having better kernels ? @UnslothAI

English

Sandro@pupposandro·8 Nis

x.com/i/article/2041…

ZXX

533

200.1K

Mohit@0xhashqueu·6d

@alexandr_wang hm still gpt 5.4 xhigh looks good .. now sure on pricing though

English

Alexandr Wang@alexandr_wang·8 Nis

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

English

718

1.2K

10.3K

4.4M

Mohit@0xhashqueu·6d

so apparently unsloth wrote better trt kernels for LLM that the official nvidia team generalised .. Guess we should all shift to using unsloth rather than using nvidia and we need same for diffusion models also

English

Mohit@0xhashqueu·6d

@pupposandro so tensorrt also doesnt have fused kernel for this deltanet + full attention layers ?

English

Mohit@0xhashqueu·6d

hm qwen3tts model

Polski

Mohit@0xhashqueu·8 Nis

one thing that top models still struggle at is quantization .. seems simple at first but gets complex with added complexity of trt

English

Mohit@0xhashqueu·7 Nis

its interesting to see how top tier models are also not that efficient in fields where data is a sparse ..

English

Mohit@0xhashqueu·7 Nis

so claude is still figuring out with me on how to do PTQ reliably and how to bypass onnx protobuf limits and achieve quantization on all possible layers with min conversions back to original quant ..

English

Mohit@0xhashqueu·7 Nis

qwen 3 tts explained ! @Alibaba_Qwen

English

Mohit@0xhashqueu·3 Nis

@MarioNawfal now real delta is in making cheap interceptors , one main interceptor guiding ultra cheap small ones

English

Mario Nawfal@MarioNawfal·2 Nis

🇮🇳 India just unveiled an AI-powered kamikaze drone with a 2,000km range and 12-hour endurance. Every major power is now racing to mass-produce cheap one-way attack drones, the weapon that's rewriting the rules of modern war. Expensive interceptor missiles vs. cheap drones. The math isn't pretty. Source: Zero Hedge

Mario Nawfal@MarioNawfal

🇮🇷🇺🇸 ⁠Iran's Minister of Defense just told the IRGC to monitor “enemy movements with utmost accuracy” to counter their plans, as they prepare to defend against a potentially imminent U.S ground attack. Source: Walter Bloomberg

English

212

472

3.1K

322.5K

Mohit@0xhashqueu·3 Nis

@anirudhbv_ce @GoogleResearch you are late to the party ..

English

138

anirudh bv@anirudhbv_ce·3 Nis

I implemented @GoogleResearch's TurboQuant as a CUDA-native compression engine on Blackwell B200. 5x KV cache compression on Qwen 2.5-1.5B, near-loseless attention scores, generating live from compressed memory. 5 custom cuTile CUDA kernels ft: - fused attention (with QJL corrections) - online softmax -on-chip cache decompression - pipelined TMA loads Try it out: devtechjr.github.io/turboquant_cut… s/o @blelbach and the cuTile team at @nvidia for lending me Blackwell GPU access :) cc @sundeep @GavinSherry

English

144

306

3.3K

782.8K

Mohit@0xhashqueu·2 Nis

@cursor_ai composer2 is shit

English

Mohit@0xhashqueu·1 Nis

Omni 3.5 is close source .. really @Alibaba_Qwen maybe that why the founder left ..

English

Mohit@0xhashqueu·27 Mar

shortcut learning in CNN , give same pixel to a class and model will learn to hack it to use it to its favour .. Classic fuckup

English

Mohit@0xhashqueu·20 Mar

so today I realised, the gap between attention and convolution is not different and anyone that was in the deeply in the field could have come come up with it

English