Mohit

542 posts

Mohit banner
Mohit

Mohit

@0xhashqueu

Building bitty-ai || 0.5x engineer , I use Neovim x Arch BTW Tweeting till andreessen horowitz follows me https://t.co/vZsQiBKyju

Katılım Eylül 2024
344 Takip Edilen31 Takipçiler
Mohit
Mohit@0xhashqueu·
conv model : so we get the sematics out from a model transformer model : so that the attention is spread and the relevance is maintained in continous latent space
Mohit tweet media
English
0
0
0
1
Mohit
Mohit@0xhashqueu·
best correlation between Build time vs Compile time is thinking docker build and docker run
English
0
0
0
3
Mohit
Mohit@0xhashqueu·
@nengjiali this looks amazing from first glance, can you create a blog post / video for same would be amazing to get all things at one place
English
0
0
0
63
Nengjia Li
Nengjia Li@nengjiali·
I spent 4 brutal months building a full Stereo Visual SLAM system from scratch in C++17 + CUDA. NO pre-made libraries. NO black-box magic. I just wanted to understand the underlying mechanics behind SLAM. Here’s the intuitive breakdown (the full SLAM pipeline, the math that almost broke me, and the real KITTI footage)👇 cc @aelluswamy your work in this space is a massive inspiration for tackling this from absolute scratch
English
41
134
1.3K
63.4K
buun
buun@spiritbuun·
Had a huge breakthrough in quantizing weights today. Holy fug. You have no idea how small we're going and how high quality we're going. Soon.
English
41
13
785
37.4K
Mohit
Mohit@0xhashqueu·
@alexandr_wang hm still gpt 5.4 xhigh looks good .. now sure on pricing though
English
0
0
0
14
Alexandr Wang
Alexandr Wang@alexandr_wang·
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
Alexandr Wang tweet media
English
718
1.2K
10.3K
4.4M
Mohit
Mohit@0xhashqueu·
so apparently unsloth wrote better trt kernels for LLM that the official nvidia team generalised .. Guess we should all shift to using unsloth rather than using nvidia and we need same for diffusion models also
English
0
0
0
8
Mohit
Mohit@0xhashqueu·
@pupposandro so tensorrt also doesnt have fused kernel for this deltanet + full attention layers ?
English
0
0
0
73
Mohit
Mohit@0xhashqueu·
hm qwen3tts model
Mohit tweet media
Polski
0
0
0
4
Mohit
Mohit@0xhashqueu·
one thing that top models still struggle at is quantization .. seems simple at first but gets complex with added complexity of trt
English
0
0
0
4
Mohit
Mohit@0xhashqueu·
its interesting to see how top tier models are also not that efficient in fields where data is a sparse ..
English
0
0
0
0
Mohit
Mohit@0xhashqueu·
so claude is still figuring out with me on how to do PTQ reliably and how to bypass onnx protobuf limits and achieve quantization on all possible layers with min conversions back to original quant ..
English
1
0
0
10
Mohit
Mohit@0xhashqueu·
@MarioNawfal now real delta is in making cheap interceptors , one main interceptor guiding ultra cheap small ones
English
0
0
0
14
Mario Nawfal
Mario Nawfal@MarioNawfal·
🇮🇳 India just unveiled an AI-powered kamikaze drone with a 2,000km range and 12-hour endurance. Every major power is now racing to mass-produce cheap one-way attack drones, the weapon that's rewriting the rules of modern war. Expensive interceptor missiles vs. cheap drones. The math isn't pretty. Source: Zero Hedge
Mario Nawfal tweet media
Mario Nawfal@MarioNawfal

🇮🇷🇺🇸 ⁠Iran's Minister of Defense just told the IRGC to monitor “enemy movements with utmost accuracy” to counter their plans, as they prepare to defend against a potentially imminent U.S ground attack. Source: Walter Bloomberg

English
212
472
3.1K
322.5K
anirudh bv
anirudh bv@anirudhbv_ce·
I implemented @GoogleResearch's TurboQuant as a CUDA-native compression engine on Blackwell B200. 5x KV cache compression on Qwen 2.5-1.5B, near-loseless attention scores, generating live from compressed memory. 5 custom cuTile CUDA kernels ft: - fused attention (with QJL corrections) - online softmax -on-chip cache decompression - pipelined TMA loads Try it out: devtechjr.github.io/turboquant_cut… s/o @blelbach and the cuTile team at @nvidia for lending me Blackwell GPU access :) cc @sundeep @GavinSherry
English
144
306
3.3K
782.8K
Mohit
Mohit@0xhashqueu·
Omni 3.5 is close source .. really @Alibaba_Qwen maybe that why the founder left ..
English
0
0
0
15
Mohit
Mohit@0xhashqueu·
shortcut learning in CNN , give same pixel to a class and model will learn to hack it to use it to its favour .. Classic fuckup
English
0
0
0
9
Mohit
Mohit@0xhashqueu·
so today I realised, the gap between attention and convolution is not different and anyone that was in the deeply in the field could have come come up with it
English
0
0
0
5
Mohit
Mohit@0xhashqueu·
@ViralOps_ When you read this article and realize its a shit article
GIF
English
0
0
0
874