Shantanu Wani

44 posts

Shantanu Wani

Shantanu Wani

@ShaantanuWani

Building reliable AI over flashy demos AI researcher Shipped ML systems end-to-end Open to impactful ML roles

Chennai / Mumbai Katılım Nisan 2023
64 Takip Edilen2 Takipçiler
Shantanu Wani
Shantanu Wani@ShaantanuWani·
If a distance is Euclidean, exp(−λ d²) becomes a valid positive-definite kernel. Since dK isn’t Euclidean globally, kernel tricks don’t automatically apply to all strings. But for finite subsets? They do. Kolmogorov complexity isn’t just information theory, it has geometry.
English
0
0
1
16
Shantanu Wani
Shantanu Wani@ShaantanuWani·
1. dK is not Euclidean globally. You cannot embed all of (B*, dK) into ℓ2. 2. But any finite Euclidean point set can be scale-embedded into dK. 3. Many infinite-dimensional spaces cannot embed into it. So dK is constrained, but expressive.
English
1
0
1
21
Shantanu Wani
Shantanu Wani@ShaantanuWani·
What if “distance” between objects wasn’t geometric… …but based on how hard it is to transform one into the other? Marcus Hutter studies the geometry of Algorithmic Information Distance — and the result is surprising. Is it Euclidean?
Shantanu Wani tweet media
English
1
0
1
22
Shantanu Wani
Shantanu Wani@ShaantanuWani·
This paper proves something important: Bigger embedding models aren’t inevitable. With: • Better initialization • Direct embedding-space distillation • Quantization-aware training You can win on efficiency and quality. That’s production-focused ML.
English
0
0
1
10
Shantanu Wani
Shantanu Wani@ShaantanuWani·
Results (MTEB Multilingual v2): • #1 under 500M parameters • Competitive with models ~2× larger • Strong even at 128-dim embeddings • Maintains performance at 4-bit quantization That’s rare for embedding models.
Shantanu Wani tweet media
English
1
0
1
11
Shantanu Wani
Shantanu Wani@ShaantanuWani·
Everyone keeps scaling embedding models to billions of parameters. Google DeepMind did the opposite. They built a 308M parameter embedding model that: • Tops MTEB multilingual • Competes with models 2× its size • Survives 4-bit quantization • Runs efficiently on-device
Shantanu Wani tweet media
English
1
0
1
17
Shantanu Wani
Shantanu Wani@ShaantanuWani·
Full dataset accuracy: GraphSAGE → 98.59% With 24 missing nodes: GraphSAGE drop → 8.8% CNN drop → 29% Bayesian drop → 39% Key takeaway: Inductive GNNs + local graph reconstruction > global deep models for infrastructure systems.
English
0
0
1
10
Shantanu Wani
Shantanu Wani@ShaantanuWani·
Using the entire graph for precise localization doesn’t work well. So they rebuild a small subgraph using k-order adjacent nodes near the fault. This: · Reduces noise · Forces local feature learning · Improves regression accuracy It’s structured inductive bias.
English
1
0
1
8
Shantanu Wani
Shantanu Wani@ShaantanuWani·
Most AI papers on power system faults stop at interval localization. This one goes further: It diagnoses the fault and pinpoints the exact fault location using GraphSAGE. Here’s the core idea.
Shantanu Wani tweet media
English
1
0
1
13
Shantanu Wani
Shantanu Wani@ShaantanuWani·
Most RAG systems don’t need re-rankers. Before adding cross-encoders: • Fix chunking • Use hybrid search (BM25 + semantic) • Improve embeddings • Let your LLM filter Add re-rankers only when precision actually matters (legal, medical, citations).
English
0
0
1
19
Shantanu Wani
Shantanu Wani@ShaantanuWani·
Stage 2 — Cross-encoder • Query + document fed together • Full transformer pass per document • Better relevance scoring Precision jumps (~60% → ~85%). But here’s the tradeoff: You now run a full model 100 times per query. That’s where your infra bill explodes.
English
1
0
1
15
Shantanu Wani
Shantanu Wani@ShaantanuWani·
Read about a startup which was spending $15K/month on RAG rerankers. They didn’t need them. Here’s what most ML engineers misunderstand about cross-encoders and why it’s costing real money.
Shantanu Wani tweet media
English
1
0
1
15