Shantanu Wani

@ShaantanuWani

Building reliable AI over flashy demos AI researcher Shipped ML systems end-to-end Open to impactful ML roles

Chennai / Mumbai Katılım Nisan 2023

64 Takip Edilen2 Takipçiler

Shantanu Wani@ShaantanuWani·28 Şub

If a distance is Euclidean, exp(−λ d²) becomes a valid positive-definite kernel. Since dK isn’t Euclidean globally, kernel tricks don’t automatically apply to all strings. But for finite subsets? They do. Kolmogorov complexity isn’t just information theory, it has geometry.

English

Shantanu Wani@ShaantanuWani·28 Şub

1. dK is not Euclidean globally. You cannot embed all of (B*, dK) into ℓ2. 2. But any finite Euclidean point set can be scale-embedded into dK. 3. Many infinite-dimensional spaces cannot embed into it. So dK is constrained, but expressive.

English

Shantanu Wani@ShaantanuWani·28 Şub

What if “distance” between objects wasn’t geometric… …but based on how hard it is to transform one into the other? Marcus Hutter studies the geometry of Algorithmic Information Distance — and the result is surprising. Is it Euclidean?

English

Shantanu Wani@ShaantanuWani·27 Şub

This paper proves something important: Bigger embedding models aren’t inevitable. With: • Better initialization • Direct embedding-space distillation • Quantization-aware training You can win on efficiency and quality. That’s production-focused ML.

English

Shantanu Wani@ShaantanuWani·27 Şub

Results (MTEB Multilingual v2): • #1 under 500M parameters • Competitive with models ~2× larger • Strong even at 128-dim embeddings • Maintains performance at 4-bit quantization That’s rare for embedding models.

English

Shantanu Wani@ShaantanuWani·27 Şub

Everyone keeps scaling embedding models to billions of parameters. Google DeepMind did the opposite. They built a 308M parameter embedding model that: • Tops MTEB multilingual • Competes with models 2× its size • Survives 4-bit quantization • Runs efficiently on-device

English

Shantanu Wani@ShaantanuWani·26 Şub

Full dataset accuracy: GraphSAGE → 98.59% With 24 missing nodes: GraphSAGE drop → 8.8% CNN drop → 29% Bayesian drop → 39% Key takeaway: Inductive GNNs + local graph reconstruction > global deep models for infrastructure systems.

English

Shantanu Wani@ShaantanuWani·26 Şub

Using the entire graph for precise localization doesn’t work well. So they rebuild a small subgraph using k-order adjacent nodes near the fault. This: · Reduces noise · Forces local feature learning · Improves regression accuracy It’s structured inductive bias.

English

Shantanu Wani@ShaantanuWani·26 Şub

Most AI papers on power system faults stop at interval localization. This one goes further: It diagnoses the fault and pinpoints the exact fault location using GraphSAGE. Here’s the core idea.

English

Shantanu Wani@ShaantanuWani·24 Şub

Most RAG systems don’t need re-rankers. Before adding cross-encoders: • Fix chunking • Use hybrid search (BM25 + semantic) • Improve embeddings • Let your LLM filter Add re-rankers only when precision actually matters (legal, medical, citations).

English

Shantanu Wani@ShaantanuWani·24 Şub

Stage 2 — Cross-encoder • Query + document fed together • Full transformer pass per document • Better relevance scoring Precision jumps (~60% → ~85%). But here’s the tradeoff: You now run a full model 100 times per query. That’s where your infra bill explodes.

English

Shantanu Wani@ShaantanuWani·24 Şub

Read about a startup which was spending $15K/month on RAG rerankers. They didn’t need them. Here’s what most ML engineers misunderstand about cross-encoders and why it’s costing real money.

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry