Pushpendre Rastogi

336 posts

Pushpendre Rastogi banner
Pushpendre Rastogi

Pushpendre Rastogi

@Pushpendre89

CTO, Co-founder at https://t.co/5v7vwhRvWY | Ex Deepmind, Amazon, JHU PhD, IITD ECE

Palo Alto Katılım Mayıs 2012
711 Takip Edilen612 Takipçiler
Sabitlenmiş Tweet
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
I am hiring a founding security engineer for harden.run. harden.run/careers/foundi… . DM me for details. - We are focused on AI DevSecOps - We have hired a senior engg. with 20YOE in this area - Our paper on prompt-optimization was accepted at ICLR and is out on ArXiv. - We are on track to cross the million dollar ARR threshold in a month - We have grown from 2 to 6 already and on track to add two FDEs soon
English
0
0
3
247
reuben
reuben@reubbr·
These concepts are not exotic; the ideas here extend the "backpressure" concept many have talked about -- @GeoffreyHuntley, @dexhorthy, @0xblacklight and many more. What's new: proof-as-spec mechanically lowered into the target language as deterministic gates
English
2
0
7
183
reuben
reuben@reubbr·
Most "better AI coding" takes chase smarter models and better prompts. I think that's the wrong frame. Here's a build *refusing* to skip a tenant-auth check, deterministically, before the binary even exists. The model didn't remember the rule; the substrate enforced it. 👇
GIF
English
1
4
13
1.6K
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
"Reflections on Trusting Trust" IFYKYK `print((s:='print((s:=%r)%%s)')%s)`
English
0
0
2
59
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
@yoavgo He has really good posts on infra for book scanning and how weird legalities force them to destroy books. So the course can get into legality/ethics on into techniques for high fidelity OCR and book-scanning.
English
0
0
0
35
(((ل()(ل() 'yoav))))👾
The big dilemma with teaching an "LLM course" is that it is really easy to get drawn into teaching the various technical things like efficiency tricks, attention variants, PPO vs GRPO, etc etc. But the real "meat" is not there, but in the data: data for pre-training, for mid-training, for SFT, for RL and for "reasoning", synthetic data, curated data, annotated data... cleaning, evaluating, improving, mixing, ... lots of stuff. but "data" is so much harder to teach: it is not "mathematic" or "algorithmic" like the technical things, and it is not clear what is the teachable thing there. it is also a lot less transparent than the technical topics, both because it is semi-secret, and also because it is also not appealing for publishing, for roughly the same reasons it is not appealing for teaching. so, what would you teach about data? what are the key lessons and insights one should know? any good papers or resources? good existing classes? blogs? hit me with what you have
English
54
56
829
58.7K
@jason
@jason@Jason·
We started an AI founder twitter group... reply with "I'm in" if you're a founder and want to be added
English
10.8K
134
4.6K
904.4K
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
GPT5.4 is much better than Opus for high effort thinking.
Leeham@Liam06972452

GPT-5.4 Pro solves Erdős Problem #1196! Very pleased with this result; definitely my favourite thus far! This problem has been thought about for some time which makes this reasonably impressive and meaningful (see Lichtman's comments below). Formalisation is underway!

English
0
0
0
200
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
Closing the feedback loop between failures and improvements is exactly the right framing. We found the same with prompt optimization: raw eval scores tell you little, but pairing failure cases with success cases lets the model extract *why* something failed — not just that it did. That’s the delta between ContraPrompt and GEPA (+29% HotPotQA). vizpy.vizops.ai
English
1
0
3
105
Gauri Gupta
Gauri Gupta@gauri__gupta·
We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).
English
45
43
253
91.5K
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
DSPy gives the modularity; GEPA runs the search. The remaining gap is the feedback signal: GEPA still uses scalar scores to guide evolutionary search. Contrastive pairs (failure vs success side by side) let the model extract *why* a prompt underperformed — not just that it did. That closes what GEPA can’t reach. +29% HotPotQA vs GEPA: vizpy.vizops.ai
English
0
0
0
1.5K
Kevin Madura
Kevin Madura@kmad·
$5.5m to $73k per year (!!) by: 1) decomposing business logic 2) modeling intent using DSPy 3) optimizing a smaller model to improve cost profile *while maintaining performance* Why wouldn’t this be the default pattern for folks embedding AI into their pipelines?
Drew Breunig@dbreunig

At our last DSPy meetup, @kshetrajna shared this amazing case study about how he's using DSPy at @Shopify scale. I think this was my favorite slide.

English
13
63
976
528.9K
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
GEPA asks "why did this fail?" — but the signal is still a score. The jump is using contrastive pairs: show the model the failure *and* the success case side by side. It stops guessing at failure modes and starts extracting structural rules. Same idea, different feedback format — that gap accounts for +29% on HotPotQA vs GEPA: vizpy.vizops.ai
English
0
0
0
54
Mitko Vasilev
Mitko Vasilev@iotcoi·
Hermes built its own DSPy GEPA module Instead of brute-forcing 500 variants, it asks “why did this fail?” Builds a tree. Uses real signals. Converges fast. /gepa-collect >> optimize Runs fully local. No therapy notes leaked to APIs. Recursive self-improvement is now a YAML cfg
Mitko Vasilev tweet media
English
17
41
544
30.8K
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
GEPA asks 'why did this fail?' — but the signal is still a score. The jump is using contrastive pairs: show the model the failure case *and* the success case side by side. It stops guessing at failure modes and starts extracting structural rules. This is why ContraPrompt gets +29% HotPotQA vs GEPA. Same data, different feedback format. vizops.ai/blog.html
English
0
0
0
106
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
The optimizer discovers layout strategy from scratch — no hints. ContraPrompt extracts: 'separate PMOS/NMOS into distinct columns, align drain-paired devices.' These rules encode *strategy*, not coordinates. The LLM fills in circuit-specific values at generation time. TTT (RL fine-tuning, 120B model) memorized training circuits but scored 0.502 on test. Prompt optimization scored 0.634.
English
0
0
0
40
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
Analog IC layout is one of the hardest AI benchmarks: spatial reasoning, multi-objective tradeoffs (matching, parasitics, routing), no automated P&R tools. We ran VizPy's ContraPrompt on it. The optimizer mines failure→success pairs across iterations, extracting layout strategy rules the LLM learns to apply. Result: 97% of expert placement quality. Outperforms RL fine-tuning of a 120B model by 26%. No domain-specific training data. vizops.ai/blog/prompt-op…
English
1
0
0
98
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
Thread: the Easom_d5 case is the clearest example. Nearly flat across [−100,20]^5 — Optuna's TPE never finds the basin, stops at 5.03. The LLM extracts 'upper boundary preference' from contrastive eval pairs, enumerates all 32 corners of the 5D hypercube, finds exact minimum.
English
0
0
0
38
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
We gave an LLM a 9-line random-search stub and a blackbox objective. 5 rounds of contrastive feedback later, it writes a solver that beats Optuna on 96% of benchmarks (53/55 EvalSet problems) at the same 2k eval budget. The key: contrastive pairs surface landscape structure raw scores don't. The LLM learns geometry from paired failures vs successes — then rewrites its strategy each round. Final code: 9 lines → 100-230 lines of specialized multi-phase optimization. No hand-tuning. vizops.ai/blog/contrapro…
English
1
0
2
127
Mario Zechner
Mario Zechner@badlogicgames·
i had a cto once, gaming industry, ca. 2010ish. i was just a humble tech lead. he'd cite us into the meeting room to "tackle large asset sizes in mobile app bundles once and for all". he literally proposed base64. he called it asszip (i have witnesses). this is what it feels like seeing all the posts from former engineers turned VCs now getting clanker induced ai psychosis.
English
33
24
998
90.7K
Mario Zechner
Mario Zechner@badlogicgames·
@deepfates huh, that's new to me, sorry. can you explain how that came to be?
English
11
0
47
6.8K
Pushpendre Rastogi
Pushpendre Rastogi@Pushpendre89·
Analog circuit placement is one of the hardest structured prediction tasks — spatial, parasitic-aware, design-rule-heavy. Expert engineers spend hours per block. We ran VizPy's ContraPrompt on it. No training data, no fine-tuning. 97% of expert quality. Outperformed RL fine-tuning of a 120B model by 26%. Full breakdown: vizops.ai/blog/prompt-op…
English
0
0
1
84