Pandey

604 posts

Pandey banner
Pandey

Pandey

@SumitPandeyvine

AI Engineer

India Katılım Kasım 2018
865 Takip Edilen78 Takipçiler
Ashutosh Maheshwari
Ashutosh Maheshwari@asmah2107·
If you can reply to this post, you are a software engineer who still has a job… for now. 👀
English
8
2
44
12.4K
Pandey retweetledi
Nishkarsh
Nishkarsh@contextkingceo·
Is AI being designed to fail? Everyone talks about reasoning. But when given a task, the AI isn't reasoning the way you might expect. It looks at your input, finds the closest match it's seen before, and predicts the most likely next action. That process is called vector similarity search. It's genuinely powerful. It's also not the same thing as understanding what you actually meant. Think of a plumber who hears the word "leak" and starts pulling up floorboards before you've finished the sentence. He's not being careless. He's pattern-matching - that's exactly how he was trained. Your AI agent is doing the same thing. Context is the one thing that gets deprioritized when teams are racing to ship. But without it, you don't have an intelligent agent. You have a very fast guesser. Similarity ≠ relevance. How? Find out with the link in the comments ⬇️
English
88
224
611
819.6K
Animesh Koratana
Animesh Koratana@akoratana·
Introducing: PlayerZero The world's first Engineering World Model that puts debugging, fixing, and testing your code on autopilot. We've raised $20M from Foundation Capital, @matei_zaharia (Databricks), @pbailis (Workday), @rauchg (Vercel), @zoink (Figma), @drewhouston (Dropbox), and more PlayerZero frees up 30% of your engineering bandwidth by: 1.⁠ ⁠Finding the root cause for bugs & incidents in minutes that engineering teams take days to identify. 2.⁠ ⁠Predicting in minutes, edge case issues that a 300-person QA team would take weeks to find. ------ Here's why this matters: No one in your org has a complete picture of how your production software actually behaves. Support sees tickets. SRE sees infra. Dev sees code. Each team builds their own fragmented view - and none of these systems talk to each other. When something breaks, everyone scrambles to stitch the picture together by hand. PlayerZero connects all of it into a single context graph - → The Slack thread where your lead said "we went with X because Y fell apart in prod last time" → The PR review where an engineer explained the tradeoff → The lifetime history of your CI/CD pipeline, observability stack, incidents, and support tickets So you can trace any problem to its root cause across every silo. And it compounds. Every incident diagnosed teaches the model something new. The longer it runs, the deeper it understands - which code paths are high-risk, which configurations are fragile, which changes tend to break which customer flows. So when you sit down to debug a live issue, you have your entire org's collective reasoning and production memory behind you - instantly. ------ Zuora, Georgia-Pacific, and Nylas have reduced resolution time by 90% and caught 95% of breaking changes and freeing an average of $30M in engineering bandwidth. ------ Our guarantee: If we can't increase your engineering bandwidth by at least 20% within one week, we'll donate $10,000 to an open-source project of your choice. Book a demo - bit.ly/3NlLMeN
English
889
803
5.3K
2.7M
Pandey
Pandey@SumitPandeyvine·
@archiexzzz I think the provider should provide this option to optimise your prompt on realtime eval
English
0
0
0
72
Archie Sengupta
Archie Sengupta@archiexzzz·
Introducing AutoVoiceEvals I've applied the @karpathy autoresearch loop to voice AI agents. It's open source. Your voice agent has a system prompt. That prompt determines how it handles every call - bookings, complaints, edge cases, background noises, long pauses, people trying to trick it. Most teams write it once, test manually, and hope for the best. autovoiceevals makes it a loop. One artifact (system prompt), one metric (adversarial eval score), keep what improves it, revert what doesn't. Run it overnight. Wake up to a better agent. > How it works: You describe your agent in a config file - what it does, its services, policies, and what it should never do. You don't write test cases. You don't define attack vectors. provider: vapi / smallest ai assistant: id: "your-agent-id" description: | Voice receptionist for a hair salon. Maria does coloring only. Jessica does cuts only. $25 cancellation fee under 24 hours notice. Cannot advise on skin conditions. Closed Sundays. From that description alone, Claude generates adversarial caller personas - each with an attack strategy, a voice profile (accents, background noise, mumblers, interrupters), a multi-turn caller script, and pass/fail evaluation criteria. The eval suite is generated once and held fixed for the entire run, like a validation set. > The loop: 1. Read the agent's current prompt from the platform 2. Generate adversarial eval suite from your description 3. Run baseline 4. Claude proposes ONE surgical change to the prompt 5. Push the modified prompt to the agent via API 6. Run all scenarios against the updated agent 7. Score improved? Keep. Same score but shorter prompt? Keep. Otherwise revert. 8. Go to 4. Run until Ctrl+C. The system sees its own experiment history. When a change fails, the next proposal knows what was tried and why it didn't work. We ran 20 experiments on a live Vapi dental scheduling agent. 0 human intervention. > Score: 0.728 → 0.969 (+33%) > CSAT: 45 → 84 > Pass rate: 25% → 100% > 9 kept, 10 discarded > Prompt: 1191 → 1139 chars (better AND shorter) You describe your agent. It figures out how to break it.
Archie Sengupta tweet media
English
66
83
1.2K
277.8K
Pandey
Pandey@SumitPandeyvine·
I work in a compan with 25k and my family is not happy as i’m working here for more than 15 months with this salary, i was comfortable here but i think i need to wake up. Suggest me
English
0
0
0
28
Pandey
Pandey@SumitPandeyvine·
Ye macbook lena jruri h kya??
Indonesia
0
0
0
32
Pandey
Pandey@SumitPandeyvine·
@ajay_2512x Fun fact : its my founder tho
English
0
0
1
260
Ajay Bhakar
Ajay Bhakar@ajay_2512x·
Company: AI47Labs 💼 Role: AI Engineer Intern 💰 Stipend: ₹1.2L – ₹1.8L 📍 Location: Remote / Flexible Apply Link: wellfound.com/jobs/3976469-a…
English
4
12
253
12.7K
Pandey retweetledi
Tanay Kothari
Tanay Kothari@tankots·
We offered 5 people a Porsche 911 GT3 RS if they could get @WisprFlow to make a mistake It's the fastest and most accurate AI voice dictation app that's 3x more accurate than ChatGPT, Claude, or Siri. Today, we’re finally launching on Android. Download now: play.google.com/store/apps/det… As a part of the launch, we’re giving away 6 months of Wispr Flow Pro for free. Like, retweet and comment ‘Wispr Flow’ to get it. Enjoy. — Written with Wispr Flow
English
4.6K
3.1K
10.8K
4.3M
Paul Graham
Paul Graham@paulg·
Neural nets work.
Paul Graham tweet media
English
771
93
2.9K
567K
Tech with Mak
Tech with Mak@techNmak·
These are literally the kind of LLM interview questions most candidates wish they had seen earlier. A curated list of LLM interview questions - shared by Hao Hoang Want this doc? Follow @techNmak and comment “LLM” - I’ll send it over.
Tech with Mak tweet media
English
1.4K
498
4.3K
408.5K
Pandey retweetledi
Arpit Bhayani
Arpit Bhayani@arpit_bhayani·
SQLite has about 155,800 lines of code, and its test suite has roughly 92 million lines. That is ~590x more test code than actual code 🤯 This is the level of testing you need for a real production database. Here are some types of tests they run. Out-of-memory tests - SQLite cannot just crash when memory runs out. On embedded devices, OOM errors are common. They simulate malloc failures at every possible point and verify that the database handles them gracefully. I/O error tests - Disks fail. Networks drop. Permissions change mid-operation. SQLite inserts a custom file system layer that can simulate failures after N operations, then verifies that no corruption occurs. Crash tests - What happens if power cuts out mid-write? They simulate crashes at random points during writes, corrupt the unsynchronized data to mimic real filesystem behavior, then verify the database either completed the transaction or rolled it back cleanly. No corruption allowed. Fuzz testing - They throw malformed SQL, corrupted database files, and random garbage at SQLite. The dbsqlfuzz tool runs about 500 million test mutations every day across 16 cores. 100% branch coverage - Every single branch instruction in SQLite's core is tested in both directions. Not just 'did this line run', but 'did this condition evaluate to both true AND false'. Databases are really unforgiving :) By the way, if you want to go deeper, I recommend reading the official SQLite documentation on their testing strategy. The doc is pretty practical and deep. Have linked it below.
English
98
519
6.1K
527.5K
Udit Goenka
Udit Goenka@iuditg·
I've a playbook for AEO. (Sharing for 24h only) That just works. Within less than a month of launching a website, I can guarantee you that if you follow the playbook.. ..you can get recommended by Copilot, ChatGPT, Claude, Gemini, Perplexity, etc. Just comment "AEO" and I will send you the guide in the next 24 hours.
English
148
3
72
10.4K
Pandey
Pandey@SumitPandeyvine·
🥹🥹🥹
Pandey tweet media
QME
1
0
1
22
Pandey
Pandey@SumitPandeyvine·
I just hate government job, I don't know but I do
English
0
0
0
17