Aidan

82 posts

Aidan banner
Aidan

Aidan

@AidanGior

Always building. Usually somewhere between accounting tech, automation, AI agents, market intel, and business software.

Katılım Kasım 2016
1.1K Takip Edilen99 Takipçiler
Aidan
Aidan@AidanGior·
@adxtyahq Honestly useless answer without mentioning chunking/indexing methods
English
0
0
2
593
aditya
aditya@adxtyahq·
“design a RAG pipeline for 10M docs with zero hallucination” apparently this was asked in a Google L5 interview round. came across it somewhere on the internet and honestly it’s a way more interesting system design problem than most classic distributed systems questions 1. ingest + normalize docs - remove duplicates, standardize formats, extract metadata, maintain version history 2. hybrid retrieval (BM25 + embeddings) - BM25 handles exact keyword matching while embeddings capture semantic meaning - semantic search alone usually struggles with precision at massive scale 3. ANN retrieval + reranking - ANN (Approximate nearest neighbor ) quickly pulls top candidate chunks from millions of docs - then a reranker rescoring step improves relevance by deeply comparing query vs retrieved chunks 4. source confidence scoring - every retrieved chunk gets scored based on freshness, trust level, overlap and retrieval consistency - low-confidence context should never heavily influence generation 5. constrained generation - the model is only allowed to answer using retrieved context (nothing new to be invented outside of the retrieved context) 6. citation-backed responses - every major claim links back to exact chunks, documents or timestamps 7. hallucination fallback layer - if retrieval confidence drops below a threshold: “insufficient evidence found” 8. continuous evals - run adversarial queries, retrieval recall benchmarks and hallucination tests continuously 9. caching + memory layer - cache high-frequency enterprise queries and retrieval paths (improves latency and output) 10. observability everywhere - trace retrieval paths, chunk rankings, token attribution and failure points Also at 10M docs, retrieval quality matters more than the frontier model itself.
aditya tweet media
English
83
320
2.7K
189.1K
Aidan retweetledi
Blake Burge
Blake Burge@blakeaburge·
Underrated life advice: Make yourself easy to root for. Be kind. Be reliable. Celebrate other people’s wins. Work hard without complaining. Carry good energy into rooms. You'll be shocked by how many doors open for you by making life better for others.
English
172
3.6K
23.6K
491.8K
Aidan
Aidan@AidanGior·
@_vmlops Pretty sure these are distilling models. Claude costs more because they actively fight distillation
English
0
0
1
2K
Vaishnavi
Vaishnavi@_vmlops·
CHINESE DEVS ARE BURNING 100M+ GPT-5.4 TOKENS FOR ~$1/DAY ▫️ they buy api access from resellers who exploit cheap regional subscriptions at massive scale ▫️ gpt costs them 3% of official price. claude costs more because anthropic made it harder to crack ▫️ when pirates can undercut you by 97%, your pricing model is the real problem
Vaishnavi tweet mediaVaishnavi tweet media
English
19
29
456
214K
Jonas
Jonas@jonaasw1·
I created a list of my favorite cafes to work at in NYC. Coffee shops > offices. I started my company inside a coffee shop. Beautiful spaces. Good energy. Surrounded by people locking in. And you randomly meet the most interesting people. My list includes the cafe name, neighborhood, and a proprietary, confidential scoring system based on: - Work space (tables, outlets, WiFi, design) - Food (quality, options, price) - Music (playlist, can you take calls) - People (do interesting people go here) - Coffee (does it hit) - Vibes (overall energy) I'd love to share this list with you + add new spots. Comment "CAFE" and I'll DM you the list.
Jonas tweet mediaJonas tweet media
English
381
17
752
207.9K
Aidan
Aidan@AidanGior·
@swang_co moshava is one of the best. not open on weekends though
English
0
0
2
1.3K
Serena Wang
Serena Wang@swang_co·
best spots to lock in on a drizzly Saturday in NY 🌧️ - conwell coffee hall - Georgie’s cafe - haraz coffee house - stone street - cafe jalu - mori coffee - plantshed - the lost draft - toby’s estate in tribeca - kings street coffee - the blue bottle on broadway - moshava only including the few that are both laptop friendly over the weekend and with ample seating for founders building/raising, come co-work with me next Sat!
Serena Wang tweet mediaSerena Wang tweet mediaSerena Wang tweet mediaSerena Wang tweet media
English
12
46
1.2K
111.2K
Aidan
Aidan@AidanGior·
@hiarun02 Often using Claude/Codex CLIs in Cursor and only using cursor ai for basic stuff and quick edits. Problem is starting to feel cursor being very slow and bloated
English
0
0
0
419
Arun
Arun@hiarun02·
Is anyone still using VSCode instead of switching fully to Claude Code, Cursor IDE, or OpenAI Codex? Or am I the only one?
English
352
6
419
84.3K
Victor Cardenas Codriansky
Victor Cardenas Codriansky@victorcardenas·
We built this internally at Slash and it has genuinely changed the trajectory of our company. Insane what perfect, real-time context for anyone at your org does for productivity.
Y Combinator@ycombinator

Company Brain @t_blom Every company has critical know-how scattered across people's heads, old Slack threads, support tickets, and databases, and AI agents can't operate like that. We think every company in the world is going to need a new primitive: a living map of how the company works that turns its own artifacts into an executable skills file for AI.

English
35
10
556
168.6K
Aidan
Aidan@AidanGior·
@garrytan Wow zero LLM entity extraction? Hybrid GraphRAG is nice if done well but will decay quickly at scale compared to the 2MB eval harness
English
2
0
1
647
Garry Tan
Garry Tan@garrytan·
For GBrain I built a proper eval harness. 145 queries, Opus-generated corpus. The retrieval stack uses graph based, vector based and Grep based strategies in combination. The graph layer is worth +31 points on precision. Vector-only misses 170/261 correct answers that the full system finds. Keyword + vector + graph are three separable wins, each load-bearing. Standard information retrieval metrics: the same ones Google uses to measure search quality. Precision at 5: You ask a question, the system returns 5 results. How many of those 5 are actually useful? If 3 out of 5 are relevant, P@5 = 60%. It measures: am I wasting your time with junk results? Recall at 5: For a given question, there might be 3 pages in the entire brain that are genuinely relevant. If the system finds all 3 in its top 5, R@5 = 100%. If it only finds 1, R@5 = 33%. It measures: am I missing things you need? High precision = low noise. High recall = nothing slips through. GBrain's 97.9% R@5 means it almost never misses the right answer. The 49.1% P@5 means about half the results are relevant — which is good when you realize that for most queries there are only 1-2 right answers out of 17,888 pages, so 2.5 hits out of 5 is strong signal. Entity resolution is zero-LLM-call: regex extracts typed links (works_at, invested_in, founded) on every write. Re-embed on write not on a timer, so decay = stale pages, and stale pages get rewritten when new info lands. Scorecards: github.com/garrytan/gbrai…
Garry Tan tweet media
English
56
28
466
211.7K
Aidan
Aidan@AidanGior·
@tom_doerr Feels like the next level of Karpathy’s knowledge bases
English
0
0
1
59
Aidan
Aidan@AidanGior·
@mytechceoo @albertadevs @getaxal Very interested to see where this goes. Token maxxing makes sense while driving adoption but token efficiency will soon become priority as costs grow with adoption. Even personally I’m starting want to see where I’m spending and what value I’m getting
English
0
0
1
150
Jason
Jason@mytechceoo·
CEO obsessed with token maxxing
English
282
1K
13K
1.9M
Aidan retweetledi
Gideon Shalwick
Gideon Shalwick@GideonShalwick·
Hot take: Vibe coding doesn’t (fully) replace thinking. It replaces hard core, coalface coding. If you want it to actually work, you still need: - Deep understanding of what the market wants - A clear user experience (not just “it works”) - Real UI design with proper affordances - Backend + infrastructure that can scale - Security that won’t bite you later - A plan for distribution Vibe coding replaces the old dev bottleneck. It doesn’t replace product, strategy, or execution. What do you think @garrytan?
English
53
50
440
113.8K
Aidan
Aidan@AidanGior·
Wow I had a veryyy similar idea just last week. Started building a variation of it that I believe is more powerful. I agree though, the only thing stopping agents from being infinitely more helpful is them not having the full context on our works, goals, problems, procedures, etc. Luckily that data is readily available
English
0
0
0
1.1K
Tibo
Tibo@thsottiaux·
We are releasing a *research preview* of Chronicle in Codex. It allows codex to build up memories based on your day to day work on your computer and then refer to these memories to be a lot more helpful. Available for PRO subscriptions and on Mac to start. This is early and consumes quite a bit of tokens, but it has changed how I and many folks at OpenAI use Codex.
OpenAI Developers@OpenAIDevs

Last week, we released a preview of memories in Codex. Today, we’re expanding the experiment with Chronicle, which improves memories using recent screen context. Now, Codex can help with what you’ve been working on without you restating context.

English
239
150
2.6K
938.9K
Aidan
Aidan@AidanGior·
@TalkinYanks This is the same team as always. Just run and put the ball in play and we’ll beat ourselves
English
0
0
27
1.6K
Talkin' Yanks
Talkin' Yanks@TalkinYanks·
Yankees have lost four straight
Talkin' Yanks tweet media
English
229
72
1.1K
64.1K
Aidan
Aidan@AidanGior·
@RyanFieldABC They can’t say they didn’t expect this. Tickets are only a few $ for a lot of games anyway, $0 is not a game changer. Sad reality of a sport with 81 home games, especially on a cold weeknight
English
1
0
5
10.3K
Ryan Field
Ryan Field@RyanFieldABC·
The Mets owner has a legitimate gripe.
Ryan Field tweet media
English
149
220
20K
1.9M
Aidan
Aidan@AidanGior·
@a16z One of my favorite speeches ever. The spirit we need, now growing again with AI. Choose the hard things. Bring out the best of our energy and skill.
English
0
0
0
681
a16z
a16z@a16z·
We're going back to the moon.
English
76
438
4K
215.5K
Aidan
Aidan@AidanGior·
@Vtrivedy10 @hwchase17 Frontier models shine for general apps. Open models with specialized harnesses crush targeted tasks. Running MiniMax 2.7 in DeepAgents for much lower costs
English
1
0
1
113
Viv
Viv@Vtrivedy10·
we’re leaning incredibly hard into Open Models + Open Harnesses evals show that current open models get near frontier (or better) intelligence on many tasks, they’re way cheaper, and usually faster real world tasks need to take perf, cost, latency into account many tasks don’t need the bleeding edge frontier intelligence, they need specialized intelligence funneled into a problem with open models, production traces becomes even more of a moat with harness engineering and finetuning we’re so early, a big part of the future will be open everything 🚀
Mason Daugherty@masondrxy

x.com/i/article/2039…

English
13
16
92
14K
Aidan
Aidan@AidanGior·
Probably user error but first issue I faced was the main agent continuously checking on the async subagent so it’s not really async. I don’t fully see a reason to let the main agent check at all for my use case so would be nice have more granular control over tools like that by default
English
1
0
0
20
Harrison Chase
Harrison Chase@hwchase17·
⛳️async subagents in deepagents==0.5.0a1 i think in future we will "chat" with a single agent, and it will manage multiple longer running agents in the background you can now do this with deepagents we released an alpha release (0.5.0a1) with this functionality - try it out and let us know what you think! docs.langchain.com/oss/python/rel…
Harrison Chase tweet media
English
23
21
155
21.8K
Aidan
Aidan@AidanGior·
@shiri_shh Is it still dogfooding if it’s SOTA?
English
0
0
0
221
shirish
shirish@shiri_shh·
The Anthropic team is dogfooding Claude Code at insane levels. In the last 52 days, the Claude team dropped 50+ major UPDATES. One employee alone hit $150,000 in a single month on Claude Code 80% of employees use it daily, with power users racking up six-figure bills.
shirish tweet media
Claude@claudeai

Your work tools in Claude are now available on mobile. Explore Figma designs, create Canva slides, check Amplitude dashboards, all from your phone. Give it a try: claude.com/download

English
92
52
938
431.9K