Nathan

421 posts

Nathan

Nathan

@NathanJLeRoy

Research @ Qdrant, PhD @ UVA

New York, NY Katılım Şubat 2012
196 Takip Edilen246 Takipçiler
Nathan
Nathan@NathanJLeRoy·
@GoogleResearch Inb4 they contrast it with natural language to do multimodal search: > give me the location with the most tim hortons in walking distance possible
English
0
0
0
461
Google Research
Google Research@GoogleResearch·
Mapping the modern world: We introduce S2Vec, a self-supervised framework that transforms complex geospatial data into general-purpose embeddings for predicting population density, carbon emissions, and urban development at scale. Check out the blog: goo.gle/3PlswyF
Google Research tweet media
English
11
129
957
53.9K
Nathan
Nathan@NathanJLeRoy·
@marvindiazjr @contextkingceo exactly. And really just shows me they don’t know ball, considering context-dependent embeddings have been a thing since early 2010s
English
0
0
2
100
Marvin Diaz Jr. 👀
Marvin Diaz Jr. 👀@marvindiazjr·
@NathanJLeRoy @contextkingceo yeah the problem statement is based around the most infantile version of RAG that is nowhere representative of what Hybrid Search RAG with reranking already can do...
English
1
0
2
147
Nishkarsh
Nishkarsh@contextkingceo·
We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️
English
620
641
6K
3.8M
Zengineering
Zengineering@Samhanknr·
@jxmnop Great article. It’s always about the data
English
1
0
1
1.4K
Nathan
Nathan@NathanJLeRoy·
I agree. My hypothesis is ATS systems being inundated with AI slop resumes such that they simply can’t find the right candidates and miss real talent. Lots of people coming into interviews who can hardly form sentences or just drastically over exaggerating qualifications. I predict a trend toward in person networking and (ironically) tactics like: “just go in with a firm handshake and a smile”
English
2
0
3
502
Enzo the Baker
Enzo the Baker@BakerThe5603·
@sanpellyenjoyer What I don’t understand is I apply for jobs which exactly match my current role and hear nothing. However when I am the hiring manager I get resumes with only a tenuous link to my JD. “ Well they worked on a project 5 years ago which is sort of what you asked for.”
English
1
1
34
6.9K
Alex
Alex@sanpellyenjoyer·
Every 3-6 months I vigorously apply for new jobs (reach out to people on LinkedIn, tailor resume if needed, the whole shebang), hear nothing back, and then I remember it’s over for white collar work until something pisses me off at my job and I repeat the process
English
46
161
5.8K
312.3K
Nathan
Nathan@NathanJLeRoy·
@JEFworks Better data is better than better models!
English
0
0
0
24
Dr. Jean Fan
Dr. Jean Fan@JEFworks·
Changing training data alone can improve deep learning prediction of spatial transcriptomics gene expression from histology images by 38% (without any changes to model architecture). We've updated our preprint showing this with expanded results: biorxiv.org/content/10.110… 🧵👇 1/4
Dr. Jean Fan tweet media
English
2
23
95
6.5K
Nathan
Nathan@NathanJLeRoy·
@ReplyGrinder @qdrant_engine Hey! I agree with whats mentioned at the end of the article: start small and local if possible, then migrate things when you are fully ready: > test Qdrant with a single non-critical collection. First, run it for a few weeks, evaluate the performance and operational overhead
English
0
0
1
21
Qdrant
Qdrant@qdrant_engine·
𝐌𝐢𝐠𝐫𝐚𝐭𝐢𝐧𝐠 𝐟𝐫𝐨𝐦 𝐄𝐥𝐚𝐬𝐭𝐢𝐜𝐬𝐞𝐚𝐫𝐜𝐡 𝐭𝐨 𝐐𝐝𝐫𝐚𝐧𝐭 - 𝐃𝐞𝐞𝐩-𝐃𝐢𝐯𝐞 𝐛𝐲 Mahimai Raja J We think this is a a great technical breakdown on why many teams are moving from Elasticsearch to Qdrant for their vector search workloads. Why he wrote this? After facing challenges with scaling Elasticsearch for vector workloads - complex configs, higher infra cost, and limited vector performance - Mahimai created a practical guide to help teams transition smoothly to a vector-first stack. 𝐈𝐭 𝐜𝐨𝐧𝐭𝐚𝐢𝐧𝐬: 𝑾𝒉𝒚 𝒎𝒊𝒈𝒓𝒂𝒕𝒆? - Challenging to build scaled, performant vector search with Elastic - Need to reduce latency - Increase resource efficiency 𝑯𝒐𝒘 𝑸𝒅𝒓𝒂𝒏𝒕 𝒉𝒆𝒍𝒑𝒔? - Native vector indexing - Strong payload filtering - Efficient dense + sparse hybrid search - Easier scaling and maintenance 𝑻𝒉𝒆 𝒎𝒊𝒈𝒓𝒂𝒕𝒊𝒐𝒏 𝒑𝒓𝒐𝒄𝒆𝒔𝒔: - Exporting ES data & embeddings - Re-mapping schema for Qdrant - Rebuilding collections & payloads - Updating query patterns for vector search - Handling ranking and scoring differences Practical guidance: Includes real-world examples, code snippets, and common pitfalls to avoid during migration. Full article 👉 pub.towardsai.net/how-to-migrate… #Qdrant #VectorSearch #Migration #SearchEngineering #LLMApps
Qdrant tweet media
English
2
1
15
1.1K
Nathan
Nathan@NathanJLeRoy·
@IdoIrani @joreyn82 @JohnHolbein1 Ha! very interesting. Abstract click-bait. Also meta-clickbait in the original post claiming the abstract-only distribution is equivalent to "z-values from medical research"
English
0
0
0
23
John B. Holbein
John B. Holbein@JohnHolbein1·
Look at the distribution of z-values from medical research!
John B. Holbein tweet media
English
125
306
5.9K
1.4M
Ido Irani
Ido Irani@IdoIrani·
@NathanJLeRoy @joreyn82 @JohnHolbein1 Just to clarify, there is no rule against publishing a z=1.9 result. I would personally treat it just as statistically significant as a z=2 (that is, not very significant)
English
1
0
4
118
Nathan
Nathan@NathanJLeRoy·
@IdoIrani @joreyn82 @JohnHolbein1 I like the explanation of a "psychological barrier." -- thanks for that. Reminds me of the distribution of marathon times histogram. Huge spike at 4:00 hrs... There's nothing magical about it other than it's some arbitrary barrier people really want to surpass.
Nathan tweet media
English
1
0
2
42
Ido Irani
Ido Irani@IdoIrani·
@NathanJLeRoy @joreyn82 @JohnHolbein1 you expect people to be as happy with 2.01 as with 1.99. the p-value is almost identical, right? It's the psychological barrier of the z=2. It also makes people choose an analysis that compliments their results. But sure - there is definitely also survivorship bias.
English
2
0
6
172
Nathan
Nathan@NathanJLeRoy·
@IdoIrani @joreyn82 @JohnHolbein1 Maybe I am dense, but why? If 1.95 is not significant using a two-tailed test, it doesn't get reported. However, 2.05 will. Is that not exactly the survivorship bias? (Genuinely asking because I feel like I'm still not getting it)
English
2
0
11
202
Ido Irani
Ido Irani@IdoIrani·
@joreyn82 @JohnHolbein1 That could explain a difference between z of 4 and 2, not a difference between 2.05, and 1.95. the sharp jump is artificial
English
3
1
94
4.1K
Nathan
Nathan@NathanJLeRoy·
@mtlushan What does each point represent? Something feels off about this -- like your embedding dimensions are all highly correlated and therefore not that informative. Every time I get a UMAP like this it usually means somethings not right on my end...
English
1
0
1
243
Jack D. Carson
Jack D. Carson@mtlushan·
by far the most beautiful UMAP I've had the privilege of making I love biology
Jack D. Carson tweet media
English
33
15
518
28K
Nathan
Nathan@NathanJLeRoy·
@MikeTFox5 Am I seeing correctly that the left half of the system seems to get obliterated by the Appalachian Mountains?
English
2
0
1
274
Mike Thomas
Mike Thomas@MikeTFox5·
Morning HRRR simulations still suggesting an active evening in our region, particularly around the evening commute hours. 🌡️: Sun breaking out early with latest guidance suggesting DC peaks around 80°F with dews in the mid-to-upper 60s ⛈️: SPC upped most of the region to slight risk of severe weather. Most active window looks like 4-8pm. 🌪️: Tornado risk is isolated locally. Highest risk zones are in the Northern Neck, Eastern VA, and southern MD.
GIF
English
4
2
29
2.8K
Nathan
Nathan@NathanJLeRoy·
@EIFY @percyliang Interesting! If I saw this during a training (I don't train LLMs), I'd think this was clearly me not shuffling my dataset, or some artifacts with my LR scheduler. Wonder if anyone has looked into the actual batches in these steps to see what might have caused a spike.
English
2
0
1
258
EIFY
EIFY@EIFY·
@NathanJLeRoy @percyliang They are really hard to avoid, so much so that it's more noteworthy if there isn't any lol AFAIK so far in the open only DeepSeek-V3 and Pangu Ultra have claimed absence or near absence of them
English
3
0
3
401
Percy Liang
Percy Liang@percyliang·
Marin 32B training crossed 1.5 trillion tokens today...
Percy Liang tweet media
English
9
15
300
316.1K