Nathan

421 posts

Nathan

@NathanJLeRoy

Research @ Qdrant, PhD @ UVA

New York, NY Katılım Şubat 2012

196 Takip Edilen246 Takipçiler

Nathan@NathanJLeRoy·25 Mar

@GoogleResearch Inb4 they contrast it with natural language to do multimodal search: > give me the location with the most tim hortons in walking distance possible

English

461

Google Research@GoogleResearch·24 Mar

Mapping the modern world: We introduce S2Vec, a self-supervised framework that transforms complex geospatial data into general-purpose embeddings for predicting population density, carbon emissions, and urban development at scale. Check out the blog: goo.gle/3PlswyF

English

129

957

53.9K

Nathan@NathanJLeRoy·13 Mar

@marvindiazjr @contextkingceo exactly. And really just shows me they don’t know ball, considering context-dependent embeddings have been a thing since early 2010s

English

100

Marvin Diaz Jr. 👀@marvindiazjr·13 Mar

@NathanJLeRoy @contextkingceo yeah the problem statement is based around the most infantile version of RAG that is nowhere representative of what Hybrid Search RAG with reranking already can do...

English

147

Nishkarsh@contextkingceo·12 Mar

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

English

620

641

3.8M

Nathan@NathanJLeRoy·9 Mar

@Samhanknr @jxmnop Better data is **always** better than a better model

English

Zengineering@Samhanknr·9 Mar

@jxmnop Great article. It’s always about the data

English

1.4K

dr. jack morris@jxmnop·9 Mar

x.com/i/article/2031…

ZXX

156

1.9K

395.6K

Nathan@NathanJLeRoy·2 Mar

I agree. My hypothesis is ATS systems being inundated with AI slop resumes such that they simply can’t find the right candidates and miss real talent. Lots of people coming into interviews who can hardly form sentences or just drastically over exaggerating qualifications. I predict a trend toward in person networking and (ironically) tactics like: “just go in with a firm handshake and a smile”

English

502

Enzo the Baker@BakerThe5603·2 Mar

@sanpellyenjoyer What I don’t understand is I apply for jobs which exactly match my current role and hear nothing. However when I am the hiring manager I get resumes with only a tenuous link to my JD. “ Well they worked on a project 5 years ago which is sort of what you asked for.”

English

6.9K

Alex@sanpellyenjoyer·2 Mar

Every 3-6 months I vigorously apply for new jobs (reach out to people on LinkedIn, tailor resume if needed, the whole shebang), hear nothing back, and then I remember it’s over for white collar work until something pisses me off at my job and I repeat the process

English

161

5.8K

312.3K

Nathan@NathanJLeRoy·27 Şub

@JEFworks Better data is better than better models!

English

Dr. Jean Fan@JEFworks·25 Şub

Changing training data alone can improve deep learning prediction of spatial transcriptomics gene expression from histology images by 38% (without any changes to model architecture). We've updated our preprint showing this with expanded results: biorxiv.org/content/10.110… 🧵👇 1/4

English

6.5K

Nathan@NathanJLeRoy·3 Ara

@ReplyGrinder @qdrant_engine Hey! I agree with whats mentioned at the end of the article: start small and local if possible, then migrate things when you are fully ready: > test Qdrant with a single non-critical collection. First, run it for a few weeks, evaluate the performance and operational overhead

English

Luca | ReplyGrinder.ai 🌊@ReplyGrinding·3 Ara

@qdrant_engine Qdrant Flexing on Elastic! Any tips to ensure a smooth transition without disrupting the developer experience? cc: @qdrantlabs

English

Qdrant@qdrant_engine·3 Ara

𝐌𝐢𝐠𝐫𝐚𝐭𝐢𝐧𝐠 𝐟𝐫𝐨𝐦 𝐄𝐥𝐚𝐬𝐭𝐢𝐜𝐬𝐞𝐚𝐫𝐜𝐡 𝐭𝐨 𝐐𝐝𝐫𝐚𝐧𝐭 - 𝐃𝐞𝐞𝐩-𝐃𝐢𝐯𝐞 𝐛𝐲 Mahimai Raja J We think this is a a great technical breakdown on why many teams are moving from Elasticsearch to Qdrant for their vector search workloads. Why he wrote this? After facing challenges with scaling Elasticsearch for vector workloads - complex configs, higher infra cost, and limited vector performance - Mahimai created a practical guide to help teams transition smoothly to a vector-first stack. 𝐈𝐭 𝐜𝐨𝐧𝐭𝐚𝐢𝐧𝐬: 𝑾𝒉𝒚 𝒎𝒊𝒈𝒓𝒂𝒕𝒆? - Challenging to build scaled, performant vector search with Elastic - Need to reduce latency - Increase resource efficiency 𝑯𝒐𝒘 𝑸𝒅𝒓𝒂𝒏𝒕 𝒉𝒆𝒍𝒑𝒔? - Native vector indexing - Strong payload filtering - Efficient dense + sparse hybrid search - Easier scaling and maintenance 𝑻𝒉𝒆 𝒎𝒊𝒈𝒓𝒂𝒕𝒊𝒐𝒏 𝒑𝒓𝒐𝒄𝒆𝒔𝒔: - Exporting ES data & embeddings - Re-mapping schema for Qdrant - Rebuilding collections & payloads - Updating query patterns for vector search - Handling ranking and scoring differences Practical guidance: Includes real-world examples, code snippets, and common pitfalls to avoid during migration. Full article 👉 pub.towardsai.net/how-to-migrate… #Qdrant #VectorSearch #Migration #SearchEngineering #LLMApps

English

1.1K

Nathan@NathanJLeRoy·9 Kas

@IdoIrani @joreyn82 @JohnHolbein1 Ha! very interesting. Abstract click-bait. Also meta-clickbait in the original post claiming the abstract-only distribution is equivalent to "z-values from medical research"

English

Ido Irani@IdoIrani·7 Kas

@NathanJLeRoy @joreyn82 @JohnHolbein1 Turns out, there's a good explanation. These are mostly z values from abstracts. Another version of rhis graph looks different x.com/lakens/status/… x.com/vientsek/statu…

Witold Więcek 🇺🇦@vientsek

No, look at *this* distribution of z-values from medical research! (329,601 z-values from Cochrane database)

English

142

John B. Holbein@JohnHolbein1·5 Kas

Look at the distribution of z-values from medical research!

English

125

306

5.9K

1.4M

Nathan retweetledi

bioRxiv Bioinfo@biorxiv_bioinfo·4 Kas

Atacformer: A transformer-based foundation model for analysis and interpretation of ATAC-seq data biorxiv.org/content/10.110… #biorxiv_bioinfo

English

4.5K

Nathan@NathanJLeRoy·6 Kas

@IdoIrani @joreyn82 @JohnHolbein1 Agreed

English

Ido Irani@IdoIrani·6 Kas

@NathanJLeRoy @joreyn82 @JohnHolbein1 Just to clarify, there is no rule against publishing a z=1.9 result. I would personally treat it just as statistically significant as a z=2 (that is, not very significant)

English

118

Nathan@NathanJLeRoy·6 Kas

@IdoIrani @joreyn82 @JohnHolbein1 I like the explanation of a "psychological barrier." -- thanks for that. Reminds me of the distribution of marathon times histogram. Huge spike at 4:00 hrs... There's nothing magical about it other than it's some arbitrary barrier people really want to surpass.

English

Ido Irani@IdoIrani·6 Kas

@NathanJLeRoy @joreyn82 @JohnHolbein1 you expect people to be as happy with 2.01 as with 1.99. the p-value is almost identical, right? It's the psychological barrier of the z=2. It also makes people choose an analysis that compliments their results. But sure - there is definitely also survivorship bias.

English

172

Nathan@NathanJLeRoy·5 Kas

@IdoIrani @joreyn82 @JohnHolbein1 Maybe I am dense, but why? If 1.95 is not significant using a two-tailed test, it doesn't get reported. However, 2.05 will. Is that not exactly the survivorship bias? (Genuinely asking because I feel like I'm still not getting it)

English

202

Ido Irani@IdoIrani·5 Kas

@joreyn82 @JohnHolbein1 That could explain a difference between z of 4 and 2, not a difference between 2.05, and 1.95. the sharp jump is artificial

English

4.1K

Nathan@NathanJLeRoy·6 Ağu

@mtlushan What does each point represent? Something feels off about this -- like your embedding dimensions are all highly correlated and therefore not that informative. Every time I get a UMAP like this it usually means somethings not right on my end...

English

243

Jack D. Carson@mtlushan·6 Ağu

by far the most beautiful UMAP I've had the privilege of making I love biology

English

518

28K

Nathan@NathanJLeRoy·30 May

@MikeTFox5 Am I seeing correctly that the left half of the system seems to get obliterated by the Appalachian Mountains?

English

274

Mike Thomas@MikeTFox5·30 May

Morning HRRR simulations still suggesting an active evening in our region, particularly around the evening commute hours. 🌡️: Sun breaking out early with latest guidance suggesting DC peaks around 80°F with dews in the mid-to-upper 60s ⛈️: SPC upped most of the region to slight risk of severe weather. Most active window looks like 4-8pm. 🌪️: Tornado risk is isolated locally. Highest risk zones are in the Northern Neck, Eastern VA, and southern MD.

GIF

English

2.8K

Nathan@NathanJLeRoy·23 May

@EIFY @percyliang Interesting! If I saw this during a training (I don't train LLMs), I'd think this was clearly me not shuffling my dataset, or some artifacts with my LR scheduler. Wonder if anyone has looked into the actual batches in these steps to see what might have caused a spike.

English

258

EIFY@EIFY·23 May

@NathanJLeRoy @percyliang They are really hard to avoid, so much so that it's more noteworthy if there isn't any lol AFAIK so far in the open only DeepSeek-V3 and Pangu Ultra have claimed absence or near absence of them

English

401