Neil Stoker ✨

5.6K posts

Neil Stoker ✨

@nmstoker

Photographer 📷 Developer 👨‍💻 Pythonista 🐍 Data Wrangler 🤠 ML Enthusiast 🤖🧠🎉 Countryside Lover 🍂🍄🌳💚

London Katılım Kasım 2014

545 Takip Edilen227 Takipçiler

Neil Stoker ✨ retweetledi

Siddhartha Saxena@siddsax·1d

Anthropic onboarding day: Michael Scott introducing Karpathy like he just signed Wemby in free agency.

English

366

1.4K

16.1K

1.9M

Neil Stoker ✨@nmstoker·3d

@Alex_Stafford @TfL Sorry to hear that Hope it wasn't this guy... standard.co.uk/news/crime/pol…

English

Alexander Stafford@Alex_Stafford·4d

Just been assaulted at the train station. Punched in the head and the face to the ground. @TfL staff did literally nothing, even when the assailant had left. Thank you to the fellow passengers who came to my aid. Sadiq Khan has lost control of the city. This is lawless London.

English

783

1.9K

11.4K

592.3K

Neil Stoker ✨@nmstoker·6d

@karpathy Congratulations 👏

English

Andrej Karpathy@karpathy·6d

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English

7.9K

11.1K

148.9K

27M

Neil Stoker ✨ retweetledi

Artur Nadolny@ArturNadol7566·8 May

SHE SAVED MILLIONS IN TAX. HE SET THE TAX RULES. HMRC SAW NO PROBLEM. Rishi Sunak (@RishiSunak) was running the nation's finances. Raising taxes on working people. Telling the country there was no alternative. His wife, Akshata Murty, was quietly using non-domiciled status to avoid paying UK tax on her overseas earnings, including roughly £11.6 million a year in dividends from her father's company, Infosys. The estimated saving: around £2.1 million per year. Over several years, sources told @Independent that figure could have reached £20 million. Non-dom status is legal. But when the man setting tax policy for 67 million people has a wife saving millions under that same policy, most organisations would want that conflict documented and scrutinised. There is no evidence HMRC (@HMRCgovuk) treated it with any urgency. Then someone inside Whitehall decided the public had a right to know. A source passed details to @Independent in April 2022, right in the middle of Partygate. The story blew up. Sunak was forced to ask for a ministerial interests review. Murty announced she would voluntarily start paying UK tax on worldwide income. What happened to the whistleblower? A leak inquiry was launched. @Channel4 noted it could lead to criminal prosecution, because disclosing someone's personal tax information is illegal in the UK. The source was never publicly identified. No prosecution ever came. So the person who told the truth about a potential conflict of interest at the heart of the Treasury faced a criminal investigation. The conflict of interest itself got a press release and a polite apology. Source: @Independent, @guardian, @BBCNews, @thetimes

English

416

552

18.7K

Neil Stoker ✨@nmstoker·12 May

@alexmaxham Correct, they'll call it Aluminium OS 😀

English

Alex Maxham@alexmaxham·12 May

The Googlebook is real. This is essentially Google's new laptop running 'Aluminum OS', though they are not explicitly calling it that. This is just a tease for right now, the first hardware will start shipping this Fall, possibly around the time of the Pixel 11 launch. Google is working with a number of PC makers including Acer, ASUS, Dell, HP, and Lenovo. Which is why it's not a PixelBook.

English

4.9K

Neil Stoker ✨ retweetledi

Google DeepMind@GoogleDeepMind·12 May

We’re reimagining a 50-year-old interface - the mouse pointer - with AI. 🖱️ These experimental demos show how people can intuitively direct Gemini on their screens using motion, speech, and natural shorthand to get things done 🧵

English

461

1.1K

8.6K

1.6M

Neil Stoker ✨@nmstoker·10 May

@sama More sophisticated modelling/control of when to ask for my guidance/preference/give me an early summary vs when to press on (a la "I'm feeling lucky")

English

Sam Altman@sama·9 May

what would you most like to see improve in our next model?

English

8.3K

305

1.4M

Neil Stoker ✨ retweetledi

Akshay 🚀@akshay_pachaar·7 May

A tricky LLM interview question: Your RAG system scores 90% retrieval accuracy on 5k company docs. But scaling to 500k docs drops the accuracy to just 50%, with the same embedding model and retriever. Why did this happen? The simplest answer is that more documents mean more competition for the top-k retrieval slots. That is true, but it doesn't explain why accuracy drops this dramatically. The answer comes down to how enterprise docs are distributed in the embedding space. Today, a single product decision in a company generates meeting transcripts, Slack threads, Confluence docs, Jira tickets, and email threads. They are related to the same event, so they all land in a similar region of the embedding space. As the company operates over months, this pattern repeats for every project/customer/roadmap, and the embedding space fills up with clusters of closely related documents. But all related docs don't contain the same facts. → Slack thread covers the decision made → Jira has the implementation deadline → Confluence has the technical spec → Email thread has the customer request When a query is about a specific fact (like a deadline), the answer lives in one of those docs. At a 5K corpus size, there might be 3-5 docs touching that topic, and the correct one easily lands in the top-k results. But at a 500K corpus size, there could be 40-60 total docs, and the one containing the actual answer can easily get pushed out of the top-k by other topically relevant docs, degrading retrieval. A recent research paper from Onyx documented this. The researchers used their newly open-sourced EnterpriseRAG-Bench dataset. It has 500k+ synthetic enterprise documents spread across Slack, Gmail, Jira, GitHub, Confluence, Google Drive, HubSpot, Fireflies, and Linear, with realistic noise like misfiled documents, near-duplicates, and conflicting versions. They ran the same retrievers at five corpus sizes from 5K to 500K. → Vector search accuracy dropped from 90.7% at 5K documents to 50.6% at 500K docs. → BM25 degraded more gracefully, from 85.8% to 68.4%. → At every scale, higher neighborhood density in the embedding space monotonically correlated with lower recall. The practical implication here is that retrieval accuracy on a 5k test set tells you almost nothing about production-scale performance. Always test at a realistic volume to measure the neighborhood density in your embedding space to estimate how much headroom the retriever actually has. The entire EnterpriseRAG-Bench dataset (500K docs with questions, and the whole evaluation harness) is open-source. Run your retriever against it at 5K, then at 500K, and see where your own accuracy curve breaks. I have shared the GitHub repo in the replies.

English

440

62K

Neil Stoker ✨ retweetledi

okazakitomohiro@oo_kk_aa·6 May

ニャッキの伊藤有壱さんにお声掛け頂き、コマ撮りの展覧会に一作家として参加しています。私はコマ撮り分野ではない場所から活動をはじめて、デザインの視点でのコマ撮りに取り組んできましたが、今回初めてコマ撮り界の本丸の方々とご一緒でき嬉しいです。今6年目のマッチ撮影素材等を展示しています

日本語

527

27.4K

124.3K

5.1M

Neil Stoker ✨ retweetledi

Alice Mills@millsalice144·3 May

I'm a big advocate for the Oxford comma. I'm, also an advocate for, the, Shatner comma. You should, try it sometime. It really, makes your, sentences more, exciting!

English

239

659

5.9K

163.2K

Neil Stoker ✨@nmstoker·4 May

@paulg Oh my! Anything for other rooms in the house?! 🫨

English

268

Paul Graham@paulg·4 May

Apparently in Germany you can be fined for "Zweckentfremdung" if you use your garage for anything other than parking. Even for storage! themunicheye.com/bavaria-garage…

English

1.3K

63.7K

Paul Graham@paulg·4 May

It could actually be a significant problem that Europe doesn't have enough garages. This sounds like a joke, but I'm serious. Garages let you work on stuff that doesn't matter yet, which is how big things often start. The outliers of ideas need the outliers of space.

Jon Erlichman@JonErlichman

First offices of 6 companies worth a combined $21 trillion.

English

757

1.5K

15.1K

1.5M

Neil Stoker ✨@nmstoker·2 May

@Birdyword And now it fits in with the data centre they already built on the hill behind it! 🙂

English

Mike Bird@Birdyword·30 Nis

Many people do not seem to want data centres built near them, despite the fact that they don't cause that much traffic and often generate a lot of local tax revenue. I suspect it's partly because they're ugly! My proposal:

English

1.6K

1.5K

17.5K

3.6M

Neil Stoker ✨@nmstoker·1 May

@sainsburys Unless there is a written apology and retraction from that member of staff, I will not be back

English

Neil Stoker ✨@nmstoker·1 May

@sainsburys I've been shopping in that store for well over ten years. I have returned literally one other thing that whole time and this clearly was not my fault

English

Neil Stoker ✨@nmstoker·29 Nis

@sainsburys your meal deal isn't being correctly discounted by tills in Southfields - it's not convenient to go back now, can I get a refund with the receipt later in the week?

English

116

Neil Stoker ✨@nmstoker·29 Nis

@sainsburys Great - thanks for clarifying that Ben! Have a good evening 🙂

English

Neil Stoker ✨ retweetledi

Bilawal Sidhu@bilawalsidhu·29 Nis

360 drones are epic for capturing immaculate 3d gaussian splats. Wait till the virtual camera flies through the tree canopy down to ground level - bloody magnificent !

English

317

17.2K

Neil Stoker ✨@nmstoker·29 Nis

@SW_Help can you STOP sending out broken surveys from swr@bva-bdrc.com! After a good journey, why tarnish it with a 💩 survey?! Asked if you're happy to answer more questions, it ignores the answer: you have to either give up the sunk effort or plough on!

English

Keşfet

@Alex_Stafford @TfL @karpathy @RishiSunak @Independent @HMRCgovuk @Channel4 @guardian