에그

4.8K posts

에그

@eggie5

انضم Haziran 2010

606 يتبع428 المتابعون

تغريدة مثبتة

에그@eggie5·4 Şub

Announcing 🕺DABstep! an Agentic Benchmark collab w/ @Adyen x @huggingface There are many blind spots in LLM evals especially wrt agents, namely: * saturation * real-world applicability * objective evaluation * complexity We make concrete contributions in these directions...

English

488

에그@eggie5·2d

I did not wake up a loser this morning

English

에그 أُعيد تغريده

Andreu ⛩️@dru_blackberry·6d

On "Why would I pay for SaaS if I can vibe-code it?". Here's my 🌶️ take: 1) Headcount You don't need as many engineers as you had. You can do with less. But the reality is that you could already do with less even before AI. That's management honesty. 1/4

English

에그@eggie5·1 Nis

@yaroslavvb thanks for sharing. ur comments around 39m made me think of Napoleon, where he wouldn't open letters until after many weeks or a follow up was sent! apply to emails, slack...

English

295

Yaroslav Bulatov@yaroslavvb·1 Nis

The project could be understaffed because it's not that useful to the company bottom line. Spending a lot of effort on something that won't get appreciated puts you in danger of burn-out. Related, a talk on avoiding burn out -- youtube.com/watch?v=bh906h…

YouTube

English

198

35.2K

에그@eggie5·26 Mar

🔥

Dieter@kagglingdieter

ARC Prize 2026 just launched — $2M in prizes across two tracks. The new main track, ARC-AGI-3, is wild: hundreds of interactive game-style environments with no instructions, no rules, no stated goals. Agents must explore, figure out what winning looks like, and adapt. Humans score 100%. Frontier AI scores 0.26%. Worth remembering how last year played out: NVARC (NVIDIA Kaggle GMs) scored 24% on ARC-AGI-2, miles ahead of 2nd place at 16.5%. Their approach — fine-tuned small models on synthetic data — was cheaper and more effective than frontier model approaches. Yet the organizers buried them under "Honorable Mentions" because it wasn't the right kind of progress for their narrative. kaggle.com/competitions/a…

ART

에그@eggie5·24 Mar

Fraud model idea: separate base-rate priors from transaction evidence, then add them in logit space. Test corr(prior, evidence) = 0.038, suggesting the evidence tower learned a distinct corrective signal rather than just amplifying priors.

English

에그@eggie5·24 Mar

Congrats, excited to read your report

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget

Very proud to share that @JiweiLiu, @MJeblick , and @jackyu815 from my team just won DABStep benchmark with an agent that learns from the tasks it solves. We'll share more details ASAP. huggingface.co/spaces/adyen/D…

English

에그@eggie5·17 Mar

how does this not actually show _the_ hoods?

@levelsio@levelsio

This was asked for for YEARS and I could never find time to build it myself 🗺️ Hoodmaps for 🏡 Airbnb Hoodmaps is my app that lets you find out where to stay in a city, it classifies neighborhoods by: 🟥 Tourists 🟨 Cool 🟩 Rich 🟦 Suits ⬜️ Normies I asked Claude Code to build it and it kinda works, not perfect but a start I just need to get the map to update faster and then publish it as a Chrome extension For now you can try it though: hoodmaps.com/airbnb-overlay… Copy paste that in console on Airbnb map, type your city as a slug (like los-angeles) and it should work Happy booking!!!

English

에그@eggie5·12 Mar

Hot take: dog man is just the star wars story but w no universe and poop jokes

English

에그@eggie5·11 Mar

people don't read passages they read articles!

Kevin Roose@kevinroose

We made a blind taste test to see whether NYT readers prefer human writing or AI writing. 86,000 people have taken it so far, and the results are fascinating. Overall, 54% of quiz-takers prefer AI. A real moment! nytimes.com/interactive/20…

English

에그@eggie5·10 Mar

Where doing variant of this at adyen for shopper linking: dense retrieval for identity

English

에그@eggie5·10 Mar

1) do bm25 hard neg mining (DPR paper) 2) verify pos pairs w/ LLM (if possible) 3) full softmax for each pos pair (devil in details) at runtime you get amortized LLM inference (could we get most of the way w/ steps 1-2, sampled softmax and just larger batches??)

dr. jack morris@jxmnop

x.com/i/article/2031…

English

244

에그@eggie5·1 Mar

You don't see this detail much in practice, as my first tweet alludes, but I feel there was a comeback w/ RAG namely around the hierarchical chunking techniques... arxiv.org/abs/1905.06566

English

에그@eggie5·1 Mar

i'm thinking about this a lot lately as I'm working on tabular pretraining where there's a strong nat hierarchy: fields > payments > entities (eg shopper/merchant). HIBERT addresses this same problem in LLMs as they attend over _sentences_ in a doc: tokens > sentences > docs...

English

에그@eggie5·1 Mar

Interesting that LLM pretraining corpra originally came from documents like web pages, articles or books but the only structure learning sees is sentence boundaries. This gravely assumes this natural context is irrelevant _or_ that in practice its recovered at scale (emergent)...

English

에그@eggie5·22 Şub

didn't get the ICLR accept, but pretty proud of the reviews, given the rush job -- i wrote most of it during the sessions at neurips in December :)

English

205

에그@eggie5·19 Şub

the proverbial last (skewed) task!

English

에그@eggie5·17 Şub

all of the US now thinks the equivalent of a dutch an HBO education is middle school 😂

Seco@0xSeco

In the Netherlands you need 14 steps and approvals to build a shed, but we’ll redesign the entire tax system based on vibes.

English

에그@eggie5·17 Şub

I love the window into Headstone's mania and how Rayburn goads him into insanity

English

에그@eggie5·13 Şub

The Eugene Rayburn character is basically Simpole from Bleak House, but with the redemption arc...

English

에그@eggie5·13 Şub

Beyond the romance and satire, Our Mutual Friend is story about the inheritance of a 19th-century recycling startup. Dickens literally started the circular economy....

English