Edward Raff

4.2K posts

Edward Raff

@EdwardRaffML

Sr. Director @CrowdStrike. Chair @CamlisOrg. Author of #InsideDeepLearning @ManningBooks & of JSAT Machine Learning library. PhD from & Visiting Prof @UMBC

Katılım Nisan 2014

670 Takip Edilen1.9K Takipçiler

Sabitlenmiş Tweet

Edward Raff@EdwardRaffML·21 May

I'm now officially a published book author @ManningBooks! Inside Deep Learning mng.bz/8M2g ! Filling the need for a combination of practical "get something running" and understanding why things work and how the math relates to the code. @KirkDBorne for the forward!

English

567

Edward Raff@EdwardRaffML·3h

@meathead @RBHS208 Their cakes look better than mine do. I don’t need a bunch of high schoolers making better BBQ too!

English

Meathead "BBQ Hall of Famer, Hedonism Evangelist”@meathead·5h

@RBHS208 I am a Barbecue Hall of Famer and I live just a few blocks from RBHS. One of my books was a New York Times best seller. If you’d like, I’d be happy to come down and teach Barbecue science and art.

English

1.1K

Riverside Brookfield High School@RBHS208·8h

Baking & Pastry students have been learning about the fundamentals of cake baking and decorating. Today, they created their own cakes from scratch and had teachers judge each one for best presentation and best flavor. It was a fun and delicious way to showcase their creativity and skills in the kitchen! 🍰🐾 #RB208Pride

Riverside Brookfield High School tweet media

English

1.2K

Edward Raff@EdwardRaffML·6h

@thegautamkamath @usmananwar391 ICDM goes hard with “triple blind”, no arxiv and no one knows who anyone is until published basically.

English

Gautam Kamath@thegautamkamath·6h

@usmananwar391 The usual defense: it may lead to less bias in evaluation based on knowledge of the authors. Note that they use lightweight double blind, meaning arxiv posting is allowed (similar to NeurICMLR)

English

739

Gautam Kamath@thegautamkamath·6h

CS theory conferences used to be single blind, though recently (in the last 5-10 years) they moved to lightweight double blind. 1 benefit of single blind: authors can't submit trash without taking a reputational hit. It's increasingly clear that paper submission can't be "free."

English

8.4K

Edward Raff@EdwardRaffML·6h

Might actually be 2.7 complaints at a time ssp.impulsetrain.com/celebrities.ht…

Josh Trebach, MD@jtrebach

we joke in the ER that “chief complaints come in threes” so after two people came in after slipping/falling on ice i was like “okay who’s next” but a third never came in anyways i got off shift, walked to my car, and i

English

154

Edward Raff@EdwardRaffML·9h

@danpacary @moskstraum21745 Raw SSD my M2 on sequential read can hit at most 7 GB/s. Which is near theoretical maximum of the interface. Are you sure you’re not counting Gb/s, or did Apple change the ssd interface?

English

113

Daniel Isaac@danpacary·10h

But I can ..jk The 69 GB/s is 8-thread pread from warm page cache. Should have been clearer. Measured properly: Raw SSD (F_NOCACHE): 19.6 GB/s Page cache (1 thread): 20.0 GB/s Page cache (8 threads): 69 GB/s Cold vs warm nearly identical single-threaded. The drive itself is ~20 GB/s. Parallelism is where cache wins. 69 vs 6 was apples to oranges. That's on me.

English

811

Daniel Isaac@danpacary·10h

I hit 69 GB/s streaming MoE expert weights off an SSD on a MacBook. For context: Apple's "LLM in a Flash" paper: 6 GB/s llama.cpp mmap: 3.5 GB/s flash-moe (M3 Max): 17.5 GB/s rustane pread (M4 Max): 69.3 GB/s.

English

345

21.7K

Edward Raff@EdwardRaffML·10h

@VivekVRao1 GARCH is what?

English

314

Vivek V Rao@VivekVRao1·13h

My experience with Claude Code has been different. I asked it to write programs to fit various GARCH models (symmetric, GJR-GARCH, NAGARCH, EGARCH etc.) with various noise distributions (normal, Student t, GED, normal inverse-gaussian) and am impressed by its knowledge of optimization methods (it implemented BFGS), GARCH models, and probability distributions. It can do math such as computing the analytical gradient of a log-likelihood function, which speeds optimization. It implements Bessel functions as needed and in some code it cited the famous Abramowitz and Stegun math handbook. When I checked the references, they were correct.

Santiago@svpino

Claude is the perfect complement for me: Whenever I have a question I can't answer, I ask Claude, and it gives me the perfect answer every time. But as soon as I ask Claude something I do know, the answer is usually horseshit.

English

127

16.5K

Edward Raff@EdwardRaffML·12h

Often, the real issue is “are they on payroll”. I tried to find ways to keep my R&D interns officially in the system to help with this. If they aren’t on payroll, this has implications in finance, accounting, budgets allowed to be used, and just cascades in corporate complexity

Gabriele Berton@gabriberton

PhD student doing internship in a company. The internship leads to a paper, accepted at conference. In YOUR experience, does the company pay for the registration + trip? Not asking if the company SHOULD pay, I'm trying to understand what the trend is

English

163

Edward Raff retweetledi

ICML Conference@icmlconf·2d

To ensure compliance w peer-review policies, ICML has removed 795 reviews (1% of total) by reviewers who used LLMs when they explicitly agreed to not. Consequently, 497 papers (2% of all submissions) of these (reciprocal) reviewers have been desk rejected Details in blog post 👇

English

572

176.1K

Edward Raff@EdwardRaffML·3d

@kchonyc @RolandMemisevic Someone is re-inventing it or just appreciating it?

English

257

Kyunghyun Cho@kchonyc·3d

folks figuring out online k-means clustering. i miss this lecture by @RolandMemisevic from 2013.

English

152

12.1K

Edward Raff retweetledi

CrowdStrike@CrowdStrike·4d

📣 JUST ANNOUNCED: CrowdStrike is expanding its collaboration with @NVIDIA to advance Agentic MDR. Early testing with NVIDIA Nemotron models shows: 🚀 Up to 5x faster investigations 🎯 >3x higher triage accuracy Charlotte AI AgentWorks now supports Nemotron 3 Super for custom security agent development. Read more: crwdstr.ke/6011B6uak5

English

5.3K

Edward Raff@EdwardRaffML·4d

@2oovy @Silible59 You also don’t have a complete grasp of dyslexia, as it is not considered a perceptual condition. Dyslexia is a peocessing issue, and as such also has artifacts in how dyslexics process auditory signals too.

English

bourbaki@2oovy·4d

@Silible59 No dyslexia makes more sense because it’s more perceptual. Lack of numeracy can be accounted for by mapping contexts to other frameworks

English

262

bourbaki@2oovy·4d

I fundamentally don’t understand dyscalculia. It doesn’t make sense to me. I have no intuition for algebraic geometry despite years of practice but that doesn’t mean I have a disability. Having poor grasp on arithmetic should also not count as a disability

English

8.5K

Edward Raff@EdwardRaffML·5d

@finn_hulse @yetanothadj Mostly selecting for “learned HLL or related topics”. Even if they are expectational, it’s not an easy thing to work out in an interview context from first principles. IMO start with hll and explore to vhll where there are more paths that don’t require a special insight

English

Finn Hulse@finn_hulse·5d

@yetanothadj it doesn’t require as much guidance as you think. i ask a few leading questions at most, plus some encouragement this would not be something i use for rank and file candidates anyway, only to see if someone is truly truly exceptional

English

399

Finn Hulse@finn_hulse·6d

i had to retire my favorite technical interview problem so it is time to ask it to my loyal followers (solution in replies) given a stream of N not necessarily distinct integers from an O(N) sized universe, for some massive N, find a way to estimate how many distinct integers appear, only using O(log(log(N)) persistent storage use 5 lines of pseudocode there was a time in my life where i wouldn't work with someone who couldn't answer this

English

457

96.3K

Edward Raff@EdwardRaffML·5d

@pontus_rendahl I have on occasion done this when I’m in a “numerical methods” mindset and just thinking about floating point issues. It’s not good in this context, but a possible benign explanation. I would generally default to adding 1e-6/1e-7 since floats have ~7.2 sig figs of precision

English

217

Pontus Rendahl@pontus_rendahl·6d

When people do ln(x+0.001) or whatever, what type of DGP do they have in mind? What does the true DGP look like to justify that as a good approximation?

English

23.4K

Edward Raff@EdwardRaffML·6d

@akoustov @bradleveck Absolutely it makes these mistakes. When I check mine output it regularly makes mistakes like this and others that require care to catch because they can easily fly right by. Twice I’ve had to remind opus FFTs exist when I tired to do dumb signal processing

English

Alexander Kustov@akoustov·6d

@bradleveck Yeah, it's still bad at counting words, for instance. But the question is whether these coding/math mistakes are more or less common than in published AER papers now or, let's say one year from now :)

English

759

Alexander Kustov@akoustov·6d

Sorry, but I do have to wonder now: would Opus 4.6 or GPT 5.4 ever make this mistake?

Michael Wiebe@michael_wiebe

Issue 4: Table 6 studies the effect of cluster size on patent quality, measured using citations. M21 claims to use log citations, but the code actually does log(y+0.00001). When I use log(y+1) or Poisson, the effect switches from positive to negative. 8/

English

13.6K

Edward Raff@EdwardRaffML·6d

@krismicinski @moyix @0xTib3rius @vxunderground 🫡

QME

Tib3rius@0xTib3rius·13 Mar

I tried to get Claude Code to write some custom malware for me, and it kept refusing even when I told it I was testing a new AV I was writing. It did suggest I contact @vxunderground though. Smelly please write my malware. 🥹

Tib3rius@0xTib3rius

Claude Code seemingly has little to no guardrails right now compared to Codex. From getting it to run offensive security engagements on arbitrary endpoints, to asking it to code purposefully vulnerable web apps for training, it will often just go do it without a fuss. 🤯

English

180

24.9K

Edward Raff@EdwardRaffML·6d

@jmwooldridge @jonahrexer @DonMacKenzie9 Mean = variance* typo.

English

Edward Raff@EdwardRaffML·6d

@jmwooldridge @jonahrexer @DonMacKenzie9 Poison assumes that mean > variance , and in fact Poisson is a special case of NB. I would argue the reverse, use NB by default and if Poisson makes sense your inference process should be picking near-zero alpha.

English

137

Jonah Rexer@jonahrexer·14 Mar

My most frequent comment as a referee: USE POISSON REGRESSION

Michael Wiebe@michael_wiebe

English

137

24.1K

Edward Raff@EdwardRaffML·13 Mar

@krismicinski @banteg Oh yea his stuff is beautiful. Just not sure I've ever actually understood something from one of his figures. Which confuses me because I think he's ~100% on point with his complaints about visualizations, and then, does an art?

English

banteg@banteg·13 Mar

anthropic is trying to disrupt work and then repeat the same error as microsoft excel, imposing poor taste onto millions of people as a default. someone ship a bunch of edward tufte books to their headquarters till they understand pie charts should not be a thing.

Michael Livs@micLivs

Anthropic shipped generative UI for Claude. I reverse-engineered how it works and rebuilt it for PI. Extracted the full design system from a conversation export. Live streaming HTML into native macOS windows via morphdom DOM diffing. Article: michaellivs.com/blog/reverse-e… Repo: github.com/Michaelliv/pi-… Built on @badlogicgames's pi and @DanielGri's Glimpse.

English

185

25.5K

Edward Raff@EdwardRaffML·13 Mar

Since the M1 I’ve been saying that @Apple is the biggest threat to @awscloud and @nvidia , and maybe people are going to start seeing it.

PC Gamer@pcgamer

New benchmarks show the iPhone chip in the cut-price Apple MacBook Neo beating every single x86 PC processor for single-core performance pcgamer.com/hardware/gamin…

English

595

Edward Raff@EdwardRaffML·13 Mar

Many nations a physician is a 4 year degree out of high school and have no worse outcomes than we do. No reason to delay and accumulate debt before hand.

John Arnold@johnarnold

AI should allow med schools to rethink whether 4 years is still necessary for med school. If students can focus more on clinical practice and less on memorizing the Krebs cycle and molecular bio, many programs could eliminate a year, reducing both costs and physician shortages.

English

146

Keşfet

@meathead @RBHS208 @thegautamkamath @usmananwar391 @danpacary @moskstraum21745 @VivekVRao1 @kchonyc