Ryan Marcus

669 posts

Ryan Marcus

@RyanMarcus

Assistant prof @CIS_Penn. Machine learning for systems, databases.

Philadelphia, PA Katılım Mart 2009

1.1K Takip Edilen1.7K Takipçiler

Ryan Marcus@RyanMarcus·6d

@nomad421 Here's an example of the "almost": a LL chained hashmap is one of the most efficient concurrent multi-map implementations! You can insert a new node into a bucket's chain with a single atomic swap. This is useful for hash joins in database systems. db.in.tum.de/~leis/papers/m…

English

𝕐@nomad421·13 Mar

Why would one be using a separate chaining hash table in the first place? I mean, I know that without certain techniques, open addressing can also exhibit O(n) worst case performance, but practically it's almost always preferable to separate chaining.

Dave W Plummer@davepl1968

When multiple keys point to the same hash slot, what you've got there is an O(n) linked list in that slot. Not to be snarky, but please tell me this stuff is still in Comp Sci 200. Or are CS graduates just loading numpy from Python these days?

English

387

Ryan Marcus@RyanMarcus·1 Oca

@BenSManning @metrics52 Having fun is a (big!) competitive advantage. Those who succeed are likely to have several competitive advantages. So there's an "over-representation" of fun-having at the top. Of course, not everyone at the top has fun, and not everyone who has fun makes it to the top...

English

924

Benjamin Manning@BenSManning·31 Ara

One thing I’ve noticed: truly top-tier academics almost never talk about academia the way this article does. They’re super intense, sure—but they also truly love the work. I once went to a public writing seminar from @metrics52, and he said something like “Academia is a competitive endeavor, but it’s the most wonderful endeavor I’ve ever pursued.” I think pieces like this might reflect some authorial selection into rationalizing personal dissatisfaction with outcomes/career decisions...

Science News@SciencNews

Academia isn't a calling—it's a job. Stop glorifying burnout. Clock in, do great work, clock out. Your worth isn't measured in unpaid overtime

English

693

190.6K

Ryan Marcus@RyanMarcus·26 Ara

We conclude with a discussion about how database researchers should use industrial traces, and how we might begin to build systems that optimize for "the query the user never sends." 📄Paper: rm.cab/survivorshipbi…

English

104

Ryan Marcus@RyanMarcus·26 Ara

For researchers, databases traces are a MAJOR upgrade compared to synthetic benchmarks (or simply making something up, which is shockingly common). We argue we need more of these workload traces to build a complete picture, and, perhaps more importantly, see what is missing.

English

123

Ryan Marcus@RyanMarcus·26 Ara

Most database teams optimize what they see in workload logs. But those very optimizations change what users choose to run! In our CIDR paper, we argue that industrial workloads exhibit 𝐬𝐮𝐫𝐯𝐢𝐯𝐨𝐫𝐬𝐡𝐢𝐩 𝐛𝐢𝐚𝐬: logs reflect a negotiation between users and the platform.

English

218

Ryan Marcus@RyanMarcus·3 Haz

For that one query that must go 𝑟𝑒𝑎𝑙𝑙𝑦 𝑓𝑎𝑠𝑡, BayesQO (by Jeff Tao) finds superoptimized plans using Bayesian optimization in a learned plan space. It’s costly, but the results can train an LLM to speed things up next time. 📄rm.cab/bayesqo

English

299

Ryan Marcus@RyanMarcus·3 Haz

LimeQO (by @yi_zixuan), a 𝑤𝑜𝑟𝑘𝑙𝑜𝑎𝑑-𝑙𝑒𝑣𝑒𝑙 approach to query optimization, can use neural networks or simple linear methods to find good query hints significantly faster than a random or brute force search. 📄rm.cab/limeqo

English

386

Ryan Marcus@RyanMarcus·3 Haz

OLAP workloads are dominated by repetitive queries -- how can we optimize them? A promising direction is to do 𝗼𝗳𝗳𝗹𝗶𝗻𝗲 query optimization, allowing for a much more thorough plan search. Two new SIGMOD papers! 🧵

English

578

Ryan Marcus@RyanMarcus·2 Haz

@DPearsonPHL @coryfromphilly Yeah, college-aged folks in college-adjacent stations wearing college-branded clothing seems like good evidence to make this inference. I'll report back if/when I get a response from the higher-ups.

Philadelphia, PA 🇺🇸 English

Daniel Pearson@DPearsonPHL·2 Haz

@RyanMarcus @coryfromphilly I appreciate your willingness to raise the issue. It hasn't always been like this.

English

Daniel Pearson@DPearsonPHL·2 Haz

Please go ticket the Jefferson employees, Penn students, and other wealthy scofflaws. Not warnings, tickets.

ISEPTAPHILLY@SEPTAPHILLY

When people don’t pay—everyone loses. We're stepping up enforcement to ensure every rider does their part. #ISEPTAPHILLY #HowWeRoll

English

121

10.1K

Ryan Marcus@RyanMarcus·2 Haz

@DPearsonPHL @coryfromphilly Is there really a disproportionate trend of Penn students evading the fare? Not saying there isn't, I'm uneducated here. If so, I'll raise the issue with the university. I imagine I'll at least get a response. Fare evasion is clearly against the student code of conduct.

Philadelphia, PA 🇺🇸 English

Daniel Pearson@DPearsonPHL·2 Haz

@coryfromphilly I will scold them when I see it. It is pathetic behavior. The universities should be ashamed.

English

160

Ryan Marcus@RyanMarcus·30 Mar

@alpha_convert Use RDTSCP, with an extra mfence if you want to ensure writes are flushed. This also solves the problem of different NUMA regions having different clocks. I'm not sure anyone uses RDTSC for timing on modern CPUs, but admittedly I haven't looked into it in a while.

English

Ryan Marcus@RyanMarcus·21 Mar

@justinjaffray I think the main reason it's called "JIT" is because it uses the LLVM/GCC APIs that are used for implementing JITs. Obviously if I use a screwdriver to hammer in a nail, that doesn't make the nail a screw, but calling it a "screwed in nail" isn't too far from the truth :D

English

Ryan Marcus@RyanMarcus·15 Şub

Pair(akeet) programming.

English

789

Ryan Marcus@RyanMarcus·23 Ara

@fluxtheorist @fizziksBoris @atheorist Oral exams, formal or informal, are a staple of any PhD program and, in my experience, work very well. But I don't know how to scale it up to a class of 300-400.

English

flux@fluxtheorist·23 Ara

@fizziksBoris @atheorist Oral exams and board work tend to separate the wheat from the chaff

English

294

sarah@atheorist·22 Ara

Professors of upper level STEM courses: What practices are you moving toward to ensure students learn the material themselves? Are you pivoting to in person exams to try to combat student reliance on AI assistance tools? What should the near future of education look like?

English

161

26.6K

Ryan Marcus@RyanMarcus·26 Kas

@alpha_convert I can do way better -- the list type in Haskell is a "k-depth tree." (this is wrong at at least k-1 more levels).

English

255

Ryan Marcus@RyanMarcus·29 Eki

@samokhvalov We'll have some writeups on some fully fleshed out ideas soon!

English

Ryan Marcus@RyanMarcus·29 Eki

@samokhvalov We've been thinking about this question in our lab. Can think of this as an "offline query optimization" problem, where we want to fix N slow queries using minimal time. We have some preliminary work for when the reason for the slowness is a poor plan: rm.cab/limeqo

English

427

Nik Samokhvalov@samokhvalov·29 Eki

What would you do if you need to review and optimize 500 slow queries from auto_explain log?

English

1.7K

Ryan Marcus@RyanMarcus·1 Ağu

@tobycmurray As a question of political philosophy, I'd have to go with "there is no such threshold." As a question of math, I feel this question is ill-posed. The threshold chosen for Y will clearly impact the value of Y, so Y cannot be measured in this scheme (a classic RL problem).

English

Toby Murray@tobycmurray·1 Ağu

Random thought: Suppose crime statistics implied that known offenders of crime X had a probability Y of reoffending over time period Z. What value of Y would constitute reasonable suspicion enabling police to re-arrest or search every X offender after every period Z?

English

298

Keşfet

@nomad421 @BenSManning @metrics52 @yi_zixuan @DPearsonPHL @coryfromphilly @alpha_convert @fluxtheorist