Ryan Marcus

669 posts

Ryan Marcus banner
Ryan Marcus

Ryan Marcus

@RyanMarcus

Assistant prof @CIS_Penn. Machine learning for systems, databases.

Philadelphia, PA Katılım Mart 2009
1.1K Takip Edilen1.7K Takipçiler
Ryan Marcus
Ryan Marcus@RyanMarcus·
@nomad421 Here's an example of the "almost": a LL chained hashmap is one of the most efficient concurrent multi-map implementations! You can insert a new node into a bucket's chain with a single atomic swap. This is useful for hash joins in database systems. db.in.tum.de/~leis/papers/m…
English
0
0
1
63
𝕐
𝕐@nomad421·
Why would one be using a separate chaining hash table in the first place? I mean, I know that without certain techniques, open addressing can also exhibit O(n) worst case performance, but practically it's almost always preferable to separate chaining.
Dave W Plummer@davepl1968

When multiple keys point to the same hash slot, what you've got there is an O(n) linked list in that slot. Not to be snarky, but please tell me this stuff is still in Comp Sci 200. Or are CS graduates just loading numpy from Python these days?

English
1
0
0
387
Ryan Marcus
Ryan Marcus@RyanMarcus·
@BenSManning @metrics52 Having fun is a (big!) competitive advantage. Those who succeed are likely to have several competitive advantages. So there's an "over-representation" of fun-having at the top. Of course, not everyone at the top has fun, and not everyone who has fun makes it to the top...
English
0
0
2
924
Benjamin Manning
Benjamin Manning@BenSManning·
One thing I’ve noticed: truly top-tier academics almost never talk about academia the way this article does. They’re super intense, sure—but they also truly love the work. I once went to a public writing seminar from @metrics52, and he said something like “Academia is a competitive endeavor, but it’s the most wonderful endeavor I’ve ever pursued.” I think pieces like this might reflect some authorial selection into rationalizing personal dissatisfaction with outcomes/career decisions...
Science News@SciencNews

Academia isn't a calling—it's a job. Stop glorifying burnout. Clock in, do great work, clock out. Your worth isn't measured in unpaid overtime

English
52
69
693
190.6K
Ryan Marcus
Ryan Marcus@RyanMarcus·
We conclude with a discussion about how database researchers should use industrial traces, and how we might begin to build systems that optimize for "the query the user never sends." 📄Paper: rm.cab/survivorshipbi…
English
0
0
1
104
Ryan Marcus
Ryan Marcus@RyanMarcus·
For researchers, databases traces are a MAJOR upgrade compared to synthetic benchmarks (or simply making something up, which is shockingly common). We argue we need more of these workload traces to build a complete picture, and, perhaps more importantly, see what is missing.
English
1
0
1
123
Ryan Marcus
Ryan Marcus@RyanMarcus·
Most database teams optimize what they see in workload logs. But those very optimizations change what users choose to run! In our CIDR paper, we argue that industrial workloads exhibit 𝐬𝐮𝐫𝐯𝐢𝐯𝐨𝐫𝐬𝐡𝐢𝐩 𝐛𝐢𝐚𝐬: logs reflect a negotiation between users and the platform.
Ryan Marcus tweet media
English
1
0
6
218
Ryan Marcus
Ryan Marcus@RyanMarcus·
For that one query that must go 𝑟𝑒𝑎𝑙𝑙𝑦 𝑓𝑎𝑠𝑡, BayesQO (by Jeff Tao) finds superoptimized plans using Bayesian optimization in a learned plan space. It’s costly, but the results can train an LLM to speed things up next time. 📄rm.cab/bayesqo
Ryan Marcus tweet media
English
0
0
6
299
Ryan Marcus
Ryan Marcus@RyanMarcus·
LimeQO (by @yi_zixuan), a 𝑤𝑜𝑟𝑘𝑙𝑜𝑎𝑑-𝑙𝑒𝑣𝑒𝑙 approach to query optimization, can use neural networks or simple linear methods to find good query hints significantly faster than a random or brute force search. 📄rm.cab/limeqo
Ryan Marcus tweet media
English
1
0
7
386
Ryan Marcus
Ryan Marcus@RyanMarcus·
OLAP workloads are dominated by repetitive queries -- how can we optimize them? A promising direction is to do 𝗼𝗳𝗳𝗹𝗶𝗻𝗲 query optimization, allowing for a much more thorough plan search. Two new SIGMOD papers! 🧵
English
1
0
10
578
Ryan Marcus
Ryan Marcus@RyanMarcus·
@DPearsonPHL @coryfromphilly Yeah, college-aged folks in college-adjacent stations wearing college-branded clothing seems like good evidence to make this inference. I'll report back if/when I get a response from the higher-ups.
Philadelphia, PA 🇺🇸 English
0
0
6
67
Ryan Marcus
Ryan Marcus@RyanMarcus·
@DPearsonPHL @coryfromphilly Is there really a disproportionate trend of Penn students evading the fare? Not saying there isn't, I'm uneducated here. If so, I'll raise the issue with the university. I imagine I'll at least get a response. Fare evasion is clearly against the student code of conduct.
Philadelphia, PA 🇺🇸 English
1
0
5
81
Daniel Pearson
Daniel Pearson@DPearsonPHL·
@coryfromphilly I will scold them when I see it. It is pathetic behavior. The universities should be ashamed.
English
1
0
9
160
Ryan Marcus
Ryan Marcus@RyanMarcus·
@alpha_convert Use RDTSCP, with an extra mfence if you want to ensure writes are flushed. This also solves the problem of different NUMA regions having different clocks. I'm not sure anyone uses RDTSC for timing on modern CPUs, but admittedly I haven't looked into it in a while.
English
2
0
4
97
Ryan Marcus
Ryan Marcus@RyanMarcus·
@justinjaffray I think the main reason it's called "JIT" is because it uses the LLVM/GCC APIs that are used for implementing JITs. Obviously if I use a screwdriver to hammer in a nail, that doesn't make the nail a screw, but calling it a "screwed in nail" isn't too far from the truth :D
English
0
0
0
49
Ryan Marcus
Ryan Marcus@RyanMarcus·
Pair(akeet) programming.
English
0
1
13
789
Ryan Marcus
Ryan Marcus@RyanMarcus·
@fluxtheorist @fizziksBoris @atheorist Oral exams, formal or informal, are a staple of any PhD program and, in my experience, work very well. But I don't know how to scale it up to a class of 300-400.
English
1
0
1
35
sarah
sarah@atheorist·
Professors of upper level STEM courses: What practices are you moving toward to ensure students learn the material themselves? Are you pivoting to in person exams to try to combat student reliance on AI assistance tools? What should the near future of education look like?
English
27
6
161
26.6K
Ryan Marcus
Ryan Marcus@RyanMarcus·
@alpha_convert I can do way better -- the list type in Haskell is a "k-depth tree." (this is wrong at at least k-1 more levels).
English
0
0
3
255
Ryan Marcus
Ryan Marcus@RyanMarcus·
@samokhvalov We'll have some writeups on some fully fleshed out ideas soon!
English
0
0
2
43
Ryan Marcus
Ryan Marcus@RyanMarcus·
@samokhvalov We've been thinking about this question in our lab. Can think of this as an "offline query optimization" problem, where we want to fix N slow queries using minimal time. We have some preliminary work for when the reason for the slowness is a poor plan: rm.cab/limeqo
English
1
0
1
427
Nik Samokhvalov
Nik Samokhvalov@samokhvalov·
What would you do if you need to review and optimize 500 slow queries from auto_explain log?
English
7
0
7
1.7K
Ryan Marcus
Ryan Marcus@RyanMarcus·
@tobycmurray As a question of political philosophy, I'd have to go with "there is no such threshold." As a question of math, I feel this question is ill-posed. The threshold chosen for Y will clearly impact the value of Y, so Y cannot be measured in this scheme (a classic RL problem).
English
0
0
2
91
Toby Murray
Toby Murray@tobycmurray·
Random thought: Suppose crime statistics implied that known offenders of crime X had a probability Y of reoffending over time period Z. What value of Y would constitute reasonable suspicion enabling police to re-arrest or search every X offender after every period Z?
English
1
0
0
298