Mark Whiting

13.6K posts

Mark Whiting banner
Mark Whiting

Mark Whiting

@MarkWhiting

Research at @hellopareto & @CSSPenn → https://t.co/04KvBxOIQa Previously: @StanfordHCI, @CMUEngineering, @KAISTpr & @RMIT.

Orinda Katılım Mart 2008
2.5K Takip Edilen2.4K Takipçiler
Mark Whiting retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autor… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autor… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.
English
517
711
7.5K
1.1M
Mark Whiting retweetledi
Marilyn Zhang
Marilyn Zhang@marilyn_zhang·
Recently I've been thinking a lot about frontier models' ability to express uncertainty, especially for high-stakes medical use cases. We evaluated this capability across models. Early results below 👇 Gemini: 0% across every scenario. Claude: failed on over half GPT: best
Marilyn Zhang tweet media
Phoebe Yao@phoebeyao

x.com/i/article/2029…

English
1
3
8
761
Mark Whiting retweetledi
Geoffrey Litt
Geoffrey Litt@geoffreylitt·
My favorite designers can instantly switch from loose / hazy / intuitive thinking to sharp / analytical / precise thinking on demand. Many people can do one or the other. The combination is rare!
English
6
3
134
9.2K
Mark Whiting
Mark Whiting@MarkWhiting·
@BenSManning Agreed, it feels like a more holistic version of the question might reveal a different answer. Do dishwashers teach me to wash dishes better? No, but they give me more leverage on my time.
English
0
0
2
55
Benjamin Manning
Benjamin Manning@BenSManning·
Or am I missing something? Is there a better way to think about skills and delegation? Is there a reason to ask this question more generally? 3/3
English
3
0
7
547
Benjamin Manning
Benjamin Manning@BenSManning·
I've been thinking about this paper a bit recently. Is there any reason to expect otherwise if the goal is to complete the task? Like, OF COURSE, asking someone (thing) else to do something for you means that you won't get better at it. You'd otherwise have to spend that same time doing an even more efficient type of learning, right? 1/3
Benjamin Manning tweet media
English
1
5
27
3.8K
Mark Whiting retweetledi
Benjamin Manning
Benjamin Manning@BenSManning·
I've been trying to figure out why AI systems took a seemingly large, discrete jump in capabilities around the new year. 1/n
English
10
7
111
39.6K
Mark Whiting
Mark Whiting@MarkWhiting·
Excited to see our work coming out (+ @joshnguyen99 & @duncanjwatts) After establishing a means to study common sense in humans (and finding it rather limited — common sense is not so common) in a prior paper, we wondered if the same challenge faced language models. It does!
Josh Nguyen@joshnguyen99

Benchmarks of LLM common sense overwhelmingly rely on correct labels to report an accuracy score. But what if your "ground truth" genuinely differs from mine? In a new @PNASNexus paper, @DuncanJWatts, @MarkWhiting and I explore the implications of this intriguing question. 🧵⤵️

English
1
2
7
2.3K
Mark Whiting retweetledi
Alex Komoroske
Alex Komoroske@komorama·
What if technology didn’t feel so… hollow? Some friends and I just released a manifesto about a world where tech leaves us feeling nourished (along with an evolving list of theses about how we can build it) resonantcomputing.org
English
47
123
910
260.5K
Mark Whiting retweetledi
IC2S2
IC2S2@IC2S2·
With record-breaking submissions and our most competitive, gender-balanced program on record, #IC2S2’25 has officially started! Please check the updated program and plan your day. #ic2s2
IC2S2 tweet mediaIC2S2 tweet mediaIC2S2 tweet mediaIC2S2 tweet media
English
0
6
23
2.3K
Mark Whiting retweetledi
Linus ✦ Ekenstam
Linus ✦ Ekenstam@LinusEkenstam·
Bro, I can make 1 liter of Anthrax in an afternoon, Grok just wrote me a 20 page detailed report and instructions on how to do it. It also listed all websites where I can buy the materials and chemicals I need as a private person living in Europe. It also made a detailed list of the equipment I need (on a budget) It made detailed instructions on where i should deploy the anthrax for maximum death efficacy Give me one other place on the internet where I can create this in a few minutes….
English
98
21
318
44.7K
Mark Whiting retweetledi
Penn Engineering
Penn Engineering@PennEngineers·
Using GPS data to inform epidemic modeling, @csspenn Director @DuncanJWatts and Post-Doctoral Researcher Francisco (Paco) Barreras tackle integrating mobility data while addressing privacy concerns to build public trust. bit.ly/3Y5D1b3
Penn Engineering tweet media
English
0
6
7
1.4K
Mark Whiting retweetledi
Halide + Kino aka halideapp.bsky.social
Say hello to Halide 2.16.1 — our first update for iPhone 16 and iPhone 16 Pro. - Open Halide in a hurry with a click using Camera Control - Our initial release of Process Zero for iPhone 16 and iPhone 16 Pro - Updated UI for the new screens on iPhone 16 Pro More coming soon.
Halide + Kino aka halideapp.bsky.social tweet media
English
8
16
422
35.7K
Mark Whiting retweetledi
SSRC
SSRC@ssrc_org·
@duncanjwatts First q from @annalilharvey: When testing policies with partner orgs, they can't possibly run a huge number of studies, and partner orgs might not be representative of the field? @duncanjwatts: You can adapt to field methods, as long as your data can be modeled, and....
SSRC tweet media
English
1
3
2
2.6K
Mark Whiting retweetledi
Bret Victor
Bret Victor@worrydream·
Q: This will never work. A: It already works. We've been using it for years, for everything. Q: But I can't imagine how it would work. A: If you could imagine it, we wouldn't have had to build it.
English
3
50
361
30.3K