Mark Whiting

13.6K posts

Mark Whiting

@MarkWhiting

Research at @hellopareto & @CSSPenn → https://t.co/04KvBxOIQa Previously: @StanfordHCI, @CMUEngineering, @KAISTpr & @RMIT.

Orinda Katılım Mart 2008

2.5K Takip Edilen2.4K Takipçiler

Mark Whiting retweetledi

Andrej Karpathy@karpathy·8 Mar

The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autor… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autor… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.

English

517

711

7.5K

1.1M

Mark Whiting retweetledi

Marilyn Zhang@marilyn_zhang·6 Mar

Recently I've been thinking a lot about frontier models' ability to express uncertainty, especially for high-stakes medical use cases. We evaluated this capability across models. Early results below 👇 Gemini: 0% across every scenario. Claude: failed on over half GPT: best

Phoebe Yao@phoebeyao

x.com/i/article/2029…

English

761

Mark Whiting retweetledi

Phoebe Yao@phoebeyao·6 Mar

x.com/i/article/2029…

ZXX

2.3K

Mark Whiting retweetledi

Geoffrey Litt@geoffreylitt·27 Şub

My favorite designers can instantly switch from loose / hazy / intuitive thinking to sharp / analytical / precise thinking on demand. Many people can do one or the other. The combination is rare!

English

134

9.2K

Mark Whiting@MarkWhiting·27 Şub

@BenSManning Agreed, it feels like a more holistic version of the question might reveal a different answer. Do dishwashers teach me to wash dishes better? No, but they give me more leverage on my time.

English

Benjamin Manning@BenSManning·27 Şub

Or am I missing something? Is there a better way to think about skills and delegation? Is there a reason to ask this question more generally? 3/3

English

547

Benjamin Manning@BenSManning·27 Şub

I've been thinking about this paper a bit recently. Is there any reason to expect otherwise if the goal is to complete the task? Like, OF COURSE, asking someone (thing) else to do something for you means that you won't get better at it. You'd otherwise have to spend that same time doing an even more efficient type of learning, right? 1/3

English

3.8K

Mark Whiting@MarkWhiting·26 Şub

The more we can measure sophisticated concepts the more we (and systems) can leverage them. Very excited about the opportunities and capabilities this framework at @hellopareto is unlocking

Phoebe Yao@phoebeyao

x.com/i/article/2027…

English

236

Mark Whiting retweetledi

Benjamin Manning@BenSManning·25 Şub

I've been trying to figure out why AI systems took a seemingly large, discrete jump in capabilities around the new year. 1/n

English

111

39.6K

Mark Whiting@MarkWhiting·17 Şub

Excited to see our work coming out (+ @joshnguyen99 & @duncanjwatts) After establishing a means to study common sense in humans (and finding it rather limited — common sense is not so common) in a prior paper, we wondered if the same challenge faced language models. It does!

Josh Nguyen@joshnguyen99

Benchmarks of LLM common sense overwhelmingly rely on correct labels to report an accuracy score. But what if your "ground truth" genuinely differs from mine? In a new @PNASNexus paper, @DuncanJWatts, @MarkWhiting and I explore the implications of this intriguing question. 🧵⤵️

English

2.3K

Mark Whiting@MarkWhiting·28 Oca

At @hellopareto we have been working on projects to train models — of course — but also to better understand how models can improve around key day-to-day risks and challenges.

Phoebe Yao@phoebeyao

x.com/i/article/2016…

English

214

Mark Whiting retweetledi

Alex Komoroske@komorama·5 Ara

What if technology didn’t feel so… hollow? Some friends and I just released a manifesto about a world where tech leaves us feeling nourished (along with an evolving list of theses about how we can build it) resonantcomputing.org

English

123

910

260.5K

Mark Whiting retweetledi

IC2S2@IC2S2·22 Tem

With record-breaking submissions and our most competitive, gender-balanced program on record, #IC2S2’25 has officially started! Please check the updated program and plan your day. #ic2s2

English

2.3K

Mark Whiting retweetledi

Linus ✦ Ekenstam@LinusEkenstam·24 Şub

Bro, I can make 1 liter of Anthrax in an afternoon, Grok just wrote me a 20 page detailed report and instructions on how to do it. It also listed all websites where I can buy the materials and chemicals I need as a private person living in Europe. It also made a detailed list of the equipment I need (on a budget) It made detailed instructions on where i should deploy the anthrax for maximum death efficacy Give me one other place on the internet where I can create this in a few minutes….

English

318

44.7K

Mark Whiting retweetledi

PennPSC@PennPSC·8 Eki

@PennPSC's @duncanjwatts and @csspenn invite you to participate in The Commonsense Project.

CSSLab at Penn@csspenn

The common sense project is now live! @DuncanJWatts, @MarkWhiting, @amirhosnakh and @joshnguyen99 from the CSSLab invite you to take a quick survey 📝 and measure your common sense💡 commonsense.seas.upenn.edu You can read more about the project here: css.seas.upenn.edu/commonsensical…

English

4.3K

Mark Whiting retweetledi

Josh Nguyen@joshnguyen99·4 Eki

Check out @CSSPenn's recent blog post too. x.com/csspenn/status… 8/

CSSLab at Penn@csspenn

English

340

Mark Whiting retweetledi

Josh Nguyen@joshnguyen99·4 Eki

Everybody trivializes common sense because we believe it is self-evident and universal. But is it really? Find out by participating in our Common Sense Project: commonsense.seas.upenn.edu. With @AmirhosNakh, @MarkWhiting, and @DuncanJWatts at @CSSPenn. 1/n

English

3.4K

Mark Whiting retweetledi

CSSLab at Penn@csspenn·4 Eki

English

6.4K

Mark Whiting retweetledi

Penn Engineering@PennEngineers·3 Eki

Using GPS data to inform epidemic modeling, @csspenn Director @DuncanJWatts and Post-Doctoral Researcher Francisco (Paco) Barreras tackle integrating mobility data while addressing privacy concerns to build public trust. bit.ly/3Y5D1b3

English

1.4K

Mark Whiting retweetledi

Halide + Kino aka halideapp.bsky.social@halidecamera·21 Eyl

Say hello to Halide 2.16.1 — our first update for iPhone 16 and iPhone 16 Pro. - Open Halide in a hurry with a click using Camera Control - Our initial release of Process Zero for iPhone 16 and iPhone 16 Pro - Updated UI for the new screens on iPhone 16 Pro More coming soon.

Halide + Kino aka halideapp.bsky.social tweet media

English

422

35.7K

Mark Whiting retweetledi

SSRC@ssrc_org·19 Eyl

@duncanjwatts First q from @annalilharvey: When testing policies with partner orgs, they can't possibly run a huge number of studies, and partner orgs might not be representative of the field? @duncanjwatts: You can adapt to field methods, as long as your data can be modeled, and....

English

2.6K

Mark Whiting retweetledi

Bret Victor@worrydream·19 Eyl

Q: This will never work. A: It already works. We've been using it for years, for everything. Q: But I can't imagine how it would work. A: If you could imagine it, we wouldn't have had to build it.

English

361

30.3K

Keşfet

@BenSManning @hellopareto @joshnguyen99 @duncanjwatts @PennPSC @csspenn @AmirhosNakh @DuncanJWatts