Gabriel Asher

270 posts

Gabriel Asher

@GabrielAsher02

Senior ML Scientist @matterworks_bio. Alumni of @dartmouth CS. Ex researcher in CV and ML @DHMCandClinics

Katılım Ekim 2022

144 Takip Edilen48 Takipçiler

Gabriel Asher@GabrielAsher02·3 May

@AnthropicAI Please fix your API usage/pricing pages. Why is it so hard to have api costs sync to the platform page in real time??

English

Gabriel Asher retweetledi

alex zhang@a1zhang·10 Nis

x.com/i/article/2041…

ZXX

139

1.1K

302.3K

Gabriel Asher retweetledi

Andrej Karpathy@karpathy·9 Nis

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

1.2K

2.5K

20.7K

4.3M

Gabriel Asher@GabrielAsher02·1 Nis

anyone else notice that claude code seems to have gotten a lot worse in the last few days? It keeps getting stuck in loops...

English

Gabriel Asher retweetledi

Sakana AI@SakanaAILabs·25 Mar

The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s4158… Blog: sakana.ai/ai-scientist-n… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Sc…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune

GIF

English

412

719.1K

Gabriel Asher retweetledi

Om Patel@om_patel5·13 Mar

stop spending money on Claude Code. Chipotle's support bot is free:

English

1.2K

10.1K

159.2K

7.9M

Gabriel Asher@GabrielAsher02·3 Mar

@TimothyKassis I've not noticed anthropic being down, however since the outage yesterday speeds have been much slower than normal

English

Timothy Kassis@TimothyKassis·3 Mar

Is Anthropic down?

English

230

Gabriel Asher@GabrielAsher02·3 Mar

The fun-upgrade in writing software vs a year ago is unreal

English

Gabriel Asher@GabrielAsher02·6 Oca

@james_y_zou My senior thesis had a similar approach (PatchTST-CNN) here! digitalcommons.dartmouth.edu/cgi/viewconten…

English

309

James Zou@james_y_zou·6 Oca

We created a new architecture to integrate multimodal sleep time-series data. CNNs learn local features, transformers aggregate information across time + channels, and leave-one-modality-out contrastive learning trains robust representations. This design generalizes across sites and diverse populations. 3/n

English

221

37.2K

James Zou@james_y_zou·6 Oca

Today in @NatureMedicine we report that AI can predict 130 diseases from 1 night of sleep🛌 We trained a foundation model (#SleepFM) on 585K hours of sleep recordings from 65K people—brain, heart, muscle & breathing signals combined. AI learns the language of sleep🧵

English

272

2.1K

11K

913.6K

Gabriel Asher retweetledi

ARC Prize@arcprize·5 Ara

ARC Prize 2025 Paper Award Winners 1st / "Less is More: Recursive Reasoning with Tiny Networks" (TRM) / A. Jolicoeur-Martineau / $50k 2nd / "Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI" (SOAR) / J. Pourcel et al. / $20k 3rd / "ARC-AGI Without Pretraining" / I. Liao et al. / $5k

English

281

133.4K

Gabriel Asher@GabrielAsher02·25 Kas

@jerhadf Really good. Its been zero-shotting all of the tasks that I've been giving it in claude code and is noticeably better than sonnet 4.5. I still wish there was better notebook capabilities (especially that it stops writing so many print statements)

English

1.1K

jeremy@jerhadf·25 Kas

what do people think about Opus 4.5 for coding so far? what are the behavioral problems or limitations you still want to see improved? we're hungry for feedback 🙏

English

118.2K

Gabriel Asher@GabrielAsher02·24 Kas

@k_dense_ai @EdisonSci @kepler_ai_ I think the analysis capabilities are incredible already, but there is no closed loop back with in-vitro experimentation which is half the game. Someone needs to make more accessible saas like autonomous labs!

English

K-Dense@k_dense_ai·24 Kas

@GabrielAsher02 @EdisonSci @kepler_ai_ We would argue more like GPT 3.5 vibes!

English

147

Gabriel Asher@GabrielAsher02·24 Kas

Curious where today’s agentic scientists land on the “GPT scale” (@EdisonSci @k_dense_ai @kepler_ai_). Personally feels like GPT-2 vibes.

English

167

Gabriel Asher@GabrielAsher02·26 Ağu

@jerhadf Late reply, but sometimes 4.1 opus will write 3/4 of my code, then stop due to usage limits! I just really wish I could get those last few lines before waiting for usage limits to reset.

English

jeremy@jerhadf·26 Tem

what're the most annoying or disruptive model behaviors you see when coding with claude models today? ie things you always have to work around in claude code, mistakes the models make often, etc. the more examples the better!

English

790

Gabriel Asher retweetledi

AI at Meta@AIatMeta·11 Haz

Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model, trained on video, that can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments. Learn more about V-JEPA 2 ➡️ai.meta.com/blog/v-jepa-2-… As we continue working toward our goal of achieving advanced machine intelligence (AMI), we’re also releasing three new benchmarks for evaluating how well existing models can reason about the physical world from video. Learn more and download the new benchmarks ➡️ai.meta.com/blog/v-jepa-2-…

English

345

1.9K

307.9K

Gabriel Asher@GabrielAsher02·21 May

It’s crazy this paper flew under the radar—it simultaneously roasts genAI/LLMs, showcases the promise of @ylecun’s JEPA ideas for latent-space SSL, and reads like a validation of Karl Friston’s free-energy/active inference framework! arxiv.org/abs/2502.11831

English

Gabriel Asher@GabrielAsher02·6 Oca

@C_Kavanagh Lex needs to stop doing political interviews. His AI/tech ones are much more interesting anyways

English

Chris Kavanagh@C_Kavanagh·6 Oca

Lex’s interview with Zelensky goes exactly as you would anticipate if you’re familiar with Lex. Here’s some ‘highlights’: 1. Lex praises Joe Rogan and his comedy club. 2. Lex praises Elon & his commitment to fight corruption. He also asks Zelensky what he admires most about Elon.