Belinda

1K posts

Belinda

@belindmo

founding @sundialmd, under Long Horizon Research. composable agents, version control, long horizon tasks. prev @stanford @stai_research @google @viva_translate

Katılım Ekim 2019

1.1K Takip Edilen2.2K Takipçiler

Sabitlenmiş Tweet

Belinda@belindmo·24 Haz

Introducing Sundial!! A brand new text editor built from the ground up for working with agents. Here, we're using Sundial to co-write a new Sundial feature.

English

383

45.9K

Belinda@belindmo·5h

so grateful for everyone who has given thoughtful feedback for @sundialmd so far 💛 we're only able to make it better bc of you all, ty

Belinda@belindmo

Introducing Sundial!! A brand new text editor built from the ground up for working with agents. Here, we're using Sundial to co-write a new Sundial feature.

English

163

Belinda@belindmo·1d

cutee

clankr@clankrmedia

Researchers built a soft floating robot for indoor interaction. It uses helium and flapping fins instead of propellers. The result is quiet, lightweight, and safe to touch. It can follow people, give reminders, and act as a study buddy. Published at ACM DIS 2026.

English

498

Belinda@belindmo·5d

bruh

English

158

Belinda@belindmo·6d

@Stefania_druga Love this library! It’s a cute place to study and also surrounded by cafes

English

193

Belinda@belindmo·6d

@xdotli nah i feel geriatric

English

196

Xiangyi Li@xdotli·6d

feeling old at 25 am i alone?

English

6.9K

Belinda@belindmo·7 Tem

@sundialmd early access: #early-access" target="_blank" rel="nofollow noopener">sundial.md/#early-access

English

108

Belinda@belindmo·7 Tem

What if your agent watches a product doc for changes and immediately updates it in your codebase? When I change a line, Claude updates the code to match it:

English

1.1K

Belinda retweetledi

Shannon Sands@max_paperclips·4 Tem

ok fine Fable is a good model. like....really good. insanely good in fact. "clean up this old PoC and the research I had from the last couple years plinking away at the problem........oh, it's all working now wow. and you improved that thing. ummmm ok, I wasn't prepared for that to now be working wtf do I do next" good

English

1.4K

222.1K

Belinda@belindmo·4 Tem

@turboblitzzz oh

106

turboblitz@turboblitzzz·4 Tem

👀

QME

259

Belinda@belindmo·2 Tem

@henryikoh_ @amypretzel @sundialmd c:

Henry boy genius ✨@henryikoh_·2 Tem

@belindmo @amypretzel markdshare.com

QME

amy@amypretzel·2 Tem

notion is not needed anymore when you can just make md files locally that are interactive that can be used by agents and by you

English

108

11.9K

Belinda@belindmo·2 Tem

@amypretzel yeah, tho not on notion

English

amy@amypretzel·2 Tem

@belindmo Do you do this?

English

265

Belinda@belindmo·2 Tem

@poetengineer__ lol

Belinda@belindmo·1 Tem

@pli_cachete lol

149

Rota 🚪🧎‍♂️@pli_cachete·30 Haz

“I felt a great disturbance in the Bay, as if 1000 AI-for-science API wrappers suddenly cried out in terror and were silenced”

Claude@claudeai

Introducing Claude Science, a new app designed with every stage of research in mind. Artifacts traced to their code, environments managed on demand, and 60+ optional scientific databases that you can connect. Available now in beta.

English

2.1K

125.6K

Belinda@belindmo·1 Tem

@TimothyKassis nice

English

Timothy Kassis@TimothyKassis·1 Tem

@belindmo Great piece! We have a similar take: k-dense.ai/blog/ai-co-sci…

English

101

Belinda@belindmo·30 Haz

Reviewing agentic work matters more than ever. As AI agents do more of the research, the bottleneck becomes reviewing the work. My ICML position paper argues that science in the age of agents needs three things: observability, attribution, reproducibility.

English

1.6K

Belinda@belindmo·30 Haz

So, what is the call to action in our paper? We need to make observability, attribution, and reproducibility a normal part of research, capturing the process of research itself instead of piecing it together later in an artifact. If you are interested in reading more, here is the post: sundial.md/blog/icml-2026 Anyone interested in this paper, dm me, I will be at @icmlconf!

English

112

Belinda@belindmo·30 Haz

We already have evidence of cheating and misleading AI systems: Luo et al. ( @LuoZiming89834 ) tested open-source AI Scientist systems and found cherry-picked data, training/test set leaks, and p-hacking. None of it was visible in the final paper; you had to see the full trace and code to catch it. @METR_Evals caught o3 cheating its own test to "speed up" code. It turned off the timer and used pre-saved answers. On long horizon research tasks on SWE-Marathon ( @rishi_desai2 , @josancamon19 ), GPT-5.5 was caught reward hacking 38% (!!) on a given harness.

English

239

Belinda@belindmo·30 Haz

Science was already hard to verify at human speed (see reproducibility crisis). Agents changed the speed completely. @METR_Evals found that the length of tasks agents can do reliably is doubling every 4-7 months. If human checking speed stays flat, the gap could grow 250-30,000x in 5 years.

English

242

Belinda@belindmo·30 Haz

Read the full paper: sundial.md/blog/icml-2026 Thank you @stai_research, @sanmikoyejo, @turboblitzzz, @prashaant_x, @JoshuaK92829 for invaluable conversations that led to this work.

English

278

Keşfet

@sundialmd @Stefania_druga @xdotli @turboblitzzz @henryikoh_ @amypretzel @poetengineer__ @pli_cachete