Celeste HC

75 posts

Celeste HC

@CeliaShu1024

UG SWE @xmumalaysia → MAIR'26 && PhD in CBHI @cuhksz

Cyber Utopia Katılım Temmuz 2016

155 Takip Edilen5 Takipçiler

Celeste HC@CeliaShu1024·9 Nis

seems much more lightweight than I expected... capability enhancement likely reflects a dual contribution from both harness engineering and scaling laws 🤔

Claude@claudeai

Build and deploy your agents through the Claude Console, Claude Code, or our new CLI: platform.claude.com/workspaces/def… Read more on the blog: claude.com/blog/claude-ma…

English

Celeste HC@CeliaShu1024·4 Nis

this relationship between emotion behaviours and domain-specific performance of LLMs might be an inspiration for our later attempts🤩

Anthropic@AnthropicAI

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

English

Celeste HC@CeliaShu1024·17 Kas

our recent attempts on agentic frameworks reminded me that perhaps we should return agent-core LLMs and memory systems to a more naive stage. namely, less knowledge SFT, more structured memory format, and overwriting inherent memory instead of expanding it.

English

Celeste HC@CeliaShu1024·23 Eyl

I'm addicted to emphasizing the importance of something by laying some groundwork. it might becomes a bad habit because recently people seem to have lost the patience and interest to complete this process of divergence and convergence 😢

English

101

Celeste HC@CeliaShu1024·24 Ağu

omg glad to find we were noticed by domain-specific practitioners 🤣🙏

김준혁 Junhewk Kim@junhewk

[HAIE 2025-33] 헬스케어 AI 윤리 뉴스레터 blog.aiethics.dev/haie-2025-33/ 가장 흥미로운 논문으로 arXiv에 공개된 PrinciplismQA 벤치마크를 꼽고 싶습니다. 논문 저자는 LLM의 의료윤리적 판단을 시험하기 위해 "PrinciplismQA" 벤치마크 데이터를 개발했어요.

English

235

Celeste HC@CeliaShu1024·26 Tem

I was struggling choosing the most proper ending for this work as well. forks on my way grew in front of me like a huge decision tree. I gradually felt crucial that I cannot assume the result first and then constantly align my prejudices with it. whenever 😌

English

110

Celeste HC@CeliaShu1024·25 Tem

finally completed all experiments this round. involved phases were completely reconstructed--some became more logical, some were aborted. better than expected but partially deviated from the initial intention. I can't tell it is a relief or relinquishment. make it exist first.

English

142

Celeste HC retweetledi

Artificial Intelligence Papers@SciFi·16 Haz

LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic. arxiv.org/abs/2506.11221

English

229

Celeste HC retweetledi

Jack Morris@jxmnop·24 Haz

another incredibly underrated paper: Thinking Like Transformers (Weiss et al, 2021) presents RASP: a programming language that compiles to transformer *weights*. can implement sort(), bincount(), etc. seems important. why don't interpretability people care about this?

English

105

1.2K

83.2K

Celeste HC@CeliaShu1024·17 Haz

@sthuyan not just about moving on. this is more about how to become a complete individual 🫡

English

Yan Hu@sthuyan·16 Haz

@CeliaShu1024 Move on

English

Celeste HC@CeliaShu1024·14 Haz

@sthuyan 👌

QME

Yan Hu@sthuyan·14 Haz

@CeliaShu1024 Pokemon go!

Indonesia

Celeste HC retweetledi

François Chollet@fchollet·8 Haz

Beyond the perhaps superficial semantic distinction between "reasoning" and "pattern matching", there is a fundamental gap in the practical capabilities and behavior of these systems. You don't create an invention machine by iterating on an automation machine.

Ruben Hassid@rubenhassid

BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests)

English

180

1.3K

146.4K

Celeste HC retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·2 Haz

How much do language models memorize? "We formally separate memorization into two components: unintended memorization, the information a model contains about a specific dataset, and generalization, the information a model contains about the true data-generation process. When we completely eliminate generalization, we can compute the total memorization, which provides an estimate of model capacity: our measurements estimate that GPT-style models have a capacity of approximately 3.6 bits per parameter. We train language models on datasets of increasing size and observe that models memorize until their capacity fills, at which point “grokking” begins, and unintended memorization decreases as models begin to generalize."

Tanishq Mathew Abraham, Ph.D. tweet media

English

167

1.1K

78.6K

Celeste HC retweetledi

Dmitry Rybin@DmitryRybin1·16 May

We discovered faster way to compute product of matrix by its transpose! This has profound implications for data analysis, chip design, wireless communication, and LLM training! paper: arxiv.org/abs/2505.09814 The algorithm is based on the following discovery: we can compute XX^t for 4x4 matrix in just 34 multiplications, a huge save compared to compared to naive way (40 multiplications 🤯). We can apply this algorithm to any m x n matrix X (with n, m >= 4) by dividing it into 16 blocks X_1, ..., X_16. - Estimated energy save: 5-10% ✅ - Estimated time save: 5% ✅ The discovery was made by combining Machine Learning-based Search and Combinatorial Optimization. We used RL to sample bilinear expressions. We then used combinatorial solvers (Gurobi) to enumerate relations between these expressions and combine these expressions together into one algorithm for XX^t. One way think of it is modification of AlphaTensor approach - We reduced the action space by a factor of a million (x1000000) at the expense of relying on combinatorial solvers. The matrix XX^t is used everywhere: - Data Analysis: linear regression - Finance: covariance matrix for asset returns - LLM training: Muon, SOAP, Shampoo - Wireless Communication: 5G, MIMO channel capacity This operation is performed trillions of times every minute globally. Imagine if we can save 5% of energy used for these computations! Coauthors: Yushun Zhang @ericzhang0410, Zhi-Quan Luo.

English

571

4.3K

422.9K

Celeste HC retweetledi

Ted Werbel@tedx_ai·17 Şub

Try this prompt instead, works like magic 🪄 "Reflect on 5-7 different possible sources of the problem, distill those down to 1-2 most likely sources, and then add logs to validate your assumptions before we move onto implementing the actual code fix"

English

1.4K

87K

Celeste HC@CeliaShu1024·3 May

@sthuyan 🤣🤣🤣

QME

Yan Hu@sthuyan·3 May

@CeliaShu1024 Learn uncertain quantification and you will not fear of uncertainty any more.

English

Celeste HC retweetledi

Ryu Tanno | 丹野龍太郎@RyutaroTanno·2 May

Excited to share a big update on AMIE, our research AI doctor from @GoogleDeepMind and @GoogleAI Now, AMIE can “see” and interpret visual medical data within a diagnostic conversation. Yes, AMIE exceeded human doctors (PCPs) in many key metrics like diagnostic accuracy and multimodal reasoning in a simulated clinical exam (OSCE) study. But crucially, how we realised this upgrade was different from our previous works, which relied very much on domain-specific (pre- or post-) training. We demonstrate that, with no finetuning, the combination of (1) natively multimodal Gemini 2.0 Flash (2) domain-specific inference-time algorithm can result in a capable conversational diagnostic AI. Through this year long project, we felt the power of the evolving frontier foundation models in this important domain while lots of work still remains to be done. See the thread from @KhaledSaab11 to learn more: x.com/KhaledSaab11/s… Adding more details and pointers to deep dives from my colleagues below!

Google AI@GoogleAI

Building on Articulate Medical Intelligence Explorer — AMIE, our research diagnostic conversational AI agent — today on the blog we share a first of its kind demonstration of a multimodal conversational diagnostic AI agent, multimodal AMIE. Learn more →goo.gle/42D0QcB

English

324

45.6K

Celeste HC retweetledi

The Game Awards@thegameawards·28 Mar

March 26, 2027. The Legend of Zelda live-action movie has a release date. It is being released by Nintendo in collaboration with Sony.

English

194

1.1K

9.5K

524.7K

Celeste HC@CeliaShu1024·7 Mar

hoping that our efforts on evaluation system can really help medical LLMs make real improvement for the world. but right now, I am not sure if my dreams are just a bit too socialist and romantic 😌

English

Celeste HC retweetledi

François Chollet@fchollet·5 Mar

Disagree; we're working on near-term AGI and we aren't primarily using LLMs

Find me on bsky @colin-fraser.net@colin_fraser

There are basically two positions: either LLMs, sufficiently expansively pre-trained and reinforcement-learned and test-time-scaled, become AGI, or they don't. There's no obvious third thing. All the eggs are in the LLM basket.

English

1.3K

282.9K

Keşfet

@sthuyan @GoogleDeepMind @GoogleAI @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates