Sungmin Cha

691 posts

Sungmin Cha

Sungmin Cha

@_sungmin_cha

Research Scientist at Meta | Formally Faculty Fellow @nyuniversity | PhD @SeoulNatlUni

New York, USA Katılım Temmuz 2019
325 Takip Edilen950 Takipçiler
Sabitlenmiş Tweet
Sungmin Cha
Sungmin Cha@_sungmin_cha·
I’m happy to share that I’m starting a new position as Research Scientist at Meta!
Sungmin Cha tweet mediaSungmin Cha tweet media
English
155
93
3.9K
91.1K
Jaemin Cho
Jaemin Cho@jmin__cho·
🥳 I am incredibly honored and grateful to receive the 2026 @UNC Distinguished Dissertation Award! This award recognizes four recipients across the whole university, and I’m humbled to represent the Mathematics, Physical Sciences, and Engineering category this year. Many thanks to my advisor @mohitban47, our MURGe-Lab family, and the @unccs @unc_ai_group for their constant support! 🙏 This is a great reminder of all the good memories from my PhD journey before I start my faculty career at The Johns Hopkins University 😊
Jaemin Cho tweet media
English
9
16
78
5.6K
Sungmin Cha retweetledi
Yoonho
Yoonho@youknow04·
몬티홀 문제는 1마리 염소가 3개의 문 중에 하나에 있는게 아니라, 아예 확 늘려서 100개의 문 중에 하나에 염소가 있다 하고 참여자가 문 하나를 찍은 다음에, 나머지 빈 문 98개를 열어주고 바꿀래? 케이스로 설명하면 수학이 불편한 사람들의 직관적 결정이 합리적 결정이되도록 돕기 좋더라.
@sepiroot

‘몬티 홀 문제’의 정답을 수학자들이 해설과 시뮬레이션으로 보여줘도 사람들이 믿지 못하는 문제를 ”’몬티 홀 문제‘ 문제“라고 부르는건 어떨까요

한국어
4
152
445
302.6K
Sungmin Cha retweetledi
Sungmin Cha retweetledi
Sungmin Cha retweetledi
Meta Newsroom
Meta Newsroom@MetaNewsroom·
Our AI glasses are constantly getting more useful, helpful, and intuitive 👓Today we’re launching our first prescription-optimized AI glasses and a range of software updates including nutrition tracking, @WhatsApp summaries and recall by Meta AI, Neural Handwriting, and more. about.fb.com/news/2026/03/m…
English
28
66
373
66.5K
Sungmin Cha retweetledi
Sungmin Cha retweetledi
Jung-Woo Ha
Jung-Woo Ha@JungWooHa2·
#AI연구역량 에서도 중국, 미국에 이은 세계 3위! #NeurIPS 논문 1저자 기준 3위로 등극했네요! AI G3 차근차근 달성해 갑니다. economist.com/interactive/sc…
Jung-Woo Ha tweet media
Seongnam-si, Republic of Korea 🇰🇷 한국어
26
265
819
23.5K
Sungmin Cha retweetledi
AI at Meta
AI at Meta@AIatMeta·
We’re releasing SAM 3.1: a drop-in update to SAM 3 that introduces object multiplexing to significantly improve video processing efficiency without sacrificing accuracy. We’re sharing this update with the community to help make high-performance applications feasible on smaller, more accessible hardware. 🔗 Model Checkpoint: go.meta.me/8dd321 🔗 Codebase: go.meta.me/b0a9fb
AI at Meta tweet media
English
102
276
2.2K
318.7K
Sungmin Cha retweetledi
Sungmin Cha retweetledi
Sungmin Cha retweetledi
Emergence
Emergence@PcIOvebbCbTdSTb·
한동안 나는 데이터사이언티스트로서의 내 해자가 비교적 분명하다고 생각해왔다. 새로운 문제를 만나면 데이터의 구조와 제약을 파악하고, 관련 논문과 아이디어를 빠르게 훑어 가설을 세우고, 직접 모델을 바꿔가며 성능을 끌어올리는 능력. 나름 성과도 내면서 자신감도 있었다. 그런데 LLM이 발전하면서 막연했지만 이제는 실제가 된 불안이 생겼다. 내가 가장 자신 있던 바로 그 영역이 생각보다 빠르게 자동화될 수 있다는 사실 때문이다. (autoresearch의 충격을 웃기게더 karpathy보다 빠르게 작년 말쯤 업무를 통해 확인했다) 그래서 나는 더 늦기 전에 포지션을 GenAI 쪽으로 옮겼다. 모델을 직접 트레이닝하며 개선하던 쪽에서, 이미 존재하는 강력한 모델을 활용해 시스템과 제품을 만드는 쪽으로... 분명 시대의 방향에 맞는 선택이라고 생각했고, 지금도 그 판단 자체를 후회하지는 않는다. 다만 막상 와보니 예상과는 다른 종류의 공허함이 있다. 예전에는 데이터와 문제를 깊이 파고들며 내 아이디어로 모델 성능을 밀어 올리는 데서 분명한 보람이 있었는데, 지금은 내가 데이터사이언티스트라기보다 AI를 잘 다루는 SWE에 더 가까워진 것처럼 느껴지기도 한다 (물론 그조차도 딸깍질이지만). 아무튼 내 고민의 과거의 해자가 사라진것도 있지만 앞으로 나의 무기는 뭐가 될 것인가이다. 앞으로의 덮쳐올 거대한 파도에 고민자체가 의미가 있나 싶지만...
한국어
6
38
180
20K
Sungmin Cha retweetledi
NeurIPS Conference
NeurIPS Conference@NeurIPSConf·
NeurIPS is aware of the community's concerns regarding the list of sanctions. NeurIPS is an inclusive community focused on free scientific discourse. We deeply value the research that comes from everyone in our community. The present concerns are not about science or academic freedom. They are about legal requirements that apply to the NeurIPS Foundation, which is responsible for complying with sanctions. We are actively consulting legal counsel to fully understand the legal constraints and we will update the NeurIPS community as soon as we have reliable guidance from our lawyers.
English
128
31
238
212K
Sungmin Cha retweetledi
말러팔삼
말러팔삼@mahler83·
오늘 하루종일 핫한 TurboQuant #논문 내가 이해한 대로 읊어보면: 고차원 벡터를 양자화 압축해 데이터 양을 줄이는데, 압축하는 만큼 부정확해짐. 특히 예를 들어 100번째 좌표가 0.5 근처에 다 몰려있으면 양자화할 때 이 부분 정보가 사실상 날아감. 그래서 뭘 하냐면 랜덤한 방향으로 회전시킴
말러팔삼 tweet media
말러팔삼@mahler83

조깅 3일차. 타임라인에 열번쯤 올라온 TurboQuant arXiv 논문을 넣고 AO 만들어서 퇴근길, 빨래설거지, 조깅에 거쳐서 들어봤는데 어렵기도 하고 신기하기도 했다. 압축률이 Theoretical lower bound에 근접한다고? 시끌시끌할만 하구나 싶었다

한국어
4
34
127
22.6K
Sungmin Cha retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
Google’s TurboQuant research blog, published yesterday is rattling memory stocks in financial market. Shares of major memory and storage suppliers had declined during early market action on Wednesday. Micron Technology (MU) was down 4%, Western Digital had slid 4.4%, Seagate Technology (STX) had declined 5.6%, and Sandisk (SNDK) had sunk 6.5%. KV cache is the running memory an LLM keeps so it does not recompute every past token, and that memory grows fast as context windows get longer. TurboQuant says Google can shrink that cache to 3 bits per value with no retraining, which means roughly 6x less KV memory while keeping quality close to intact. That hits the part of the AI hardware story that made MU, WDC, SNDK, and STX attractive, because fewer bits per model session can mean fewer high-end memory chips per server. As to the mechanism of TurboQuant, Google takes a long list of numbers that represent the model’s memory and turns that list a little, like rotating a pile of objects so they line up better in a box. That makes the numbers easier to store in a very low-precision form, so each one uses far fewer bits while still keeping most of the useful pattern. The second step is a cleanup pass that fixes part of the distortion caused by that heavy compression, so the model can still find the right past information instead of getting confused by the rougher stored version. Google also claims up to 8x faster performance on H100 for some key operations, so this is not only about saving memory but also about moving data with less friction. The selloff makes sense as a first reaction, but it may be too aggressive because lab wins do not automatically become industry-wide deployment and AI demand is still hitting hard supply limits. --- seekingalpha .com/news/4568538-google-reveals-algorithms-to-address-ai-memory-challenges-memory-stocks-drop
Rohan Paul tweet media
Rohan Paul@rohanpaul_ai

This is massive. Google released TurboQuant, advanced theoretically grounded quantization algorithms - massive compression for LLMs. Tackles one of the nastiest costs in long-context LLMs: the KV cache, which stores small memory vectors for every past token and keeps growing as the prompt gets longer. The usual fix is quantization, where each number is stored with far fewer bits, but most methods quietly add bookkeeping data, so the real memory savings are smaller than they seem. Google’s idea is a 2-stage compressor that keeps the useful geometry of those vectors while stripping out most of that hidden overhead. PolarQuant first randomly rotates the vector, then rewrites pairs of coordinates as a length and an angle, which makes the data easier to pack tightly without storing extra per-block constants. That captures most of the signal, and then QJL uses just 1-bit signs on the tiny leftover error so the final attention score stays accurate instead of drifting. A simple way to picture it is this: PolarQuant stores the main shape of the memory, and QJL stores a tiny correction note almost for free. The other smart part is that this works without retraining or fine-tuning, so it can sit under an existing model rather than forcing the whole system to learn a new format. In Google’s tests, TurboQuant cut KV cache memory by at least 6x, reached 3-bit storage with no accuracy drop on long-context benchmarks, and showed up to 8x faster attention scoring at 4-bit on H100 GPUs. That is a big deal because long prompts are often bottlenecked not by raw compute, but by the cost of moving huge amounts of memory around. Overall, the real advance is not just better compression, but compression that attacks hidden overhead directly, which is why the speed gains look unusually strong for something this lightweight.

English
16
18
75
13.1K
Sungmin Cha retweetledi
Prince Canuma
Prince Canuma@Prince_Canuma·
Just implemented Google’s TurboQuant in MLX and the results are wild! Needle-in-a-haystack using Qwen3.5-35B-A3B across 8.5K, 32.7K, and 64.2K context lengths: → 6/6 exact match at every quant level → TurboQuant 2.5-bit: 4.9x smaller KV cache → TurboQuant 3.5-bit: 3.8x smaller KV cache The best part: Zero accuracy loss compared to full KV cache.
Prince Canuma tweet media
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
148
412
5.2K
723.9K
Sungmin Cha retweetledi
Alex Finn
Alex Finn@AlexFinn·
This is potentially the biggest news of the year Google just released TurboQuant. An algorithm that makes LLM’s smaller and faster, without losing quality Meaning that 16gb Mac Mini now can run INCREDIBLE AI models. Completely locally, free, and secure This also means: • Much larger context windows possible with way less slowdown and degradation • You’ll be able to run high quality AI on your phone • Speed and quality up. Prices down. The people who made fun of you for buying a Mac Mini now have major egg on their face. This pushes all of AI forward in a such a MASSIVE way It can’t be stated enough: props to Google for releasing this for all. They could have gatekept it for themselves like I imagine a lot of other big AI labs would have. They didn’t. They decided to advance humanity. 2026 is going to be the biggest year in human history.
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
332
879
9.7K
1.5M
Sungmin Cha retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
This is massive. Google released TurboQuant, advanced theoretically grounded quantization algorithms - massive compression for LLMs. Tackles one of the nastiest costs in long-context LLMs: the KV cache, which stores small memory vectors for every past token and keeps growing as the prompt gets longer. The usual fix is quantization, where each number is stored with far fewer bits, but most methods quietly add bookkeeping data, so the real memory savings are smaller than they seem. Google’s idea is a 2-stage compressor that keeps the useful geometry of those vectors while stripping out most of that hidden overhead. PolarQuant first randomly rotates the vector, then rewrites pairs of coordinates as a length and an angle, which makes the data easier to pack tightly without storing extra per-block constants. That captures most of the signal, and then QJL uses just 1-bit signs on the tiny leftover error so the final attention score stays accurate instead of drifting. A simple way to picture it is this: PolarQuant stores the main shape of the memory, and QJL stores a tiny correction note almost for free. The other smart part is that this works without retraining or fine-tuning, so it can sit under an existing model rather than forcing the whole system to learn a new format. In Google’s tests, TurboQuant cut KV cache memory by at least 6x, reached 3-bit storage with no accuracy drop on long-context benchmarks, and showed up to 8x faster attention scoring at 4-bit on H100 GPUs. That is a big deal because long prompts are often bottlenecked not by raw compute, but by the cost of moving huge amounts of memory around. Overall, the real advance is not just better compression, but compression that attacks hidden overhead directly, which is why the speed gains look unusually strong for something this lightweight.
Rohan Paul tweet media
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
18
25
195
52.4K
Sungmin Cha retweetledi
Sukh Sroay
Sukh Sroay@sukh_saroy·
🚨 BREAKING: You asked AI to improve your writing. It changed what you were actually saying. New research just proved it. In a controlled study, heavy AI writing assistance led to a 70% increase in essays that gave no clear answer to the question being asked. Not unclear writing. Neutral writing. The kind that sounds polished but commits to nothing. Here's what makes this worse: Researchers took essays written in 2021 — before ChatGPT existed — and asked an LLM to revise them based on real expert feedback. The instruction was simple: fix the grammar. The model changed the meaning anyway. Every time. It can't help it. The training pushes toward inoffensive, agreeable, averaged-out text. That's not a bug they can patch. It's the objective function. And then there's the peer review finding. 21% of reviews at a recent top AI conference were AI-generated. Those reviews scored papers a full point higher on average. They also placed significantly less weight on clarity and significance — the two things peer review is supposed to evaluate. So we're not just talking about your email sounding a little corporate. We're talking about AI quietly flattening scientific discourse. Laundering opinions into non-answers. Replacing your voice with the mean of everyone's voice. The industry keeps asking: is AI-written content detectable? Wrong question. The right question is: what are we losing when a billion people let the same model edit their thinking?
Sukh Sroay tweet media
English
58
291
772
46.8K
Sungmin Cha retweetledi
NeurIPS Conference
NeurIPS Conference@NeurIPSConf·
Following the success of the EurIPS and NeurIPS-Mexico City pilots in 2025, we are thrilled to announce two official NeurIPS 2026 satellite events for this year! These will be held in Paris, France and Atlanta, USA, respectively, running alongside the main venue in Sydney, Australia. Both satellite events will feature keynotes, oral and poster presentations of accepted NeurIPS 2026 papers, as well as workshops. We are planning tutorials, affinity events, and other elements for the satellite sites and we'll share more information as planning advances. Wherever you choose to join us, the entire NeurIPS organizing committee is working hard to deliver an outstanding experience for the whole community! neurips.cc
English
13
62
515
123.7K