Jongho Park

129 posts

Jongho Park

@jon_ghoh

🧑‍💻 PhD student @berkeley_ai 👾 prev: researcher @Krafton_AI, M.S. @WisconsinCS

Seoul, Republic of Korea Katılım Ocak 2020

729 Takip Edilen370 Takipçiler

Jongho Park@jon_ghoh·5d

@TBaharav @JHUBME @HopkinsDSAI @JHUCompSci Congrats! awesome to hear :)

English

115

Tavor Baharav@TBaharav·5d

Life update! Excited to share I'll be joining Johns Hopkins University as an Assistant Professor of Biomedical Engineering (@JHUBME) in January 2027! I’ll be based in the Data Science and AI Institute (@HopkinsDSAI) with a secondary appointment in CS (@JHUCompSci). (1/6)

English

109

11.3K

Jongho Park@jon_ghoh·27 May

Autoregressive LMs seem to require smart looping to get better performance per training FLOPs. For masked diffusion LMs, simple looping seems to be a free lunch!

Dongmin Park@dongmin_park11

➿Looped Diffusion Language Models Looping has landed in dLLMs, and it is surprisingly effective! Accelerates training convergence 3.34x, improves GSM8K accuracy +8.5% on the same data, and enables test-time depth scaling. Check out our LoopMDM paper for more details!

English

1.9K

Jongho Park retweetledi

Dongmin Park@dongmin_park11·27 May

English

359

22.2K

Jongho Park retweetledi

Sonny Rollins@sonnyrollins·26 May

It is with deep sorrow and profound love that we announce the passing of Sonny Rollins. The Saxophone Colossus died this afternoon at his home in Woodstock, NY at the age of 95. 1/2 conta.cc/4wFIDrM

English

528

7.2K

17.6K

1.1M

Jongho Park retweetledi

Hurley@Johnsjawn·16 May

Or go to the Presidio, jump in the ocean, get a coffee at The Mill, watch sunset at Twin Peaks, ride a bike anywhere, see live music, eat a burrito, take a grass nap in GG Park, have beer at The Page, watch the Bay Bridge lights, wander Chinatown, wander Ferry building, run across GG Bridge, walk Fort Funston, eat the best meal of your life with friends…drive any direction for 2hrs. And be deeply grateful for the heavenscape you live in.

Deedy@deedydas

The vibes in SF feel pretty frenetic right now. The divide in outcomes is the worst I've ever seen. Over the last 5yrs, a group of ~10k people - employees at Anthropic, OpenAI, xAI, Nvidia, Meta TBD, founders - have hit retirement wealth of well above $20M (back of the envelope AI estimation). Everyone outside that group feels like they can work their well-paying (but <$500k) job for their whole life and never get there. Worse yet, layoffs are in full swing. Many software engineers feel like their life's skill is no longer useful. The day to day role of most jobs has changed overnight with AI. As a result, 1. The corporate ladder looks like the wrong building to climb. Everyone's trying to align with a new set of career "paths": should I be a founder? Is it too late to join Anthropic / OpenAI? should I get into AI? what company stock will 10x next? People are demanding higher salaries and switching jobs more and more. 2. There’s a deep malaise about work (and its future). Why even work at all for “peanuts”? Will my job even exist in a few years? Many feel helpless. You hear the “permanent underclass” conversation a lot, esp from young people. It's hard to focus on doing good work when you think "man, if I joined Anthropic 2yrs ago, I could retire" 3. The mid to late middle managers feel paralyzed. Many have families and don't feel like they have the energy or network to just "start a company". They don't particularly have any AI skills. They see the writing on the wall: middle management is being hollowed out in many companies. 4. The rich aren’t particularly happy either. No one is shedding tears for them (and rightfully so). But those who have "made it" experience a profound lack of purpose too. Some have gone from <$150k to >$50M in a few years with no ramp. It flips your life plans upside down. For some, comparison is the thief of joy. For some, they escape to NYC to "live life". For others still, they start companies "just cuz", often to win status points. They never imagined that by age 30, they'd be set. I once asked a post-economic founder friend why they didn't just sell the co and they said "and do what? right now, everyone wants to talk to me. if i sell, I will only have money." I understand that many reading this scoff at the champagne problems of the valley. Society is warped in this tech bubble. What is often well-off anywhere else in the world is bang average here. Unlike many other places, tenure, intelligence and hard work can be loosely correlated with outcomes in the Bay. Living through a societally transformative gold rush in that environment can be paralyzing. "Am I in the right place? Should I move? Is there time still left? Am I gonna make it?" It psychologically torments many who have moved here in search of "success". Ironically, a frequent side effect of this torment is to spin up the very products making everyone rich in hopes that you too can vibecode your path to economic enlightenment.

English

200

4.1K

288.3K

Jongho Park retweetledi

Ryan Yixiang Wang@RyanYixiang·8 May

MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!

Ai2@allen_ai

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

English

532

116.3K

Jongho Park retweetledi

corsaren • vibecamping on sat@corsaren·21 Nis

types of guy in the AI consciousness debate: - guy who thinks ai can’t be conscious because it’s “just a stochastic parrot” - guy who thinks ai must be conscious because claude is a good boi - guy who hasn’t gotten over 4o - guy who unironically thinks everything is computer - guy who claims to have a more nuanced argument for computational functionalism, but it just boils down to everything is computer - dualist whose belief in dualism is downstream of their belief in god, yet tries to argue the inverse - guy who doesn’t understand the difference between cognition and p-consciousness - guy who asserts illusionism but has apparently wrestled with zero of the implications other than “reductive materialism wins again” - guy who says the hard problem is easy, but then proceeds to only answer the easy problem - guy who rejects ai consciousness because otherwise it might be wrong to abuse claude with death threats to make CRUD apps faster - guy who argues that consciousness is is the key to moral patienthood, but completely ignores that when discussing animal rights - eliezer yudkowsky being pedantic - guy being pedantic about eliezer yudkowsky’s pedantry - guy who rejects dualism because that would make mind uploading impossible and mean that he finally has to confront the inevitability of his own death - guy who thinks this argument is unresolvable so everyone should just shut up and accept his position (which obviously deserves the benefit of the doubt) - guy who would literally cut off his own hand if he thought there were a 1 in 10 trillion chance of creating ~infinite utility~ - guy who just thinks that redness is, like, super weird, man. can’t explain that! - guy with a rarely-updated philosophy blog despite not majoring in philosophy or even reading that many books, talking about how “the whole field is up its own ass” - academic philosopher who, for some reason, expects a higher caliber of discussion on x dot com the everything app - guy who thinks that vectors are literally emotions and bites the bullet that, yes, your thermostat does feel hot - panpsychist who took dmt once and contributes almost nothing to the conversation - guy who is literally a solipsist but is still really invested in convincing strangers on the internet that he’s right any that i missed?

English

351

194

1.8K

169.5K

Jongho Park retweetledi

Juno KIM@junokim_ai·30 Mar

Excited to share our new paper on sharp capacity scaling of the Muon optimizer! Joint work with @EshaanNichani Denny Wu @albertobietti @jasondeanlee: arxiv.org/abs/2603.26554 (1/7)

English

125

21.1K

Jongho Park retweetledi

xuan (ɕɥɛn / sh-yen)@xuanalogue·5 Mar

There's not much info I can find about how deeply integrated AI is into Chinese military, but I will not be surprised if this is a repeat of when the US rushed to build the atomic bomb because they were so afraid the Germans were ahead of them (they weren't even close).

xuan (ɕɥɛn / sh-yen)@xuanalogue

Apparently Claude is now so embedded in US military decision-making that they would exercise the Defense Production Act (or other means) to continue using it in this illegal war. Anthropic should never have agreed to the DoW-Palantir deal.

English

4.9K

Jongho Park retweetledi

baby keem@babykeem·26 Şub

how do u fix openclaw internal reasoning leaking

English

649

1.7K

18.6K

3.6M

Jongho Park retweetledi

jasmine sun@jasminewsun·13 Şub

I went to DC to talk to people across the political spectrum (& see some data centers) and concluded that we are *really* not ready for how much people hate AI new scene report on my week with the AI populists: jasmi.news/p/ai-populism

English

129

890

234.1K

Jongho Park retweetledi

Richard Brody@tnyfrontrow·17 Şub

In memory and honor of Frederick Wiseman, who took hold of a still-young format and, guided from the start by an unyielding sense of principle, made a body of work so original, idea-rich, and unified that it seems foreordained—a historic fusion of investigation and the inner life

English

299

1.8K

61.6K

Jongho Park retweetledi

Jack Morris@jxmnop·13 Şub

- invents the greatest plagiarism machine in history - it gets plagiarized

Bloomberg@business

OpenAI has warned US lawmakers that its Chinese rival DeepSeek is using unfair and increasingly sophisticated methods to extract results from leading US AI models to train the next generation of its breakthrough R1 chatbot bloomberg.com/news/articles/…

English

521

181.5K

Jongho Park@jon_ghoh·12 Şub

@misterminsoo my guess is chaeyoung or yves?

English

770

Joshua Minsoo Kim@misterminsoo·12 Şub

tfw the first tone glow k-pop star interview is actually happening

English

160

39.3K

Jongho Park retweetledi

Dimitris Papailiopoulos@DimitrisPapail·6 Şub

x.com/i/article/2019…

ZXX

20.2K

Jongho Park retweetledi

Yacine Mahdid@yacinelearning·27 Oca

it’s a bit hard to understand for the non-initiated but there is a whole cottage industry of influencers that are processing research papers algorithmically and purposefully creating the worst scientific communication content you ever seen

AI Highlight@AIHighlight

🚨 Your AI is lying to you with complete confidence. Harvard & MIT just proved ChatGPT hallucinates 110% less when you force it to argue with itself. The technique is called "Recursive Meta-Cognition" and it's embarrassingly simple. Here's how to make AI actually think:

English

1.2K

100.9K

Jongho Park retweetledi

Dimitris Papailiopoulos@DimitrisPapail·27 Oca

What's the tradeoff between (Model Size, Quantization, Test-time Compute, Accuracy)? Come to ICLR to find out how we interpret 1700 experiments attempting to address this question :)

Dimitris Papailiopoulos@DimitrisPapail

Not All Bits Are Equal: What We Learned From 1700 Experiments on Memory-Optimal Reasoning Given a fixed memory budget, how should you allocate across model weights, KV cache, and test-time compute to maximize accuracy in reasoning models? For example: would you choose a 32B, 4-bit model with a 14k token budget, or an 8B, 16-bit model with 30k tokens? We ran 1,700 experiments on the Qwen3 family to find out. We varied: - Model size (0.6B-32B), - Weight precision (4/8/16-bit via GPTQ), - Serial test-time compute (token budgets 2k→30k via budget forcing), - Parallel test-time compute (Maj @K, up to K=16), - KV cache compression (eviction: R-KV, StreamingLLM; quantization: HQQ at 2/4/8-bit). This is great work led by my Krafton/UW collaborators @jhyuckkim (Krafton), @ethan_ewer (undergrad!! at UW-Madison), @taehong_moon (Krafton), @jon_ghoh (UC Berkeley) Here is a summary of the main findings: At the 4B+8-bit threshold optimal mem strategy flips For models effectively smaller than 8-bit 4B, spend memory on more (or higher-precision) weights, not on longer generations. For larger models, do the opposite: allocate memory to longer generations until performance saturates. This threshold isn't arbitrary, i.e., it is right at the point where weights dominate KV cache/token count. The reasoning task matters (eg your mileage may vary, no universal recipe!) Math reasoning (eg AIME25): 4-bit quantization is almost always a bad idea. An 8B model at 16-bit outperforms a 14B model at 4-bit with similar memory spent. The numerical precision in weights seems to matter for "reasoning heavy" tasks. Almost as if the model’s capacity to utilize test time compute is decimated by quantization... Knowledge heavier tasks (GPQA-D): 4-bit is broadly memory-optimal. Here, parameter count matters more than precision. The interpretation is that here you want more effective weights to store things, and raw parameter count dominates test time compute. How does parallel test time compute (fancy for majority voting) factor in? Majority voting (Maj@K) increases KV cache linearly with K. It improves the mem–acc trade-off only when the model is >= 8-bit 4B effective size; the optimal K grows with the memory budget. Below that scale, serial test time compute should be preferred. Weight quantization alone isn't enough! Both KV cache eviction and KV quantization push the mem optimal Pareto frontier higher across all model sizes we tested. Should you prefer KV evict or quant? - Small models (<8-bit 4B): KV cache eviction wins - Large models (≥8-bit 4B): Both strategies competitive Latency/throughput note: End-to-end latency is dominated by generation length. When you care a lot about latency, 8-bit often sits at a better speed–accuracy point than 4-bit. Note on batching: When model weights, i.e., params, are amortized across concurrent generations (that is, batched inference), the tradeoff shifts, as you’d expected. At a batch size of 16 the 0.6B model never appears on the Pareto. The 4B-8bit model always appears no matter the batchsize in the ~1-2GB memory region (good model setting for mobile devices!) What This Means ***There is no universal memory-optimal strategy for reasoning models!*** The right choice depends on almost every parameter involved, but here is one way to choose: If effective size < 8-bit 4B --- Spend your bits on model capacity/precision over longer token budgets. --- Prefer 8-bit for math-heavy tasks. --- Use KV eviction over KV quantization. --- Stick to serial scaling; Maj@K is memory-inefficient here. If effective size ≥ 8-bit 4B ---Increase token budget till gains saturate. ---Maj@K always helps, so grow K with available memory. ---KV quantization is competitive with eviction; choose based on implementation and maybe taste 😊 Important Caveat: These findings are specific to the Qwen3 family on AIME25 and GPQA-D. The thresholds and strategies will vary with different architectures, training methods, and task distributions.

English

139

38.7K

Jongho Park@jon_ghoh·17 Oca

@Kangwook_Lee Madison will miss you! Best of luck on your next journey :)

English

425

Kangwook Lee@Kangwook_Lee·16 Oca

Today is my last day at UW-Madison. This was the most difficult decision of my life. I truly loved working here with the best colleagues and students, and I loved Madison dearly. Thank you all for everything, and I’ll miss everyone. And very special thanks to @DimitrisPapail and @rdnowak for being my best colleagues, mentors, and friends. Thank you all ❤️

English

243

39.5K

Jongho Park retweetledi

Kangwook Lee@Kangwook_Lee·10 Ara

🚀 I'm hiring 3 postdocs to work with me @Krafton_AI (Seoul, Korea) What I offer: 🔥 Access to a compute cluster with 1,000+ GPUs 🧠 Support to pursue top-tier research 💰 The highest postdoc salary in Korea + The absolutely best company-provided food & office view 😁

English

10.1K

Jongho Park retweetledi

Negin Raoof@NeginRaoof_·6 Ara

How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on TerminalBench, and sets a new bar on our newly released OpenThoughts-TB-Dev benchmark. (1/n)

English

287

127.3K

Keşfet

@TBaharav @JHUBME @HopkinsDSAI @JHUCompSci @EshaanNichani @albertobietti @jasondeanlee @misterminsoo