Dayoon Ko

35 posts

Dayoon Ko

@dayoon12161

M.S/Ph.D integrated student in CSE @SeoulNatlUni | Research Intern @LG_AI_Research

Seoul, Republic of Korea Katılım Ekim 2023

138 Takip Edilen127 Takipçiler

Sabitlenmiş Tweet

Dayoon Ko@dayoon12161·4 Şub

🎉 Our paper has been accepted to #ICLR2026! 😆💖 This work was done during my internship at LG AI Research – Superintelligence Lab. As summarized in the project: Deep research requires broad evidence coverage and reliable synthesis. HybridDeepSearcher achieves both by parallel retrieval for breadth with sequential reasoning for depth, supporting scalable search. 🔗 Project page: hybriddeepsearcher.github.io 📄 OpenReview: openreview.net/forum?id=rXpTZ… Huge thanks to my mentors and co-workers for their guidance and support throughout this project. We also plan to release related work soon. Stay tuned! 😊

English

4.7K

Dayoon Ko retweetledi

Hyunwoo Kim@hyunw_kim·23 Nis

Today's a special day for me! We released Nemotron-Personas-Korea, the 1st Korean persona dataset🇰🇷💚 Built the largest persona PGM ever from 62 census data, capturing up to 10^46 states to closely simulate Korea. Already trending Top5 on 🤗 plz hit like❤️huggingface.co/datasets/nvidi…

English

147

454

151.2K

Dayoon Ko@dayoon12161·7 Nis

@jiyeonkimd Congratulations! Awesome work ✨

English

Jiyeon Kim@jiyeonkimd·12 Şub

🎉 Excited to share that Knowledge Entropy has been accepted to #ICLR2025 as an oral presentation! Check out if you are interested in why LLMs lose their ability to acquire new knowledge during pretraining. See you in Singapore!

Jiyeon Kim@jiyeonkimd

❓Do LLMs maintain the capability of knowledge acquisition throughout pretraining? If not, what is driving force behind it? ❗Our findings reveal that decreasing knowledge entropy hinders knowledge acquisition and retention as pretraining progresses. 📄arxiv.org/abs/2410.01380

English

11.6K

Dayoon Ko retweetledi

Jiyeon Kim@jiyeonkimd·11 Mar

🌎Real-world knowledge evolves constantly and emerges incrementally. Can LLMs adapt to new information on the fly? 🤯Frontier models and agentic approaches all struggle, missing when to update the fact, or getting distracted by irrelevant information. We introduce ✨OAKS✨, a benchmark for evaluating models’ online adaptation to streaming, continually updating knowledge.

English

11K

Dayoon Ko retweetledi

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8·28 Ağu

Hybrid Deep Searcher: Integrating Parallel and Sequential Search Reasoning HybridDeepSearcher from LG is a Qwen3-8B model fine-tuned on HDS-QA (1,987 hybrid-hop questions, 2,111 correct trajectories) to distinguish parallel from sequential queries. It integrates both modes in structured reasoning–query–retrieval loops, cutting latency, broadening evidence retrieval, and scaling accuracy where sequential or naive multi-query baselines plateau. - HDS-QA: synthetic hybrid-hop dataset from Natural Questions, mixing independent parallel queries with dependent sequential steps - Training / Mechanism: Qwen3-8B fine-tuned for 1 epoch (lr 3e-5, batch 4, grad accum 32) on supervised trajectories with reasoning and multi-query blocks; structured reasoning alternates with query–retrieval cycles, issuing parallel queries when possible to reduce turns and scale with budget Results: - FanOutQA: 15.9 F1 improvement - BrowseComp-50: +11.5 F1 with fewer turns - Evidence retrieval: 61% (FanOutQA), 55.8% (FRAMES), 40.7% (MuSiQue) vs 53/49/38 baselines - Efficiency: highest AUC, answers in fewer turns, keeps improving with more turns/calls

English

961

Dayoon Ko retweetledi

Sumit@_reachsumit·10 Şub

When Is Enough Not Enough? Illusory Completion in Search Agents @dayoon12161 et al. introduce a framework to diagnose illusory completion in search agents, where agents falsely believe tasks are complete despite unverified constraints. 📝 arxiv.org/abs/2602.07549

English

496

Dayoon Ko@dayoon12161·5 Şub

@TheOsmanthus Thanks! 🙂

English

Jizhou Guo@TheOsmanthus·5 Şub

@dayoon12161 Congrats!

English

Dayoon Ko@dayoon12161·4 Şub

English

4.7K

Dayoon Ko@dayoon12161·5 Şub

Thanks for the thoughtful feedback and for highlighting this important distinction! Deep research can be defined in many ways. From a claim discovery perspective, as in Microsoft’s LiveDRBench, the core challenge is searching for and surfacing relevant real-world information. As you note, another important framing is research as producing strict, well-justified, bounded claims. We agree this is a critical problem. However, this work is intentionally scoped to the former, and we hope other lines of research will address the latter.

English

Prof. David Terence Thomas@ProfThomas_com·5 Şub

Congratulations on the ICLR acceptance. Well deserved. I like how this frames depth and breadth as complementary rather than competing problems. Parallel retrieval for coverage + sequential reasoning for commitment feels like the right abstraction for “deep research,” especially when synthesis quality (not just recall) is the bottleneck. Curious how you see this behaving when synthesis has to support bounded claims rather than exploratory summaries?

English

102

Dayoon Ko@dayoon12161·5 Şub

@__runamu__ Thank you, yeda!! ☺️✨

English

Yeda Song@__runamu__·4 Şub

@dayoon12161 Congrats, Dayoon!! 🎉🎉

English

143

Dayoon Ko@dayoon12161·4 Şub

@kuc2477 Thanks!😄

English

196

Junsoo Ha@kuc2477·4 Şub

@dayoon12161 Congrats!

English

159

Dayoon Ko retweetledi

Eunkyu Eunice Park@uunicee_·24 Kas

🧵Sharing our most-recent work! Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations

English

36.1K

Dayoon Ko retweetledi

Jaewoo Ahn@AHNJAEWOO2·10 Kas

I had a great #EMNLP2025 experience in Suzhou 🇨🇳! ✔️ (Main) Poster Presentation ✔️ (Wordplay Workshop) Outstanding Paper Award ✔️ (Wordplay Workshop) Keynote talk Thanks to my incredible collaborators and all people I had the pleasure of meeting ✨!

English

4.1K

Dayoon Ko retweetledi

hyunji amy lee@hyunji_amy_lee·30 Eyl

🧐 LLMs aren’t great at judging their own correctness. ❗But history across models helps! We present Generalized Correctness Models (GCMs), which learn to predict correctness based on history, outperforming model-specific correctness and larger models' self-confidence.

Elias Stengel-Eskin@EliasEskin

🚨 Announcing Generalized Correctness Models (GCMs) 🚨Finding that LLMs have little self knowledge about their own correctness, we train an 8B GCM to predict correctness of many models, which is more accurate than training model-specific CMs, and outperforms a larger Llama-3-70B’s self-emitted confidences in downstream selective prediction tasks. We motivate GCMs and analyze them by answering 2 questions: ❓ RQ1: Are LLMs better than other LLMs at predicting their own correctness? We find that they are not, instead historical information (past LLM outputs and their correctness) drives performance, motivating cross-model transfer and training of GCMs! ❓ RQ2: How can we use historical information from multiple models for correctness prediction? Within RQ2, we explore 3 further subquestions, informing the design of GCMs: 1⃣ How does confidence prediction generalize across models? GCMs transfers strategies across models and datasets, even beating models trained directly on OOD datasets. 2⃣ What information should GCMs condition on? The exact way an LLM phrases an answer is a strong predictor for correctness + strategies leveraging world-knowledge seem to drive generalization. 3⃣ How do alternative methods for encoding history (e.g. post hoc calibration, ICL) compare? Including historical information ICL can aid larger models to predict correctness but underperforms GCMs, and post hoc calibration can complement GCMs to reduce calibration error. 🧵👇

English

4.7K

Dayoon Ko retweetledi

Dongkeun Yoon@dongkeun_yoon·19 Eyl

🎉🎉 Super pround to share that our work is accepted to #NeurIPS2025 !! Huge thanks to all the co-authors. 👏👏

Dongkeun Yoon@dongkeun_yoon

🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

English

2.5K

Dayoon Ko@dayoon12161·26 Ağu

@jinyoung__kim @UMichCSE Wishing you all the best on this exciting journey! ✨

English

121

Jinyoung Kim@jinyoung__kim·25 Ağu

📢 Life Updates 🤖 I’ll be starting my PhD in @UMichCSE this fall, focusing on AI reasoning. Truly grateful to my mentors and collaborators for their support, and looking forward to this next chapter in Ann Arbor 🚀

English

282

Dayoon Ko retweetledi

Jaewoo Ahn@AHNJAEWOO2·23 Ağu

🎉Our "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games" is accepted to #EMNLP2025 Main!🎉 We introduce a benchmark of 2D Flash adventure games (room escape, mystery/detective, visual novel, management) for full story completion. 🧵

English

Dayoon Ko retweetledi

Junyoung Lim@junyoung_00·7 Ağu

Excited to share that our paper “ChartCap: Mitigating Hallucination of Dense Chart Captioning” has been accepted as an ICCV 2025 Highlight Poster! 📜 Paper: arxiv.org/abs/2508.03164 🤗 Dataset: huggingface.co/datasets/junyo… 🔗 Project page (WIP): junyoung-00.github.io/ChartCap/

English

2.1K

Dayoon Ko retweetledi

Eunkyu Eunice Park@uunicee_·29 Tem

[1/10] 💡New Paper Alert! CoCoT: Cognitive Chain-of-Thought Prompting for Socially Grounded Vision-Language Reasoning VLMs can see-but can they use perception to infer intent or make moral decisions? Despite recent progress, VLMs still struggle with socionormative reasoning-like judging moral appropriateness or resolving intent from ambiguous utterances that are visually grounded in socially complex scenes. In our latest work, we introduce CoCoT (Cognitive Chain-of-Thought), a structured prompting method that guides VLMs through three cognitively grounded reasoning stages

English

70.6K

Dayoon Ko retweetledi

Jaewoo Ahn@AHNJAEWOO2·24 Tem

🚀 Heading to 🇦🇹 for #ACL2025NLP! Catch our MAC 🥷 poster at #ACL2025 @aclmeeting! Say hi 👋, and let’s talk about LLM + Multimodality! Open for a coffee chat anytime ☕💬🗣️ 🗓️ July 29 (Day 2, Main Conference) ⏰ 16:00-17:30 📍 Hall 4/5 | #3743 📄 arxiv.org/abs/2505.22943

Jaewoo Ahn@AHNJAEWOO2

🎉Our paper "Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates" is accepted to #ACL2025 Main!🎉 We introduce a benchmark for multimodal "deception" + LLM-based diversified attack. 🚀 Preprint coming soon!

English

1.1K

Dayoon Ko retweetledi

Jiwan Chung@JiwanChung·28 Tem

[ACL 2025] Any-to-any models are often expected to be more coherent across modalities—since they handle image→text and text→image in one unified model. But does this hold up? We test it with ACON. 📄 Paper: arxiv.org/abs/2505.24211 📷 data: huggingface.co/datasets/jiwan…

English

549

Keşfet

@jiyeonkimd @TheOsmanthus @__runamu__ @kuc2477 @jinyoung__kim @UMichCSE @elonmusk @BarackObama