Dayoon Ko

35 posts

Dayoon Ko banner
Dayoon Ko

Dayoon Ko

@dayoon12161

M.S/Ph.D integrated student in CSE @SeoulNatlUni | Research Intern @LG_AI_Research

Seoul, Republic of Korea Katılım Ekim 2023
138 Takip Edilen127 Takipçiler
Sabitlenmiş Tweet
Dayoon Ko
Dayoon Ko@dayoon12161·
🎉 Our paper has been accepted to #ICLR2026! 😆💖 This work was done during my internship at LG AI Research – Superintelligence Lab. As summarized in the project: Deep research requires broad evidence coverage and reliable synthesis. HybridDeepSearcher achieves both by parallel retrieval for breadth with sequential reasoning for depth, supporting scalable search. 🔗 Project page: hybriddeepsearcher.github.io 📄 OpenReview: openreview.net/forum?id=rXpTZ… Huge thanks to my mentors and co-workers for their guidance and support throughout this project. We also plan to release related work soon. Stay tuned! 😊
English
4
5
59
4.7K
Dayoon Ko retweetledi
Hyunwoo Kim
Hyunwoo Kim@hyunw_kim·
Today's a special day for me! We released Nemotron-Personas-Korea, the 1st Korean persona dataset🇰🇷💚 Built the largest persona PGM ever from 62 census data, capturing up to 10^46 states to closely simulate Korea. Already trending Top5 on 🤗 plz hit like❤️huggingface.co/datasets/nvidi…
Hyunwoo Kim tweet media
English
11
147
454
151.2K
Jiyeon Kim
Jiyeon Kim@jiyeonkimd·
🎉 Excited to share that Knowledge Entropy has been accepted to #ICLR2025 as an oral presentation! Check out if you are interested in why LLMs lose their ability to acquire new knowledge during pretraining. See you in Singapore!
Jiyeon Kim@jiyeonkimd

❓Do LLMs maintain the capability of knowledge acquisition throughout pretraining? If not, what is driving force behind it? ❗Our findings reveal that decreasing knowledge entropy hinders knowledge acquisition and retention as pretraining progresses. 📄arxiv.org/abs/2410.01380

English
2
9
78
11.6K
Dayoon Ko retweetledi
Jiyeon Kim
Jiyeon Kim@jiyeonkimd·
🌎Real-world knowledge evolves constantly and emerges incrementally. Can LLMs adapt to new information on the fly? 🤯Frontier models and agentic approaches all struggle, missing when to update the fact, or getting distracted by irrelevant information. We introduce ✨OAKS✨, a benchmark for evaluating models’ online adaptation to streaming, continually updating knowledge.
Jiyeon Kim tweet media
English
3
21
64
11K
Dayoon Ko retweetledi
𝚐𝔪𝟾𝚡𝚡𝟾
𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8·
Hybrid Deep Searcher: Integrating Parallel and Sequential Search Reasoning HybridDeepSearcher from LG is a Qwen3-8B model fine-tuned on HDS-QA (1,987 hybrid-hop questions, 2,111 correct trajectories) to distinguish parallel from sequential queries. It integrates both modes in structured reasoning–query–retrieval loops, cutting latency, broadening evidence retrieval, and scaling accuracy where sequential or naive multi-query baselines plateau. - HDS-QA: synthetic hybrid-hop dataset from Natural Questions, mixing independent parallel queries with dependent sequential steps - Training / Mechanism: Qwen3-8B fine-tuned for 1 epoch (lr 3e-5, batch 4, grad accum 32) on supervised trajectories with reasoning and multi-query blocks; structured reasoning alternates with query–retrieval cycles, issuing parallel queries when possible to reduce turns and scale with budget Results: - FanOutQA: 15.9 F1 improvement - BrowseComp-50: +11.5 F1 with fewer turns - Evidence retrieval: 61% (FanOutQA), 55.8% (FRAMES), 40.7% (MuSiQue) vs 53/49/38 baselines - Efficiency: highest AUC, answers in fewer turns, keeps improving with more turns/calls
𝚐𝔪𝟾𝚡𝚡𝟾 tweet media
English
1
2
8
961
Dayoon Ko retweetledi
Sumit
Sumit@_reachsumit·
When Is Enough Not Enough? Illusory Completion in Search Agents @dayoon12161 et al. introduce a framework to diagnose illusory completion in search agents, where agents falsely believe tasks are complete despite unverified constraints. 📝 arxiv.org/abs/2602.07549
English
0
1
7
496
Dayoon Ko
Dayoon Ko@dayoon12161·
🎉 Our paper has been accepted to #ICLR2026! 😆💖 This work was done during my internship at LG AI Research – Superintelligence Lab. As summarized in the project: Deep research requires broad evidence coverage and reliable synthesis. HybridDeepSearcher achieves both by parallel retrieval for breadth with sequential reasoning for depth, supporting scalable search. 🔗 Project page: hybriddeepsearcher.github.io 📄 OpenReview: openreview.net/forum?id=rXpTZ… Huge thanks to my mentors and co-workers for their guidance and support throughout this project. We also plan to release related work soon. Stay tuned! 😊
English
4
5
59
4.7K
Dayoon Ko
Dayoon Ko@dayoon12161·
Thanks for the thoughtful feedback and for highlighting this important distinction! Deep research can be defined in many ways. From a claim discovery perspective, as in Microsoft’s LiveDRBench, the core challenge is searching for and surfacing relevant real-world information. As you note, another important framing is research as producing strict, well-justified, bounded claims. We agree this is a critical problem. However, this work is intentionally scoped to the former, and we hope other lines of research will address the latter.
English
1
0
0
72
Prof. David Terence Thomas
Prof. David Terence Thomas@ProfThomas_com·
Congratulations on the ICLR acceptance. Well deserved. I like how this frames depth and breadth as complementary rather than competing problems. Parallel retrieval for coverage + sequential reasoning for commitment feels like the right abstraction for “deep research,” especially when synthesis quality (not just recall) is the bottleneck. Curious how you see this behaving when synthesis has to support bounded claims rather than exploratory summaries?
English
1
0
1
102
Dayoon Ko retweetledi
Eunkyu Eunice Park
Eunkyu Eunice Park@uunicee_·
🧵Sharing our most-recent work! Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations
English
1
4
54
36.1K
Dayoon Ko retweetledi
Jaewoo Ahn
Jaewoo Ahn@AHNJAEWOO2·
I had a great #EMNLP2025 experience in Suzhou 🇨🇳! ✔️ (Main) Poster Presentation ✔️ (Wordplay Workshop) Outstanding Paper Award ✔️ (Wordplay Workshop) Keynote talk Thanks to my incredible collaborators and all people I had the pleasure of meeting ✨!
Jaewoo Ahn tweet mediaJaewoo Ahn tweet mediaJaewoo Ahn tweet mediaJaewoo Ahn tweet media
English
0
6
45
4.1K
Dayoon Ko retweetledi
hyunji amy lee
hyunji amy lee@hyunji_amy_lee·
🧐 LLMs aren’t great at judging their own correctness. ❗But history across models helps! We present Generalized Correctness Models (GCMs), which learn to predict correctness based on history, outperforming model-specific correctness and larger models' self-confidence.
Elias Stengel-Eskin@EliasEskin

🚨 Announcing Generalized Correctness Models (GCMs) 🚨Finding that LLMs have little self knowledge about their own correctness, we train an 8B GCM to predict correctness of many models, which is more accurate than training model-specific CMs, and outperforms a larger Llama-3-70B’s self-emitted confidences in downstream selective prediction tasks. We motivate GCMs and analyze them by answering 2 questions: ❓ RQ1: Are LLMs better than other LLMs at predicting their own correctness? We find that they are not, instead historical information (past LLM outputs and their correctness) drives performance, motivating cross-model transfer and training of GCMs! ❓ RQ2: How can we use historical information from multiple models for correctness prediction? Within RQ2, we explore 3 further subquestions, informing the design of GCMs: 1⃣ How does confidence prediction generalize across models? GCMs transfers strategies across models and datasets, even beating models trained directly on OOD datasets. 2⃣ What information should GCMs condition on? The exact way an LLM phrases an answer is a strong predictor for correctness + strategies leveraging world-knowledge seem to drive generalization. 3⃣ How do alternative methods for encoding history (e.g. post hoc calibration, ICL) compare? Including historical information ICL can aid larger models to predict correctness but underperforms GCMs, and post hoc calibration can complement GCMs to reduce calibration error. 🧵👇

English
0
19
33
4.7K
Dayoon Ko retweetledi
Jinyoung Kim
Jinyoung Kim@jinyoung__kim·
📢 Life Updates 🤖 I’ll be starting my PhD in @UMichCSE this fall, focusing on AI reasoning. Truly grateful to my mentors and collaborators for their support, and looking forward to this next chapter in Ann Arbor 🚀
English
1
0
7
282
Dayoon Ko retweetledi
Jaewoo Ahn
Jaewoo Ahn@AHNJAEWOO2·
🎉Our "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games" is accepted to #EMNLP2025 Main!🎉 We introduce a benchmark of 2D Flash adventure games (room escape, mystery/detective, visual novel, management) for full story completion. 🧵
English
1
5
28
5K
Dayoon Ko retweetledi
Eunkyu Eunice Park
Eunkyu Eunice Park@uunicee_·
[1/10] 💡New Paper Alert! CoCoT: Cognitive Chain-of-Thought Prompting for Socially Grounded Vision-Language Reasoning VLMs can see-but can they use perception to infer intent or make moral decisions? Despite recent progress, VLMs still struggle with socionormative reasoning-like judging moral appropriateness or resolving intent from ambiguous utterances that are visually grounded in socially complex scenes. In our latest work, we introduce CoCoT (Cognitive Chain-of-Thought), a structured prompting method that guides VLMs through three cognitively grounded reasoning stages
Eunkyu Eunice Park tweet media
English
1
8
18
70.6K
Dayoon Ko retweetledi
Jaewoo Ahn
Jaewoo Ahn@AHNJAEWOO2·
🚀 Heading to 🇦🇹 for #ACL2025NLP! Catch our MAC 🥷 poster at #ACL2025 @aclmeeting! Say hi 👋, and let’s talk about LLM + Multimodality! Open for a coffee chat anytime ☕💬🗣️ 🗓️ July 29 (Day 2, Main Conference) ⏰ 16:00-17:30 📍 Hall 4/5 | #3743 📄 arxiv.org/abs/2505.22943
Jaewoo Ahn tweet media
Jaewoo Ahn@AHNJAEWOO2

🎉Our paper "Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates" is accepted to #ACL2025 Main!🎉 We introduce a benchmark for multimodal "deception" + LLM-based diversified attack. 🚀 Preprint coming soon!

English
0
3
12
1.1K
Dayoon Ko retweetledi
Jiwan Chung
Jiwan Chung@JiwanChung·
[ACL 2025] Any-to-any models are often expected to be more coherent across modalities—since they handle image→text and text→image in one unified model. But does this hold up? We test it with ACON. 📄 Paper: arxiv.org/abs/2505.24211 📷 data: huggingface.co/datasets/jiwan…
Jiwan Chung tweet media
English
1
1
4
549