Hayate Iso

39 posts

Hayate Iso

@iso_map

@nvidia

Santa Clara, CA เข้าร่วม Eylül 2023

130 กำลังติดตาม95 ผู้ติดตาม

Hayate Iso รีทวีตแล้ว

Bryan Catanzaro@ctnzr·11 Mar

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!

English

205

1.2K

203.3K

Hayate Iso รีทวีตแล้ว

NVIDIA AI Developer@NVIDIAAIDev·8 Tem

What if you could ask a chatbot a question the size of an entire encyclopedia—and get an answer in real time? Multi-million token queries with 32x more users are now possible with Helix Parallelism, an innovation by #NVIDIAResearch that drives inference at huge scale. 🔗 nvda.ws/4eCXxqh

GIF

English

146

16.9K

Hayate Iso@iso_map·19 Haz

😤

QME

787

Hayate Iso รีทวีตแล้ว

NVIDIA AI Developer@NVIDIAAIDev·6 Haz

Inference at scale is pushing the boundaries of what today’s systems can handle. One promising direction? #DisaggregatedInference — splitting the serving pipeline into distinct stages like prefill and decode — to unlock better performance across the throughput-interactivity spectrum. While the concept has gained traction and sparked a wave of open source experimentation, real-world deployments remain rare. Why? Because the optimization space is vast, and orchestrating system-level coordination is incredibly complex. 📄 In our latest #NVIDIAResearch, we present a comprehensive systematic study of disaggregated inference at scale, evaluating hundreds of thousands of design points across varied model sizes, traffic patterns, and hardware configurations. Key findings: 1️⃣ Disaggregation shines in prefill-heavy traffic patterns and with larger models. 2️⃣ Dynamic rate matching and elastic scaling are essential to achieving Pareto-optimal performance. 3️⃣ There's no one-size-fits-all architecture — workload-aware tuning is critical. Whether you're building production inference infrastructure or exploring #LLM optimization techniques, this work offers actionable insights to balance system throughput with low-latency responsiveness. 🔬 Read the paper and dive into the data — let’s shape the future of scalable #AI serving, together. 📗research.nvidia.com/publication/20…

English

2.3K

Hayate Iso รีทวีตแล้ว

NVIDIA AI Developer@NVIDIAAIDev·6 Haz

👀New #NVIDIAResearch on boosting MoE model performance with disaggregated serving. Learn how our NVIDIA Dynamo and GB200 NVL72 work together to boost the performance of #AI data centers running MOE models like DeepSeek R1 and the new Llama 4. ⚡ Technical deep dive➡️ nvda.ws/45eElfW

English

5.3K

Hayate Iso รีทวีตแล้ว

Megagon Labs@MegagonLabs·13 Kas

🚀 Want to improve your LLM responses? Read our tutorial for implementing AmbigNLG! Addressing task ambiguity in Natural Language Generation to drive more accurate, context-aligned outputs. @iso_map megagon.ai/ambignlg-a-tut… #AmbigNLG #NLP #tutorial #LLMs #EMNLP2024 #MLEngineering

English

1.9K

Hayate Iso@iso_map·11 Kas

🌴Heading to #EMNLP2024! Presenting AmbigNLG with @ayaniwa1213 on Tuesday at 4pm (Riverfront Hall). Paper: aclanthology.org/2024.emnlp-mai… Data: github.com/megagonlabs/am… You can also stop by our @MegagonLabs sponsor booth or DM me to chat about full-time and internship opportunities :)

English

7.7K

Hayate Iso รีทวีตแล้ว

Sajjadur Rahman@subZero_saj·21 Eki

The CFP of DAIS@ICDE2025 has been released 👇

DAIS@ICDE2025@DAIS_workshop

The first call for papers of DAIS 2025 is out. 👉Submission: January 20, 2025, 11:59pm Pacific Time 👉Notification: February 20, 2025 👉Camera-ready due: March 20, 2025 Please visit the website for more details: dais-workshop-icde.github.io

English

774

Hayate Iso รีทวีตแล้ว

Sajjadur Rahman@subZero_saj·17 Eki

📢Excited to bring the DAIS workshop to ICDE'25 (w/ @SainyamGalhotra @FarihaAnna @MikeCafarella @sairamgv) The focus is on the emerging idea of compound AI systems with a specific emphasis on data discovery, interactions w/ data, architectures for #AgenticAI+#LLM, and evaluation.

DAIS@ICDE2025@DAIS_workshop

We are excited to announce the First Workshop on Data-AI Systems @ICDE 2025 (DAIS). The workshop will focus on exploring innovative approaches towards building data-aware compound AI systems in the #LLM era. More details: dais-workshop-icde.github.io #DataManagement #AgenticAI

English

Hayate Iso@iso_map·17 Eki

Also paper link 🫠 🗒️ arxiv.org/abs/2410.11996 7/

English

139

Hayate Iso@iso_map·17 Eki

We hope our work sets a new standard for evaluating holistic reasoning in LLMs! Proudly co-authored with @SAYg_7 (equal first author!) and Nikita Bhutani at @MegagonLabs. 💻 Code: github.com/megagonlabs/ho… 🗂 Dataset: hf.co/datasets/megag… 6/

English

307

Hayate Iso@iso_map·17 Eki

Introducing 🫧 HoloBench—a new benchmark for measuring LLMs' reasoning capabilities across massive document collections! - RAG models fetch info well but struggle with multi-doc reasoning. - HoloBench evaluates how LLMs synthesize and aggregate info to solve complex tasks. 🧵 1/

English

680

Hayate Iso รีทวีตแล้ว

Megagon Labs@MegagonLabs·15 Eki

⚡️Are you working with #NLP or #AI-driven products for Natural Language Generation (#NLG)? Task ambiguity is a common pain point and AmbigNLG changes that! AmbigNLG is designed to solve task ambiguity in instructions for NLG. What it is... 👇🧵

English

591

Hayate Iso รีทวีตแล้ว

Ayana Niwa@ayaniwa1213·21 Eyl

🎉 Our long paper has been accepted for the #EMNLP2024 main conference! In "AmbigNLG," co-authored with @iso_map, we tackle task ambiguity in NLG instructions to better align LLM outputs with your expectations. 📃 Read more: arxiv.org/abs/2402.17717

English

123

9.6K

Hayate Iso รีทวีตแล้ว

Megagon Labs@MegagonLabs·17 May

Let’s push the boundaries of #LLMs in text editing tasks! XATU is the first-of-its-kind benchmark that addresses the nuances of text editing, revolutionizing how we edit text with large language models. Don't miss the presentation at #Coling2024! #AI #NLP megagon.ai/xatu-fine-grai…

English

Hayate Iso รีทวีตแล้ว

Pouya Pezeshkpour@PPezeshkpour·2 Nis

📢New Preprint📢 LLMs excel at ranking items for retrieval/recommender systems. But, what if we reduce the number of items and instead enforce multiple conditions as ranking instructions? It turns out, LLMs have a long way to go. See our new paper: arxiv.org/abs/2404.00211. 1/n

English

3.9K

ค้นพบ

@ayaniwa1213 @MegagonLabs @SainyamGalhotra @MikeCafarella @sairamgv @SAYg_7 @elonmusk @BarackObama