mlfeed.tech

22.6K posts

mlfeed.tech

@mlfeedtech

https://t.co/wIcTSovxb3 is your source for the latest artificial intelligence, machine learning, and data science content and analysis.

Beigetreten Mayıs 2019

223 Folgt327 Follower

mlfeed.tech retweetet

Sumit@_reachsumit·25 Ara

NVIDIA Nemotron 3: Efficient and Open Intelligence NVIDIA introduces a family of models (Nano, Super, Ultra) using hybrid Mamba-Transformer MoE architecture with up to 1M token context and state-of-the-art reasoning performance. 📝 arxiv.org/abs/2512.20856

English

384

37.9K

mlfeed.tech retweetet

Rohan Paul@rohanpaul_ai·26 Ara

The paper proposes FaithLens, an 8B model that spots when a large language model (LLM) claim is unsupported, and explains why. Makes it much easier and cheaper to catch and explain hallucinated claims before they reach users. Across 12 benchmarks, it beats GPT-4.1 and o3 while running far cheaper. In many apps, a model is given documents but still invents details, and that is a faithfulness hallucination. Most checkers either call a huge judge model, or output a bare Yes or No with no reasons. FaithLens takes a document and a claim, then returns both the label and a short explanation that points to the missing or conflicting evidence. To train it without humans, the authors make synthetic examples using a stronger model, then throw away samples where the label, explanation, or topic variety looks wrong. After that cold-start training, they run reinforcement learning where an explanation only earns credit if it helps a weaker model reach the correct Yes or No. The takeaway is a practical, low-cost verifier that flags a bad claim and spells out the evidence gap. ---- Paper Link – arxiv. org/abs/2512.20182 Paper Title: "FaithLens: Detecting and Explaining Faithfulness Hallucination"

English

146

11.1K

mlfeed.tech retweetet

DAIR.AI@dair_ai·26 Ara

The first comprehensive survey on GraphRAG. There is a lot of interest in GraphRAG, so let's discuss why it matters. RAG has transformed how LLMs access external knowledge. However, traditional RAG treats documents as isolated chunks. It misses the relationships between entities that often matter most for answering complex questions. But real-world knowledge is interconnected. This survey formalizes GraphRAG: retrieval-augmented generation that leverages graph structure instead of flat text. Graphs capture what text-based RAG cannot. Citation networks encode influence. Knowledge graphs encode relationships. Social networks encode interactions. Semantic similarity alone misses these structural signals. The framework operates in three stages. - Graph-Based Indexing: construct or connect to knowledge graphs, whether open sources like Wikidata and ConceptNet or self-constructed from documents. - Graph-Guided Retrieval: fetch relevant nodes, triplets, paths, or subgraphs based on queries. - Graph-Enhanced Generation: convert retrieved graph elements into prompts for LLMs. Benefits of GraphRAG: - Retrieval granularity matters. - Nodes provide entity information. - Triplets capture direct relationships. - Paths reveal multi-hop reasoning chains. - Subgraphs offer comprehensive local context. - Hybrid approaches combine multiple granularities based on query complexity. The survey covers 200+ papers across downstream tasks, including knowledge base QA, commonsense reasoning, entity linking, fact verification, and dialogue systems. Application domains span e-commerce, biomedicine, academic research, and legal analysis. As RAG adoption grows, understanding when and how to incorporate graph structure becomes critical. Not every retrieval task needs graphs, but many complex reasoning tasks benefit substantially from explicit relational knowledge. Paper: dl.acm.org/doi/10.1145/37… Learn to build effective AI agents and RAG systems in our academy: dair-ai.thinkific.com

English

380

68.1K

mlfeed.tech retweetet

Chi Wang@Chi_Wang_·5 Eki

Imagine if ✨multiple✨ ChatGPT agents could collaborate to solve complex tasks for you! 🧑‍🦱🤝🤖🤖🤖 📢 AutoGen: A new framework for building multi-agent LLM applications aka.ms/autogen-pdf It allows creating many agents that converse to solve complex tasks! ... 1/4

English

240

121.3K

mlfeed.tech retweetet

AutoGen@pyautogen·5 Eki

AutoGen is the highest trending repo on GitHub this week! 🚀🚀🚀 github.com/microsoft/auto… #AutoGen #TechNews #LLM #Microsoft

English

5.4K

mlfeed.tech retweetet

Aran Komatsuzaki@arankomatsuzaki·8 Ağu

AgentBench: Evaluating LLMs as Agents Presents a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM as Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. repo: github.com/THUDM/AgentBen… abs: arxiv.org/abs/2308.03688

English

138

593

115K

mlfeed.tech retweetet

Niels Rogge@NielsRogge·5 Ağu

After text-only leaderboards, the next step is going to be multimodal leaderboards. Found an interesting paper that tries to benchmark all multimodal LLMs: arxiv.org/abs/2307.16125 Looks like instructBLIP is current SOTA. Available in @huggingface here: huggingface.co/docs/transform…

English

345

56.8K

mlfeed.tech retweetet

elvis@omarsar0·4 Ağu

Enabling LLMs with tool-use capabilities is where I am noticing the greatest potential for companies to go big with LLMs. Gorilla is a good popular example but I have seen a ton of other examples, especially from people building with AI-powered agents. I also think this is one of the use cases where open LLMs like Llama 2 are going to be extremely useful -- every company will want to tune their models for their own internal APIs. If you are curious about this space, check out this new paper that enables LLMs to interact with 16000 real-world APIs. It's more of a framework with all the niceties like data preparation, training, and evaluation (GitHub repo included). The authors also claim that one of their models, ToolLLaMA, has reached the performance of ChatGPT (turbo-16k) in tool use. Another side note: not sure if it's possible that LLMs can do this natively, although the Llama 2 paper does mention a related emergent behavior. I have been tracking all the research and tools that aim to enable these types of capabilities. Combining tools and LLMs is nothing new and we are seeing this across products and even in domains like Robotics and Chemistry. There are significant breakthroughs to be made here but we are not quite there yet. (paper and tool in the replies)

English

140

510

135.9K

mlfeed.tech retweetet

Aran Komatsuzaki@arankomatsuzaki·1 Ağu

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs ToolLLaMA exhibits comparable performance to ChatGPT repo: github.com/OpenBMB/ToolBe… abs: arxiv.org/abs/2307.16789

English

150

609

88.9K

mlfeed.tech retweetet

DAIR.AI@dair_ai·30 Tem

Top ML Papers of the Week (July 24 - July 30): - RT-2 - LoraHub - Med-PaLM Multimodal - Survey of Aligned LLMs - Foundation Models in Vision - Universal Adversarial LLM Attacks ...

English

416

113.7K

mlfeed.tech retweetet

Aran Komatsuzaki@arankomatsuzaki·28 Tem

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback Presents PanGu-Coder2, which achieves 62.20% pass@1 on the HumanEval benchmark. arxiv.org/abs/2307.14936

English

166

35.9K

mlfeed.tech retweetet

Moritz Laurer@MoritzLaurer·22 Tem

Microsoft and Tsinghua U. claim to have found the "Successor to Transformer for Large Language Models": RetNet. They claim better language modelling performance, with 3.4x lower memory consumption, 8.4x higher throughput, 15.6x lower latency. 1/2

English

164

987

221.9K

mlfeed.tech retweetet

Yam Peleg@Yampeleg·23 Tem

The largest dialog dataset collection just dropped! DialogStudio from Salesforce TL;DR: Merged data from 87 datasets. Evaluated & filtered each sample by multiple criteria [1]. Ended up with a HUGE high quality conversational dataset. --- Huggingface Dataset: huggingface.co/datasets/Sales… Github: github.com/salesforce/Dia… Paper: arxiv.org/abs/2307.10172 --- The conversations in the dataset are categorized into multiple categories: - Knowledge-Grounded-Dialogues - Natural-Language-Understanding - Open-Domain-Dialogues - Task-Oriented-Dialogues - Dialogue-Summarization - Conversational-Recommendation-Dialogs Really cool and useful work. I just wish I had enough compute to train on all of these datasets --- [1] Understanding, Relevance, Correctness, Coherence, Completeness, and Overall Quality.

English

135

543

110.2K

mlfeed.tech retweetet

Sanyam Bhutani@bhutanisanyam1·22 Tem

Easily the best paper on current State of LLMs! 🙏 A 50 page read but it’s not “just another” survey paper, that only documents facts. The authors actually add very useful commentary capturing all aspects of building Large Language Models. Hence the result is a collection of ideas we might have missed across months of research. It covers both building LLMs and effectively applying them to domains, with a focus on current limitations and “sharp edges” As always, I think great content makes you discover missing bits in your knowledge, for this reason it’s a solid cover to cover read recommendation: arxiv.org/abs/2307.10169

English

305

1.9K

313.7K

mlfeed.tech retweetet

Aran Komatsuzaki@arankomatsuzaki·21 Tem

Meta-Transformer: A Unified Framework for Multimodal Learning The first framework to perform unified learning across 12 modalities with unpaired data arxiv.org/abs/2307.10802

English

127

503

76.7K

mlfeed.tech retweetet

Arvind Narayanan@random_walker·19 Tem

We dug into a paper that’s been misinterpreted as saying GPT-4 has gotten worse. The paper shows behavior change, not capability decrease. And there's a problem with the evaluation—on 1 task, we think the authors mistook mimicry for reasoning. w/ @sayashk aisnakeoil.com/p/is-gpt-4-get…

English

195

993

580.9K

mlfeed.tech retweetet

Leandro von Werra@lvwerra·19 Tem

Did you know that you can train all Llama-2 models on your own data in just a few lines? The script even works with the 70B model on a single A100 GPU thanks to the magic of 4bit and and PEFT! Learn more: #finetuning-llama2-model" target="_blank" rel="nofollow noopener">huggingface.co/docs/trl/main/… Full script: github.com/lvwerra/trl/bl…

English

268

1.3K

192.3K

mlfeed.tech retweetet

Andrej Karpathy@karpathy·13 Tem

Good / slightly obscure tip is that applications can benefit from custom supervised finetuning of emebeddings returned by APIs. Collect a few examples of +ve (and optionally hard -ve) pairs, use them to train a linear projection that better discriminates your pairs.

Sergey Karayev@sergeykarayev

Broke: using OpenAI embeddings as-is. Bespoke: learning an embedding projection from human judgements. OpenAI explains that this will "better emphasize aspects of the text relevant to your use case. In binary classification use cases, we've seen error rates drop by ≤ 50%."

English

831

264.6K

mlfeed.tech retweetet

Shunyu Yao@ShunyuYao12·18 Tem

Write a sentence with "dog, frisbee, catch, throw" 👉Too easy for 7B LM... Will (constrained) text generation (CTG) "die out" like many other NLP tasks, in face of LLM? 👉Excited to introduce 🐕COLLIE, next-gen CTG that even challenges GPT-4! collie-benchmark.github.io (1/n)

English

20.7K

mlfeed.tech retweetet

Jim Fan@DrJimFan·18 Tem

You'll soon see lots of "Llama just dethroned ChatGPT" or "OpenAI is so done" posts on Twitter. Before your timeline gets flooded, I'll share my notes: ▸ Llama-2 likely costs $20M+ to train. Meta has done an incredible service to the community by releasing the model with a commercially-friendly license. AI researchers from big companies were wary of Llama-1 due to licensing issues, but now I think many of them will jump on the ship and contribute their firepower. ▸ Meta's team did a human study on 4K prompts to evaluate Llama-2's helpfulness. They use "win rate" as a metric to compare models, in similar spirit as the Vicuna benchmark. 70B model roughly ties with GPT-3.5-0301, and performs noticeably stronger than Falcon, MPT, and Vicuna. I trust these real human ratings more than academic benchmarks, because they typically capture the "in-the-wild vibe" better. ▸ Llama-2 is NOT yet at GPT-3.5 level, mainly because of its weak coding abilities. On "HumanEval" (standard coding benchmark), it isn't nearly as good as StarCoder or many other models specifically designed for coding. That being said, I have little doubt that Llama-2 will improve significantly thanks to its open weights. ▸ Meta's team goes above and beyond on AI safety issues. In fact, almost half of the paper is talking about safety guardrails, red-teaming, and evaluations. A round of applause for such responsible efforts! In prior works, there's a thorny tradeoff between helpfulness and safety. Meta mitigates this by training 2 separate reward models. They aren't open-source yet, but would be extremely valuable to the community. ▸ I think Llama-2 will dramatically boost multimodal AI and robotics research. These fields need more than just blackbox access to an API. So far, we have to convert the complex sensory signals (video, audio, 3D perception) to text description and then feed to an LLM, which is awkward and leads to huge information loss. It'd be much more effective to graft sensory modules directly on a strong LLM backbone. ▸ The whitepaper itself is a masterpiece. Unlike GPT-4's paper that shared very little info, Llama-2 spelled out the entire recipe, including model details, training stages, hardware, data pipeline, and annotation process. For example, there's a systematic analysis on the effect of RLHF with nice visualizations. Quote sec 5.1: "We posit that the superior writing abilities of LLMs, as manifested in surpassing human annotators in certain tasks, are fundamentally driven by RLHF." Congrats to the team again 🥂! Today is another delightful day in OSS AI.

English

161

1.1K

5.4K

1.4M

Entdecken

@huggingface @sayashk @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA