mlfeed.tech

22.6K posts

mlfeed.tech

mlfeed.tech

@mlfeedtech

https://t.co/wIcTSovxb3 is your source for the latest artificial intelligence, machine learning, and data science content and analysis.

Beigetreten Mayıs 2019
223 Folgt327 Follower
mlfeed.tech retweetet
Sumit
Sumit@_reachsumit·
NVIDIA Nemotron 3: Efficient and Open Intelligence NVIDIA introduces a family of models (Nano, Super, Ultra) using hybrid Mamba-Transformer MoE architecture with up to 1M token context and state-of-the-art reasoning performance. 📝 arxiv.org/abs/2512.20856
English
3
57
384
37.9K
mlfeed.tech retweetet
Rohan Paul
Rohan Paul@rohanpaul_ai·
The paper proposes FaithLens, an 8B model that spots when a large language model (LLM) claim is unsupported, and explains why. Makes it much easier and cheaper to catch and explain hallucinated claims before they reach users. Across 12 benchmarks, it beats GPT-4.1 and o3 while running far cheaper. In many apps, a model is given documents but still invents details, and that is a faithfulness hallucination. Most checkers either call a huge judge model, or output a bare Yes or No with no reasons. FaithLens takes a document and a claim, then returns both the label and a short explanation that points to the missing or conflicting evidence. To train it without humans, the authors make synthetic examples using a stronger model, then throw away samples where the label, explanation, or topic variety looks wrong. After that cold-start training, they run reinforcement learning where an explanation only earns credit if it helps a weaker model reach the correct Yes or No. The takeaway is a practical, low-cost verifier that flags a bad claim and spells out the evidence gap. ---- Paper Link – arxiv. org/abs/2512.20182 Paper Title: "FaithLens: Detecting and Explaining Faithfulness Hallucination"
Rohan Paul tweet media
English
6
22
146
11.1K
mlfeed.tech retweetet
DAIR.AI
DAIR.AI@dair_ai·
The first comprehensive survey on GraphRAG. There is a lot of interest in GraphRAG, so let's discuss why it matters. RAG has transformed how LLMs access external knowledge. However, traditional RAG treats documents as isolated chunks. It misses the relationships between entities that often matter most for answering complex questions. But real-world knowledge is interconnected. This survey formalizes GraphRAG: retrieval-augmented generation that leverages graph structure instead of flat text. Graphs capture what text-based RAG cannot. Citation networks encode influence. Knowledge graphs encode relationships. Social networks encode interactions. Semantic similarity alone misses these structural signals. The framework operates in three stages. - Graph-Based Indexing: construct or connect to knowledge graphs, whether open sources like Wikidata and ConceptNet or self-constructed from documents. - Graph-Guided Retrieval: fetch relevant nodes, triplets, paths, or subgraphs based on queries. - Graph-Enhanced Generation: convert retrieved graph elements into prompts for LLMs. Benefits of GraphRAG: - Retrieval granularity matters. - Nodes provide entity information. - Triplets capture direct relationships. - Paths reveal multi-hop reasoning chains. - Subgraphs offer comprehensive local context. - Hybrid approaches combine multiple granularities based on query complexity. The survey covers 200+ papers across downstream tasks, including knowledge base QA, commonsense reasoning, entity linking, fact verification, and dialogue systems. Application domains span e-commerce, biomedicine, academic research, and legal analysis. As RAG adoption grows, understanding when and how to incorporate graph structure becomes critical. Not every retrieval task needs graphs, but many complex reasoning tasks benefit substantially from explicit relational knowledge. Paper: dl.acm.org/doi/10.1145/37… Learn to build effective AI agents and RAG systems in our academy: dair-ai.thinkific.com
DAIR.AI tweet media
English
18
71
380
68.1K
mlfeed.tech retweetet
Chi Wang
Chi Wang@Chi_Wang_·
Imagine if ✨multiple✨ ChatGPT agents could collaborate to solve complex tasks for you! 🧑‍🦱🤝🤖🤖🤖 📢 AutoGen: A new framework for building multi-agent LLM applications aka.ms/autogen-pdf It allows creating many agents that converse to solve complex tasks! ... 1/4
Chi Wang tweet media
English
7
55
240
121.3K
mlfeed.tech retweetet
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
AgentBench: Evaluating LLMs as Agents Presents a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM as Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. repo: github.com/THUDM/AgentBen… abs: arxiv.org/abs/2308.03688
Aran Komatsuzaki tweet media
English
10
138
593
115K
mlfeed.tech retweetet
elvis
elvis@omarsar0·
Enabling LLMs with tool-use capabilities is where I am noticing the greatest potential for companies to go big with LLMs. Gorilla is a good popular example but I have seen a ton of other examples, especially from people building with AI-powered agents. I also think this is one of the use cases where open LLMs like Llama 2 are going to be extremely useful -- every company will want to tune their models for their own internal APIs. If you are curious about this space, check out this new paper that enables LLMs to interact with 16000 real-world APIs. It's more of a framework with all the niceties like data preparation, training, and evaluation (GitHub repo included). The authors also claim that one of their models, ToolLLaMA, has reached the performance of ChatGPT (turbo-16k) in tool use. Another side note: not sure if it's possible that LLMs can do this natively, although the Llama 2 paper does mention a related emergent behavior. I have been tracking all the research and tools that aim to enable these types of capabilities. Combining tools and LLMs is nothing new and we are seeing this across products and even in domains like Robotics and Chemistry. There are significant breakthroughs to be made here but we are not quite there yet. (paper and tool in the replies)
elvis tweet media
English
6
140
510
135.9K
mlfeed.tech retweetet
DAIR.AI
DAIR.AI@dair_ai·
Top ML Papers of the Week (July 24 - July 30): - RT-2 - LoraHub - Med-PaLM Multimodal - Survey of Aligned LLMs - Foundation Models in Vision - Universal Adversarial LLM Attacks ...
English
3
76
416
113.7K
mlfeed.tech retweetet
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback Presents PanGu-Coder2, which achieves 62.20% pass@1 on the HumanEval benchmark. arxiv.org/abs/2307.14936
Aran Komatsuzaki tweet media
English
4
44
166
35.9K
mlfeed.tech retweetet
Moritz Laurer
Moritz Laurer@MoritzLaurer·
Microsoft and Tsinghua U. claim to have found the "Successor to Transformer for Large Language Models": RetNet. They claim better language modelling performance, with 3.4x lower memory consumption, 8.4x higher throughput, 15.6x lower latency. 1/2
Moritz Laurer tweet mediaMoritz Laurer tweet mediaMoritz Laurer tweet mediaMoritz Laurer tweet media
English
16
164
987
221.9K
mlfeed.tech retweetet
Yam Peleg
Yam Peleg@Yampeleg·
The largest dialog dataset collection just dropped! DialogStudio from Salesforce TL;DR: Merged data from 87 datasets. Evaluated & filtered each sample by multiple criteria [1]. Ended up with a HUGE high quality conversational dataset. --- Huggingface Dataset: huggingface.co/datasets/Sales… Github: github.com/salesforce/Dia… Paper: arxiv.org/abs/2307.10172 --- The conversations in the dataset are categorized into multiple categories: - Knowledge-Grounded-Dialogues - Natural-Language-Understanding - Open-Domain-Dialogues - Task-Oriented-Dialogues - Dialogue-Summarization - Conversational-Recommendation-Dialogs Really cool and useful work. I just wish I had enough compute to train on all of these datasets --- [1] Understanding, Relevance, Correctness, Coherence, Completeness, and Overall Quality.
Yam Peleg tweet media
English
11
135
543
110.2K
mlfeed.tech retweetet
Sanyam Bhutani
Sanyam Bhutani@bhutanisanyam1·
Easily the best paper on current State of LLMs! 🙏 A 50 page read but it’s not “just another” survey paper, that only documents facts. The authors actually add very useful commentary capturing all aspects of building Large Language Models. Hence the result is a collection of ideas we might have missed across months of research. It covers both building LLMs and effectively applying them to domains, with a focus on current limitations and “sharp edges” As always, I think great content makes you discover missing bits in your knowledge, for this reason it’s a solid cover to cover read recommendation: arxiv.org/abs/2307.10169
Sanyam Bhutani tweet media
English
38
305
1.9K
313.7K
mlfeed.tech retweetet
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
Meta-Transformer: A Unified Framework for Multimodal Learning The first framework to perform unified learning across 12 modalities with unpaired data arxiv.org/abs/2307.10802
Aran Komatsuzaki tweet media
English
12
127
503
76.7K
mlfeed.tech retweetet
Arvind Narayanan
Arvind Narayanan@random_walker·
We dug into a paper that’s been misinterpreted as saying GPT-4 has gotten worse. The paper shows behavior change, not capability decrease. And there's a problem with the evaluation—on 1 task, we think the authors mistook mimicry for reasoning. w/ @sayashk aisnakeoil.com/p/is-gpt-4-get…
English
28
195
993
580.9K
mlfeed.tech retweetet
Leandro von Werra
Leandro von Werra@lvwerra·
Did you know that you can train all Llama-2 models on your own data in just a few lines? The script even works with the 70B model on a single A100 GPU thanks to the magic of 4bit and and PEFT! Learn more: #finetuning-llama2-model" target="_blank" rel="nofollow noopener">huggingface.co/docs/trl/main/… Full script: github.com/lvwerra/trl/bl…
Leandro von Werra tweet media
English
16
268
1.3K
192.3K
mlfeed.tech retweetet
Andrej Karpathy
Andrej Karpathy@karpathy·
Good / slightly obscure tip is that applications can benefit from custom supervised finetuning of emebeddings returned by APIs. Collect a few examples of +ve (and optionally hard -ve) pairs, use them to train a linear projection that better discriminates your pairs.
Sergey Karayev@sergeykarayev

Broke: using OpenAI embeddings as-is. Bespoke: learning an embedding projection from human judgements. OpenAI explains that this will "better emphasize aspects of the text relevant to your use case. In binary classification use cases, we've seen error rates drop by ≤ 50%."

English
17
91
831
264.6K
mlfeed.tech retweetet
Shunyu Yao
Shunyu Yao@ShunyuYao12·
Write a sentence with "dog, frisbee, catch, throw" 👉Too easy for 7B LM... Will (constrained) text generation (CTG) "die out" like many other NLP tasks, in face of LLM? 👉Excited to introduce 🐕COLLIE, next-gen CTG that even challenges GPT-4! collie-benchmark.github.io (1/n)
Shunyu Yao tweet media
English
2
13
95
20.7K
mlfeed.tech retweetet
Jim Fan
Jim Fan@DrJimFan·
You'll soon see lots of "Llama just dethroned ChatGPT" or "OpenAI is so done" posts on Twitter. Before your timeline gets flooded, I'll share my notes: ▸ Llama-2 likely costs $20M+ to train. Meta has done an incredible service to the community by releasing the model with a commercially-friendly license. AI researchers from big companies were wary of Llama-1 due to licensing issues, but now I think many of them will jump on the ship and contribute their firepower. ▸ Meta's team did a human study on 4K prompts to evaluate Llama-2's helpfulness. They use "win rate" as a metric to compare models, in similar spirit as the Vicuna benchmark. 70B model roughly ties with GPT-3.5-0301, and performs noticeably stronger than Falcon, MPT, and Vicuna. I trust these real human ratings more than academic benchmarks, because they typically capture the "in-the-wild vibe" better. ▸ Llama-2 is NOT yet at GPT-3.5 level, mainly because of its weak coding abilities. On "HumanEval" (standard coding benchmark), it isn't nearly as good as StarCoder or many other models specifically designed for coding. That being said, I have little doubt that Llama-2 will improve significantly thanks to its open weights. ▸ Meta's team goes above and beyond on AI safety issues. In fact, almost half of the paper is talking about safety guardrails, red-teaming, and evaluations. A round of applause for such responsible efforts! In prior works, there's a thorny tradeoff between helpfulness and safety. Meta mitigates this by training 2 separate reward models. They aren't open-source yet, but would be extremely valuable to the community. ▸ I think Llama-2 will dramatically boost multimodal AI and robotics research. These fields need more than just blackbox access to an API. So far, we have to convert the complex sensory signals (video, audio, 3D perception) to text description and then feed to an LLM, which is awkward and leads to huge information loss. It'd be much more effective to graft sensory modules directly on a strong LLM backbone. ▸ The whitepaper itself is a masterpiece. Unlike GPT-4's paper that shared very little info, Llama-2 spelled out the entire recipe, including model details, training stages, hardware, data pipeline, and annotation process. For example, there's a systematic analysis on the effect of RLHF with nice visualizations. Quote sec 5.1: "We posit that the superior writing abilities of LLMs, as manifested in surpassing human annotators in certain tasks, are fundamentally driven by RLHF." Congrats to the team again 🥂! Today is another delightful day in OSS AI.
Jim Fan tweet media
English
161
1.1K
5.4K
1.4M