Hugging Face

13.5K posts

Hugging Face banner
Hugging Face

Hugging Face

@huggingface

The AI community building the future. https://t.co/TpiXQMQ9rZ

NYC and Paris and 🌏 Katılım Eylül 2016
222 Takip Edilen689.9K Takipçiler
Hugging Face retweetledi
NVIDIA AI
NVIDIA AI@NVIDIAAI·
Most language models only generate one token at a time. We just released Nemotron-Labs-Diffusion, a family of diffusion language models that take a different approach, generating multiple tokens in parallel within a single model. Rather than committing to each token permanently, these models can revise as they go, resulting in faster inference that better utilizes modern GPUs. The full model family ranges from 3B to 14B, including vision-language variants. Available now: nvda.ws/4tEnTxP
GIF
English
28
141
914
58.7K
Hugging Face retweetledi
Hugging Face retweetledi
Shubham Sharma
Shubham Sharma@HappyyPablo·
open sourcing Marlin-2B 🐟 a tiny VLM to extract structured information from videos Marlin is finetuned for two questions devs want to ask in their videos: what is happening, and when? Best open model in its weight class, competitive with Gemini-2.5-flash at only 2B params 🧵
English
89
345
2.8K
150.1K
Hugging Face retweetledi
Caleb
Caleb@calebfahlgren·
As more decisions get made with a coding agent, we should probably be centralizing those traces somewhere. at @huggingface we just throw them in a bucket. wrote a bit about why. huggingface.co/blog/huggingfa…
Caleb tweet media
English
3
5
21
7.3K
Hugging Face retweetledi
Leandro von Werra
Leandro von Werra@lvwerra·
We are releasing Carbon: a crazy fast DNA model Carbon is 275x faster than the next best model. So fast you can process the whole human genome on a single GPU in <2 days. Here are the tricks we used: When modelling DNA sequences a lot of the performance comes down to tokenizing the sequences in a smart way. BPE tokenizer struggle because there are no whitespaces and character (called base in DNA) level tokenizers waste a lot of compute on too many tokens. Carbon is built with a unique tokenizer: we split sequences in chunks of 6 bases, but during both training and inference we can work with single base resolution. That's similar to having word tokens but resolving them at the character level. All possible thanks to the DNA tokens unique structure. The architecture combined with the tokenizer makes the model 275x faster than the previous SoTA (Evo2) at this size. We built an interactive demo so you can explore how the model can generate DNA sequences, investigate the structure of genes, predict the effect of mutations, generate and fold proteins and even reconstruct parts of the tree of life. huggingface.co/spaces/Hugging…
English
49
217
1.4K
172K
Hugging Face retweetledi
Loubna Ben Allal
Loubna Ben Allal@LoubnaBenAllal1·
Introducing Carbon 🧬 a family of open generative DNA foundation models. Carbon-3B matches Evo2-7B while running 250x faster at inference. It can generate new DNA sequences and score the functional impact of mutations, zero-shot. We borrowed a lot from how modern LLMs are trained, but DNA isn't language. Genomes are noisy, redundant, and shaped by evolution rather than communication. So we adjusted the recipe: Tokenizer. Most genomic models tokenize at the nucleotide/character level, which blows up sequence length. BPE is the obvious LLM-style fix, but it doesn't behave well on DNA. We use deterministic 6-mer tokens (one token = 6 nucleotides): 6× shorter sequences and cheaper attention. Training loss. With 6-mer tokens, cross-entropy scores a prediction that gets 5/6 nucleotides right the same as one that's completely wrong. This gets brittle late in training and produces loss spikes. We switch mid-training to a more flexible factorized loss (FNS). Data. Genomes are mostly sparse, repetitive background. We curate down to a staged functional DNA + mRNA mixture, with every ratio chosen by ablation, like mixing a web corpus, but for biology. We're releasing the models, training data, training code, evaluation suite, and a demo to play with. More details in the technical report: github.com/huggingface/ca… Demo to play with the model, with a biology primer for our ML friends ;) huggingface.co/spaces/Hugging…
English
14
71
304
29.1K
Hugging Face retweetledi
tomaarsen
tomaarsen@tomaarsen·
🤗 Announcing the Ettin Reranker family: six new CrossEncoder rerankers from 17M to 1B parameters, state-of-the-art at their respective sizes. Built on the Ettin ModernBERT encoders, with the full training recipe and ~143M-triple training dataset as well. 🧵
tomaarsen tweet media
English
10
27
134
19.9K
Hugging Face retweetledi
Jeff Boudier 🤗
Jeff Boudier 🤗@jeffboudier·
"We give you model choice, without infrastructure chaos" — @MichaelDell, live from #DellTechWorld 🎤 Kimi K2.6, DeepSeek V4 Pro, GLM 5.1, MiniMax M2.7 & DeepSeek V4 Flash are now one click away on Dell Enterprise Hub, optimized for PowerEdge XE9780 with NVIDIA B300. dell.hf.co
English
13
29
159
91.7K
Hugging Face retweetledi
Alvaro Bartolome
Alvaro Bartolome@alvarobartt·
Latest `hf-mem` now breaks down Mixture-of-Experts (MoE) memory estimations into base weights, routed experts, and KV cache. Useful for reasoning about residency footprint and serving trade-offs before picking a parallelism strategy for inference. More in the thread 🧵
Alvaro Bartolome tweet media
English
2
10
51
11.5K
Hugging Face retweetledi
Niels Rogge
Niels Rogge@NielsRogge·
Introducing a revival of PapersWithCode! As @ilyasut said, we're back to the "age of research". Hence, it's important to share research and build on each other's work. > find SOTA per domain, not just LLMs > leaderboards > methods > all parsed at scale using AI agents.
English
31
84
578
59.1K
Hugging Face retweetledi
Victor M
Victor M@victormustar·
llama.cpp with MTP support makes local models fast enough to use as daily drivers 🚀 Qwen3.6-27B dense generation (on A10G): From 25 tok/s → 45 tok/s (+78%). Two flags on llama-server: --spec-type draft-mtp --spec-draft-n-max 2
Georgi Gerganov@ggerganov

llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! github.com/ggml-org/llama…

English
42
123
1.2K
161.3K
Hugging Face retweetledi
Georgi Gerganov
Georgi Gerganov@ggerganov·
llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! github.com/ggml-org/llama…
English
48
181
1.2K
257K
Hugging Face retweetledi
clem 🤗
clem 🤗@ClementDelangue·
I believe on-prem and local AI - based on @huggingface open-source models - will be an important answer to the GPU shortages this year (because they are cheaper, faster, safer than cloud APIs)! Great collaboration between @huggingface & @MichaelDell @Dell to make this a reality for enterprise today. Announced at the main keynote of Dell Technologies World.
English
51
54
441
71.5K
Hugging Face retweetledi
clem 🤗
clem 🤗@ClementDelangue·
Very cool to see Cursor doubling down on training great models. In my opinion, ultimately all serious companies in AI will want to train models themselves, based on open-source instead of outsourcing AI to others via APIs!
Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.

English
18
31
235
25.7K
Hugging Face retweetledi
clem 🤗
clem 🤗@ClementDelangue·
Reachy mini rap battle at @aiDotEngineer Singapore!
Filipino
7
13
100
23K
Hugging Face retweetledi
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
Qwen3.6-27B running 100% on WebGPU. Not the best speed but still 😁
English
11
7
133
52.3K
Hugging Face retweetledi
Kyle Hessling
Kyle Hessling@KyleHessling1·
Hello again, everyone! We've got another really fun 9b, this one specifically trained for tool calling and agentic coding workflows in @NousResearch Hermes agent. Happy to report that it crushes, and as a 9b it runs on super affordable hardware. We also hit this one with some coding domain-specific training, and it scored a 53.33% on SWE bench on a slice of 200 samples! To me, I was really shocked to see this high of a score on a 9B model in swe, correct me if I'm wrong, but I think that's nipping at the heels of the Gemma 4 series, much larger models on this particular benchmark, which is really incredible to see! It also crushes the HermesAgent-20 benchmark, scoring an 85 vs the base model's 71! Make sure to run it hot, --temp around 1, that seems to be the sweet spot for running these particular fine tunes in harnesses. If you have trouble, you can work your way down, but it does a much better job departing from base models, overthinking when you run it, high temp ~1. Please spin it up in Hermes and let us know your thoughts! Looking forward to hearing your feedback as always! Also, those of you waiting for Qwopus 3.6 27B, I have put together a preliminary evaluation for you in my HF repo, go check it out; we will be releasing the full model very soon! I will put the preliminary repo in the comments! huggingface.co/Jackrong/Qwopu…
English
70
146
1.5K
117K
Hugging Face retweetledi
Erik Kaunismäki
Erik Kaunismäki@ErikKaum·
Releasing my first kernel on @huggingface: MaxSim Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA. Result is 3–5× speedup compared to naive PyTorch. Try it out 👇
Erik Kaunismäki tweet media
English
17
42
351
40.2K