Hugging Face (@huggingface) - Twitter Profili | Zamantika Mersobahis Locabet

Hugging Face retweetledi

NVIDIA AI@NVIDIAAI·11h

Most language models only generate one token at a time. We just released Nemotron-Labs-Diffusion, a family of diffusion language models that take a different approach, generating multiple tokens in parallel within a single model. Rather than committing to each token permanently, these models can revise as they go, resulting in faster inference that better utilizes modern GPUs. The full model family ranges from 3B to 14B, including vision-language variants. Available now: nvda.ws/4tEnTxP

GIF

English

28

141

914

58.7K

Hugging Face retweetledi

Adithya S K@adithya_s_k·18h

Wake up ppl Huggingface just open sourced Genomic Foundational Models

Leandro von Werra@lvwerra

We are releasing Carbon: a crazy fast DNA model Carbon is 275x faster than the next best model. So fast you can process the whole human genome on a single GPU in <2 days. Here are the tricks we used: When modelling DNA sequences a lot of the performance comes down to tokenizing the sequences in a smart way. BPE tokenizer struggle because there are no whitespaces and character (called base in DNA) level tokenizers waste a lot of compute on too many tokens. Carbon is built with a unique tokenizer: we split sequences in chunks of 6 bases, but during both training and inference we can work with single base resolution. That's similar to having word tokens but resolving them at the character level. All possible thanks to the DNA tokens unique structure. The architecture combined with the tokenizer makes the model 275x faster than the previous SoTA (Evo2) at this size. We built an interactive demo so you can explore how the model can generate DNA sequences, investigate the structure of genes, predict the effect of mutations, generate and fold proteins and even reconstruct parts of the tree of life. huggingface.co/spaces/Hugging…

English

0

18

225

27K

Hugging Face retweetledi

Shubham Sharma@HappyyPablo·14h

open sourcing Marlin-2B 🐟 a tiny VLM to extract structured information from videos Marlin is finetuned for two questions devs want to ask in their videos: what is happening, and when? Best open model in its weight class, competitive with Gemini-2.5-flash at only 2B params 🧵

English

89

345

2.8K

150.1K

Hugging Face retweetledi

Caleb@calebfahlgren·13h

As more decisions get made with a coding agent, we should probably be centralizing those traces somewhere. at @huggingface we just throw them in a bucket. wrote a bit about why. huggingface.co/blog/huggingfa…

English

3

5

21

7.3K

Hugging Face retweetledi

Leandro von Werra@lvwerra·18h

We are releasing Carbon: a crazy fast DNA model Carbon is 275x faster than the next best model. So fast you can process the whole human genome on a single GPU in <2 days. Here are the tricks we used: When modelling DNA sequences a lot of the performance comes down to tokenizing the sequences in a smart way. BPE tokenizer struggle because there are no whitespaces and character (called base in DNA) level tokenizers waste a lot of compute on too many tokens. Carbon is built with a unique tokenizer: we split sequences in chunks of 6 bases, but during both training and inference we can work with single base resolution. That's similar to having word tokens but resolving them at the character level. All possible thanks to the DNA tokens unique structure. The architecture combined with the tokenizer makes the model 275x faster than the previous SoTA (Evo2) at this size. We built an interactive demo so you can explore how the model can generate DNA sequences, investigate the structure of genes, predict the effect of mutations, generate and fold proteins and even reconstruct parts of the tree of life. huggingface.co/spaces/Hugging…

English

49

217

1.4K

172K

Hugging Face retweetledi

Loubna Ben Allal@LoubnaBenAllal1·18h

Introducing Carbon 🧬 a family of open generative DNA foundation models. Carbon-3B matches Evo2-7B while running 250x faster at inference. It can generate new DNA sequences and score the functional impact of mutations, zero-shot. We borrowed a lot from how modern LLMs are trained, but DNA isn't language. Genomes are noisy, redundant, and shaped by evolution rather than communication. So we adjusted the recipe: Tokenizer. Most genomic models tokenize at the nucleotide/character level, which blows up sequence length. BPE is the obvious LLM-style fix, but it doesn't behave well on DNA. We use deterministic 6-mer tokens (one token = 6 nucleotides): 6× shorter sequences and cheaper attention. Training loss. With 6-mer tokens, cross-entropy scores a prediction that gets 5/6 nucleotides right the same as one that's completely wrong. This gets brittle late in training and produces loss spikes. We switch mid-training to a more flexible factorized loss (FNS). Data. Genomes are mostly sparse, repetitive background. We curate down to a staged functional DNA + mRNA mixture, with every ratio chosen by ablation, like mixing a web corpus, but for biology. We're releasing the models, training data, training code, evaluation suite, and a demo to play with. More details in the technical report: github.com/huggingface/ca… Demo to play with the model, with a biology primer for our ML friends ;) huggingface.co/spaces/Hugging…

English

14

71

304

29.1K

Hugging Face retweetledi

tomaarsen@tomaarsen·21h

🤗 Announcing the Ettin Reranker family: six new CrossEncoder rerankers from 17M to 1B parameters, state-of-the-art at their respective sizes. Built on the Ettin ModernBERT encoders, with the full training recipe and ~143M-triple training dataset as well. 🧵

English

10

27

134

19.9K

Hugging Face retweetledi

Jeff Boudier 🤗@jeffboudier·1d

"We give you model choice, without infrastructure chaos" — @MichaelDell, live from #DellTechWorld 🎤 Kimi K2.6, DeepSeek V4 Pro, GLM 5.1, MiniMax M2.7 & DeepSeek V4 Flash are now one click away on Dell Enterprise Hub, optimized for PowerEdge XE9780 with NVIDIA B300. dell.hf.co

English

13

29

159

91.7K

Hugging Face retweetledi

Alvaro Bartolome@alvarobartt·1d

Latest `hf-mem` now breaks down Mixture-of-Experts (MoE) memory estimations into base weights, routed experts, and KV cache. Useful for reasoning about residency footprint and serving trade-offs before picking a parallelism strategy for inference. More in the thread 🧵

English

2

10

51

11.5K

Hugging Face retweetledi

Niels Rogge@NielsRogge·1d

Introducing a revival of PapersWithCode! As @ilyasut said, we're back to the "age of research". Hence, it's important to share research and build on each other's work. > find SOTA per domain, not just LLMs > leaderboards > methods > all parsed at scale using AI agents.

English

31

84

578

59.1K

Hugging Face retweetledi

Victor M@victormustar·1d

llama.cpp with MTP support makes local models fast enough to use as daily drivers 🚀 Qwen3.6-27B dense generation (on A10G): From 25 tok/s → 45 tok/s (+78%). Two flags on llama-server: --spec-type draft-mtp --spec-draft-n-max 2

Georgi Gerganov@ggerganov

llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! github.com/ggml-org/llama…

English

42

123

1.2K

161.3K

Hugging Face retweetledi

Georgi Gerganov@ggerganov·1d

llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! github.com/ggml-org/llama…

English

48

181

1.2K

257K

Hugging Face retweetledi

clem 🤗@ClementDelangue·1d

I believe on-prem and local AI - based on @huggingface open-source models - will be an important answer to the GPU shortages this year (because they are cheaper, faster, safer than cloud APIs)! Great collaboration between @huggingface & @MichaelDell @Dell to make this a reality for enterprise today. Announced at the main keynote of Dell Technologies World.

English

51

54

441

71.5K

Hugging Face retweetledi

clem 🤗@ClementDelangue·1d

Very cool to see Cursor doubling down on training great models. In my opinion, ultimately all serious companies in AI will want to train models themselves, based on open-source instead of outsourcing AI to others via APIs!

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.

English

18

31

235

25.7K

Hugging Face retweetledi

clem 🤗@ClementDelangue·1d

Reachy mini rap battle at @aiDotEngineer Singapore!

Filipino

7

13

100

23K

Hugging Face retweetledi

Xuan-Son Nguyen@ngxson·2d

Qwen3.6-27B running 100% on WebGPU. Not the best speed but still 😁

English

11

7

133

52.3K

Hugging Face retweetledi

AVB@neural_avb·2d

I am working on porting SAM models and harness into Apple silicon. Already seeing 1.25x inference speed increase on mlx with the sam2.1-small model. Quantized versions soon. Repo: github.com/avbiswas/sam2-… Model: huggingface.co/avbiswas/sam2.…

English

6

40

403

38.1K

Hugging Face retweetledi

Motoshi@waqasraza·2d

Reachy Mini is Assembled and Ready Shout out to @huggingface and @pollenrobotics #reachymini #Ai

English

3

9

80

23.4K

Hugging Face retweetledi

Kyle Hessling@KyleHessling1·3d

Hello again, everyone! We've got another really fun 9b, this one specifically trained for tool calling and agentic coding workflows in @NousResearch Hermes agent. Happy to report that it crushes, and as a 9b it runs on super affordable hardware. We also hit this one with some coding domain-specific training, and it scored a 53.33% on SWE bench on a slice of 200 samples! To me, I was really shocked to see this high of a score on a 9B model in swe, correct me if I'm wrong, but I think that's nipping at the heels of the Gemma 4 series, much larger models on this particular benchmark, which is really incredible to see! It also crushes the HermesAgent-20 benchmark, scoring an 85 vs the base model's 71! Make sure to run it hot, --temp around 1, that seems to be the sweet spot for running these particular fine tunes in harnesses. If you have trouble, you can work your way down, but it does a much better job departing from base models, overthinking when you run it, high temp ~1. Please spin it up in Hermes and let us know your thoughts! Looking forward to hearing your feedback as always! Also, those of you waiting for Qwopus 3.6 27B, I have put together a preliminary evaluation for you in my HF repo, go check it out; we will be releasing the full model very soon! I will put the preliminary repo in the comments! huggingface.co/Jackrong/Qwopu…

English

70

146

1.5K

117K

Hugging Face retweetledi

Erik Kaunismäki@ErikKaum·1d

Releasing my first kernel on @huggingface: MaxSim Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA. Result is 3–5× speedup compared to naive PyTorch. Try it out 👇

English

17

42

351

40.2K

Hugging Face

Keşfet