Joseph Pollack #Ï 🎗️

3.8K posts

Joseph Pollack #Ï 🎗️ banner
Joseph Pollack #Ï 🎗️

Joseph Pollack #Ï 🎗️

@josephpollack

🤖AI❤️Data enjoyer , building robots to helps folks learn things quicker.

Paris, France Katılım Nisan 2009
5.9K Takip Edilen2K Takipçiler
mr-r0b0t
mr-r0b0t@mr_r0b0t·
Pavlo is an account you should probably be following you’re in the ML/Local LLM space 🤩 Insane post below on distilling 3 models into 1 👀⬇️
mr-r0b0t tweet media
Pavlo Molchanov@PavloMolchanov

What if you could take three completely different model families… and distill them into one tiny model? 🤯 📜 Paper: arxiv.org/pdf/2605.21699 MOPD (Multi-Teacher On-Policy Distillation) has become a standard procedure in post-training. We already distill multiple specialized variants of the same model into a single set of weights. But what if we could go further - and distill models from entirely different families? Turns out, it is possible. Today we’re releasing a paper on cross-tokenizer distillation - our first steps in this exciting direction. 📄 We distilled Qwen3-4B, Phi-4-Mini, and Llama-3B into Llama-3.2-1B. MMLU jumped from 32.05 → 46.32 when using multiple teachers. 📈 The team is now working on Nemo-RL integration so the community can try this method in their own settings. Plus, we are scaling experiments up. 🚀

English
28
144
3.3K
1.2M
Joseph Pollack #Ï 🎗️ retweetledi
Joseph Pollack #Ï 🎗️
Joseph Pollack #Ï 🎗️@josephpollack·
90 missiles and 600 drones launched on a european in the largest attack to date by criminals in the @KremlinRussia. that's about a third of iran's response during their war in just one day . tomorrow is monday with rush hou. it doesnt have to be this way : u24.gov.ua
Reuters@Reuters

WARNING: GRAPHIC CONTENT Russia pounded Kyiv and surrounding areas with hundreds of drones and missiles in one of the heaviest bombardments of the city since the start of the four-year war reut.rs/4v1YG1u

English
0
0
1
77
Joseph Pollack #Ï 🎗️ retweetledi
Max Zhdanov
Max Zhdanov@maxxxzdn·
We also release an interactive demo: Mosaic's ensemble forecasts on a rotating globe, shown alongside their spectral power ratio against ERA5, across variables, initial conditions, and lead times. maxxxzdn-mosaic.static.hf.space
English
1
2
31
2.2K
Joseph Pollack #Ï 🎗️ retweetledi
Loubna Ben Allal
Loubna Ben Allal@LoubnaBenAllal1·
Introducing Carbon 🧬 a family of open generative DNA foundation models. Carbon-3B matches Evo2-7B while running 250x faster at inference. It can generate new DNA sequences and score the functional impact of mutations, zero-shot. We borrowed a lot from how modern LLMs are trained, but DNA isn't language. Genomes are noisy, redundant, and shaped by evolution rather than communication. So we adjusted the recipe: Tokenizer. Most genomic models tokenize at the nucleotide/character level, which blows up sequence length. BPE is the obvious LLM-style fix, but it doesn't behave well on DNA. We use deterministic 6-mer tokens (one token = 6 nucleotides): 6× shorter sequences and cheaper attention. Training loss. With 6-mer tokens, cross-entropy scores a prediction that gets 5/6 nucleotides right the same as one that's completely wrong. This gets brittle late in training and produces loss spikes. We switch mid-training to a more flexible factorized loss (FNS). Data. Genomes are mostly sparse, repetitive background. We curate down to a staged functional DNA + mRNA mixture, with every ratio chosen by ablation, like mixing a web corpus, but for biology. We're releasing the models, training data, training code, evaluation suite, and a demo to play with. More details in the technical report: github.com/huggingface/ca… Demo to play with the model, with a biology primer for our ML friends ;) huggingface.co/spaces/Hugging…
English
16
82
361
39.4K
Joseph Pollack #Ï 🎗️ retweetledi
Bo
Bo@bo_wangbo·
okay maybe it's a good time? We have a small colbert model trained at pplx, it is a continue-training of pplx-embed-0.6b, so native multilingual, just made it open and added a section how to use MaxSim kernel: huggingface.co/perplexity-ai/…
Erik Kaunismäki@ErikKaum

Releasing my first kernel on @huggingface: MaxSim Late-interaction retrieval (ColBERT / PyLate) bottlenecks on materializing the full similarity matrix. This kernel avoids it by using tiled scoring with simdgroup_matrix (Metal) and WMMA. Result is 3–5× speedup compared to naive PyTorch. Try it out 👇

English
7
18
100
24.1K
Joseph Pollack #Ï 🎗️ retweetledi
Novita AI
Novita AI@novita_labs·
🚀 Ring-2.6-1T is now open source (from @AntLingAGI). Now 90% off on @OpenRouter via @novita_labs — a great time to start building and experimenting with large-scale agent workflows. A trillion-scale reasoning model built for real-world agents. Designed not just to answer — but to execute: planning steps, using tools, maintaining context, and completing complex workflows. Highlights: • Strong agent execution • high / xhigh reasoning modes • Async RL + IcePop training
Ant Ling@AntLingAGI

Limited time offer: 90% off Ring-2.6-1T and Ling-flash-2.6 on @OpenRouter with @novita_labs ! Ring-2.6-1T: Extreme thinking model is here to help you with complex planning. Ling-flash-2.6: Help you save $$$ by offering extreme token efficiency. Dive into the details below 👇

English
1
3
15
3.6K
Joseph Pollack #Ï 🎗️ retweetledi
Ant Ling
Ant Ling@AntLingAGI·
Limited time offer: 90% off Ring-2.6-1T and Ling-flash-2.6 on @OpenRouter with @novita_labs ! Ring-2.6-1T: Extreme thinking model is here to help you with complex planning. Ling-flash-2.6: Help you save $$$ by offering extreme token efficiency. Dive into the details below 👇
Ant Ling tweet media
English
4
3
35
4.3K
Joseph Pollack #Ï 🎗️
Joseph Pollack #Ï 🎗️@josephpollack·
last month i worked on this concept inspired by @DPhiSpace and @liquidai , it really seems like innovation highways are wide open for concepts like this : huggingface.co/blog/Tonic/sav…
Yohan@yohaniddawela

A satellite image tells you what the Earth looked like at one moment. COP-GEN tells you what it could look like, and why that distinction matters more than it sounds. Most Earth observation models are deterministic. If you feed in a DEM and a land-cover map, and they produce one output: the most likely optical image. It's basically one question, one answer. The problem is that the real world doesn't work that way. The same terrain on the same coordinates can look completely different depending on cloud cover, season, soil moisture, atmospheric scattering, and a dozen other variables that aren't in your input. There's no single "correct" image. There's a distribution of plausible images. Deterministic models collapse that distribution to its mean, and they call it a prediction. COP-GEN, from researchers at Edinburgh and ESA, is built around this problem. It's a multimodal latent diffusion transformer trained on Copernicus data: Sentinel-2 optical, Sentinel-1 SAR, elevation, land cover, timestamps, and geolocation. Rather than predicting the most likely output, it samples from a learned distribution of physically plausible outputs. Ask it the same question sixteen times, and you get sixteen different but coherent answers. The benchmark numbers make this quite concrete. Against TerraMind, the existing benchmark model, COP-GEN achieves a spectral recall of 0.900. TerraMind achieves 0.028. That means COP-GEN's generated samples cover 90% of the real observation manifold. TerraMind's cover just 2.8%. Its sixteen outputs are nearly identical to each other, clustered near the conditional mean, and effectively invisible to the real data distribution. It wins on precision (each individual sample is close to a plausible real image) but fails entirely on recall (it can't reproduce the range of valid observations). The authors call this diversity collapse, and it's not a minor flaw. It's a structural consequence of deterministic training objectives. When you optimise for "produce the most accurate single output", you end up with a model that produces almost the same output every time. That's fine if you want a point estimate. It's a problem if you're trying to model uncertainty, simulate counterfactuals, or generate training data for downstream tasks. COP-GEN trades some of that per-sample precision for real coverage. Its intra-set diversity is 9.1 times higher than TerraMind's in spectral space. Its MMD (maximum mean discrepancy from the real distribution) is roughly half. It covers 63% of the real per-band reflectance range; TerraMind covers 18%. The practical implications aren't subtle though. Cloud gap-filling is the obvious one: when optical imagery is missing, you can't just impute a mean. You want a sample from the distribution of what the surface probably looked like, not a blurred average. Change detection across seasons has the same problem. Uncertainty quantification for downstream land-use models, water stress mapping, disaster monitoring. These tasks all require knowing not just what's most likely, but what range of outcomes is physically plausible. Band infilling is another demonstration of what the architecture can do. Feed COP-GEN only the four high-resolution visible bands (B2, B3, B4, B8) and it reconstructs the remaining Sentinel-2 spectral bands, the Sentinel-1 SAR, elevation, land cover, timestamp, and geolocation. It's inferring the full observational signature of a location from a narrow slice of it. The architecture treats each sensor and each spectral group as an independent modality with its own latent encoder. Resolution-aware tokenisation means Sentinel-2's 10m, 20m, and 60m bands are handled separately, preserving native sensor characteristics instead of resampling everything to a common grid. The diffusion process runs independent timesteps across modalities, which is what enables zero-shot any-to-any conditional generation without task-specific retraining. The paper is honest about where it falls short. Geolocation and timestamp conditioning have limited influence on outputs. Snow appears near the equator. The spatial modalities dominate the diffusion loss because they're represented by far more tokens than a latitude-longitude pair or a date. That's a training imbalance problem, and the authors flag it as a clear direction for future work. What COP-GEN establishes, beyond the model itself, is an argument about evaluation. Standard pointwise metrics like MAE and PSNR reward deterministic solutions. A model that always produces the conditional mean will score well on those metrics and will have near-zero recall. The stochastic benchmark in this paper, comparing the full distribution of outputs rather than the best single sample, is closer to the right question. The EO community will need to adopt that framing if it wants to properly evaluate generative models. The architecture is available. The Major Tom dataset it trained on is public. The gap between "what the Earth looks like" and "what the Earth could look like" has a model now. Link to the full paper: arxiv.org/pdf/2603.03239

English
0
1
0
154
Joseph Pollack #Ï 🎗️ retweetledi
Quentin Gallouédec
Quentin Gallouédec@QGallouedec·
releasing hf-sandbox 🥡
English
12
58
436
119.2K
Andrew Perpetua
Andrew Perpetua@AndrewPerpetua·
Ukrainian Wild Hornets drones going around shotgunning russian drones presumably using a Davis Gun type mechanism.
English
8
120
1.5K
46.9K
Theofanis Karaletsos
Theofanis Karaletsos@Tkaraletsos·
10/ Huge gratitude to the incredible collaborators and team who made this possible, especially James Pearce whose scientific stamina throughout this work were extraordinary; @StephenQuake , who worked closely with us throughout; and @czi @ChanZuckerberg / @biohub for supporting this long-term research effort. TranscriptFormer has set a high bar for its successor models. The road ahead is even more exciting.
English
1
0
4
756
Theofanis Karaletsos
Theofanis Karaletsos@Tkaraletsos·
1/ Excited to share that TranscriptFormer is now published in Science. We trained a generative foundation model on 112 million cells across 12 species spanning ~1.5 billion years of evolution. science.org/doi/10.1126/sc…
English
7
57
296
23.8K
Joseph Pollack #Ï 🎗️
Joseph Pollack #Ï 🎗️@josephpollack·
opensource is going to eat itself. opensource will collapse its own business models. licenses have become meaningless. source code has lost all value and meaning. people will loose their livelihoods . there is no solution currently. unrelated: cloudflare is firing 1.1K people
English
0
1
1
130
Joseph Pollack #Ï 🎗️
Joseph Pollack #Ï 🎗️@josephpollack·
@Lon @KatieMiller if rockerfellers can do it and keep all their money , we can do it too ! to be honest choosing these two charities as example was just a bad take since they are both legit finance guys , the "fixed overhead cost" on both/either of these is where the real scam is ;-)
English
0
0
1
6
Lon()
Lon()@Lon·
@josephpollack @KatieMiller Yes, I asked Grok about some of this and it assured me what Brockman (and apparently many others) have done is a great path to getting rich with nearly no criminal liability risk and a snowball's chance in hell of losing a civil suit. Entrepreneurialism or whatever...
English
1
0
0
33
Katie Miller
Katie Miller@KatieMiller·
After two days of testimony, it’s clear that Greg Brockman put tremendous time, energy, and effort into OpenAI. However, OpenAI was founded as a charity. By definition, a charity does not exist to enrich its founders, employees, or investors. If Greg wanted to build personal wealth, he should have launched a for-profit company, as he successfully did with Stripe. Greg testified that his stake in OpenAI is now worth $30 billion. Imagine the president of St. Jude’s or Habitat for Humanity doing the same. Greg also testified that OpenAI sold its API and products commercially to make money, not to benefit all of humanity—the stated mission of OpenAI. The fundamental issue in this case is simple: you cannot convert a nonprofit into a for-profit.
English
82
271
2.8K
86.6K
Joseph Pollack #Ï 🎗️
Joseph Pollack #Ï 🎗️@josephpollack·
@Lon @KatieMiller if you follow his "insider trading pages" you can also 30x yourself and ride their tailcoats into the sunset . turns out american charities are run by insider trading billionaires & some of which even get convicted for theft then become president or treasury secretary.
English
2
0
0
10