Nicolò Monti

8 posts

Nicolò Monti banner
Nicolò Monti

Nicolò Monti

@NMonti25537

Italy Katılım Mayıs 2024
323 Takip Edilen63 Takipçiler
Erfanzar
Erfanzar@eraznafre·
github.com/erfanzar/Spect… SpecTrax 0.1.0 with `sxregion_stage` sxstage_region means multimodal MPMD can finally look like the model: Vision path: V0 -> V1 -> V2 -> V3 Text path: T0 -> T1 -> T2 -> T3 One function. Separate logical pipelines. True forward/backward/scheduler MPMD underneath. No fake stages, no SPMD cosplay.
Erfanzar tweet media
English
3
2
12
771
Nicolò Monti
Nicolò Monti@NMonti25537·
Looks like Qwen wants to prevent distillation on their larger, closed-source models? I somehow never looked too much at the top of Qwen3.5's CoTs, and it's mentioning instructions I never put in my prompt to prevent it from reciting its reasoning. Presumably an artifact from the RL done on the large teacher model?
Nicolò Monti tweet media
English
0
0
3
153
Nicolò Monti retweetledi
PrismML
PrismML@PrismML·
Today we’re announcing Ternary Bonsai: Top intelligence at 1.58 bits Using ternary weights {-1, 0, +1}, we built a family of models that are 9x smaller than their 16-bit counterparts while outperforming most models in their respective parameter classes on standard benchmarks. We’re open-sourcing the models under the Apache 2.0 license in three sizes: 8B (1.75 GB), 4B (0.86 GB), and 1.7B (0.37 GB).
PrismML tweet media
English
116
307
2.2K
476K
Nicolò Monti retweetledi
Paradigma
Paradigma@paradigmainc·
introducing Flywheel: the infrastructure for autonomous research.
English
27
73
551
117.8K