
spaceCrumbs
37 posts


@edugarmer @andrewgwils can you link your work? Would love to read
English

the "small" model behind this demo is a 276B total 12B active MoE (larger pretrains are cooking), sparsity ratio looks pretty standard compared to open models of the same size

Thinking Machines@thinkymachines
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…
English

@LinusMixson @aakashgupta Care to explain why it's bad for lecun? Guy was literally complimenting him
English

@aakashgupta This post is so bad that it's kind of insulting to LeCun. It's also bit ironic that it was written by a default-settings LLM and that literally no efforts was made to disguise that.
English

Yann LeCun closed $1.03B for AMI Labs on March 10. Three days later, this paper dropped from his NYU collaborators.
15M parameters. Single GPU. A few hours of training.
LeWorldModel is the first JEPA that trains end-to-end from raw pixels. Two loss terms: predict the next embedding, keep the latent space Gaussian. Previous JEPAs needed exponential moving averages or pretrained encoders to avoid representation collapse. LeWM doesn't.
Six hyperparameters down to one.
The numbers are the story. Foundation-model-based world models require hundreds of millions of parameters and serious compute to plan a control task. LeWM plans up to 48x faster while staying competitive on 2D and 3D benchmarks. The whole thing fits on a laptop GPU.
Look at the trajectory. Yann announced his Meta departure in November 2025 after 12 years and called founding FAIR his "proudest non-technical accomplishment." On March 10, 2026, AMI Labs closed the largest seed round in European history at a $3.5B pre-money valuation. Bezos, Nvidia, Samsung, and Toyota all wrote checks.
Three days later: a paper showing that JEPA-from-pixels is no longer fragile and no longer compute-heavy. The engineering scaffolding that made it look like an academic curiosity is gone.
The authors sit at Mila, NYU, Samsung SAIL, and Brown. None at Meta.
Yann's bet was that the path to machine intelligence runs through world models, not language models. He left a public company to build it. Each JEPA paper from his network resets the assumed cost structure for that bet. This one makes world modeling laptop-cheap.
Meta still has the GPUs. The architecture left.


English

🔥 New #ICML2026 Paper accepted 🔥 by Arjun Rao with Tessa Ooms, Ruth Castro, @kklmmr @david_rolnick
Paper: openreview.net/forum?id=eWQQ0…
Code: github.com/arjunarao619/S…
TL;DR: We propose Slepian functions as localized, spatially concentrated basis functions for regional location encoding. Building on spherical harmonics, Slepians allow location encoders to allocate higher resolution where it is most needed — for example, in regions with denser observations or where the underlying geospatial field varies at finer spatial scales, such as land compared to oceans.
This work connects geospatial AI, implicit neural representations, and functional modeling of Earth system data fields.


English

@Ibelick You should've checked out mikupad on GitHub. It does the same thing but better.
English

@sidgairo18 @PontiEdoardo @icmlconf It's likely that the conference had accepted more than enough papers already.
English

@PontiEdoardo @icmlconf Very disturbing. Sorry to witness.
Just curious - what justification does the AC / Meta-Review make in such cases ? 🤔
English

Trying to win the consolation prize of the rejected paper with the highest average score at @icmlconf. Any contenders?

English

@FioraStarlight i thought you meant output > prompt by reverse sft...
Anyway, if you mean prompt > unwanted behavior that's literally how preferences datasets are built.
The rejected field would have near-misses, misalignment, all sorts of wrong behavior.
English


Stop pretending LLM-as-a-judge is ground
truth.
It literally cannot see the private information
driving real human preferences.
How do you autorate when drift, multi-turn, and
tie thresholds break everything?
Arena just exposed it all in one whiteboard. Who
else is rethinking their evals right now?
English

Where do autoraters break down?
Arena researchers Li Chen and I-Hung Hsu walk through how they'd build an autorater from scratch — different kinds of autoraters, training objectives, what dimensions actually matter to rate on — then get into what makes it hard in practice: preference drift, multi-turn evaluation, tie threshold variance, and the gap between LLM-as-a-judge and real human subjectivity.
Watch on YouTube to see the whiteboard details (link in 🧵 thread)
0:00 Evaluation granularity: general vs. per-category vs. per-response
2:05 Applications of autoraters as RL reward signals and test-time scaling
3:03 Output design for pairwise autorater: scores, comparison, and ties
4:03 Verbal and visual feedback autoraters
4:48 Training for pairwise autorater: Bradley-Terry loss, threshold design
9:43 Real-world challenges: preference shifts over time
10:30 Multi-turn autorating and usage simulation
11:35 Tie threshold variance across annotators
12:18 Long-context evaluation challenges
13:02 Confidence intervals and score uncertainty
14:00 Why LLM-as-a-judge fails to capture subjective human preference
15:20 The private information unobservable in human evaluation
16:14 Model evolved to be stronger makes training data harder
17:08 Signal vs. noise in human preference data
18:04 How do you autorate an autorater?
English

@demisama_ Most probably because enough papers had already been accepted.
English

Agents make ugly UIs because they've never seen good design.
We've been fixing that, 2,000 DESIGN.md files from the world's best products, structured for a model to read and learn. Colors, type, spacing, layouts and more.
Free. styles.refero.design
English

@iraszl But you can always fine-tune them.
And finetuned small models always beat large frontier ones.
English

Thinking of running Local LLM on a new MBP? Here is the level of intelligence you can get with various memory configurations on open models:
🐹 16–24GB RAM → ≈ GPT-3.5
🐕 32–48GB RAM → ≈ higher-end GPT-3.5
🐅 64GB RAM → ≈ lower-end GPT-4
🐉 96–128GB RAM → ≈ mid-tier GPT-4
All still below newer GPT or Claude models.

English

@LottoLabs I have the agent for business use :) wanna be my salesman ?
English

@Teknium This saves so much time of manual setup. Glad it's a built-in part of Hermes
I'll run it every night like modern-day disk defrag but for something that will actually speed up my life

English

Introducing Hermes Curator!
The new system built in to Hermes Agent now helps you keep your skills that the self improvement loop creates in check, by consolidating and pruning automatically.
The curator does multiple things:
- keeps track of how often you use each skill, when it was last updated/created, etc
- Once a week runs automatically (configurable)
- Uses the analytics plus it's own scanning of your skills and consolidates or prunes them if necessary
- Skips externally installed skills, built in skills, and skills you "pin" that you dont' want touched. It will only attempt curation over agent created/updated skills or user written skills.
- It will then determine whether skills can be consolidated, pruned, or otherwise made more manageable. It will convert some skills that are too specific into references, templates or scripts for larger/broader skills, or integrate them directly into a consolidation of an existing skill.
You can also disable it entirely in the config.yaml and/or run it manually with `hermes curator run `
Learn more on the docs here:
hermes-agent.nousresearch.com/docs/user-guid…

English

@awarebrain @burny_tech Capacity as in? Make it able to store larger knowledge bases without forgetting?
English

@cOfDirac @eliebakouch True but "dumb" intelligence is also pretty useful enough to start capitalizing on AI.
Similar to how you don't need AlphaZero to beat most humans at chess - stockfish running on a potato cpu will do.
Agency > intelligence in the real world.
English

@eliebakouch It's undeniable that LLMs currently produce impressive results, but there's tiny cracks on the surface that reveal that they're the furthest thing from any sort of general intelligence. I think this is a fault of how we train them and the data we use.
English

i might be very wrong here, but i don't think "no human data, no pre-training" is the right approach to get frontier models or scientific breakthroughs any time soon

Ineffable Intelligence@IneffableLabs
Introducing Ineffable Intelligence. Led by David Silver, we're assembling the best engineers and researchers in the world to make first contact with superintelligence. We’ll be solving the hardest problems in AI on the way. Come join us. ineffable.ai
English

@dbreunig Probably cavemen. But I've yet to find my favorite skill (which would be something that can approximate what ML intern does)
English

@CV_novel_plume @CV_novel_plume you can get lower loss (~0.2-0.3 ppl) just by setting dropout=0 with muon.
Last time I checked, "most" of the muon benchmarks used a dropout of 0.01.
Would be interested to know if you can replicate this in that benchmark.
English











