spaceCrumbs

37 posts

spaceCrumbs

@CrumbsSpace

Katılım Kasım 2022

29 Takip Edilen2 Takipçiler

spaceCrumbs@CrumbsSpace·15 May

@edugarmer @andrewgwils can you link your work? Would love to read

English

Eduardo C. Garrido-Merchán@edugarmer·15 May

Destroyed by LLMs, we are currently researching Fisher information for Bayesian optimization and trying to improve Joint Entropy Search, but our feeling is that nobody will care. They are using transformers for BO and not comparing them with information theoretic approaches... sad.

English

2.2K

Andrew Gordon Wilson@andrewgwils·14 May

Sometimes I miss the days when people were passionately fighting about MCMC versus variational methods, or whether posterior tempering is problematic. We should have a nostalgia ICML 2010. You can submit AI slop, but expect a 2010 era reaction. What happened to our field?

English

277

46.2K

spaceCrumbs@CrumbsSpace·13 May

@ar0cket1 @eliebakouch Lmaooo

ar0cket1@ar0cket1·12 May

@eliebakouch Mistrals definition of small

English

503

elie@eliebakouch·12 May

the "small" model behind this demo is a 276B total 12B active MoE (larger pretrains are cooking), sparsity ratio looks pretty standard compared to open models of the same size

Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

215

43.3K

spaceCrumbs@CrumbsSpace·11 May

@LinusMixson @aakashgupta Care to explain why it's bad for lecun? Guy was literally complimenting him

English

Linus Mixson@LinusMixson·10 May

@aakashgupta This post is so bad that it's kind of insulting to LeCun. It's also bit ironic that it was written by a default-settings LLM and that literally no efforts was made to disguise that.

English

1.4K

Aakash Gupta@aakashgupta·9 May

Yann LeCun closed $1.03B for AMI Labs on March 10. Three days later, this paper dropped from his NYU collaborators. 15M parameters. Single GPU. A few hours of training. LeWorldModel is the first JEPA that trains end-to-end from raw pixels. Two loss terms: predict the next embedding, keep the latent space Gaussian. Previous JEPAs needed exponential moving averages or pretrained encoders to avoid representation collapse. LeWM doesn't. Six hyperparameters down to one. The numbers are the story. Foundation-model-based world models require hundreds of millions of parameters and serious compute to plan a control task. LeWM plans up to 48x faster while staying competitive on 2D and 3D benchmarks. The whole thing fits on a laptop GPU. Look at the trajectory. Yann announced his Meta departure in November 2025 after 12 years and called founding FAIR his "proudest non-technical accomplishment." On March 10, 2026, AMI Labs closed the largest seed round in European history at a $3.5B pre-money valuation. Bezos, Nvidia, Samsung, and Toyota all wrote checks. Three days later: a paper showing that JEPA-from-pixels is no longer fragile and no longer compute-heavy. The engineering scaffolding that made it look like an academic curiosity is gone. The authors sit at Mila, NYU, Samsung SAIL, and Brown. None at Meta. Yann's bet was that the path to machine intelligence runs through world models, not language models. He left a public company to build it. Each JEPA paper from his network resets the assumed cost structure for that bet. This one makes world modeling laptop-cheap. Meta still has the GPUs. The architecture left.

English

331

2.4K

234.5K

spaceCrumbs@CrumbsSpace·2 May

@MarcCoru @kklmmr @david_rolnick The visuals r amazing. What libraries/tools did they use?

English

Marc Rußwurm@MarcCoru·1 May

🔥 New #ICML2026 Paper accepted 🔥 by Arjun Rao with Tessa Ooms, Ruth Castro, @kklmmr @david_rolnick Paper: openreview.net/forum?id=eWQQ0… Code: github.com/arjunarao619/S… TL;DR: We propose Slepian functions as localized, spatially concentrated basis functions for regional location encoding. Building on spherical harmonics, Slepians allow location encoders to allocate higher resolution where it is most needed — for example, in regions with denser observations or where the underlying geospatial field varies at finer spatial scales, such as land compared to oceans. This work connects geospatial AI, implicit neural representations, and functional modeling of Earth system data fields.

English

151

10.2K

spaceCrumbs@CrumbsSpace·2 May

@Ibelick You should've checked out mikupad on GitHub. It does the same thing but better.

English

325

Ibelick@Ibelick·1 May

select any word to explore better options and see next-token probabilities

English

983

65.3K

spaceCrumbs@CrumbsSpace·1 May

@sidgairo18 @PontiEdoardo @icmlconf It's likely that the conference had accepted more than enough papers already.

English

110

Siddhartha Gairola@sidgairo18·1 May

@PontiEdoardo @icmlconf Very disturbing. Sorry to witness. Just curious - what justification does the AC / Meta-Review make in such cases ? 🤔

English

5.2K

Edoardo Ponti@PontiEdoardo·1 May

Trying to win the consolation prize of the rejected paper with the highest average score at @icmlconf. Any contenders?

English

346

43.6K

spaceCrumbs@CrumbsSpace·1 May

@FioraStarlight i thought you meant output > prompt by reverse sft... Anyway, if you mean prompt > unwanted behavior that's literally how preferences datasets are built. The rejected field would have near-misses, misalignment, all sorts of wrong behavior.

English

638

Fiora Starlight@FioraStarlight·30 Nis

Do people ever do, like, reverse SFT? Like, constructing an example of an unwanted behavior, and having the loss signal say "you should put zero probability on every token in this sequence"?

English

101

12.9K

spaceCrumbs@CrumbsSpace·1 May

@RyanJamesShaw @MageArez @NousResearch @Teknium @github @github fix this bro

English

Ryan J. Shaw@RyanJamesShaw·1 May

@CrumbsSpace @MageArez @NousResearch @Teknium @github 100% a scam x.com/RyanJamesShaw/…

Ryan J. Shaw@RyanJamesShaw

Watch out for scams like this. Git allows anybody to commit anything under anybody's name/email. That's why you should look for the "Verified" tag before trusting the commit author. Git also allows you to copy an arbitrary verified commit from a different repo by that author, and then build unverified commits on top of that, and name your repo whatever you want. GitHub should have a way to block association of one's account with unverified commits, if it doesn't already.

English

spaceCrumbs@CrumbsSpace·1 May

@Chahatxsharma @arena provide the solution too

English

Chahat Sharma@Chahatxsharma·30 Nis

Stop pretending LLM-as-a-judge is ground truth. It literally cannot see the private information driving real human preferences. How do you autorate when drift, multi-turn, and tie thresholds break everything? Arena just exposed it all in one whiteboard. Who else is rethinking their evals right now?

English

246

Arena.ai@arena·30 Nis

Where do autoraters break down? Arena researchers Li Chen and I-Hung Hsu walk through how they'd build an autorater from scratch — different kinds of autoraters, training objectives, what dimensions actually matter to rate on — then get into what makes it hard in practice: preference drift, multi-turn evaluation, tie threshold variance, and the gap between LLM-as-a-judge and real human subjectivity. Watch on YouTube to see the whiteboard details (link in 🧵 thread) 0:00 Evaluation granularity: general vs. per-category vs. per-response 2:05 Applications of autoraters as RL reward signals and test-time scaling 3:03 Output design for pairwise autorater: scores, comparison, and ties 4:03 Verbal and visual feedback autoraters 4:48 Training for pairwise autorater: Bradley-Terry loss, threshold design 9:43 Real-world challenges: preference shifts over time 10:30 Multi-turn autorating and usage simulation 11:35 Tie threshold variance across annotators 12:18 Long-context evaluation challenges 13:02 Confidence intervals and score uncertainty 14:00 Why LLM-as-a-judge fails to capture subjective human preference 15:20 The private information unobservable in human evaluation 16:14 Model evolved to be stronger makes training data harder 17:08 Signal vs. noise in human preference data 18:04 How do you autorate an autorater?

English

122

20.6K

spaceCrumbs@CrumbsSpace·1 May

@demisama_ Most probably because enough papers had already been accepted.

English

2.8K

Demi Wang@demisama_·1 May

all positive scores still got rejected by #ICML2026 😢

English

339

77.2K

spaceCrumbs@CrumbsSpace·1 May

@bbssppllvv @Teknium can you add this onto hermes pls

English

Mike Bespalov@bbssppllvv·30 Nis

Agents make ugly UIs because they've never seen good design. We've been fixing that, 2,000 DESIGN.md files from the world's best products, structured for a model to read and learn. Colors, type, spacing, layouts and more. Free. styles.refero.design

English

207

887

10.5K

1.5M

spaceCrumbs@CrumbsSpace·30 Nis

@iraszl But you can always fine-tune them. And finetuned small models always beat large frontier ones.

English

461

Ivan Raszl@iraszl·30 Nis

Thinking of running Local LLM on a new MBP? Here is the level of intelligence you can get with various memory configurations on open models: 🐹 16–24GB RAM → ≈ GPT-3.5 🐕 32–48GB RAM → ≈ higher-end GPT-3.5 🐅 64GB RAM → ≈ lower-end GPT-4 🐉 96–128GB RAM → ≈ mid-tier GPT-4 All still below newer GPT or Claude models.

English

171

38.7K

spaceCrumbs@CrumbsSpace·30 Nis

@NicW_AI @LottoLabs Use your agent as a salesman

English

Nic Wienandt@NicW_AI·30 Nis

@LottoLabs I have the agent for business use :) wanna be my salesman ?

English

Lotto@LottoLabs·30 Nis

If you optimized a suite just to finetune and serve qwen 27b with all sota inference tricks And sold that as the brain with a OS agent to companies You’d literally print money in the next 6-12 months

English

233

19.3K

spaceCrumbs@CrumbsSpace·30 Nis

@dannyshmueli @Teknium what is that

English

Danny Shmueli@dannyshmueli·30 Nis

@Teknium This saves so much time of manual setup. Glad it's a built-in part of Hermes I'll run it every night like modern-day disk defrag but for something that will actually speed up my life

English

3.6K

Teknium 🪽@Teknium·30 Nis

Introducing Hermes Curator! The new system built in to Hermes Agent now helps you keep your skills that the self improvement loop creates in check, by consolidating and pruning automatically. The curator does multiple things: - keeps track of how often you use each skill, when it was last updated/created, etc - Once a week runs automatically (configurable) - Uses the analytics plus it's own scanning of your skills and consolidates or prunes them if necessary - Skips externally installed skills, built in skills, and skills you "pin" that you dont' want touched. It will only attempt curation over agent created/updated skills or user written skills. - It will then determine whether skills can be consolidated, pruned, or otherwise made more manageable. It will convert some skills that are too specific into references, templates or scripts for larger/broader skills, or integrate them directly into a consolidation of an existing skill. You can also disable it entirely in the config.yaml and/or run it manually with `hermes curator run ` Learn more on the docs here: hermes-agent.nousresearch.com/docs/user-guid…

English

133

160

2.2K

475.3K

spaceCrumbs@CrumbsSpace·29 Nis

@vivoplt Bro has never heard about mistral

English

Vivo@vivoplt·29 Nis

USA has ChatGPT USA has Grok USA has Claude USA has Gemini USA has Copilot China has DeepSeek China has Qwen China has GLM China has Kimi China has MiniMax What is the rest of the world even doing??

English

1.2K

329

4.7K

719.3K

spaceCrumbs@CrumbsSpace·29 Nis

@awarebrain @burny_tech Capacity as in? Make it able to store larger knowledge bases without forgetting?

English

Burny - Effective Curiosity@burny_tech·29 Nis

What's the most promising direction in AI research?

English

3.5K

spaceCrumbs@CrumbsSpace·29 Nis

@cOfDirac @eliebakouch True but "dumb" intelligence is also pretty useful enough to start capitalizing on AI. Similar to how you don't need AlphaZero to beat most humans at chess - stockfish running on a potato cpu will do. Agency > intelligence in the real world.

English

cOfDirac@cOfDirac·28 Nis

@eliebakouch It's undeniable that LLMs currently produce impressive results, but there's tiny cracks on the surface that reveal that they're the furthest thing from any sort of general intelligence. I think this is a fault of how we train them and the data we use.

English

elie@eliebakouch·27 Nis

i might be very wrong here, but i don't think "no human data, no pre-training" is the right approach to get frontier models or scientific breakthroughs any time soon

Ineffable Intelligence@IneffableLabs

Introducing Ineffable Intelligence. Led by David Silver, we're assembling the best engineers and researchers in the world to make first contact with superintelligence. We’ll be solving the hardest problems in AI on the way. Come join us. ineffable.ai

English

299

72.5K

spaceCrumbs@CrumbsSpace·29 Nis

@dbreunig Probably cavemen. But I've yet to find my favorite skill (which would be something that can approximate what ML intern does)

English

266

Drew Breunig@dbreunig·29 Nis

Drop your favorite Skill below. The one you're most thankful for (could be one you wrote or one you found).

English

8.4K

spaceCrumbs@CrumbsSpace·29 Nis

@CV_novel_plume @CV_novel_plume you can get lower loss (~0.2-0.3 ppl) just by setting dropout=0 with muon. Last time I checked, "most" of the muon benchmarks used a dropout of 0.01. Would be interested to know if you can replicate this in that benchmark.

English

245

Yuxin Fang@CV_novel_plume·29 Nis

I’ve run a lot of experiments on Muon and its variants, and I’d bet that in this setting, the Muon baseline will be very hard to beat.

Keller Jordan@kellerjordan0

Modded-NanoGPT Optimization Benchmark Hundreds of neural network optimizers have been proposed in the literature, recently including dozens citing Muon: MARS, SWAN, REG, ADANA, Newton-Muon, TrasMuon, AdaMuon, HTMuon, COSMOS, Conda, ASGO, SAGE, and Magma, to name a few. The majority of this innovation is happening in the public research community. But the community currently lacks a widely accepted, easily accessible way to compare and make sense of the deluge of methods. As a result, promising new ideas get buried, and spurious results go unchallenged. To help address these issues, I'm releasing a new optimization benchmark. It's designed for maximum simplicity and speed: Just a single file containing ~350 lines of plain PyTorch, which can complete a baseline LM training within 20 minutes of booting up a fresh 8xH100 machine. It also works with {1,2,4}xH100 or A100. These attributes make the new benchmark more accessible than any prior work. The rules are simple: The optimization algorithm can be changed arbitrarily, with the goal being to minimize the number of training steps needed to reach 3.28 val loss on FineWeb (this is the same target loss as in the main speedrun). Modifying the architecture or dataloader, on the other hand, is not allowed. Wallclock time is unlimited, in order to give a fair chance to optimizers which would need kernel work or larger scale to become wallclock-efficient. Like the main NanoGPT speedrun, submissions are open, and new results will be publicly broadcast. Beyond just improving the step count record, another goal of the benchmark is to collaboratively produce well-tuned baselines for as many optimizers as possible. For example, any improvement to the benchmark's best hyperparameters for AdamW would be considered a worthwhile new result. This benchmark is not intended to be the final measure of optimizer quality across all domains. Convenient shared experimental infrastructure which covers the full space of possibilities -- across varying batch size, tokens per parameter, model scale, epoch count, and architecture -- is desirable, but far beyond the current status quo. This benchmark is only meant to be one step towards that goal. To start the benchmark off, I've spent ~20 runs tuning baselines for Muon and AdamW. From time to time over the next few weeks, I'll add another optimizer from the literature, with my best effort at finding good hyperparameters. Researchers interested in neural network optimization are invited to join in by picking an optimizer and giving it a try on the benchmark. All optimizers are welcome, and even runs that don't necessarily have the best hyperparameters are desirable additions to the repo, because each new run adds to the collective knowledge.

English

18.2K

Keşfet

@edugarmer @andrewgwils @ar0cket1 @eliebakouch @LinusMixson @aakashgupta @MarcCoru @kklmmr