Marco Ciccone

1.2K posts

Marco Ciccone

@mciccone_AI

Postdoctoral Fellow @VectorInst - Collaborative, Decentralized, Modular ML - Competition chair @NeurIPSConf 2021, 2022, 2023 - PhD @polimi ex @NVIDIA @NNAISENSE

Toronto, Canada Katılım Nisan 2015

1.1K Takip Edilen1K Takipçiler

Sabitlenmiş Tweet

Marco Ciccone@mciccone_AI·28 Eki

🚨 Life update 🚨 I moved to Toronto 🇨🇦and joined @VectorInst as a Postdoctoral Fellow to work with @colinraffel and his lab on collaborative, decentralized, and modular machine learning to democratize ML model development. Exciting times ahead! 🪿

English

106

11K

Marco Ciccone@mciccone_AI·2d

@eliebakouch @PrimeIntellect 🚀🚀🚀

QME

elie@eliebakouch·2d

update: joining @PrimeIntellect 🦋 i'm super excited to join the team. i really admire what they've been building and i love the mission of pushing the frontier in the open i'll be working on pre/mid training, there's so much left to figure out and i truly believe a small group with the right people, resources and focus can do sooo much 🚀

English

171

1.2K

98.3K

Marco Ciccone@mciccone_AI·27 Mar

@soldni @allen_ai Congratulations on your amazing run, and best of luck for what's ahead!

English

Luca Soldaini 🎀@soldni·27 Mar

After 4yrs, today is my last day at @allen_ai It was an honor to work on Olmo, Dolma, olmOCR, Tulu, Molmo & other fully-open artifacts 🫡 Reception has been amazing & their adoption makes me SO PROUD 🥹 Team is super committed to open recipes; can't wait to see what's next!!!!

English

582

32.1K

Marco Ciccone@mciccone_AI·16 Mar

@MatharyCharles exactly @MatharyCharles. Just thinking about all the NeurIPS submissions in the pipeline...

English

Zachary Charles@MatharyCharles·16 Mar

I don't think this will lead to very good papers, but I do think it might incentivize people to start working on bigger things. If you're submitting a paper that an agent did in a weekend, you're probably submitting a bad paper!

Huaxiu Yao@HuaxiuYaoML

Everyone's excited about Karpathy's autoresearch that automates the experiment loop. We automated the whole damn thing. 🦞 Meet AutoResearchClaw: one message in, full conference paper out. Real experiments. Real citations. Real code. No human in the loop. One message in → full paper out. Here's what happens in between: 📚 Raids arXiv & Semantic Scholar, digests 50+ papers in minutes 🥊 Three AI agents FIGHT over the best hypothesis (one swings big, one sanity-checks, one tries to kill every idea) 💻 Writes experiment code from scratch, adapts to your hardware 💥 Code crashes at 3am? It reads the stack trace, rewrites the fix, keeps going 🔄 Results weak? It pivots to entirely new hypotheses and starts over 📝 Drafts a full paper with citations, every single one verified against live databases No babysitting. No Slack messages. No "hey can you re-run this." Karpathy built the experiment loop. We built the whole lab. Chat an idea. Get a paper. 🦞 Try it 👉: github.com/aiming-lab/Aut… Kudos to the team @JiaqiLiu835914, @richardxp888, @lillianwei423, @StephenQS0710, @Xinyu2ML, @HaoqinT, @zhengop, @cihangxie, @dingmyu, and we are looking for more contributors.

English

542

Marco Ciccone@mciccone_AI·8 Şub

@thegautamkamath One-page rebuttals are sufficient in most situations and easier to verify. If reviewers formulate their concerns precisely (unfortunately harder and harder), authors should be able to address them directly and get straight to the point. No fluff.

English

Gautam Kamath@thegautamkamath·6 Şub

Suppose one of NeurIPS/ICML/ICLR decided to do away with all rebuttals. Acceptances/rejections would be decided by the reviewers and the ACs, without input from the authors beyond the submissions. Which would you, as an author and a reviewer jointly, prefer?

English

12K

Marco Ciccone@mciccone_AI·30 Oca

@MatharyCharles 100% agreed. But we all have some toxic traits and obsessive behaviours.

English

Zachary Charles@MatharyCharles·30 Oca

@mciccone_AI Ahhh that's a lot! But I think there are creative and academically honest to even reduce this (e.g. can you tune on smaller models, for a fraction of the data, etc. etc.). But yes, this table is a great example of how LR represents different things in different methods!

English

Zachary Charles@MatharyCharles·30 Oca

I think this is the experimental critique I have leveraged the most in conference reviews. I know that it's hard to tune optimally without a lot of compute, but then I think the onus is on the authors to figure out tuning shorthands that at least do pretty well.

Lucas Beyer (bl16)@giffmana

PSA: never, ever write "we use the same learning rate across all methods for fair comparison" I read this as "do not trust any of our conclusions" and then i move on. If learning rate tuning is not mentioned, it takes me a little more time to notice that, but i also move on.

English

489

Marco Ciccone@mciccone_AI·19 Oca

@m_sirovatka I believe (in) you

English

Matej Sirovatka@m_sirovatka·19 Oca

how hard can hotswapping torchtitan moe layers with ones custom built on top of TE GroupedLinear and not breaking checkpointing be 😭 turns out quite a bit

Matej Sirovatka@m_sirovatka

Someone should put me (down) out of the fp8 misery, it has no end, I’ll really end up writing my own kernels at this point

English

7.1K

Marco Ciccone@mciccone_AI·18 Oca

@m_sirovatka lol

151

Matej Sirovatka@m_sirovatka·18 Oca

Anthropic’s Fractal Language Models reframes the AGI path 🧠🌀 Not bigger context windows ❌📏 Models split, argue, compress, and self-reconstruct meaning 🤖🪞 AGI isn’t memory. It’s recursive self-understanding 📐✨

Mark Saroufim@marksaroufim

New paper dropped by Anthropic: "Fractal Language Models" It DESTROYS the context window narrative. The LLM doesn't just respond, it splits into self similar copies No tokens but models arguing, compressing until the prompt is not read but self reconstructed /satire @a1zhang

English

10.7K

Marco Ciccone@mciccone_AI·14 Oca

@manthanguptaa nice write-up. We have released a study of different tokenizers' characteristics and a multilingual robustness benchmark if you are interested arxiv.org/abs/2512.20757

English

227

Manthan Gupta@manthanguptaa·14 Oca

The wrong tokenization strategy could be costing you $500K+ annually Not because your model is bad, but because tokenization decides how much text your model actually sees. I wrote a deep dive on how LLM tokenization is trained, and why it matters manthanguptaa.in/posts/train_ll…

English

368

16.4K

Marco Ciccone@mciccone_AI·12 Oca

@m_sirovatka I generally don't do that, but when I do, I feel a certain peace of mind in knowing other people are sleeping

English

Matej Sirovatka@m_sirovatka·11 Oca

why do I always lock in at 3am

Matej Sirovatka@m_sirovatka

so many things to do that I end up doing nothing

English

3.6K

Marco Ciccone@mciccone_AI·10 Oca

So Claude implemented a long context compression feature! Wondering what method they use

English

199

Marco Ciccone@mciccone_AI·3 Oca

@SonglinYang4 @rupspace I choose you

English

Songlin Yang@SonglinYang4·2 Oca

the residual stream should be viewed as a recurrence, and insights from the RNN literature should apply here

Tianyuan Zhang@tianyuanzhang99

mHC puts lots of efforts on training stability. In some aspect, stable backprop through depth is similar to stable backprop through time(BPTT) for modern RNN. lots of RNN can be written as: S_t+1 = Gate @ S_t + f(S_t), similar to mHC: x_t+1 = H @x_t + f(x_t). And the backprop for both will has cumulative matmuls, where eigen value might explode or vanish. In RNN, common stable parametrization of the gate include: 1. Decay gate: diagonal or scalar gate with value between 0-1. Used by Retnet, Mamba2 2. Identity: same as original residual connect 3. Householder matrix: used by deltanet(if beta=2), one type orthogonal matrix, singular value as 1. Thus cumulative matmuls also is orthogonal. mHC use double stochastic mat, and the cumulative matmuls also yields double stochastic mat. Interestingly, these design space for residual connections and RNN might be shared, and influence each other. And more tricky point is that, stable might not always mean effectiveness.

English

208

21.6K

Marco Ciccone@mciccone_AI·21 Ara

@kchonyc haha +1 for PAI, Mother's Dumpling is good, and BIWON for Korean food! Amal for fancy Lebanese food. I also like Momo Ghar (more east), and RAIJIN or Ikkousha for ramen

English

122

Kyunghyun Cho@kchonyc·21 Ara

according to gemini, toronto is a sad place in terms of its culinary culture.

English

10.4K

Marco Ciccone@mciccone_AI·20 Ara

@kchonyc useful for faculty applications and collaborations!

English

Kyunghyun Cho@kchonyc·20 Ara

repo: github.com/kyunghyuncho/c…

Español

1.9K

Kyunghyun Cho@kchonyc·20 Ara

productivity gain from LLM's is so real if only you use them. less than an hour after, i was able to write a code to enumerate faculty members and their research interests in the new Courant Institute School of Mathematics, Computing and Data Science: courant-faculty-research.netlify.app

English

3.4K

Marco Ciccone@mciccone_AI·6 Ara

@samsja19 @_fracapuano @lexfridman Let rephrase: I demand that shirt

English

Marco Ciccone@mciccone_AI·6 Ara

@samsja19 @_fracapuano @lexfridman I need that shirt @samsja19

English

Francesco Capuano@_fracapuano·5 Ara

Met @lexfridman We talked BJJ, he’s genuinely a nice person, and so down to earth!

English

168

8.9K

Marco Ciccone retweetledi

Riccardo Zaccone @ NeurIPS@RickZack96·2 Ara

🚀 Excited to be at #NeurIPS2025 this week! I’ll be presenting our work on distributed and federated optimization. You'll find me on 6th Dec: - OPT for ML: 20A 10-11 am - Reliable ML: 2, 1:30 2:15 pm If you're working on learning at scale, come find me at — happy to chat 🤝

English

474

Marco Ciccone@mciccone_AI·4 Ara

Link to slides neurips.cc/media/neurips-…

English

168

Marco Ciccone@mciccone_AI·4 Ara

😮 A fully packed room for our Model Merging tutorial at #NeurIPS2025 yesterday! I hope you are all less perplexed about parameter averaging! Thanks to all our panelists @sarahookr @PontiEdoardo @alexandraxron @margs_li @chargoddard and to all participants!

Sara Hooker@sarahookr

A lunch merge.

English

600

Marco Ciccone@mciccone_AI·4 Ara

@_fracapuano @Malikeh5 I think they are already available on slideslive, but I'll let you know!

English

Francesco Capuano@_fracapuano·4 Ara

@mciccone_AI @Malikeh5 Amazing! So much fomo for not having been there :( recording available wen

English

337

Keşfet

@eliebakouch @PrimeIntellect @soldni @allen_ai @MatharyCharles @thegautamkamath @m_sirovatka @manthanguptaa