Marco Ciccone

1.2K posts

Marco Ciccone banner
Marco Ciccone

Marco Ciccone

@mciccone_AI

Postdoctoral Fellow @VectorInst - Collaborative, Decentralized, Modular ML - Competition chair @NeurIPSConf 2021, 2022, 2023 - PhD @polimi ex @NVIDIA @NNAISENSE

Toronto, Canada Katılım Nisan 2015
1.1K Takip Edilen1K Takipçiler
Sabitlenmiş Tweet
Marco Ciccone
Marco Ciccone@mciccone_AI·
🚨 Life update 🚨 I moved to Toronto 🇨🇦and joined @VectorInst as a Postdoctoral Fellow to work with @colinraffel and his lab on collaborative, decentralized, and modular machine learning to democratize ML model development. Exciting times ahead! 🪿
Marco Ciccone tweet media
English
13
3
106
11K
elie
elie@eliebakouch·
update: joining @PrimeIntellect 🦋 i'm super excited to join the team. i really admire what they've been building and i love the mission of pushing the frontier in the open i'll be working on pre/mid training, there's so much left to figure out and i truly believe a small group with the right people, resources and focus can do sooo much 🚀
elie tweet media
English
171
46
1.2K
98.3K
Luca Soldaini 🎀
Luca Soldaini 🎀@soldni·
After 4yrs, today is my last day at @allen_ai It was an honor to work on Olmo, Dolma, olmOCR, Tulu, Molmo & other fully-open artifacts 🫡 Reception has been amazing & their adoption makes me SO PROUD 🥹 Team is super committed to open recipes; can't wait to see what's next!!!!
Luca Soldaini 🎀 tweet media
English
68
9
582
32.1K
Zachary Charles
Zachary Charles@MatharyCharles·
I don't think this will lead to very good papers, but I do think it might incentivize people to start working on bigger things. If you're submitting a paper that an agent did in a weekend, you're probably submitting a bad paper!
Huaxiu Yao@HuaxiuYaoML

Everyone's excited about Karpathy's autoresearch that automates the experiment loop. We automated the whole damn thing. 🦞 Meet AutoResearchClaw: one message in, full conference paper out. Real experiments. Real citations. Real code. No human in the loop. One message in → full paper out. Here's what happens in between: 📚 Raids arXiv & Semantic Scholar, digests 50+ papers in minutes 🥊 Three AI agents FIGHT over the best hypothesis (one swings big, one sanity-checks, one tries to kill every idea) 💻 Writes experiment code from scratch, adapts to your hardware 💥 Code crashes at 3am? It reads the stack trace, rewrites the fix, keeps going 🔄 Results weak? It pivots to entirely new hypotheses and starts over 📝 Drafts a full paper with citations, every single one verified against live databases No babysitting. No Slack messages. No "hey can you re-run this." Karpathy built the experiment loop. We built the whole lab. Chat an idea. Get a paper. 🦞 Try it 👉: github.com/aiming-lab/Aut… Kudos to the team @JiaqiLiu835914, @richardxp888, @lillianwei423, @StephenQS0710, @Xinyu2ML, @HaoqinT, @zhengop, @cihangxie, @dingmyu, and we are looking for more contributors.

English
1
0
4
542
Marco Ciccone
Marco Ciccone@mciccone_AI·
@thegautamkamath One-page rebuttals are sufficient in most situations and easier to verify. If reviewers formulate their concerns precisely (unfortunately harder and harder), authors should be able to address them directly and get straight to the point. No fluff.
English
0
0
0
88
Gautam Kamath
Gautam Kamath@thegautamkamath·
Suppose one of NeurIPS/ICML/ICLR decided to do away with all rebuttals. Acceptances/rejections would be decided by the reviewers and the ACs, without input from the authors beyond the submissions. Which would you, as an author and a reviewer jointly, prefer?
English
16
0
15
12K
Zachary Charles
Zachary Charles@MatharyCharles·
@mciccone_AI Ahhh that's a lot! But I think there are creative and academically honest to even reduce this (e.g. can you tune on smaller models, for a fraction of the data, etc. etc.). But yes, this table is a great example of how LR represents different things in different methods!
English
1
0
1
33
Zachary Charles
Zachary Charles@MatharyCharles·
I think this is the experimental critique I have leveraged the most in conference reviews. I know that it's hard to tune optimally without a lot of compute, but then I think the onus is on the authors to figure out tuning shorthands that at least do pretty well.
Lucas Beyer (bl16)@giffmana

PSA: never, ever write "we use the same learning rate across all methods for fair comparison" I read this as "do not trust any of our conclusions" and then i move on. If learning rate tuning is not mentioned, it takes me a little more time to notice that, but i also move on.

English
2
0
7
489
Matej Sirovatka
Matej Sirovatka@m_sirovatka·
Anthropic’s Fractal Language Models reframes the AGI path 🧠🌀 Not bigger context windows ❌📏 Models split, argue, compress, and self-reconstruct meaning 🤖🪞 AGI isn’t memory. It’s recursive self-understanding 📐✨
Mark Saroufim@marksaroufim

New paper dropped by Anthropic: "Fractal Language Models" It DESTROYS the context window narrative. The LLM doesn't just respond, it splits into self similar copies No tokens but models arguing, compressing until the prompt is not read but self reconstructed /satire @a1zhang

English
7
3
68
10.7K
Manthan Gupta
Manthan Gupta@manthanguptaa·
The wrong tokenization strategy could be costing you $500K+ annually Not because your model is bad, but because tokenization decides how much text your model actually sees. I wrote a deep dive on how LLM tokenization is trained, and why it matters manthanguptaa.in/posts/train_ll…
Manthan Gupta tweet media
English
15
27
368
16.4K
Marco Ciccone
Marco Ciccone@mciccone_AI·
@m_sirovatka I generally don't do that, but when I do, I feel a certain peace of mind in knowing other people are sleeping
English
0
0
0
37
Marco Ciccone
Marco Ciccone@mciccone_AI·
So Claude implemented a long context compression feature! Wondering what method they use
Marco Ciccone tweet media
English
0
0
3
199
Songlin Yang
Songlin Yang@SonglinYang4·
the residual stream should be viewed as a recurrence, and insights from the RNN literature should apply here
Tianyuan Zhang@tianyuanzhang99

mHC puts lots of efforts on training stability. In some aspect, stable backprop through depth is similar to stable backprop through time(BPTT) for modern RNN. lots of RNN can be written as: S_t+1 = Gate @ S_t + f(S_t), similar to mHC: x_t+1 = H@x_t + f(x_t). And the backprop for both will has cumulative matmuls, where eigen value might explode or vanish. In RNN, common stable parametrization of the gate include: 1. Decay gate: diagonal or scalar gate with value between 0-1. Used by Retnet, Mamba2 2. Identity: same as original residual connect 3. Householder matrix: used by deltanet(if beta=2), one type orthogonal matrix, singular value as 1. Thus cumulative matmuls also is orthogonal. mHC use double stochastic mat, and the cumulative matmuls also yields double stochastic mat. Interestingly, these design space for residual connections and RNN might be shared, and influence each other. And more tricky point is that, stable might not always mean effectiveness.

English
8
16
208
21.6K
Marco Ciccone
Marco Ciccone@mciccone_AI·
@kchonyc haha +1 for PAI, Mother's Dumpling is good, and BIWON for Korean food! Amal for fancy Lebanese food. I also like Momo Ghar (more east), and RAIJIN or Ikkousha for ramen
English
0
0
1
122
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
according to gemini, toronto is a sad place in terms of its culinary culture.
Kyunghyun Cho tweet media
English
14
4
14
10.4K
Marco Ciccone
Marco Ciccone@mciccone_AI·
@kchonyc useful for faculty applications and collaborations!
English
0
0
0
47
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
productivity gain from LLM's is so real if only you use them. less than an hour after, i was able to write a code to enumerate faculty members and their research interests in the new Courant Institute School of Mathematics, Computing and Data Science: courant-faculty-research.netlify.app
Kyunghyun Cho tweet media
English
1
0
21
3.4K
Marco Ciccone retweetledi
Riccardo Zaccone @ NeurIPS
Riccardo Zaccone @ NeurIPS@RickZack96·
🚀 Excited to be at #NeurIPS2025 this week! I’ll be presenting our work on distributed and federated optimization. You'll find me on 6th Dec: - OPT for ML: 20A 10-11 am - Reliable ML: 2, 1:30 2:15 pm If you're working on learning at scale, come find me at — happy to chat 🤝
English
0
2
8
474