Kamran Chitsaz

132 posts

Kamran Chitsaz

@KChitsaz

Machine Learning Researcher @Mila_Quebec, MSc of Electrical Engineering at @polymtl

Montreal, QC Katılım Mart 2021

273 Takip Edilen154 Takipçiler

Sabitlenmiş Tweet

Kamran Chitsaz@KChitsaz·9 Eki

Long reasoning without the quadratic tax: The Markovian Thinker makes LLMs reason in chunks with a bounded state → linear compute, constant memory and it keeps scaling beyond the training limit. 1/6

GIF

Milad Aghajohari@MAghajohari

Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵

English

4.3K

Kamran Chitsaz@KChitsaz·12 May

@ZyphraAI @AMD Very exciting to see Markovian Thinking used in ZAYA1-8B. Scaling test-time compute to millions of tokens within 32K context, with strong gains from a <1B active parameter model, is exactly what we hoped this idea would enable. Congrats to the Zyphra team! x.com/MAghajohari/st…

Milad Aghajohari@MAghajohari

Excited to see that Markovian Thinker contributed to Zyphra's strong release 🚀. Their Markovian RSA: markovian thinking (carrying forward bounded-length reasoning tails) + RSA (recursive self-aggregation) boosted test-time compute to be on-par with larger reasoning models. 1/

English

108

Zyphra@ZyphraAI·6 May

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

English

102

295

2.5K

1.3M

Kamran Chitsaz retweetledi

Milad Aghajohari@MAghajohari·12 May

Zyphra@ZyphraAI

English

7.4K

Kamran Chitsaz retweetledi

Nilaksh@nilaksh404·12 May

Diffusion world models can help test and improve robot policies before running them on real robots. But can the choice of latent space make the WM more faithful? We show that semantic spaces beat reconstruction spaces on task relevant metrics. hskalin.github.io/semantic-wm

English

218

41.1K

Kamran Chitsaz retweetledi

Darshan Patil@dapatil211·5 Mar

🧬 New paper Scientific datasets evolve as science evolves. With proteins, new sequences get added, annotations get corrected, and noisy entries get curated out. Introducing CoPeP, a continual-pretraining benchmark for protein LMs. Details 🧵 1/n

English

8.5K

Kamran Chitsaz retweetledi

Chandar Lab@ChandarLab·24 Şub

Streaming Reinforcement Learning (RL) is a huge challenge: transitions are used once and discarded immediately. This makes agents extremely sample-inefficient. But what if we could "squeeze" more information out of every single frame? Check out our latest paper!

English

2.9K

Kamran Chitsaz retweetledi

Chandar Lab@ChandarLab·17 Şub

‘The Markovian Thinker’, developed by our lab, has been accepted at @iclr_conf!   This work achieved long reasoning without the quadratic attention tax LLMs reason in chunks with a bounded state, achieving linear compute, constant memory and scaling beyond its training limit!

GIF

English

8.8K

Kamran Chitsaz retweetledi

Chandar Lab@ChandarLab·10 Şub

New work from our lab, accepted @iclr_conf : "The Expressive Limits of Diagonal SSMs for State-Tracking" We give a complete characterization of what diagonal SSMs can and cannot compute on state-tracking tasks and the answer is deeply connected to group theory. 🧵👇

English

4.5K

Kamran Chitsaz retweetledi

Mila - Institut québécois d'IA@Mila_Quebec·22 Ara

Congrats to Prashant (@prashantg_17), Davide (@DavideBald42296 ), Quentin (@qfournier2), and Sarath (@apsarathchandar) on CADmium, a new method that rethinks text-to-CAD to generate high-fidelity 3D models! Read their blog post: mila.quebec/en/article/imp…

English

4.6K

Kamran Chitsaz retweetledi

Chandar Lab@ChandarLab·20 Oca

Can LLMs become CAD designers? Check out “CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design”, which is now published in Transactions on Machine Learning Research (TMLR), and led by @prashantg_17, @DavideBald42296, and @qfournier2!

English

8.1K

Kamran Chitsaz retweetledi

Mila - Institut québécois d'IA@Mila_Quebec·4 Ara

Alongside @NeurIPSConf in San Diego, the satellite conference NeurIPS Mexico City is taking place, with several Mila student-researchers taking part. Two of them presented their research today. SaharDastani (@sonia_dt98), PhD student at ETS/Mila, presented “TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses” and Saba Ahmadi (@Saba_A96), affiliated researcher at UdeM/Mila, presented “The Promise of RL for Autoregressive Image Editing.” Congratulations!

Mila - Institut québécois d'IA tweet media

English

6.3K

Kamran Chitsaz retweetledi

Amir Kargaran@amir_nlp·24 Kas

With all the ICLR 2026 drama, we’re sharing some insights on the review and rebuttal process from ICLR 2025 & 2024. You might find them useful for your own rebuttal! arxiv.org/abs/2511.15462 The data of scores before and after rebuttal is also available: github.com/papercopilot/i…

English

11.8K

Kamran Chitsaz retweetledi

Aarash Feizi @ ICLR 🇧🇷@aarashfeizi·12 Kas

🚀 Announcing GroundCUA, a high-quality dataset for grounding computer-use agents. With over 3M expert annotations spanning 87 desktop apps, we use our new dataset to train state-of-the-art grounding models, namely GroundNext-3B and GroundNext-7B. 👇 Thread

English

22.4K

Kamran Chitsaz retweetledi

Mohammad Pezeshki@mpezeshki91·7 Kas

We show a phase transition for optimal data curation: For strong models, concentrating on difficult samples drives further improvement (LIMO). In contrast, weaker models benefit from the conventional "More is More" where broad data exposure is essential to learn core capabilities

Elvis Dohmatob@dohmatobelvis

1/n "Less is More" (s1, etc.) vs "More is More", which mantra is correct for the training/fine-tuning large LLMs? In our recent preprint, we reconcile both of these. They correspond to different parts of a complex phase diagram

English

2.3K

Kamran Chitsaz retweetledi

Amirhossein Kazemnejad@a_kazemnejad·3 Kas

After nearly 3 years since our NeurIPS paper, SOTA architectures are now adopting NoPE. Kimi Linear uses NoPE for all full-attention layers (not a RoPE hybrid).

Rohan Paul@rohanpaul_ai

The brilliant Kimi Linear paper. It's a hybrid attention that beats full attention while cutting memory by up to 75% and keeping 1M token decoding up to 6x faster. It cuts the key value cache by up to 75% and delivers up to 6x faster decoding at 1M context. Full attention is slow because it compares every token with every other token and stores all past keys and values. Kimi Linear speeds this up by keeping a small fixed memory per head and updating it step by step like a running summary, so compute and memory stop growing with length. Their new Kimi Delta Attention adds a per channel forget gate, which means each feature can separately decide what to keep and what to fade, so useful details remain and clutter goes away. They also add a tiny corrective update on every step, which nudges the memory toward the right mapping between keys and values instead of just piling on more data. The model stacks 3 of these fast KDA layers then 1 full attention layer, so it still gets occasional global mixing while cutting the key value cache roughly by 75%. Full attention layers run with no positional encoding, and KDA learns order and recency itself, which simplifies the stack and helps at long ranges. Under the hood, a chunkwise algorithm plus a constrained diagonal plus low rank design removes unstable divisions and drops several big matrix multiplies, so the kernels run much faster on GPUs. With the same training setup, it scores higher on common tests, long context retrieval, and math reinforcement learning, while staying fast even at 1M tokens. It drops into existing systems, saves memory, scales to 1M tokens, and improves accuracy without serving changes. ---- Paper – arxiv. org/abs/2510.26692 Paper Title: "Kimi Linear: An Expressive, Efficient Attention Architecture"

English

366

52K

Kamran Chitsaz retweetledi

Divyat Mahajan@divyat09·29 Eki

[1/9] While pretraining data might be hitting a wall, novel methods for modeling it are just getting started! We introduce future summary prediction (FSP), where the model predicts future sequence embeddings to reduce teacher forcing & shortcut learning. 📌Predict a learned embedding of the future sequence, not the tokens themselves

GIF

English

222

60.3K

Kamran Chitsaz retweetledi

Mohammad Pezeshki@mpezeshki91·30 Eki

My prediction is that next-token prediction loss will not last the test of time, and the next frontier models will need richer loss functions. In this paper, we take a step towards that, shifting from predicting a single token to predicting a summary of the future.

Divyat Mahajan@divyat09

English

3.4K

Kamran Chitsaz retweetledi

Sarath Chandar@apsarathchandar·24 Eki

I am recruiting several graduate students (both MSc and PhD level) for Fall 2026 @ChandarLab! The application deadline is December 01. Please apply through the @Mila_Quebec supervision request process here: mila.quebec/en/prospective…. More details about the recruitment process here: chandar-lab.github.io/join/

English

157

581

50.4K

Kamran Chitsaz retweetledi

Artem Zholus@artemZholus·21 Eki

I can't attend #ICCV 2025 in Honolulu, Hawaii but my amazing teammates will be there! Please stop by our poster tomorrow 21 Oct (#438) to learn about TAPNext, a general, ViT-like architecture with SOTA point tracking quality! Links: 🌐 website: tap-next.github.io

English

1.5K

Kamran Chitsaz retweetledi

Mohammad Pezeshki@mpezeshki91·21 Eki

Alleviating long context issues: Iterative Amortized Inference (IAI) refines solutions step-by-step over mini-batches, just like stochastic optimization. IAI merges: - Scalability of stochastic opt. (SGD). - Expressivity of forward-pass amortization (ICL in LLMs).

Sarthak Mittal@sarthmit

Meta on meta: thrilled to share our work on Meta-learning… at Meta! 🔥🧠 We make two major contributions: 1️⃣ Unified framework revealing insights into various amortizations 🧠 2️⃣ Greedy belief-state updates to handle long context-lengths 🚀

English

1.7K

Kamran Chitsaz retweetledi

Mila - Institut québécois d'IA@Mila_Quebec·15 Eki

Mila's annual supervision request process is now open to receive MSc and PhD applications for Fall 2026 admission! For more information, visit mila.quebec/en/prospective…

English

122

105.8K

Keşfet

@ZyphraAI @AMD @iclr_conf @prashantg_17 @DavideBald42296 @qfournier2 @apsarathchandar @NeurIPSConf