Marawan Gamal

31 posts

Marawan Gamal

@mrremila

PhD @ Mila UoFT and UWaterloo Alum Things I made: https://t.co/fFBYP0mofu https://t.co/57TzY6sMNn

montreal Katılım Haziran 2023

111 Takip Edilen17 Takipçiler

Sabitlenmiş Tweet

Marawan Gamal@mrremila·16 Nis

New preprint! Introducing ACTMat: Model Merging via Data-Free Covariance Estimation Work done w/ @dtredsox13, @PTikeng, Colin Raffel and Guillaume Rabusseau TL;DR: It's RegMean, but without needing data for covariance estimation (C ≈ Δᵀ Δ) 📄 arxiv.org/pdf/2604.01329 🧵(1/6)

Română

1.2K

Marawan Gamal@mrremila·20h

@predict_addict Isn’t that roughly just 25 employees? (Assuming 100k annual cost per employee)

English

Valeriy M., PhD, MBA, CQF@predict_addict·1d

ArXiv received over $25 million over the last decade. What are they spending money on? The army of moderators (circa ~300) is unpaid. The authors are unpaid. This must be quite a lucrative business on par with publishing cartels like Elsevier.

English

2.1K

Marawan Gamal@mrremila·20h

@TeX64AI @Arian_Khorasani @dggoldst Oh nice! I never knew about that site!

English

TeX64@TeX64AI·1d

@mrremila @Arian_Khorasani @dggoldst Risk with GPT-bibtex: author names get subtly garbled with no error signal. If the paper has a DOI, doi2bib.org pulls the canonical Crossref record — deterministic and clean. For arXiv: arxiv.org/bibtex/ARXIV_ID. Older papers without DOIs are the genuinely hard case.

English

Dan Goldstein@dggoldst·2d

arxiv imposing a 1 year ban on authors who submit hallucinated references is good. Authors should be able to write with AI but need to be fully accountable for catching that level of error. Doesn't matter if humans also sometimes cite non-existent things. 1 year ban them too.

English

476

28.5K

Marawan Gamal@mrremila·1d

@Arian_Khorasani @dggoldst I feel referencing errors should also be treated differently from hallucinated references. For instance I often pass the title of the paper,authors,venue and ask gpt to make the bibtex

English

Arian Khorasani 🦅@Arian_Khorasani·2d

I think arXiv's new policy is a perfect starting one, but we also have to consider how messy citation data actually is in the real world. A ton of older or more obscure papers don't have clean BibTeX files or standardized metadata, which forces authors to type them out manually. That leads to messy formatting and typos that might look fake to an automated scanner, even though the paper actually exists. It's a huge headache, and I just hope any enforcement tool can actually tell the difference between a sloppy manual typo and a completely hallucinated AI source.

English

415

Marawan Gamal@mrremila·1d

@benno_krojer Would be cool if there was a chat

English

156

Benno Krojer@benno_krojer·1d

Is other academic folks' memory just better than mine? Or is this just a regular awkward situation in reviewing: When reviewing for a conference, i have to admit i forget most things about a paper after 1 month, so when the rebuttal arrives i rarely remember why exactly i wrote all the weaknesses and whether the author's rebuttal addresses them well or not Ofc i could reread the paper but that also feels excessive

English

2.3K

Marawan Gamal@mrremila·23 Nis

If you want to semantically search through ICLR papers this week, check out papers.app

English

Marawan Gamal@mrremila·22 Nis

@leonardoalt Should we expect significant performance drops given how out of domain it must be to code with caveman reasoning traces?

English

Leo Alt@leonardoalt·21 Nis

told Claude to use caveman dialect + first order logic instead of plain (verbose) English, great result, save token

English

152

13.5K

Marawan Gamal retweetledi

Divyat Mahajan@divyat09·20 Nis

[1/8] In-context learning and prior-fitted networks are gaining traction for causal effect estimation in tabular datasets. But can they also generate novel observational & interventional data from causal models? 📌Introducing Cond-FiP: In-context learning of causal mechanisms via conditional fixed-point iterations. 📜 arxiv.org/abs/2410.06128 🔹Accepted at TMLR and to be presented at ICLR 26 in the J2C track!

English

4.9K

Marawan Gamal@mrremila·20 Nis

@JFPuget Best reply i've seen 😁

English

156

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱@JFPuget·20 Nis

I disagree. There are great JAX users.

François Chollet@fchollet

When looking at deep learning profiles, one of the most obvious tells between a mediocre and great candidate is whether they list PyTorch or JAX.

English

271

27.3K

Marawan Gamal retweetledi

Michael Rizvi-Martel@frisbeemortel·9 Nis

Latent CoT is an alternative LLM reasoning scheme hypothesized to enable “superposition” allowing models to hold uncertainty over multiple concepts during reasoning 💭 We revisit superposition in 3 latent CoT approaches and find that it is largely an illusion 🔮! More in 🧵

English

168

14K

Marawan Gamal@mrremila·17 Nis

@John3tck @dtredsox13 @PTikeng Surprisingly, we actually stumbled upon this approximation while trying to understand why TSV works

English

John@John3tck·17 Nis

@mrremila @dtredsox13 @PTikeng likely inspired by WUDI-MEGRE, good work

English

Marawan Gamal@mrremila·16 Nis

Română

1.2K

Marawan Gamal@mrremila·16 Nis

@dtredsox13 @PTikeng 🧵(6/6) We also evaluate merging RL Zero fine-tuned models (Olmo-3-7b), and find ACTMat outperforms other methods on average.

English

Marawan Gamal@mrremila·16 Nis

@dtredsox13 @PTikeng 🧵(5/6) On the common ViT/T5 benchmarks ACTMat outperforms other data-free baselines (with and without LoRA)

English

Marawan Gamal@mrremila·16 Nis

@dtredsox13 @PTikeng 🧵(4/6) Why does C ≈ Δᵀ Δ? The angular distance between these two quantities can be upper bounded by three error terms (Theorem 3.1). Which we find to each be empirically close to zero.

English

Marawan Gamal@mrremila·16 Nis

@dtredsox13 @PTikeng 🧵(3/6) We show that C ≈ Δᵀ Δ, and thus propose to merge models using

Català

Marawan Gamal@mrremila·16 Nis

@dtredsox13 @PTikeng 🧵(2/6) RegMean poses model merging as layer-wise interference minimization, leading to the following merge rule for merging T linear layers

English

Marawan Gamal@mrremila·8 Kas

@joaocarreira Hey Joao! I emailed you but not sure if it went to the right place.

English

joao carreira@joaocarreira·6 Kas

If interested, drop me an email saying what you think is missing or wrong in current models -- this should be enough to see if we'll be a good match!

English

joao carreira@joaocarreira·6 Kas

I'm looking for a student researcher to work with me at Google DeepMind in London, preferably starting early next year -- topics will be around novel video model architectures / learning from a single video stream / representation learning .

English

777

56.5K

Marawan Gamal@mrremila·5 Kas

@bose_joey Had all of us Mila students at "great coffee"

English

656

Joey Bose@bose_joey·5 Kas

Come do a PhD with me 😀! Promise of fun science and great coffee ☕

Gilad@giladturok

I like the way @joeybos lays out his vision for PhD supervision! Seems intense and rewarding.

English

731

116.8K

Marawan Gamal retweetledi

Adithya S K@adithya_s_k·3 Kas

Quick Question to folks who do model finetuning One of the trickiest parts of fine-tuning LLMs, in my experience, is retaining the original capabilities of the post-trained model. Let’s take an example say you want to fine-tune a VLM for OCR, layout detection, or any other specific task using LoRA. Even if you train for just 1 epoch with a rank value anywhere between 16 and 64, the model’s general capabilities tend to take a noticeable hit on the general tasks. Its fine if you want a specialised model but if you want a model to be task specific but also have its original capability , it can get tricky This challenge becomes even more pronounced when working on multilingual finetuning. Adapting a model for cross-lingual tasks without degrading its performance in English is quite difficult. One possible approach according to me is to include a subset of the original SFT data in your training mix but the issue is that most companies rarely open-source this data, with only a few exceptions like OLMo by @allen_ai. What I usually do is deploy the model on @vllm_project , then batch sample responses across a variety of robust QA/VQA datasets at different temperatures. This allows me to reconstruct roughly distribution of the training dataset and use that as part(10~30%) of the fine-tuning data to help preserve general performance but that subset should be robust enough are there any other approaches that have worked well for you guys?

English

275

23.9K

Marawan Gamal retweetledi

Mehrdad Farajtabar@MFarajtabar·21 Tem

🧵 1/12 Your LLM Knows the Future: Revealing its Multi-token Prediction Capabilities Autoregressive (AR) models power today's LLMs by predicting one token at a time. But what if they could see into the future? In our latest work, we show how to turn AR-trained models into multi-token predictors—for 1.5×–5× faster, cheaper inference, without quality loss. ⬇️ arxiv.org/pdf/2507.11851 Led by Mohammad Samragh, Arnav Kundu, David Harrison! With Kumari Nishu, Devang Naik, and Minsik Cho.

English

154

12.9K

Keşfet

@predict_addict @TeX64AI @Arian_Khorasani @dggoldst @benno_krojer @leonardoalt @JFPuget @John3tck