Marawan Gamal

31 posts

Marawan Gamal

Marawan Gamal

@mrremila

PhD @ Mila UoFT and UWaterloo Alum Things I made: https://t.co/fFBYP0mofu https://t.co/57TzY6sMNn

montreal Katılım Haziran 2023
111 Takip Edilen17 Takipçiler
Sabitlenmiş Tweet
Marawan Gamal
Marawan Gamal@mrremila·
New preprint! Introducing ACTMat: Model Merging via Data-Free Covariance Estimation Work done w/ @dtredsox13, @PTikeng, Colin Raffel and Guillaume Rabusseau TL;DR: It's RegMean, but without needing data for covariance estimation (C ≈ Δᵀ Δ) 📄 arxiv.org/pdf/2604.01329 🧵(1/6)
Marawan Gamal tweet media
Română
6
6
12
1.2K
Valeriy M., PhD, MBA, CQF
Valeriy M., PhD, MBA, CQF@predict_addict·
ArXiv received over $25 million over the last decade. What are they spending money on? The army of moderators (circa ~300) is unpaid. The authors are unpaid. This must be quite a lucrative business on par with publishing cartels like Elsevier.
English
2
1
12
2.1K
Dan Goldstein
Dan Goldstein@dggoldst·
arxiv imposing a 1 year ban on authors who submit hallucinated references is good. Authors should be able to write with AI but need to be fully accountable for catching that level of error. Doesn't matter if humans also sometimes cite non-existent things. 1 year ban them too.
English
11
32
476
28.5K
Marawan Gamal
Marawan Gamal@mrremila·
@Arian_Khorasani @dggoldst I feel referencing errors should also be treated differently from hallucinated references. For instance I often pass the title of the paper,authors,venue and ask gpt to make the bibtex
English
1
0
1
44
Arian Khorasani 🦅
Arian Khorasani 🦅@Arian_Khorasani·
I think arXiv's new policy is a perfect starting one, but we also have to consider how messy citation data actually is in the real world. A ton of older or more obscure papers don't have clean BibTeX files or standardized metadata, which forces authors to type them out manually. That leads to messy formatting and typos that might look fake to an automated scanner, even though the paper actually exists. It's a huge headache, and I just hope any enforcement tool can actually tell the difference between a sloppy manual typo and a completely hallucinated AI source.
English
2
0
3
415
Benno Krojer
Benno Krojer@benno_krojer·
Is other academic folks' memory just better than mine? Or is this just a regular awkward situation in reviewing: When reviewing for a conference, i have to admit i forget most things about a paper after 1 month, so when the rebuttal arrives i rarely remember why exactly i wrote all the weaknesses and whether the author's rebuttal addresses them well or not Ofc i could reread the paper but that also feels excessive
English
3
0
15
2.3K
Marawan Gamal
Marawan Gamal@mrremila·
If you want to semantically search through ICLR papers this week, check out papers.app
English
0
1
3
66
Marawan Gamal
Marawan Gamal@mrremila·
@leonardoalt Should we expect significant performance drops given how out of domain it must be to code with caveman reasoning traces?
English
0
0
0
77
Leo Alt
Leo Alt@leonardoalt·
told Claude to use caveman dialect + first order logic instead of plain (verbose) English, great result, save token
Leo Alt tweet media
English
13
8
152
13.5K
Marawan Gamal retweetledi
Divyat Mahajan
Divyat Mahajan@divyat09·
[1/8] In-context learning and prior-fitted networks are gaining traction for causal effect estimation in tabular datasets. But can they also generate novel observational & interventional data from causal models? 📌Introducing Cond-FiP: In-context learning of causal mechanisms via conditional fixed-point iterations. 📜 arxiv.org/abs/2410.06128 🔹Accepted at TMLR and to be presented at ICLR 26 in the J2C track!
Divyat Mahajan tweet media
English
2
23
58
4.9K
Marawan Gamal retweetledi
Michael Rizvi-Martel
Michael Rizvi-Martel@frisbeemortel·
Latent CoT is an alternative LLM reasoning scheme hypothesized to enable “superposition” allowing models to hold uncertainty over multiple concepts during reasoning 💭 We revisit superposition in 3 latent CoT approaches and find that it is largely an illusion 🔮! More in 🧵
Michael Rizvi-Martel tweet media
English
9
33
168
14K
Marawan Gamal
Marawan Gamal@mrremila·
New preprint! Introducing ACTMat: Model Merging via Data-Free Covariance Estimation Work done w/ @dtredsox13, @PTikeng, Colin Raffel and Guillaume Rabusseau TL;DR: It's RegMean, but without needing data for covariance estimation (C ≈ Δᵀ Δ) 📄 arxiv.org/pdf/2604.01329 🧵(1/6)
Marawan Gamal tweet media
Română
6
6
12
1.2K
Marawan Gamal
Marawan Gamal@mrremila·
@dtredsox13 @PTikeng 🧵(6/6) We also evaluate merging RL Zero fine-tuned models (Olmo-3-7b), and find ACTMat outperforms other methods on average.
Marawan Gamal tweet media
English
0
0
1
46
Marawan Gamal
Marawan Gamal@mrremila·
@dtredsox13 @PTikeng 🧵(5/6) On the common ViT/T5 benchmarks ACTMat outperforms other data-free baselines (with and without LoRA)
Marawan Gamal tweet media
English
0
0
1
37
Marawan Gamal
Marawan Gamal@mrremila·
@dtredsox13 @PTikeng 🧵(4/6) Why does C ≈ Δᵀ Δ? The angular distance between these two quantities can be upper bounded by three error terms (Theorem 3.1). Which we find to each be empirically close to zero.
English
0
0
1
40
Marawan Gamal
Marawan Gamal@mrremila·
@dtredsox13 @PTikeng 🧵(2/6) RegMean poses model merging as layer-wise interference minimization, leading to the following merge rule for merging T linear layers
Marawan Gamal tweet media
English
0
0
1
43
joao carreira
joao carreira@joaocarreira·
If interested, drop me an email saying what you think is missing or wrong in current models -- this should be enough to see if we'll be a good match!
English
2
3
43
4K
joao carreira
joao carreira@joaocarreira·
I'm looking for a student researcher to work with me at Google DeepMind in London, preferably starting early next year -- topics will be around novel video model architectures / learning from a single video stream / representation learning .
English
17
98
777
56.5K
Marawan Gamal retweetledi
Adithya S K
Adithya S K@adithya_s_k·
Quick Question to folks who do model finetuning One of the trickiest parts of fine-tuning LLMs, in my experience, is retaining the original capabilities of the post-trained model. Let’s take an example say you want to fine-tune a VLM for OCR, layout detection, or any other specific task using LoRA. Even if you train for just 1 epoch with a rank value anywhere between 16 and 64, the model’s general capabilities tend to take a noticeable hit on the general tasks. Its fine if you want a specialised model but if you want a model to be task specific but also have its original capability , it can get tricky This challenge becomes even more pronounced when working on multilingual finetuning. Adapting a model for cross-lingual tasks without degrading its performance in English is quite difficult. One possible approach according to me is to include a subset of the original SFT data in your training mix but the issue is that most companies rarely open-source this data, with only a few exceptions like OLMo by @allen_ai. What I usually do is deploy the model on @vllm_project , then batch sample responses across a variety of robust QA/VQA datasets at different temperatures. This allows me to reconstruct roughly distribution of the training dataset and use that as part(10~30%) of the fine-tuning data to help preserve general performance but that subset should be robust enough are there any other approaches that have worked well for you guys?
English
21
10
275
23.9K
Marawan Gamal retweetledi
Mehrdad Farajtabar
Mehrdad Farajtabar@MFarajtabar·
🧵 1/12 Your LLM Knows the Future: Revealing its Multi-token Prediction Capabilities Autoregressive (AR) models power today's LLMs by predicting one token at a time. But what if they could see into the future? In our latest work, we show how to turn AR-trained models into multi-token predictors—for 1.5×–5× faster, cheaper inference, without quality loss. ⬇️ arxiv.org/pdf/2507.11851 Led by Mohammad Samragh, Arnav Kundu, David Harrison! With Kumari Nishu, Devang Naik, and Minsik Cho.
Mehrdad Farajtabar tweet media
English
6
24
154
12.9K