Shady

667 posts

Shady banner
Shady

Shady

@ShadyAlii0

Learning, and trying to make the Machine Learn | Research Assistant @MinnesotaNLP

Minneapolis, MN Katılım Mayıs 2025
2.1K Takip Edilen306 Takipçiler
Sabitlenmiş Tweet
Shady
Shady@ShadyAlii0·
I'm also currently training on one Nvidia DGX Spark, which is limiting the batch size to about 100 samples only. This is not the best size for contrastive learning, as seeing more negatives at a time is more helpful to the learning objective, so I'll probably try distributed training on x2 DGX Sparks!
English
1
0
6
2.2K
Shady
Shady@ShadyAlii0·
It's been a while, and it's kinda scary one can't use the "I'm still an undergrad" shield from this month onwards :D
Shady tweet media
English
1
0
5
129
Shady
Shady@ShadyAlii0·
@idavidrein Their analysis based on IRT looks really interesting. I wonder what was their exact setup & LLM sample size for fitting the IRT models, as I think it can affect the latent ability estimation.
English
0
0
0
25
david rein
david rein@idavidrein·
They use item-response theory (IRT) across a bunch of models and benchmarks to aggregate scores, jointly estimating both task difficulty and model capabilities, and they also have some nice cost-aware comparisons. Their blog post: nist.gov/news-events/ne…
david rein tweet media
English
2
0
10
912
david rein
david rein@idavidrein·
The Center for AI Standards and Innovation (CAISI) is estimating that the rate of progress of Chinese frontier AI is slower than the US's. 16 months ago (January 2025) the gap was ~4 months, now it's ~8 months.
david rein tweet media
English
3
3
55
3.3K
Demi Wang
Demi Wang@demisama_·
Life update: After a long, stressful, and busy internship hunt, I'll be joining @MSFTResearch this summer as a Research Intern, working on LLM agents! Would love to connect with ppl in Seattle. I'm into bouldering, poker, food exploring, bar hopping, and occasionally raving :)
English
16
1
346
29.6K
Shady
Shady@ShadyAlii0·
@IanArawjo Thank you! This looks even more interesting now
English
0
0
0
9
Ian Arawjo
Ian Arawjo@IanArawjo·
@ShadyAlii0 The coverage rate of a CI method is the proportion of times that the interval contains the true population parameter upon repeated testing. It's essentially its performance. A 95% CI *should* cover the true mean 95% of the time; these simulations test how true that is.
English
1
0
1
51
Ian Arawjo
Ian Arawjo@IanArawjo·
Re-run the CI methods comparison using real LLM eval data across 15 benchmarks. Here's the plot. The twist—added an empirical likelihood-based method, a Bayesian method with Normal-Inverse-Gamma prior, and a log-transformed t-interval. The latter two crush it—very efficient, too:
Ian Arawjo tweet media
English
2
0
7
1.3K
Shady
Shady@ShadyAlii0·
That’s really interesting! I’ve tried working on the reasoning geometry to predict steps correctness alongside the final result, but it failed in downstream apps I tried. It’s genuinely exciting to see the different layers’ representations here and the idea working for steering too! Congrats!!
English
0
0
1
130
Lihao Sun
Lihao Sun@1e0sun·
How do LLMs do CoT reasoning internally? In our new #ACL2026 paper, we show that reasoning unfolds as a structured trajectory in representation space. Correct and incorrect paths diverge, and we use this to predict correctness before the answer and correct errors mid-flight. 1/
Lihao Sun tweet media
English
12
34
288
19.5K
Shady
Shady@ShadyAlii0·
I didn’t read it yet but I think if we want to test the meta cognitive abilities of these models in that case, it’d be more grounded to give them snippets of that rare language’s documentation in context or allow them to retrieve if in agent mode. It’s fair to expect a smart developer to be able to transfer the business logic between languages or “setups”, but they’d still need to read or understand how to express general logic in those languages as well. And maybe for agentic settings you could test them without documentation reference but allow them to plug and play with the code in an environment so they can develop a sense of how the syntax works with feedback which could be a fair comparison too
English
0
0
2
274
Shady
Shady@ShadyAlii0·
I’m still having a really hard time myself on this, but something I noticed is just replicating 1-3 papers in that specific I started working/reading on, and doing some extra analysis on those results (not about if they’re similar to the original, but more of what other dimensions I can look at) and from those extra stuff you do and look at, you can have a tighter and written down vision of “what’s missing” or where you could dig deeper. Im too new to this though so idk
English
0
0
0
264
silicognition
silicognition@silicognition·
people who are doing research, how do you go from reading papers & ideation to getting down to something concrete which can be actually done? i have ideas, read a lot of papers but from a fuzzy cloud of insights & inspirations, i would like to get to the finish line help pls!
English
70
128
1.9K
60.6K
Shady
Shady@ShadyAlii0·
فرق الريسورز فى امريكا عن مصر مقدرتش استوعبه غير لما خدت بالى إن اول بيبر اشتغلت عليها فى مصر قعدت مع زمايلى نعمل اكونتات على gemini عشان الfree api credits و نلف على الapi keys كل ما واحد منهم يقفل كل ده عشان الجامعة متقدرش تreimburse او تدفعلنا expenses الexperiments. الموضوع مؤسف الحقيقة بذات انه على الحال ده فى مجال resourceful من الاساس و اكيد مجالات تانية بتحتاج lab equipment زى الlife sciences وضعهم اسوء بكتير.
العربية
1
0
61
9.6K
Ahmed
Ahmed@ahmd3ssam·
ده كمية الداتا الي قاعد بفعص فيها علشان الاصدار ٣، تخيل ده لو طالب في مصر عايز يعمل الكلام ده، هيحتاج كام سنة؟ طالب في جامعة القاهرة او عين شمس مش الجامعة الامريكية؟ و تخيل بقى لو واحد هاوي و حابب يجرب؟ ازاي المفروض مبادرة المليون مش عارف ايه و الانترنت يدوبك تبعت رسايل لو اشتغلت
Ahmed tweet media
العربية
15
25
494
101.4K
Shuaichen Chang
Shuaichen Chang@ShuaichenChang·
My team at the AWS AI Lab (based in NYC) is hiring several research interns this summer. I’ll be working closely with a few interns on projects focused on LLM memory and continual learning, aimed at publication. Feel free to DM me if you’re interested, have relevant experience, or would like to refer a student.
English
28
32
779
63K
Shady
Shady@ShadyAlii0·
Going back to playing squash has been the best thing I did in 2026 so far. I missed the game and I kinda look forward to it every week
English
0
0
2
828
Shady
Shady@ShadyAlii0·
Who even still uses matrix cross product in big 2026
English
0
0
2
330
Esha
Esha@esha_hq·
so i just caused my entire building to evacuate while meal prepping. easily most embarrassing moment of college but owned it by giving an apology speech to everyone in the apartment lobby. comms right?
Esha tweet media
English
13
0
59
4.3K
Shady
Shady@ShadyAlii0·
Wouldn’t AI safety in this context include multiple possible interpretations? Like the software-side safety and making “safe” software with AI, and/or the general safety of generative models that can take actions and “produce” things? I feel both are important in this context but the software part is much less explored/discussed unlike the general safety definition, even though the biggest usecase for these agents and architectures seems to be coding, so far
English
1
0
1
107
Maxime Chevalier
Maxime Chevalier@Love2Code·
Lots of people blindly believe AI safety is never going to be an issue. Meanwhile top AI labs are using LLMs to write 80% of their code, and said code relies on 5000 poorly-written Python packages that were never audited. As if humans were any good at writing bug-free code.
English
15
4
90
7.1K
Shady
Shady@ShadyAlii0·
@maharshii Created 0.5 terabyte of embeddings and struggled for a full day just to store and re-read them again for training without waiting half an hour to load a batch into memory
English
0
0
0
155
maharshi
maharshi@maharshii·
the deeper i go the more i feel that 80% of the time is spent on data movement and transformation, while only 20% involves actually computing stuff
English
16
3
220
9.9K
Shady
Shady@ShadyAlii0·
I feel that there’s no shame in that as long as it’s communicated that way. And it’s kinda amazing that we are at that level of brute-forcing things with this kind of intelligence! It feels that the challenge with deploying LLMs for consumption other than chatbots, like in agentic form or scientific discovery, is now more oriented towards developing software capable of utilizing these models “as they are” without demanding what “should be” instead of what is. And maybe intrinsic intelligence will be more long-term achievement (5-10yrs) where we don’t need to rely heavily on external software to steer these models to be productive.
English
0
0
3
272
Shady
Shady@ShadyAlii0·
@SherifKozman @MostafaNageeb المشكلة اعتقد فى نفس المكان برضو لا عندك شركات بتبص للoss, ولا عندك budget كويس كا فرد او مجموعة تدفعوا فى api للموديلز الكويسة، ولا برضو عندكم GPUs تستخدموا local models. و معتقدش فى اى grants او funding بيساعد فى الحاجات الى زى دى بأى شكل سواء حكومى او خاص
العربية
0
0
1
57
KoZman
KoZman@SherifKozman·
@MostafaNageeb كل سنة و انت طيب و حمد الله علي السلامة. مش لازم GPUs أنا بتكلم علي مشاريع زي OpenClawd و Ralph و غيرهم كتير
العربية
1
0
4
1.5K
KoZman
KoZman@SherifKozman·
سؤال بجد، هو ليه مفيش مشاريعAI لطيفة طالعة من مصر زي ما في إسهال Opensource طالع من امريكا و دول تانية؟ ايه اللي موقف الناس انها تعمل بدل الشكوي ان الAI هيقعدهم في البيت ؟
العربية
6
1
33
13.6K