Haz Sameen Shahgir

63 posts

Haz Sameen Shahgir

@sameen2080

PhD Student @UCRiverside, intern @Amazon, undergrad @BUET FromSoft enjoyer

انضم Temmuz 2023

137 يتبع29 المتابعون

تغريدة مثبتة

Haz Sameen Shahgir@sameen2080·10 Tem

Our IllusionVQA paper has been accepted at @COLM_conf ! 🥳

Haz Sameen Shahgir@sameen2080

Can multimodal models like GPT4V comprehend optical illusions? Can they tell apart illusions from ordinary objects? 🚀 Introducing IllusionVQA 👁️🧠 📄 Paper: arxiv.org/abs/2403.15952 🌐 Website: illusionvqa.github.io

English

208

Haz Sameen Shahgir@sameen2080·30 Mar

@willccbb @brendanh0gan

QME

will brown@willccbb·29 Mar

i am no longer “that one morgan stanley guy who posts fun open-source grpo experiments” there are more of us

English

403

33.6K

Haz Sameen Shahgir@sameen2080·4 Şub

@osanseviero 6. mattshumer/Reflection-Llama-3.1-70B 😬

Español

Omar Sanseviero@osanseviero·4 Şub

Which are your top 5 ML dramas?🍿 1. Llama and Zetta llama drama 2. What did Ilya see? 3. StabilityAI take-down of Runway Stable Diffusion 4. Hugging Face removal of GPT-4chan 5. Schmidhubering

English

252

30.9K

Haz Sameen Shahgir@sameen2080·30 Oca

@srush_nlp @NotFredd3 @jxmnop @jorje996 Bruh 😆

English

Sasha Rush@srush_nlp·30 Oca

@NotFredd3 @jxmnop @jorje996 Yeah, what the heck! Go get a job.

English

173

dr. jack morris@jxmnop·29 Oca

another incredible thing about deepseek: all the american AI labs compete to hire the top PhD researchers - but deepseek didn’t compete deepseek researchers aren’t top PhDs. most are not even PhDs

English

258

286

807.9K

Haz Sameen Shahgir@sameen2080·14 Oca

@teortaxesTex @ericjang11 "After repeatedly changing his degree between different subjects like natural sciences, history of art, and philosophy, he eventually graduated with a BA degree in experimental psychology in 1970" - Wikipedia Checks out.

English

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·13 Oca

Dropout is motivated by the role of sex in evolution too Hinton seems to like these cross-domain inspirations

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

English

140

Eric Jang@ericjang11·13 Oca

The opening sentence goes so hard. This paper was 10 years ahead of its time.

English

365

5.1K

314.7K

Haz Sameen Shahgir@sameen2080·9 Oca

@danielhanchen @andrew_n_carr Something something condition number?

English

Daniel Han@danielhanchen·8 Oca

@andrew_n_carr Coincidentally I literally did an entire final year uni project for this :) Also why not inverse and why QR. Or why divide n conquer SVD is faster. Or if matrix is < 2000 cols use Cholesky via POTRF and SSYRK or do column pivoting etc And LSQR LSMR, sparse methods etc! Fun!!

English

184

29.3K

Andrew Carr 🤸@andrew_n_carr·8 Oca

Another great interview question! For linear regression, we can directly compute the minimizer as β= = (X^TX)^{-1}X^Ty. So why do we often use gradient descent instead?

English

1.5K

343.7K

Haz Sameen Shahgir@sameen2080·28 Ara

@m2saxon Ah, makes sense. Pretty sad that it's a obfuscation system.

English

Michael Saxon@m2saxon·28 Ara

@sameen2080 I see, I hadn't been following closely enough to be aware that qwq (indisputably a model) is considered a reasoning model Regarding o1 though, I think that hiding the chain of thought tokens is a significant enough intervention on the raw outputs of the model to be a system

English

Michael Saxon@m2saxon·28 Ara

Can someone explain why o1 and ilk are described as "reasoning models" and not as "reasoning systems?" Isn't it a LM inside of a bigger structure?

English

1.4K

Haz Sameen Shahgir@sameen2080·1 Ara

@venturetwins They patched it?

English

Justine Moore@venturetwins·30 Kas

ChatGPT refuses to say the name “David Mayer,” and no one knows why. If you try to get it to write the name, the chat immediately ends. People have attempted all sorts of things - ciphers, riddles, tricks - and nothing works.

English

3.1K

3.7K

52.9K

10.5M

Haz Sameen Shahgir@sameen2080·30 Kas

@pranjalssh Excellent work. Couple of questions tho "...hence tensor core instructions require storing C over 128 threads in a SM" - Shouldn't it be 1024/256=4? "When we distribute C over a warp-group, each thread needs 1024/128 = 8 threads " - what does each thread needing 8 threads mean?

English

694

Pranjal@pranjalssh·30 Kas

I implemented H100 cuda matmul kernel from scratch, taking inspiration from @Si_Boehm's blog. Our final kernel outperforms cuBLAS by 7% for N=4096. It fits in a single C++ file without any dependencies. Full-blown blog post with all details: cudaforfun.substack.com/p/outperformin…

English

288

49.9K

Haz Sameen Shahgir@sameen2080·6 Kas

@simonw Why not both?

English

Simon Willison@simonw·6 Kas

After hassling Anthropic for months for a token counting library similar to OpenAI's tiktoken I just realized the Anthropic and Gemini approach of providing a free token counting API is actually better... because I don't know how to use tiktoken to count tools, images etc

English

251

30.8K

Haz Sameen Shahgir@sameen2080·8 Eki

@m2saxon Thanks for the shout out!

English

Michael Saxon@m2saxon·7 Eki

"IllusionVQA": Haz Sameen Shahgir, Khondker Salman Sayeed et al Testing VLM reasoning over optical illusion questions. For some *perceptual* illusions (same size, color) VLMs are superhuman, but for *logical* ones like "impossible shapes" they're worse. openreview.net/forum?id=7ysaJ…

English

837

Haz Sameen Shahgir@sameen2080·27 Eyl

@deliprao Seems like it's possible on Together AI (together.ai)

English

6.6K

Delip Rao e/σ@deliprao·27 Eyl

is there an llm finetuning service that will accept my data, train an open model (say llama3.2), and allow me to download the trained model?

English

797

246.9K

Haz Sameen Shahgir@sameen2080·20 Eyl

@teknium Jailbreak it to reveal the CoT? 🤔

English

Teknium (e/λ)@Teknium·19 Eyl

People probably don’t got enough questions to make it think as long as they expected (aka we’re all too dumb for it already)

English

228

10.7K

Teknium (e/λ)@Teknium·19 Eyl

Can’t tell if this is because there isn’t much demand or something else

OpenAI Developers@OpenAIDevs

Just 5x'd rate limits again: o1-preview: 500 requests per minute o1-mini: 1000 requests per minute

English

547

78.8K

Haz Sameen Shahgir@sameen2080·4 Eyl

@hu_yifei Yeah, current LLMs have really poor support for Bengali. NLLB was careful about this and upsampled Bengali. NLLB character/token for Bengali is 3.35 (higher the better) LLaMA-3, Qwen2, Mistral and Aya are all at about ~0.8. For reference English Char/Token is around ~4.5

English

Yifei Hu@hu_yifei·3 Eyl

Since I am working on multilingual stuff, I translated a piece of text from an academic paper to different languages. It seems like tokenizers are not friendly to certain languages despite they are among the most spoken languages in the world. People who speak Hindi or Bengali can confirm on this?

English

6.6K

Haz Sameen Shahgir أُعيد تغريده

Xintao Wang@xintao_w·14 Ağu

Presenting our paper in #cosplay at #ACL2024 🇹🇭, getting surprised and encouraged by the overwhelming interest we've received! Grateful for everyone’s support, especially the collaborators @LrzNeedResearch @JentseHuang. Paper Link🔗: incharacter.github.io

English

152

25.3K

Haz Sameen Shahgir@sameen2080·6 Tem

[N/N] This project was jointly lead by @MdTokiTahmid64 and yours truly. 😊 Check out our preprint: biorxiv.org/content/10.110… Try BiRNA-BERT yourself: huggingface.co/collections/bu… #RNAModeling #Bioinformatics #RNA

English

110

Haz Sameen Shahgir@sameen2080·6 Tem

[8/N] 🔍 Finally, we perform **extensive** ablation studies that confirm that training on both BPE+Nucleotide tokens of each sequence is just as good as training two models separately with no performance loss.

English

Haz Sameen Shahgir@sameen2080·6 Tem

[5/N] 📏 On long RNA sequences, BiRNA-BERT uses BPE to generate compressed sequence embeddings and can process RNA sequences 5 times longer than current SOTA RNA models with the same memory footprint.

English

اكتشف

@willccbb @brendanh0gan @osanseviero @srush_nlp @NotFredd3 @jxmnop @jorje996 @teortaxesTex