Jinghan Zhang

53 posts

Jinghan Zhang

@jinghan23

CSE PhD student @hkust in her second year advised by @junxian_he . Machine learning, NLP. bluesky here: https://t.co/ECxlKtKTxz

Hong Kong Katılım Ağustos 2022

137 Takip Edilen125 Takipçiler

Sabitlenmiş Tweet

Jinghan Zhang@jinghan23·16 Nis

Thrilled to introduce our latest work "Compression Represents Intelligence Linearly" 📜✨ on the intriguing relationship between compression efficiency and intelligence in large language models (LLMs) 🤖. Diving deep into 30 public LLMs from various organizations 🌍, we examined their performance across 12 benchmarks focusing on knowledge and commonsense, coding, and mathematical reasoning 🔍. We find a near-linear correlation between LLMs' ability to compress external text corpora and their intelligence, as measured by average benchmark scores 📊. This research not only supports the theory that superior compression hints at greater intelligence but also introduces compression efficiency as a reliable, unsupervised metric for evaluating model abilities 🎯. Dive into our insights and access our datasets & pipelines for your research ➡️ 📄Paper: arxiv.org/abs/2404.09937 💻Code: github.com/hkust-nlp/llm-… 📚Dataset: huggingface.co/datasets/hkust… Great collaboration and many thanks to my collaborators @yuzhenh17 and @junxian_he !!🥰

English

Jinghan Zhang@jinghan23·27 Eki

Check out our new work on how models benefit from self-play and world modeling to explore better!

Shiqi Chen@shiqi_chen17

Want to get an LLM agent to succeed in an OOD environment? We tackle the hardest case with SPA (Self-Play Agent). No extra data, tools, or stronger models. Pure self-play. We first internalize a world model via Self-Play, then we learn how to win by RL. Like a child playing with the env to simply learn about “what if I do this?” Below, we show our findings on: What is wrong with OOD environments? What are the key factors that allow self-play to succeed? (1/8)

English

188

Jinghan Zhang@jinghan23·18 May

@sun_hanchi Thanks for asking! We discussed about it and tend to attribute it to the LLM backbone. The attention score patterns explored in the paper are from the LLM part, showing that at least something is not functioning properly here.

English

Hanchi Sun@sun_hanchi·3 May

@jinghan23 do you think is the LLM backbone’s problem, or the vision encoder’s (e.g., SigLIP)?

English

Jinghan Zhang@jinghan23·3 May

Excited to share our upcoming ICML25 paper on understanding spatial reasoning difficulties in VLMs! A nice collaboration with an amazing team. Check out the great insights below from the lead authors! Thread here 🧵⬇️

Shiqi Chen@shiqi_chen17

🚀🔥 Thrilled to announce our ICML25 paper: "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas"! We dive into the core reasons behind spatial reasoning difficulties for Vision-Language Models from an attention mechanism view. 🌍🔍 Paper: arxiv.org/pdf/2503.01773 Code: github.com/shiqichen17/Ad… Website: shiqichen17.github.io/AdaptVis/

English

410

Jinghan Zhang@jinghan23·18 May

🌟 Honored to share our #ICML25 paper as a co-first author! Excited to continue exploring model merging in cross-modality scenarios through the lens of interpretability.

Shiqi Chen@shiqi_chen17

Share our another #ICML25 paper: “Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging” ! (1/5) We use model merging to enhance VLMs' reasoning by integrating math-focused LLMs—bringing textual reasoning into multi-modal models. Surprisingly, this avoids catastrophic forgetting and yields strong performance gains as a free lunch ! 🍱 We further leverage merging as an interpretability tool and uncover a key insight: perception and chain-of-thought reasoning are naturally decomposed in the parameter space! 🌍🔍 Paper: arxiv.org/pdf/2505.05464 Code: github.com/shiqichen17/VL…

English

321

Jinghan Zhang retweetledi

Manling Li@ManlingLi_·20 Ara

[Long Tweet Ahead] Faculty Interview Tips & Common Questions: 🧘‍♀️0. Firstly, do not be nervous - Almost everything can be prepared in advance:) - Be grateful for everyone's time. - Think of it as an opportunity to share your research with others -- exciting, right? - Technical issues WILL happen -- no worries. - Try meditation! (seriously, it helps me tremendously with the interview marathon) 🚀 1. The MOST crucial part: Research Vision This is what keeps me up at night (literally!), trying to distill my entire research agenda into one powerful sentence. It is like crafting your research tagline/punchline/slogan. What is your unique contribution? What stands you out? Here is the thing: it is fundamentally why the university wants to hire you. They want to see you as a rising star for the next few years, someone who can make the university name become associated with some impactful research. Think about it: when people want to learn about a specific topic, they immediately think "Oh, I should check out X's work because they are THE person for this". The university is not just hiring a researcher; they are investing in a vision for the future of the field. The key is to come up with a punchline that captures your research identity and repeat it all the time during the talks and onsites. Ask yourself: - What will your name be associated with in the next decades? - Where is your field heading? (Is RoboGPT the future? Is Transformer really the final architecture?) - What are the REAL unsolved challenges? (Not just throwing more data at problems) Get ready to discuss: - Are large models really the future? Can we achieve true intelligence just by scaling up? - What about data bottlenecks? Is synthetic data reliable? What are effective ways to collect data? - Are models really do reasoning? Do we need symbols/structures? - Is Transformer is the final answer? What is the bottleneck of Transformer? - What are the new tasks we really need to focus on? - How do you think of the current research trend of creating evaluation benchmarks? - What is still fundamentally missing in current research? 🤓 2. The BIG question: "Why Academia?" This is actually what you should confirm multiple times with yourself. It is really about your passion and motivation: - What your happiest moments (I talked about those late-night breakthrough moments haha) - Where do you see yourself in 50 years? (Dream big! Talk about the research institute you want to build, the problems you want to solve, the leader you want to become in your field) - What is your ideal group size and resources you will need? (Be concrete!) - Are you also looking at industry jobs? Here is the real talk: we are in the age of large AI models requiring infinite GPUs. So you need to have solid answers about: - Why choose academia NOW? - How you will position yourself in this large model era - The practical stuff, like how you will handle GPU needs (I will concretely mention XX research directions that don't need massive compute, XX research require GPUs but I have XX potential funding sources, and collaboration opportunities with XX) 💼 3. Logistics about Application Materials 3.1 Application Materials: - Use figures. It is always what people firstly check when reading long documents. - DO NOT miss deadlines! You can usually update materials after submission. 3.2 Personal Websites: - CV and websites are very important (I personally feel it is even more important than research statements, or at least equal) - Two Must-Haves for your website: (1) CV (fresh and updated!), research statement, teaching statement, diversity research statement (since people may not be able to find your package quickly during onsites; the research statements can always be updated if you have better ideas of your storyline) (2) your email address: make it OBVIOUS. ⏰ 4. Logistics about Timeline 4.1 My actual timeline: - First phone interview: Dec 14 - First onsite: Jan 11 - Last onsite: Apr 7 4.2 If you are asked to choose an interview slot: - Most importantly, figuring out whether it is "rolling-based" or not. - Rolling admissions? Interview earlier! - Non-rolling? Later interviews = more practice = better performance 4.3 Timing matters: - Schedule your dream schools for mid-Feb to early-Mar (Most universities I interviewed with after mid-March did not extend an offer, but almost all my January and February interviews led to offers, while the universities are similar ranks) - Health tip: Protect yourself from COVID during Jan-Feb interview season (I have to reschedule several interviews, learned this one from experience!😷) 🎙️ 5. Logistics about Phone Interviews Let us talk about something that makes everyone nervous: the interview process. I have learned that preparation is KEY. Let us go through step by step. 5.1 The "Why THIS School" Question (some universities even ask this as the first question, so I started to do more preparations on this part): - First think about what makes the university special (Is it known for something unique? What research centers do they have?) - Name drop (respectfully!) potential collaborators in the department - Track their recent wins (I always check department news before interviews) - Think about location benefits (research collaborations, funding opportunities, industry connections) Pro tip: Keep a cheat sheet with specific details for each school. Trust me, it helps when you're on your 5th interview and the details start blurring! 5.2 Research Vision 2.0 (School Edition) This is where you customize your research vision for THEIR context: - Show how you fill a unique gap in their department - Paint an exciting picture of future collaborations (for example, when listing your future direction, you can say: I am excited about this future direction xx and xx university is perfect for me since I can collaborate with faculties xx, and research centers xx. ) 5.3 Teaching Plans: - Specific course numbers (both undergrad and grad levels) that you could teach - Your dream course ideas (I actually created a full syllabus for a Multimodal Machine Learning course and put it on my website. Having concrete materials ready shows you are serious about teaching) 🎤 6. Logistics about Onsite (The Big Show) Alright, let us talk about the main event, the onsite interview! This is where the real magic happens, and a lot of things can be prepared as always. 6.1 The Job Talk: Your Moment to Shine ✨ Let's be real: this is THE most important part of whether you can get the offer (others are all minor). Still, your research vision is the most important: - Boil down your idea to ONE powerful sentence (and repeat it strategically!) - The first 10-15 minutes are GOLD. Some dept chairs only stay for this part. Be sure to show your research impact. - The goal is to EXCITE people about your research. I always start with a walkthrough example (this works way better than diving straight into theory) - Guide viewer attention for EVERY. SINGLE. SENTENCE. (Use animations, strategic dimming, highlight what matters) - Time management is crucial: Aim for 40 mins + Q&A (And be prepared that talks often start late! Factor in technical issues and waiting for people and other things) - I add a progressive bar to help people track my talk. 6.2 Handling Q&A Like a Pro 🎯 - Drop mini Q&A slides after your first and second sections (if you want to increase interactions with audience, this works!) - Golden rule: Be concise + logical - Common questions to prep for: "How do you handle bias/safety issues in model learning? What about adversarial attacks?" "How do you create data in model learning?" "Would you say your work leans more towards ML theory or applications?" "I do not think it is the right way to get it work, what about xx" (I feel a lot of audience will be outside your area and when they try to connect to their direction, there will be far-out questions you never think about, or you will face challenges saying they do not believe black boxes or do not believe symbols, or do not believe some other things. It is totally okay! Do not panic! You should always be confident about your direction. No need to get irritated or defensive. No need to back down or bluntly disagree / turning it into a debate. Just treat it as a research discussion. Something like: I think it is an interesting angle. At the current research stage, I believe that my way is the most reliable and practical way of handling xx, however, later I would be happy to explore more on xx and it would be great if we could even collaborate together on this. ) 6.3 Surviving the Marathon: One-on-Ones 👥 I am super introverted, and not good at small talks, so it is more a guideline for introverted people haha. I feel these are not really about casual chats. Each one is a mini-presentation opportunity. - I heard that people are saying one-on-ones just try to see whether you are nice. I do not agree. People won't hire you because that you are nice, but more because of unique, exciting insights that you can bring, which can make you get high voting scores. - Do your homework on EACH professor: • Check their recent papers (Google Scholar, sort by time) • Know what made them famous (sort by citations) • Look up their grants and awards • Find personal connections (alma mater? city connections?) 6.4 Lunches & Dinners 🍽️ Again since I often worry about being too introverted, I like to prepare talking points in advance. I usually focus on my strengths, such as research, mentoring philosophy, and funding applications. If you happen to know something interesting about the city or the food, that is a great conversation starter and a bonus! (Job search season is here again! I have been receiving DMs about faculty interview advice, so I thought I'd share a few key insights that personally helped me navigate the process. If you have already seen the slides I shared earlier, this is essentially the same content. Just a heads-up to save your time!)

English

597

54.6K

Jinghan Zhang retweetledi

Wei Liu@WeiLiu99·26 Ara

🔔🎄Christmas Gift for Multimodal Reasoning: Introducing M-STaR 🎁 (1/6) How can we dive deeper to help Large Multimodal Models (LMMs) evolve into better reasoners? Announce M-STaR (Project Page: mstar-lmm.github.io): a self-evolving training framework for multimodal reasoning that explores self-evolving training recipes via the lens of RL and the exploration & exploitation of self-evolution dynamics. 🔍 We dive into: - Each component of self-evolving multimodal reasoning training - Training dynamics research ✨ Key highlights: - No additional human cot annotationås needed - Surpasses pre-evolved models on 5 multimodal reasoning benchmarks 🔗 Resources: Project: mstar-lmm.github.io Hf Paper: huggingface.co/papers/2412.17… Github: github.com/hkust-nlp/mstar

English

18.8K

Jinghan Zhang retweetledi

Jiao Sun@sunjiao123sun_·14 Ara

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

English

175

780

3.7K

2.2M

Jinghan Zhang retweetledi

chang ma@ma_chang_nlp·12 Kas

Arriving at #EMNLP2024🏝️Come check out our poster on November 12, 2024 at 11 AM in the Riverfront Hall! Would love to chat about NLP4Science🧬, drugs and proteins💊, and LLM agents and reasoning🤖 !

chang ma@ma_chang_nlp

[1/4] RSA is accepted by #EMNLP2024 main track 🥳 - Enhance Any protein understanding model with lightning-fast retrieval. - 373x faster than MSA, on-the-fly computation, achieves comparable performance. Preprint link: biorxiv.org/content/10.110… Code: github.com/HKUNLP/RSA

English

1.9K

Jinghan Zhang retweetledi

Yangqiu Song@yqsong·14 Eki

English

5.3K

Jinghan Zhang@jinghan23·7 Eki

In Philadelphia for #COLM2024! Excited to chat about long-context, multimodal, reasoning, and everything related to LMs! Come check out our work on Wednesday morning, session 5, # 17. Also open to visiting opportunities and 2025 summer internships anywhere in the world!

English

4.3K

Jinghan Zhang retweetledi

Jim Fan@DrJimFan·12 Eyl

OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to the latter. 1. You don't need a huge model to perform reasoning. Lots of parameters are dedicated to memorizing facts, in order to perform well in benchmarks like trivia QA. It is possible to factor out reasoning from knowledge, i.e. a small "reasoning core" that knows how to call tools like browser and code verifier. Pre-training compute may be decreased. 2. A huge amount of compute is shifted to serving inference instead of pre/post-training. LLMs are text-based simulators. By rolling out many possible strategies and scenarios in the simulator, the model will eventually converge to good solutions. The process is a well-studied problem like AlphaGo's monte carlo tree search (MCTS). 3. OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers came out on Arxiv a week apart last month: - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5. - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search. 4. Productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What's the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes? Their research post didn't share much. 5. Strawberry easily becomes a data flywheel. If the answer is correct, the entire search trace becomes a mini dataset of training examples, which contain both positive and negative rewards. This in turn improves the reasoning core for future versions of GPT, similar to how AlphaGo’s value network — used to evaluate quality of each board position — improves as MCTS generates more and more refined training data.

English

135

1.1K

6.1K

799.3K

Jinghan Zhang@jinghan23·12 Eyl

Good job expanding on the idea that models need more than one examples per fact for knowledge acquisition. It's awesome to see that this actually works in practice scaling log-linearly, better than rephrase! (Besides would like to know in-context learning performance on such tasks with extended context window length.)

English

206

Aran Komatsuzaki@arankomatsuzaki·12 Eyl

Synthetic continued pretraining Proposes to bridge the sample-inefficiency of pretraining with synthetic continued pretraining: continued pretraining on a large corpus synthetically generated from a small domain-specific corpus arxiv.org/abs/2409.07431

English

249

29K

Jinghan Zhang@jinghan23·25 Ağu

Thank you @AdapterHub for implementing our #NeurIPS method (arxiv.org/abs/2306.14870) in your latest update! 🎉 Great to see our work being applied for practical advancements. Check out their work! #MachineLearning #AdapterMerging #ModelMerging

AdapterHub@AdapterHub

🎉Adapters 1.0 is here!🚀 Our open-source library for modular and parameter-efficient fine-tuning got a major upgrade! v1.0 is packed with new features (ReFT, Adapter Merging, QLoRA, ...), new models & improvements! Blog: adapterhub.ml/blog/2024/08/a… Highlights in the thread! 🧵👇

English

1.5K

Jinghan Zhang@jinghan23·5 Haz

@WenhuChen so cool to see the high robustness in different prompts

English

124

Wenhu Chen@WenhuChen·4 Haz

Our MMLU-Pro paper is out. It's a more difficult, robust and reasoning-driven benchmark to measure expert-level intelligence. We have gradually included 50+ models in our leaderboard: huggingface.co/spaces/TIGER-L…. GPT-4o, Gemini-1.5-Pro, Claude-3-Opus are the current top-3 models. Great work led by @YuboWang726 and @xueguang_ma, and in collaboration with other awesome contributors.

AK@_akhaliq

MMLU-Pro A More Robust and Challenging Multi-Task Language Understanding Benchmark In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve

English

258

59.4K

Jinghan Zhang@jinghan23·31 May

LLM Merging Competition🚨 very cool! Check out our work (arxiv.org/abs/2306.14870) on parameter-efficient module merging for insights! We effectively perform detoxification via negation on Alpaca, based on Llama-7b, in the last experiment of this paper.

Leshem (Legend) Choshen 🤖🤗@LChoshen

🚨 Model Merging competition @NeurIPSConf!🚀 Can you revolutionize model selection and merging?Let's create the best LLMs!🧠✨ 💻Come for science 💰Stay for $8K 💬Discord: discord.gg/dPBHEVnV 🔗Sign up: llm-merging.github.io Sponsors: @huggingface @SakanaAILabs @arcee_ai

English

215

Jinghan Zhang@jinghan23·30 May

Interested in these scaling laws 🥳

Rohan Pandey@khoomeik

📢 Excited to finally be releasing my NeurIPS 2024 submission! Is Chinchilla universal? No! We find that: 1. language model scaling laws depend on data complexity 2. gzip effectively predicts scaling properties from training data As compressibility 📉, data preference 📈. 🧵⬇️

English

147

Jinghan Zhang@jinghan23·14 May

@tianle_cai feeling a bit awkward...

English

963

Tianle Cai@tianle_cai·13 May

Just wrote a script to further investigate how the corpus used to train the gpt4o tokenizer is polluted by Internet scams. The results are quite interesting... 🤦‍♂️🤦‍♂️🤦‍♂️ gist.github.com/ctlllll/4451e9…

English

101

447

308.5K

Keşfet

@sun_hanchi @NeurIPSConf @hkustNLP @AdapterHub @WenhuChen @YuboWang726 @xueguang_ma @tianle_cai