Raphael Tang

128 posts

Raphael Tang

@ralph_tang

AI researcher working on foundation models for language, vision, speech, and everything in between @ucl_nlp

The Internet Katılım Nisan 2012

92 Takip Edilen337 Takipçiler

Raphael Tang@ralph_tang·12 Nis

Ironic that this derivative came from Schmidhuber's lab

Yuntian Deng@yuntiandeng

Glad to see followups to neural-os.com, but disappointed that neither the blog (with 34 refs) nor the code repo acknowledged NeuralOS, even tho the released data code appears to build directly on top of ours. That omission is hard to understand given our shared vision.

English

126

Raphael Tang retweetledi

Simone Foti@simo_foti·18 Mar

It's time to bring 3D meshes into modern machine learning properly! 🛸 Our work solves the non-differentiability of the Exp map on meshes, enabling gradients to flow directly through geodesics. It’s differentiable, GPU-fast, and fully parallelised. circle-group.github.io/research/DSG

English

685

68.8K

Raphael Tang retweetledi

return of the research era ꙮ@byebyescaling·13 Ara

This seems like a pretty big deal to me. OpenAI's circuit-sparsity release potentially entails that MoEs are a dead end. We've been isolating weights into "experts" as a crude approximation of sparsity just to appease dense matrix kernels. It fragments the manifold. The real target is inherent sparsity: projecting into massive nominal dimensions (d \gg d_{model}) with strict k-sparse activation. This forces features to be monosemantic and orthogonal by design, solving superposition natively rather than relying on router hacks to disentangle interference. It appears that we aren't just scaling parameters anymore, we're scaling the basis!

return of the research era ꙮ tweet media

AK@_akhaliq

OpenAI just released circuit-sparsity huggingface.co/openai/circuit…

English

152

1.3K

130.1K

Raphael Tang retweetledi

Yao Lu@yaolu_nlp·1 Ara

What if you could build a competitive multilingual LLM with just a translator, no matter the level of data resource available? 🤔TL;DR: We translate FineWeb to TransWeb, showing multilingual modelling can be addressed by the simple "translate everything" idea. w/@togethercompute

English

2.2K

Raphael Tang retweetledi

Wenyan Li@Wenyan62·31 Eki

I will be presenting our Lost in embeddings poster at EMNLP! Hope to see many old and new friends in Suzhou!🤗🤗 📍Time/Date: Fri. Nov 7 at 12:30-13:30 Location: Hall C Also happy to chat anything about VLMs, RAG and recently get in the domain of fintech. #EMNLP2025

Wenyan Li@Wenyan62

Happy to share (with a bit of delay tho) our paper on quantifying visual information loss in VLMs --- "Lost in Embeddings: Information Loss in Vision-Language Models" is accepted to EMNLP 2025 findings: arxiv.org/pdf/2509.11986 💃code is also released: github.com/lyan62/vlm-inf…

English

7.4K

Raphael Tang retweetledi

Avi Chawla@_avichawla·14 Eki

Finally, Python 3.14 lets you disable GIL! It's a big deal because earlier, even if you wrote multi-threaded code, Python could only run one thread at a time, giving no performance benefit. But now, Python can run your multi-threaded code in parallel. And uv fully supports it!

English

118

488

5.1K

546.2K

Raphael Tang@ralph_tang·3 Eki

📄Paper: arxiv.org/abs/2510.02306 🖥️Code: github.com/daemon/lmarena… Thanks to all the collaborators @crystina_z @Wenyan62 @_CarmenLai Pontus @yaolu_nlp!

English

162

Raphael Tang@ralph_tang·3 Eki

📢Our new paper critically examines arena-style LLM evaluation, e.g., LMArena, questioning whether draws actually mean equal model ability. TL;DR: simply ignoring draws improves rating systems by 1-3%, and query difficulty/subjectivity relate more strongly to draws than model ratings do.

English

725

Raphael Tang@ralph_tang·22 Eyl

@dreamwieber nice, reminds me of Blender's metaballs

English

Gregory Wieber@dreamwieber·20 Eyl

Alright, friends – my NEW Apple Vision Pro app Metaballs: Spatial has launched! 🤯 This was SO much work, and would mean the world to me if you share, and go download it today!

English

358

767

11.3K

1.6M

Raphael Tang@ralph_tang·22 Eyl

@HuggingPapers Oh hey, that's @Wenyan62 and my paper

English

384

DailyPapers@HuggingPapers·21 Eyl

Microsoft research reveals information loss in Vision-Language Models VLMs lose 40-60% semantic context after the projection step, distorting visual representations & impacting downstream tasks. See k-NNs diverge from fruit to mushrooms!

English

365

31.2K

Raphael Tang retweetledi

Wenyan Li@Wenyan62·20 Eyl

English

308

29.9K

Raphael Tang@ralph_tang·15 Eyl

Check out the paper at arxiv.org/abs/2509.10452! This is work led by Akshat Pandey, collaborating with @karun_kumar_ and me

English

Raphael Tang@ralph_tang·15 Eyl

📈WhisTLE+TTS adaptation ⇒ 50%+ relative improvement in word error rate vs. no adaptation, wins in 27/32 scenarios. Outperforms TTS adaptation alone by a relative 12.3% in WER

English

Raphael Tang@ralph_tang·15 Eyl

We introduce WhisTLE (arxiv.org/abs/2509.10452): the first deeply supervised, text-only domain adaptation method for pretrained ASR models like Whisper. Tl;dr: fast, no extra runtime cost, 50%+ relative word error rate reduction

English

214

Raphael Tang retweetledi

David McAllister@davidrmcall·29 Tem

Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.

English

205

1.2K

150.1K

Raphael Tang retweetledi

Shashwat Goel@ShashwatGoel7·4 Tem

There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer and get high accuracies. This affects popular benchmarks like MMLU-Pro, SuperGPQA etc. and even "multimodal" benchmarks like MMMU-Pro, which can be solved without even looking at the image ⁉️. Such choice-only shortcuts are hard to fix. We find prior attempts at fixing them-- GoldenSwag (for HellaSwag) and TruthfulQA v2 ended up worsening the problem. MCQs are inherently a discriminative task, only requiring picking the correct choice among a few given options. Instead we should evaluate language models for the generative capabilities they are used for. We show discrimination is easier than even verification, let alone generation. 🤔 But how do we grade generative responses outside "verifiable domains" like code and math? So many paraphrases are valid answers... We show a scalable alternative--Answer Matching--works surprisingly well. Its simple--get generative responses to existing benchmark questions that are specific enough to have a semantically unique answer without showing choices. Then, use an LM to match the response against the ground-truth answer. 👨‍🔬We conduct a meta-evaluation by comparing to ground-truth verification on MATH, and human grading on MMLU-Pro and GPQA-Diamond questions. Answer Matching outcomes give near-perfect alignment, with even small (recent) models like Qwen3-4B. In contrast, LLM-as-a-judge, even with frontier reasoning models like o4-mini, fares much worse. This is because without the reference-answer, the model is tasked with verification, which is harder than what answer matching requires--paraphrase detection--a skill modern language models have aced💡 Lets shift the benchmarking ecosystem from MCQs to Answer Matching. Impacts: Leaderboards: We show model rankings can change and accuracies go down making benchmarks seem less saturated. Benchmark Creation: Instead of creating harder MCQs, we should focus our efforts on creating questions with for answer matching, much like SimpleQA, GAIA etc. 🤑 Cost: Finally, to our great surprise, answer matching evals are cheaper to run than MCQs! See our paper for more, its packed with insights. 🧵 has paper and more result figures.

English

229

35.6K

Raphael Tang retweetledi

Xueguang Ma@xueguang_ma·12 Haz

Sharing our recent efforts on applying OmniEmbed to large-scale video retrieval MultiVENT2.0! tl;dr, we achieve SoTA on the MAGMAR shared task leaderboard. More importantly, we provide in-depth analysis on the effectiveness of different input modalities for video retrieval.

English

1.1K

Raphael Tang retweetledi

WIRED@WIRED·13 Haz

On Yupp, chatbot users earn cash by saying which of two prompts they prefer—info that has great value to the AI companies running the models wired.com/story/yupp-cha…

English

28.5K

Raphael Tang retweetledi

Jimmy Lin@lintool·2 Haz

💥 My awesome @UWaterloo ugrad student @sisi_xili - with the help of @rpradeep42 - slapped an MCP server in front of Pyserini to create MCPyserini and connected it to Claude to create DeepResearcherini! 🤪 Here, an example of RAG using the MS MARCO v1 passage collection.

English

18.8K

Keşfet

@togethercompute @crystina_z @Wenyan62 @_CarmenLai @yaolu_nlp @dreamwieber @HuggingPapers @karun_kumar_