MengChang Wang

33 posts

MengChang Wang

@wangmengchang

decision intelligence lab @ alibaba damo academy

Katılım Mart 2013

94 Takip Edilen6 Takipçiler

MengChang Wang retweetledi

艾略特@elliotchen100·19 Mar

论文来了。名字叫 MSA，Memory Sparse Attention。一句话说清楚它是什么：让大模型原生拥有超长记忆。不是外挂检索，不是暴力扩窗口，而是把「记忆」直接长进了注意力机制里，端到端训练。过去的方案为什么不行？ RAG 的本质是「开卷考试」。模型自己不记东西，全靠现场翻笔记。翻得准不准要看检索质量，翻得快不快要看数据量。一旦信息分散在几十份文档里、需要跨文档推理，就抓瞎了。线性注意力和 KV 缓存的本质是「压缩记忆」。记是记了，但越压越糊，长了就丢。 MSA 的思路完全不同： → 不压缩，不外挂，而是让模型学会「挑重点看」核心是一种可扩展的稀疏注意力架构，复杂度是线性的。记忆量翻 10 倍，计算成本不会指数爆炸。 → 模型知道「这段记忆来自哪、什么时候的」用了一种叫 document-wise RoPE 的位置编码，让模型天然理解文档边界和时间顺序。 → 碎片化的信息也能串起来推理 Memory Interleaving 机制，让模型能在散落各处的记忆片段之间做多跳推理。不是只找到一条相关记录，而是把线索串成链。结果呢？ · 从 16K 扩到 1 亿 token，精度衰减不到 9% · 4B 参数的 MSA 模型，在长上下文 benchmark 上打赢 235B 级别的顶级 RAG 系统 · 2 张 A800 就能跑 1 亿 token 推理。这不是实验室专属，这是创业公司买得起的成本。说白了，以前的大模型是一个极度聪明但只有金鱼记忆的天才。MSA 想做的事情是，让它真正「记住」。我们放 github 上了，算法的同学不容易，可以点颗星星支持一下。🌟👀🙏 github.com/EverMind-AI/MSA

艾略特@elliotchen100

稍微剧透一下，@EverMind 这周还会发一篇高质量论文

中文

173

563

3.2K

1.8M

MengChang Wang@wangmengchang·21 Mar

@joaquinbarroso citation begging, really bad

English

Joaquin Barroso@joaquinbarroso·21 Mar

I'm so mad. Once again, got a paper accepted, but Ref2 wants me to add 4 references all having a single author in common, whereas Ref1 suggests 6 with another common author! This unethical behavior should be stopped by the editors. Should I say who those authors are? Thoughts?

English

156

1.9K

198.7K

MengChang Wang@wangmengchang·13 Mar

@mariyaivasileva great! looking forward to the book

English

Mariya I. Vasileva@mariyaivasileva·13 Mar

Currently a little obsessed with making my own compact, textbook-style primers on foundational topics. The graph-minded pattern matcher in my brain has taken up a side quest: mapping what I know and what I want to learn into crisp tables of contents.

English

772

32.7K

MengChang Wang@wangmengchang·10 Mar

@grok github.com/Amuwa/CarWash

QME

Grok@grok·9 Mar

The Car Wash Problem nails the classic frame issue: LLMs often miss implicit goals like needing the car *at* the wash. Qwen3's 55%/75% bare-pass rates beat many scaled-up models' 0%—shows training data and architecture can embed physical intuition better than raw size alone. Structured prompts fix it across the board, hitting 85-100%. Solid data point on what truly drives reasoning.

English

MengChang Wang@wangmengchang·9 Mar

While the "Car Wash Problem" took down giant LLMs weeks ago, two old models, Qwen3-14B and Qwen3-32B, achieved accuracy rates of 55% and 75% respectively.

English

MengChang Wang@wangmengchang·10 Mar

@EdgarDobriban great. a horse race between human beings and AI

English

MengChang Wang retweetledi

Edgar Dobriban@EdgarDobriban·9 Mar

AI is getting great at math, but how good is it at solving real research problems in areas outside of those covered by Erdős problems? Towards gauging this, I have started putting together a list of unsolved research problems in mathematical statistics and machine learning, sourced from recent papers in a leading statistics journal, the Annals of Statistics (with some bonus COLT open problems: solveall.org. Currently >100 problems. In my view, much of the value of AI for researchers in the mathematical sciences stems from helping with their own research problems. These are problems without known solutions. There are many math benchmarks, but few with the following properties: (1) of a realistic research-level, so that solving them can potentially lead to a publication in a top journal (problems discussed in papers already, not contest math, not Millenium problems, not problems created for a benchmark, not problems that have a known solution); I'd say Erdős problems are the best example of this. (2) cover problems outside of the usual focus (combinatorics, number theory, ... ) of Erdős problems. Especially under-represented are domains of applied math, along with statistics, operations research, etc. I'm interested in statistics and ML, so that's where I started, but this could grow over time. Hope this can grow into something useful to the community! Happy to hear your thoughts...

English

432

54.9K

MengChang Wang@wangmengchang·9 Mar

还记得你何时加入 X 吗？我知道！#我的X周年纪念日

中文

MengChang Wang retweetledi

Sumeet@HeySumeetKapoor·23 Ağu

$BABA has developed an AI tool named MindOpt Copilot, which is based on its LLM Tongyi Qianwen. This AI tool has the capability to build computer models, generate code, and use solvers to provide solutions across catering, retail, logistics, transportation, and manufacturing.

English

4.8K

MengChang Wang retweetledi

Miles Cranmer@MilesCranmer·2 Tem

Finally got Enzyme.jl working with SymbolicRegression.jl after a year of debugging. In the end, the solution was to simply... increase the stack size... 🙂 Anyways, now you can do crazy-fast reverse-mode autodiff on runtime-generated expressions for symbolic regression:

English

6.3K

MengChang Wang retweetledi

Miles Cranmer@MilesCranmer·15 Tem

I made a preliminary version of a Gradio-based GUI for PySR: Which means... Symbolic Regression on HuggingFace! huggingface.co/spaces/MilesCr…

English

4.5K

MengChang Wang@wangmengchang·17 Nis

Human brains consume much less power than any GPU. Will green AI be the next moves?

English

MengChang Wang@wangmengchang·21 Şub

@thserra @JiataiTong1717 @JunyangCai nice work! 👍👍👍

English

Thiago Serra (@thserra.bsky.social)@thserra·10 Oca

When a neural network is embedded into an optimization model, can we leverage the neural network structure to find better solutions — and faster? We examine that in “Optimization Over Trained Neural Networks: Taking a Relaxing Walk” #orms Link: arxiv.org/abs/2401.03451 1/N

Thiago Serra (@thserra.bsky.social) tweet media

English

3.4K

MengChang Wang retweetledi

Rohan Paul@rohanpaul_ai·17 Şub

AutoMathText: A huge 200GB dataset of mathematical texts open sourced on @huggingface 🔥 Can be absolutely huge for mathematical reasoning improvement of LLMs. ✨ Departing from conventional supervised fine-tuning or trained classifiers with human-annotated data, this approach utilizes meta-prompted language models as zero-shot verifiers to autonomously evaluate and select high-quality mathematical content ✨ Multi-source : arXiv/programming code/web pages ✨ Filtered and processed to adapt Math reasoning ✨ The paper showcases a 2 times increase in pretraining token efficiency compared to baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities. ✨ Selected by Qwen 72B ---- 📌 Central to their approach is the formulation of a scoring function, as given in below Equation (1), which quantitatively evaluates the language model’s inclination towards affirming or negating the mathematical content and educational merit of a given piece of content. This function operates on the logits associated with ‘YES’ and ‘NO’ responses to meta-prompts, offering a nuanced mechanism for content evaluation: ✨ LM-Score(-) = exp(logit('YES')) / (exp(logit('YES')) + exp(logit('NO'))) This scoring function represents a novel integration of language models’ prediction capabilities into an autonomous evaluation framework, bypassing the limitations associated with traditional supervised learning techniques. ----- You can also find StackMathQA on their page, a 2 million math question and answer dataset sourced from Stack Exchange🔥 For a context, if this whole process was done with manual annotation of this dataset by experts familiar with undergraduate-level and beyond mathematical content would cost upwards of $10 million, assuming a rate of $1 per document. While this method as proposed in this paper reduces this cost to approximately $10,000 ( estimated by using Azure’s pricing rate at $3.4 per A100 GPU hour).

English

132

10.5K

MengChang Wang@wangmengchang·11 Oca

@liwaiwaicom nice work

English

liwaiwai@liwaiwaicom·26 Ara

Mixed-Integer Linear #Programming solver, splits a massive optimization problem into smaller pieces and uses generic #algorithms to try and find the best solution. #AI #ArtificialIntelligence #MILP liwaiwai.com/2023/12/15/ai-… via @liwaiwaicom

English

MengChang Wang@wangmengchang·11 Oca

@JAWarwicker interesting work

English

Alasdair Warwicker@JAWarwicker·9 Oca

Our paper "Support Vector Machines within a Bivariate Mixed-Integer Linear Programming Framework" has been published in Expert Systems with Applications! Read the full open access paper: sciencedirect.com/science/articl…

English

121

MengChang Wang@wangmengchang·1 Oca

@calebfahlgren @jmorgan so nice!!

English

Caleb@calebfahlgren·1 Oca

huggingface.co/cfahlgren1/was… Just compiled it for WebGPU, runs in the browser at 40+ ish tok/s. Is pretty awesome!

English

Jeffrey Morgan@jmorgan·1 Oca

TinyLlama is a 1.1B model with the Llama 2 architecture, trained on 3 trillion tokens. Its small size means it can run fast with little memory and compute requirements. ollama.ai/library/tinyll…

English

344

36K

MengChang Wang@wangmengchang·1 Oca

@MrBoJensen there could be misunderstandings. communicating.

English

631

Bo Jensen@MrBoJensen·30 Ara

Does anyone have insight to why it was taken down from the HM site ? 😇

English

1.8K

Bo Jensen@MrBoJensen·22 Ara

Did not look much on HM benchmarks, since I left CPLEX. I now see that a Chinese company is leading the MIP benchmark by doing parameter tuning on CPLEX... Still laughing of my own prediction that nothing would happen in this solver segment 😂😂

English

14.7K

MengChang Wang retweetledi

Lior Alexander@LiorOnAI·19 Ara

PowerInfer can massively speed up inference on consumer GPUs. Almost reaching A100 levels. It outperforms llama.cpp by up to 11.69x while retaining model accuracy. PowerInfer reached an average token generation rate of 13.20 tokens/s, with a peak of 29.08 tokens/s, across various LLMs (including OPT-175B) on a single NVIDIA RTX 4090 GPU, only 18% lower than that achieved by a top-tier server-grade A100 GPU. You can use these models with PowerInfer today: - Falcon-40B - Llama2 family

English

356

41K

Keşfet

@joaquinbarroso @mariyaivasileva @grok @EdgarDobriban @thserra @JiataiTong1717 @JunyangCai @huggingface