Yingtong Dou

188 posts

Yingtong Dou banner
Yingtong Dou

Yingtong Dou

@dozee_sim

Research Scientist @Visa. CS Ph.D. @UICCS. Foundation Model & Anomaly/Fraud Detection. Opinions are my own.

Palo Alto, CA Katılım Nisan 2016
258 Takip Edilen496 Takipçiler
Yingtong Dou retweetledi
Drew Breunig
Drew Breunig@dbreunig·
If you're doing applied AI research (especially system design, benchmarks, evals, efficiency, or ops) you should be submitting to the Conference on AI and Agentic Systems... caisconf.org/pages/cfp/
Drew Breunig tweet media
English
7
33
265
26.2K
Yingtong Dou
Yingtong Dou@dozee_sim·
🧵(4/6) 𝐑𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐈𝐦𝐩𝐚𝐜𝐭. TGPT achieves a significant improvement over a production model on transaction classification. TGPT excels at generating realistic future transaction trajectories, opening up new avenues for forecasting and personalization.
English
0
0
0
36
Yingtong Dou
Yingtong Dou@dozee_sim·
🧵(3/6) 3𝐃-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫 𝐀𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞. We design a new architecture, specifically to model and fuse the complex, multi-modal nature of transaction data. This approach hierarchically encodes transaction features, individual transactions, and their sequences.
English
1
0
0
42
Yingtong Dou
Yingtong Dou@dozee_sim·
🧵(4/6) 𝐑𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐈𝐦𝐩𝐚𝐜𝐭. TGPT achieves a significant improvement over a production model on transaction classification. TGPT excels at generating realistic future transaction trajectories, opening up new avenues for forecasting and personalization.
English
0
0
0
27
Yingtong Dou
Yingtong Dou@dozee_sim·
🧵(3/6) 3𝐃-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫 𝐀𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞. We design a new architecture, specifically to model and fuse the complex, multi-modal nature of transaction data. This approach hierarchically encodes transaction features, individual transactions, and their sequences.
English
1
0
0
29
Yingtong Dou
Yingtong Dou@dozee_sim·
@tydsh I've been tuning MobileLLM on domain-specific data and its performance is excellent. Definitely try R1 later.
English
0
0
1
99
Yuandong Tian
Yuandong Tian@tydsh·
We have released small-scale reasoning models MobileLLM-R1 (0.14B, 0.35B, 0.95B) that are trained from scratch with just 4.2T pre-training tokens (10% of Qwen3), while its reasoning performance is on-par with Qwen3-0.6B. Thanks the three core contributors for their great work! @zechunliu, @erniecyc and Changsheng Zhao! Our research opens a brand-new possibility: instead of distillation from large models, maybe we can train a small one directly from scratch with proper recipes😀
Zechun Liu@zechunliu

Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training, MobileLLM-R1 demonstrates strong reasoning ability. Despite using only 4.2T tokens, 11.7% compared to 36T pretraining token Qwen used, it delivers remarkable performance. Collaborating with @erniecyc , Changsheng, et al.

English
16
56
432
99.8K
Yingtong Dou retweetledi
Guohao Li 🐫
Guohao Li 🐫@guohao_li·
Introducing Eigent — the first multi-agent workforce on your desktop. Eigent is a team of AI agents collaborating to complete complex tasks in parallel. It is your long-term working partner with fullly customizable workers and MCPs. Public beta available to download for MacOS, Windows. 100% open-source on Github. Comment for 500 extra credits.
English
143
136
684
220.6K
Yingtong Dou retweetledi
Michael Galkin
Michael Galkin@michael_galkin·
📣 Our spicy ICML 2025 position paper: “Graph Learning Will Lose Relevance Due To Poor Benchmarks”. Graph learning is less trendy in the ML world than it was in 2020-2022. We believe the problem is in poor benchmarks that hold the field back - and suggest ways to fix it! 🧵1/10
Michael Galkin tweet media
English
5
50
293
83.7K
Yingtong Dou retweetledi
FORTUNE
FORTUNE@FortuneMagazine·
Visa President of Technology Rajat Taneja rebuilt the company’s data platform from scratch, helping position it for the generative AI boom. trib.al/6XClvqz
English
1
1
7
6.2K
Wenhan Gao
Wenhan Gao@Wenhanacademia·
I'm glad to share that I will be joining Visa as an Intern Staff Research Scientist this summer. #lifeatvisa
GIF
English
1
0
3
606
Yingtong Dou retweetledi
Yangqing Jia
Yangqing Jia@jiayq·
There is a lot of unconscious emphasis of the DeepSeek model being “Chinese” and implicit connection with the Sino-US relationship or the GPU power. In my eyes, the success of DeepSeek has little to do with that. It is simple intelligence and pragmatism at work: given a limit of computation and manpower present, produce the best outcome with smart research. Same with the AlexNet model when Alex Krizhevsky needed to make magic with 2 GPUs, and not a supercluster. There are a lot of super smart AI people and companies in the world. In terms of the Chinese ethnic group, people I had the privilege to have worked with include (but are not limited to) - Kaiming He who is the OG of modern computer vision. - Song Han who founded DeePhi, OmniML and now professor at MIT. - the DMLC folks who created early frameworks like MxNet and TVM. - Bing Xu who did MxNet, was coauthor of GAN, founded HippoML and is now at NVidia. - Orbeus, a startup on early CV applications and now the foundation of AWS ReKognition. And many more. They ace in the frontier of AI, whether it’s research, product, small startups, or big companies. AI should bring us closer rather than more separate. I was saddened by the discriminative comments given by Professor Rosalind Picard at NeurIPS, but was too busy to put my thoughts together and say something. Looking back at 2024, I think what really stood out is the fundamental seek for AI breakthrough - collect what we have, use our brain, and achieve our best. It’s like the Olympics: faster, higher, stronger, together.
English
21
62
530
67.8K
Yingtong Dou retweetledi
Zhuang Liu
Zhuang Liu@liuzhuang1234·
How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit visual generation from an LLM, when trained jointly with visual understanding.
Zhuang Liu tweet media
English
25
133
719
253.3K
Yingtong Dou retweetledi
Aurimas Griciūnas
Aurimas Griciūnas@Aurimas_Gr·
What is 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗥𝗔𝗚? In real world applications, simple naive RAG systems are rarely used nowadays. To provide correct answers to a user query, we are always adding some agency to the RAG system. However, it is important to 𝗻𝗼𝘁 𝗴𝗲𝘁 𝗹𝗼𝘀𝘁 𝗶𝗻 𝘁𝗵𝗲 𝗯𝘂𝘇𝘇 𝗮𝗻𝗱 𝘁𝗲𝗿𝗺𝗶𝗻𝗼𝗹𝗼𝗴𝘆 and understand that there is 𝗻𝗼 𝘀𝗶𝗻𝗴𝗹𝗲 𝗯𝗹𝘂𝗲𝗽𝗿𝗶𝗻𝘁 to add the mentioned agency to your RAG system and you should adapt to your use case. My advice is to not get stuck on terminology and think about engineering flows. Let’s explore some of the moving pieces in Agentic RAG: 𝟭. Analysis of the user query: we pass the original user query to a LLM based Agent for analysis. This is where: ➡️ The original query can be rewritten, sometimes multiple times to create either a single or multiple queries to be passed down the pipeline. ➡️ The agent decides if additional data sources are required to answer the query. 𝟮. If additional data is required, the Retrieval step is triggered. In Agentic RAG case, we could have a single or multiple agents responsible for figuring out what data sources should be tapped into, few examples: ➡️ Real time user data. This is a pretty cool concept as we might have some real time information like current location available for the user. ➡️ Internal documents that a user might be interested in. ➡️ Data available on the web. ➡️ … 𝟯. If there is no need for additional data, we try to compose the answer (or multiple answers) straight via an LLM. 𝟰. The answer (or answers) get analyzed, summarized and evaluated for correctness and relevance: ➡️ If the Agent decides that the answer is good enough, it gets returned to the user. ➡️ If the Agent decides that the answer needs improvement, we try to rewrite the usr query and repeat the generation loop. The real power of Agentic RAG lies in its ability to perform additional routing pre and post generation, handle multiple distinct data sources for retrieval if it is needed and recover from failures in generating correct answers. What are your thoughts on Agentic RAG? Let me know in the comments! 👇 #RAG #LLM #AI
Aurimas Griciūnas tweet media
English
15
222
1.1K
196.7K
Yingtong Dou retweetledi
Jianfeng Chi
Jianfeng Chi@jianfengchi·
I am hiring a research intern, working LLM (Llama 3+) safety. The internship is expected to start in Summer/Spring 2025, based in New York City. Please drop me an email at jianfengchi@meta.com (Subject starts with "[2025 Intern]") Learn more here: metacareers.com/jobs/171953723…
English
3
30
263
32.2K
Yingtong Dou retweetledi
William Wang
William Wang@WilliamWangNLP·
Respectfully disagree. It's the structure of language and words that make LLMs effective. Pure speech, time series, or video without linguistic co-supervision don't yield the same results. Language provides the minimal conceptual units that enable these models to work.
Andrej Karpathy@karpathy

It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something. They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it". Actually, as the LLM stack becomes more and more mature, we may see a convergence of a large number of problems into this modeling paradigm. That is, the problem is fixed at that of "next token prediction" with an LLM, it's just the usage/meaning of the tokens that changes per domain. If that is the case, it's also possible that deep learning frameworks (e.g. PyTorch and friends) are way too general for what most problems want to look like over time. What's up with thousands of ops and layers that you can reconfigure arbitrarily if 80% of problems just want to use an LLM? I don't think this is true but I think it's half true.

English
9
15
155
31.4K
Yingtong Dou retweetledi
Simran Arora
Simran Arora@simran_s_arora·
Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!! There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient and asymptotically faster than attention 💨 But there’s no free lunch 🥪 these models can’t fit all the information from long contexts into the limited memory, degrading in-context learning quality. Is all lost?
Simran Arora tweet media
English
7
58
299
93K
Yingtong Dou retweetledi
Valeriy M., PhD, MBA, CQF
Valeriy M., PhD, MBA, CQF@predict_addict·
The paper we have been waiting for essentially shows that #timeseries #llms do not work in forecasting. Back in 2022, paper “Are Transformers Effective for Time Series Forecasting?“ challenged the appearing narrative that transformers are useful for forecasting. By removing transformer elements the authors showed the performance went up ⬆️ And now people did the same with time series LLMs. The papers demonstrated: - removing the LLM component or replacing it with a basic attention layer does not degrade the forecasting results—in most cases the results even improved! - in fact removing even removing the language model entirely, yields comparable or better performance! - these simpler methods after removal of LLM component reduce training and inference time by up to three orders of magnitude while maintaining comparable performance! - the sequence modeling capabilities of LLMs do not transfer to time series. By shuffling input time series the authors find no appreciable change in performance. What this says is that LLMs can’t deal with critical features of time series, the time order is key and if LLMs performance doesn’t change when shuffling data it basically means it doesn’t model time series. These finding are as damming to time series LLMs as the “Are Transformers Effective for Time Series Forecasting?” was for transformers. #timeseries #forecasting
Valeriy M., PhD, MBA, CQF tweet media
English
11
181
864
120K