Hong Liu

66 posts

Hong Liu

Hong Liu

@HongLiu9903

Pretraining @AnthropicAI Prev: Co-founder @VoyageAI

Katılım Ekim 2021
184 Takip Edilen331 Takipçiler
Hong Liu retweetledi
Anthropic
Anthropic@AnthropicAI·
Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing
English
1.9K
6.6K
43.5K
30.3M
Hong Liu retweetledi
Voyage AI by MongoDB
Voyage AI by MongoDB@VoyageAI·
The era of embedding models is evolving. 🚀 With voyage-4-large, we’ve moved to Mixture of Experts (MoE) to shatter the scaling ceiling. The results: ✅ Massive drop in inference cost and latency ✅ New frontier for retrieval accuracy. Curious about how we implement MoE embeddings? Read the full technical breakdown of how we optimized design choices to push the Pareto frontier: mongodb.social/6017h7EUh
Voyage AI by MongoDB tweet media
English
0
1
14
1.2K
Hong Liu retweetledi
tomaarsen
tomaarsen@tomaarsen·
A few days back, following a long line of excellent proprietary models, @VoyageAI released their 🚨 first ever open-weights embedding model for retrieval 🚨! It's called voyage-4-nano, it's multilingual, and very efficient. Details in 🧵
tomaarsen tweet media
English
1
3
19
607
Hong Liu retweetledi
Kaichao You
Kaichao You@KaichaoYou·
Excited to share that I'm co-founding Inferact with an incredible team! Our mission: grow vLLM as the world's leading AI inference engine💪🏻 We've got many amazing models in our day-0 support pipeline — guess what's coming next?😉
Woosuk Kwon@woosuk_k

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

English
28
19
406
39.1K
Hong Liu retweetledi
Zitong Yang
Zitong Yang@ZitongYang0·
We believe AI research is a special research area AI itself can deliver significant progress. Ronald Fisher laid out the foundation for scientific methods: generating hypothesis (research ideas) and see if you can falsify the hypothesis (execution). Therefore, execution and experiments forms the basis of scientific progress. For mathematics, execution is somewhat special, the chain-of-thoughts of the AI models implicitly carries this through, which explains many progresses in math we are seeing. For AI research, execution is purely in code, which AI is extremely at. Idea generation is in natural language, which AI can also do, although not calibrated right now. So, the natural design is to hook up "AI AI idea generator" and "AI AI experiment executor" together end-to-end. This is the future paradigm we propose. We studied two realistic environments: nanoGPT speed run initiated by @kellerjordan0 @karpathy, and GRPO math reasoning homework built by Stanford CS336 @stanfordnlp. These two "research environments" cover the salient and realistic research topics: LLM pretraining and LLM posttraining. For GRPO post-training, the algorithm discovered by AI (see details in the paper!) outperforms the highest scoring student solution (github.com/stanford-cs336…). For nanoGPT, human expert are too good to compete yet, but AI does reduce the time-to-loss=3.28 from 35min to 19min (as a reference, human expert is 2min..) The advantage of AI over human is very simple: AI is tireless. Over the past few month, our wandb log tells us that our AI researcher tries more than 50K research ideas and iteratively learn from the logs of previous experiments. Humans are, of course, more insightful, but simply can't try so many research ideas. Unfortunately, we didn't close the loop where we apply the posttraining algorithm discovered by AI to obtain a stronger model and deploy that model to improve itself even further. This should just be the beginning of execution-grounded automated AI research. Self-improvement (AI inventing better algorithms to train itself) can be an objective for AI on its own. This is an intrinsically motivated goal for AI. This is my last project from Stanford. Deeply grateful to @ChengleiSi for pushing it together and to @saurabhsgupta for funding us.
CLS@ChengleiSi

Can LLMs automate frontier LLM research, like pre-training and post-training? In our new paper, LLMs found post-training methods that beat GRPO (69.4% vs 48.0%), and pre-training recipes faster than nanoGPT (19.7 minutes vs 35.9 minutes). 1/

English
2
11
91
12.6K
Hong Liu
Hong Liu@HongLiu9903·
Embedding models are entering a golden era🔥 We’re thrilled to announce voyage-4 series, featuring the first-ever compatible embedding spaces and MoE architecture to deliver an unprecedented cost-quality balance. blog.voyageai.com/2026/01/15/voy… Plus, we’re giving back to the community with a new open-weight nano model huggingface.co/voyageai/voyag… None of this would be possible without the relentless innovation of my team @Yujie_Qian @AkshayGoindani1 and @luo_yuping. Check it out! 🚀
Hong Liu tweet media
English
4
1
23
2.5K
Hong Liu retweetledi
Zengyi Qin
Zengyi Qin@qinzytech·
Introducing Lux, the most powerful and fastest Computer Use model, built by OpenAGI Foundation @agiopen_org Lux outperforms Google Gemini CUA, OpenAI Operator and Anthropic Claude on benchmark with 300 real-world tasks. Try our developer-friendly SDK to build powerful, real-world applications. 🧵
English
45
79
531
98.2K
Hong Liu retweetledi
Zitong Yang
Zitong Yang@ZitongYang0·
📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵
Zitong Yang tweet media
English
10
49
254
41.4K
search founder
search founder@n0riskn0r3ward·
Databricks reranker launch blog post is awk. They might as well have titled it - We made a reranker and it sucks! Pls don't ask us what our baseline is in this comparison or what the cost of using our reranker is! Just use voyage re-rank 2.5
search founder tweet media
English
3
1
15
1.4K
Hong Liu
Hong Liu@HongLiu9903·
@n0riskn0r3ward @spacemanidol @cohere To be honest, I think it’s more or less standard practice to tell people at least how you evaluate and what models you compare with. For example, take a look at LLM release blogposts. That said, transparency is what I’ve been insisting from day one
English
0
0
1
37
search founder
search founder@n0riskn0r3ward·
It’s just an incredibly stark comparison to voyage which is linking to a google sheet with 35 metrics to 3 decimal places basically saying we tried really hard to eval our model vs there’s and it looks like ours is better - but pls feel free to check our work. Thats like 100x more compelling to me at least
English
2
0
1
165
search founder
search founder@n0riskn0r3ward·
Outperforms Qwen 3's reranker for me which was previously the best reranker in my testing. Also outperforms GPT-5-mini on most metrics. Doesn't quite best Gemini 2.5 Flash Lite, though this is $0.05 per M tokens. It's a Qwen 2.5 train.
Voyage AI by MongoDB@VoyageAI

📣 Announcing rerank-2.5 and 2.5-lite: our latest generation of rerankers! • First reranker with instruction-following capabilities • rerank-2.5 and 2.5-lite are 7.94% and 7.16% more accurate than Cohere Reranker v3.5 • Additional 8.13% and 7.55% performance gain with instruction

English
1
1
8
982
Hong Liu
Hong Liu@HongLiu9903·
Greater things to come!
Hong Liu tweet media
English
0
2
10
554
Hong Liu
Hong Liu@HongLiu9903·
Thanks for the question, but the figure is perfectly reasonable to me and I cannot see why "If you 16x the chunk size, this means you have 16x more chance to retrieve the gold passage in your top 10..." is true🙂 In this figure, voyage-context-3 has a parent document retrieval setting, which means that if you retrieve a chunk inside the ground truth document then you are correct. The model has access to all consecutive chunks in its context window. so naturally when you increase the chunk size you are converging to standard single embeddings that embed the whole document into one vector and lose the benefit of chunking because you compress too many information into very few vectors. An analogy is a Colbert model (which you are more familiar with than me 🙂). If you change the granularity from per token to per document then you will essentially lose the gain and converge to a standard single embedding. In comparison, voyage-3-large in this figure cannot see other chunks in the context. that's why when you split the documents to very small chunks it starts to hurt.
English
1
0
4
172
Manuel Faysse
Manuel Faysse@ManuelFaysse·
Finally, this graphic makes little sense to me. If you 16x the chunk size, this means you have 16x more chance to retrieve the gold passage in your top 10... The fact this worsens the model is very weird
Manuel Faysse tweet media
English
3
1
8
661
Manuel Faysse
Manuel Faysse@ManuelFaysse·
This seems very similar to our work on contextual embeddings by training LC models! Love to see work in this space, but results without reporting chunk sizes are meaningless: if you feed 256 token chunks to a baseline bi-encoder and eval on in-house datasets, you'll be ahead !
Voyage AI by MongoDB@VoyageAI

📢 voyage-context-3: contextualized chunk embeddings - Auto captures of chunk level detail & global doc context, w/o metadata augmentation - Beats OpenAI-v3-large by 14.24% & Cohere-v4 by 7.89% - Binary 512-dim matches OpenAI (float, 3072-dim) in accuracy, but 192x cheaper in VDB costs

English
2
3
33
3.8K
Hong Liu
Hong Liu@HongLiu9903·
Great question! We of course did compare this with late chunking without any training and voyage-context-3 improved a lot. However, as you said, this is a blogpost but not a paper and we didn’t have them in the blogpost. If you are interested, feel free to reach out to contact@voyageai.com and we might be able to discuss them with you.
English
0
0
2
94
Manuel Faysse
Manuel Faysse@ManuelFaysse·
Last thing - the late chunking baseline (0-shot) should be done with voyage-3 for a fair comparison to @JinaAI_ 's technique! Otherwise, we don't know if the boost is due to training method, or just in-domain training... (In fairness this is a commercial blogpost, not a paper)
English
1
0
5
408
Hong Liu
Hong Liu@HongLiu9903·
@tomaarsen @VoyageAI We used an objective that considers both doc level relevance and chunk level relevance
English
0
0
3
63
tomaarsen
tomaarsen@tomaarsen·
@VoyageAI And on the training side, do you train contrastively between queries and select chunks per relevant document, or from queries and all chunks per relevant document? Or perhaps something inbetween (e.g. mask away non-relevant chunks from relevant documents) to avoid marking as neg?
English
1
0
0
167
Voyage AI by MongoDB
Voyage AI by MongoDB@VoyageAI·
📢 voyage-context-3: contextualized chunk embeddings - Auto captures of chunk level detail & global doc context, w/o metadata augmentation - Beats OpenAI-v3-large by 14.24% & Cohere-v4 by 7.89% - Binary 512-dim matches OpenAI (float, 3072-dim) in accuracy, but 192x cheaper in VDB costs
Voyage AI by MongoDB tweet media
English
4
24
89
21.1K
Hong Liu
Hong Liu@HongLiu9903·
@tomaarsen @VoyageAI The model backbone processes the whole doc with attention across all chunks. Pooling is done in each chunk
English
0
0
1
50
tomaarsen
tomaarsen@tomaarsen·
@VoyageAI Would love to hear more details - is it prechunking and then mean pooling per chunk after processing as a full document?
English
2
0
0
241
Hong Liu retweetledi
Dev Ittycheria
Dev Ittycheria@dittycheria·
We just launched Voyage-context-3, a new embedding model that gives AI a full-document view while preserving chunk-level precision that offers better retrieval performance than leading alternatives. When building AI that reads and reasons over documents (such as reports, contracts, or medical records), it’s critical to break those documents into smaller pieces, or “chunks,” while still maintaining an understanding of the big picture. Most systems today lose important context, or require complicated workarounds to stitch it back together. blog.voyageai.com/2025/07/23/voy…
English
2
13
25
2.7K