Hong Liu

66 posts

Hong Liu

@HongLiu9903

Pretraining @AnthropicAI Prev: Co-founder @VoyageAI

Katılım Ekim 2021

184 Takip Edilen331 Takipçiler

Hong Liu retweetledi

Anthropic@AnthropicAI·5d

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

1.9K

6.6K

43.5K

30.3M

Hong Liu retweetledi

Voyage AI by MongoDB@VoyageAI·5 Mar

The era of embedding models is evolving. 🚀 With voyage-4-large, we’ve moved to Mixture of Experts (MoE) to shatter the scaling ceiling. The results: ✅ Massive drop in inference cost and latency ✅ New frontier for retrieval accuracy. Curious about how we implement MoE embeddings? Read the full technical breakdown of how we optimized design choices to push the Pareto frontier: mongodb.social/6017h7EUh

English

1.2K

Hong Liu retweetledi

tomaarsen@tomaarsen·28 Oca

A few days back, following a long line of excellent proprietary models, @VoyageAI released their 🚨 first ever open-weights embedding model for retrieval 🚨! It's called voyage-4-nano, it's multilingual, and very efficient. Details in 🧵

English

607

Hong Liu retweetledi

Kaichao You@KaichaoYou·22 Oca

Excited to share that I'm co-founding Inferact with an incredible team! Our mission: grow vLLM as the world's leading AI inference engine💪🏻 We've got many amazing models in our day-0 support pipeline — guess what's coming next?😉

Woosuk Kwon@woosuk_k

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

English

406

39.1K

Hong Liu retweetledi

Zitong Yang@ZitongYang0·23 Oca

We believe AI research is a special research area AI itself can deliver significant progress. Ronald Fisher laid out the foundation for scientific methods: generating hypothesis (research ideas) and see if you can falsify the hypothesis (execution). Therefore, execution and experiments forms the basis of scientific progress. For mathematics, execution is somewhat special, the chain-of-thoughts of the AI models implicitly carries this through, which explains many progresses in math we are seeing. For AI research, execution is purely in code, which AI is extremely at. Idea generation is in natural language, which AI can also do, although not calibrated right now. So, the natural design is to hook up "AI AI idea generator" and "AI AI experiment executor" together end-to-end. This is the future paradigm we propose. We studied two realistic environments: nanoGPT speed run initiated by @kellerjordan0 @karpathy, and GRPO math reasoning homework built by Stanford CS336 @stanfordnlp. These two "research environments" cover the salient and realistic research topics: LLM pretraining and LLM posttraining. For GRPO post-training, the algorithm discovered by AI (see details in the paper!) outperforms the highest scoring student solution (github.com/stanford-cs336…). For nanoGPT, human expert are too good to compete yet, but AI does reduce the time-to-loss=3.28 from 35min to 19min (as a reference, human expert is 2min..) The advantage of AI over human is very simple: AI is tireless. Over the past few month, our wandb log tells us that our AI researcher tries more than 50K research ideas and iteratively learn from the logs of previous experiments. Humans are, of course, more insightful, but simply can't try so many research ideas. Unfortunately, we didn't close the loop where we apply the posttraining algorithm discovered by AI to obtain a stronger model and deploy that model to improve itself even further. This should just be the beginning of execution-grounded automated AI research. Self-improvement (AI inventing better algorithms to train itself) can be an objective for AI on its own. This is an intrinsically motivated goal for AI. This is my last project from Stanford. Deeply grateful to @ChengleiSi for pushing it together and to @saurabhsgupta for funding us.

CLS@ChengleiSi

Can LLMs automate frontier LLM research, like pre-training and post-training? In our new paper, LLMs found post-training methods that beat GRPO (69.4% vs 48.0%), and pre-training recipes faster than nanoGPT (19.7 minutes vs 35.9 minutes). 1/

English

12.6K

Hong Liu@HongLiu9903·15 Oca

Embedding models are entering a golden era🔥 We’re thrilled to announce voyage-4 series, featuring the first-ever compatible embedding spaces and MoE architecture to deliver an unprecedented cost-quality balance. blog.voyageai.com/2026/01/15/voy… Plus, we’re giving back to the community with a new open-weight nano model huggingface.co/voyageai/voyag… None of this would be possible without the relentless innovation of my team @Yujie_Qian @AkshayGoindani1 and @luo_yuping. Check it out! 🚀

English

2.5K

Hong Liu retweetledi

Zengyi Qin@qinzytech·1 Ara

Introducing Lux, the most powerful and fastest Computer Use model, built by OpenAGI Foundation @agiopen_org Lux outperforms Google Gemini CUA, OpenAI Operator and Anthropic Claude on benchmark with 300 real-world tasks. Try our developer-friendly SDK to build powerful, real-world applications. 🧵

English

531

98.2K

Hong Liu@HongLiu9903·22 Eyl

🚀 Unveiling the first synthetic pretraining method that doesn’t rely on teacher distillation. Big shoutout to @ZitongYang0 @Aonan12 and the team!

Zitong Yang@ZitongYang0

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵

English

2.9K

Hong Liu retweetledi

Zitong Yang@ZitongYang0·22 Eyl

English

254

41.4K

Hong Liu@HongLiu9903·23 Ağu

@spacemanidol @n0riskn0r3ward @cohere @tengyuma @frankzliu Interesting I don’t think it’s standard practice that competitor names could not be mentioned

English

Daniel Campos@spacemanidol·22 Ağu

@n0riskn0r3ward @cohere That's the genius of @tengyuma and @frankzliu—authenticity and rigor over all else. At some point, I imagine Mongo legal feels different.

English

103

search founder@n0riskn0r3ward·22 Ağu

Databricks reranker launch blog post is awk. They might as well have titled it - We made a reranker and it sucks! Pls don't ask us what our baseline is in this comparison or what the cost of using our reranker is! Just use voyage re-rank 2.5

English

1.4K

Hong Liu@HongLiu9903·23 Ağu

@n0riskn0r3ward @spacemanidol @cohere To be honest, I think it’s more or less standard practice to tell people at least how you evaluate and what models you compare with. For example, take a look at LLM release blogposts. That said, transparency is what I’ve been insisting from day one

English

search founder@n0riskn0r3ward·22 Ağu

It’s just an incredibly stark comparison to voyage which is linking to a google sheet with 35 metrics to 3 decimal places basically saying we tried really hard to eval our model vs there’s and it looks like ours is better - but pls feel free to check our work. Thats like 100x more compelling to me at least

English

165

Hong Liu@HongLiu9903·16 Ağu

@n0riskn0r3ward Curious what evaluation dataset this is 😀

English

search founder@n0riskn0r3ward·13 Ağu

Outperforms Qwen 3's reranker for me which was previously the best reranker in my testing. Also outperforms GPT-5-mini on most metrics. Doesn't quite best Gemini 2.5 Flash Lite, though this is $0.05 per M tokens. It's a Qwen 2.5 train.

Voyage AI by MongoDB@VoyageAI

📣 Announcing rerank-2.5 and 2.5-lite: our latest generation of rerankers! • First reranker with instruction-following capabilities • rerank-2.5 and 2.5-lite are 7.94% and 7.16% more accurate than Cohere Reranker v3.5 • Additional 8.13% and 7.55% performance gain with instruction

English

982

Hong Liu@HongLiu9903·11 Ağu

Instruction-following takes reranker capabilities to the next level 🔥 Huge thanks to @zhmeishi and @AkshayGoindani1 for driving this leap forward!

Voyage AI by MongoDB@VoyageAI

English

1.2K

Hong Liu@HongLiu9903·29 Tem

Greater things to come!

English

554

Hong Liu@HongLiu9903·25 Tem

@_Kcnarf @lateinteraction Yes we don’t need maxsim

English

Franck Lebeau@_Kcnarf·24 Tem

@lateinteraction Probably that a basic dot/cosine scoring is enought.

English

Omar Khattab@lateinteraction·23 Tem

Nice. Late interaction on the document side, at the granularity of chunks. Just add it on the query side and do MaxSim and voila!

Voyage AI by MongoDB@VoyageAI

Before: chunk overlaps, context summaries, metadata augmentation Now: voyage-context-3 processes the full doc in one pass and generates a distinct embedding for each chunk. Each embedding encodes the chunk-level details AND full doc context, for more semantically aware retrieval.

English

Hong Liu@HongLiu9903·25 Tem

Thanks for the question, but the figure is perfectly reasonable to me and I cannot see why "If you 16x the chunk size, this means you have 16x more chance to retrieve the gold passage in your top 10..." is true🙂 In this figure, voyage-context-3 has a parent document retrieval setting, which means that if you retrieve a chunk inside the ground truth document then you are correct. The model has access to all consecutive chunks in its context window. so naturally when you increase the chunk size you are converging to standard single embeddings that embed the whole document into one vector and lose the benefit of chunking because you compress too many information into very few vectors. An analogy is a Colbert model (which you are more familiar with than me 🙂). If you change the granularity from per token to per document then you will essentially lose the gain and converge to a standard single embedding. In comparison, voyage-3-large in this figure cannot see other chunks in the context. that's why when you split the documents to very small chunks it starts to hurt.

English

172

Manuel Faysse@ManuelFaysse·24 Tem

Finally, this graphic makes little sense to me. If you 16x the chunk size, this means you have 16x more chance to retrieve the gold passage in your top 10... The fact this worsens the model is very weird

English

661

Manuel Faysse@ManuelFaysse·24 Tem

This seems very similar to our work on contextual embeddings by training LC models! Love to see work in this space, but results without reporting chunk sizes are meaningless: if you feed 256 token chunks to a baseline bi-encoder and eval on in-house datasets, you'll be ahead !

Voyage AI by MongoDB@VoyageAI

📢 voyage-context-3: contextualized chunk embeddings - Auto captures of chunk level detail & global doc context, w/o metadata augmentation - Beats OpenAI-v3-large by 14.24% & Cohere-v4 by 7.89% - Binary 512-dim matches OpenAI (float, 3072-dim) in accuracy, but 192x cheaper in VDB costs

English

3.8K

Hong Liu@HongLiu9903·25 Tem

Great question! We of course did compare this with late chunking without any training and voyage-context-3 improved a lot. However, as you said, this is a blogpost but not a paper and we didn’t have them in the blogpost. If you are interested, feel free to reach out to contact@voyageai.com and we might be able to discuss them with you.

English

Manuel Faysse@ManuelFaysse·24 Tem

Last thing - the late chunking baseline (0-shot) should be done with voyage-3 for a fair comparison to @JinaAI_ 's technique! Otherwise, we don't know if the boost is due to training method, or just in-domain training... (In fairness this is a commercial blogpost, not a paper)

English

408

Hong Liu@HongLiu9903·24 Tem

@tomaarsen @VoyageAI We used an objective that considers both doc level relevance and chunk level relevance

English

tomaarsen@tomaarsen·24 Tem

@VoyageAI And on the training side, do you train contrastively between queries and select chunks per relevant document, or from queries and all chunks per relevant document? Or perhaps something inbetween (e.g. mask away non-relevant chunks from relevant documents) to avoid marking as neg?

English

167

Voyage AI by MongoDB@VoyageAI·23 Tem

English

21.1K

Hong Liu@HongLiu9903·24 Tem

@tomaarsen @VoyageAI The model backbone processes the whole doc with attention across all chunks. Pooling is done in each chunk

English

tomaarsen@tomaarsen·24 Tem

@VoyageAI Would love to hear more details - is it prechunking and then mean pooling per chunk after processing as a full document?

English

241

Hong Liu retweetledi

Dev Ittycheria@dittycheria·23 Tem

We just launched Voyage-context-3, a new embedding model that gives AI a full-document view while preserving chunk-level precision that offers better retrieval performance than leading alternatives. When building AI that reads and reasons over documents (such as reports, contracts, or medical records), it’s critical to break those documents into smaller pieces, or “chunks,” while still maintaining an understanding of the big picture. Most systems today lose important context, or require complicated workarounds to stitch it back together. blog.voyageai.com/2025/07/23/voy…

English

2.7K

Keşfet

@VoyageAI @kellerjordan0 @karpathy @stanfordnlp @ChengleiSi @saurabhsgupta @Yujie_Qian @AkshayGoindani1