George Halal

33 posts

George Halal banner
George Halal

George Halal

@halal_george

AI @Google | PhD Physics @Stanford

Katılım Aralık 2020
368 Takip Edilen153 Takipçiler
Sabitlenmiş Tweet
George Halal
George Halal@halal_george·
Excited to share that we trained rerankers at the cost/performance frontier and are open sourcing them! Contextual AI Reranker v2 🚀 Best performing, most efficient reranker 🤗 Open weights (1B, 2B, 6B) 🫡 Instruction-following (including recency-awareness) 🌐 Multilingual 1/4
George Halal tweet media
English
6
26
169
31.1K
George Halal
George Halal@halal_george·
Stop building heuristic-based graphs. Start building adaptable, agentic tools. We break down the full system design in this blog post: contextual.ai/blog/an-agenti… 🧵 4/4
English
0
0
0
102
George Halal
George Halal@halal_george·
Our solution: shift the traversal logic to the agent. Extract “metadata”/“aliases” for each chunk at indexing time. During retrieval, the agent dynamically chooses: Search raw text content OR search metadata to hop to a reference? 🧵 3/4
George Halal tweet media
English
2
0
0
127
George Halal
George Halal@halal_george·
An agentic alternative to GraphRAG. We built a Metadata Search Tool to solve reference traversal without the rigid complexity of static graphs. The result? Agents resolve complex queries in fewer steps with higher accuracy. 🧵 1/4
George Halal tweet media
English
1
6
12
2.3K
George Halal
George Halal@halal_george·
This flexibility is the superpower. You control what to extract: section hierarchies, list of claims, questions the doc answers—whatever fits your use case. GraphRAG locks you into a static workflow. Metadata search adapts to yours. Thanks Jackie Zhang and @sheshanshag for their help on this project! This tool will be available on our platform soon, but contact @ContextualAI for early access.
English
0
0
0
180
George Halal
George Halal@halal_george·
Like GraphRAG, we extract structured info from docs at ingestion—each entry becoming a searchable node in the embedding space. Unlike GraphRAG, we skip the heuristic-based graph building and navigation methods, which are often specialized to various domains and show diminishing returns in our ablations. This keeps things fast and adaptable. Adding new docs or changing your metadata schema is trivial.
English
1
0
1
180
George Halal
George Halal@halal_george·
Your metadata IS your graph. Giving our agents access to a metadata search tool boosted our evals by 11%, providing the flexibility of GraphRAG while avoiding all the complexity. It unlocks new capabilities, including reference traversal. Example: 1. Agent finds a doc with references. 2. Agent decides which references to traverse and searches over metadata to fetch them.
George Halal tweet media
English
3
5
17
2.9K
George Halal
George Halal@halal_george·
Woohoo! We made sure not to overfit to benchmarks and focused on its generalization capabilities, so glad to hear that worked :)
search founder@n0riskn0r3ward

.@ContextualAI 's new re-ranker ($0.05 per M tokens) is a bit better than voyage re-rank 2.5 (also $0.05 per M tokens) which is a pretty high bar IMO. ~2% better recall @ 10 in my eval. I'm also not exactly doing standard QA RAG either, so likely a bit out of domain for both.

English
1
0
9
479
George Halal
George Halal@halal_george·
@ethan_kim00 and that's why all other rerankers perform poorly on the recency benchmark. Ours was specifically trained to rank retrieved documents as: more_relevant_more_recent_doc > more_relevant_less_recent_doc > less_relevant_more_recent_doc > less_relevant_less_recent_doc
English
0
0
0
81
Ethan Kim
Ethan Kim@ethan_kim00·
@halal_george Seems like recency ranking would be poorly calibrated and prone to drift for a pointwise reranker.
English
1
0
0
128
George Halal
George Halal@halal_george·
Excited to share that we trained rerankers at the cost/performance frontier and are open sourcing them! Contextual AI Reranker v2 🚀 Best performing, most efficient reranker 🤗 Open weights (1B, 2B, 6B) 🫡 Instruction-following (including recency-awareness) 🌐 Multilingual 1/4
George Halal tweet media
English
6
26
169
31.1K
George Halal retweetledi
Michael
Michael@michael_chomsky·
Instruction following rerankers are so underrated. You can set arbitrary instructions like ‘sort by candidates that are a good fit for this role’ or ‘article mentions an early stage company’. This is the kind of thing I was hypothesizing years ago, and it’s cool to see the space catch up to theory. The next step will be small models that do binary classification based on a set of arbitrary criteria.
George Halal@halal_george

Excited to share that we trained rerankers at the cost/performance frontier and are open sourcing them! Contextual AI Reranker v2 🚀 Best performing, most efficient reranker 🤗 Open weights (1B, 2B, 6B) 🫡 Instruction-following (including recency-awareness) 🌐 Multilingual 1/4

English
0
3
16
1.8K
George Halal
George Halal@halal_george·
@lgandecki "compared to the 2nd-best rerankers which are up to ~10x more expensive!” I’m now realizing that the line break makes it seem like it’s not part of the sentence above it
English
1
0
0
25
George Halal retweetledi
Douwe Kiela
Douwe Kiela@douwekiela·
We just released the latest version of our reranker: best performing, most efficient, open weights, instruction following, and multilingual. Try it out in your agentic RAG pipelines!
George Halal@halal_george

Excited to share that we trained rerankers at the cost/performance frontier and are open sourcing them! Contextual AI Reranker v2 🚀 Best performing, most efficient reranker 🤗 Open weights (1B, 2B, 6B) 🫡 Instruction-following (including recency-awareness) 🌐 Multilingual 1/4

English
6
4
44
6K
George Halal retweetledi
Sheshansh Agrawal
Sheshansh Agrawal@sheshanshag·
Performance on standard retrieval benchmarks like BEIR/ MMTEB hasn't correlated with performance on real world retrieval evaluation datasets for a while now. The causes are twofold: - Relevance is ill-defined and subjective. - Popular retrieval benchmarks are gameable. Here I describe how we tackled these challenges while building our second generation of rerankers. 1/N
Sheshansh Agrawal tweet media
George Halal@halal_george

Excited to share that we trained rerankers at the cost/performance frontier and are open sourcing them! Contextual AI Reranker v2 🚀 Best performing, most efficient reranker 🤗 Open weights (1B, 2B, 6B) 🫡 Instruction-following (including recency-awareness) 🌐 Multilingual 1/4

English
1
6
20
2.7K
George Halal retweetledi
Contextual AI
Contextual AI@ContextualAI·
🏆 It's official - Contextual AI is now at the top of the FACTS leaderboard for groundedness, beating out strong competition from Gemini 2.5 Pro and GPT-5! Congrats to our research team @w33lliam @rajan__vivek @nandita__naik @Thienhn97 @sheshanshag @shikibmehri on this awesome achievement!
Contextual AI tweet media
William Berrios@w33lliam

Tired of seeing O3 hallucinate? 😵‍💫 Today, I am excited to share how we built the least hallucinatory LLM in the 🌍 Our GLMv2, developed at @ContextualAI, just claimed 1st place 🥇 on the FACTS Grounded leaderboard by Google DeepMind — outperforming Gemini-2.5-pro, Claude 4, and O3 by 18%. 🤯 More details about our SFT and post-training recipe below 👇 1/N

English
0
6
18
6.1K
George Halal retweetledi
William Berrios
William Berrios@w33lliam·
📢 As promised ✨, we're open-sourcing LMUnit! Our SoTA generative model for fine-grained criteria evaluation of your LLM responses 🎯 ✅ SoTA on Flask & BigGbench ✅ SoTA generative reward model on RewardBench2 🤗 Models available on @huggingface: tiny.cc/qjzp001 💻 Github repo: github.com/ContextualAI/L… 📄 Paper: arxiv.org/abs/2412.13091 ✍️ Blog: contextual.ai/lmunit/ See more details in the quoted tweet👇
William Berrios@w33lliam

Excited to share 🤯 that our LMUnit models with @ContextualAI just claimed the top spots on RewardBench2 🥇 How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below: 🧵 1/11

English
1
14
34
7K