Ahmed

12.5K posts

Ahmed

@halflings

Designed in 🇲🇦, based in 🇨🇭

Zürich, Switzerland Katılım Haziran 2008

1.8K Takip Edilen3K Takipçiler

Ahmed@halflings·21 Nis

@antoine_chaffin Could you clarify this part (filtering), Antoine? What is the "drop" here? Were these sampled as positive pairs, but the cross-encoder assigned a very low score to them so they are filtered out? What does "expose filters as metadata" mean? How are they used in training?

English

Antoine Chaffin@antoine_chaffin·21 Nis

Then we carefully curated it all We developed per-source structural filters, applied deduplication and computed cross-encoder relevance on every pair Crucially, every curation step is non-destructive We keep all the data and expose filters as metadata, applied at training time

English

Antoine Chaffin@antoine_chaffin·21 Nis

The new generation of open state-of-the-art single and multi-vector retrieval models is here It's time, DenseOn with the LateOn 🎶 @LightOnIO releases models that leap past existing ones, and everything you need to do the same!

English

223

39K

Ahmed@halflings·18 Nis

@Kimi_Moonshot Amazing work from the Kimi team as usual 👏👏

English

164

Ahmed retweetledi

Kimi.ai@Kimi_Moonshot·18 Nis

We push Prefill/Decode disaggregation beyond a single cluster: cross-datacenter + heterogeneous hardware, unlocking the potential for significantly lower cost per token. This was previously blocked by KV cache transfer overhead. The key enabler is our hybrid model (Kimi Linear), which reduces KV cache size and makes cross-DC PD practical. Validated on a 20x scaled-up Kimi Linear model: ✅ 1.54× throughput ✅ 64% ↓ P90 TTFT → Directly translating into lower token cost. More in Prefill-as-a-Service: arxiv.org/html/2604.1503…

English

345

2.9K

682.7K

Ahmed retweetledi

Logan Kilpatrick@OfficialLoganK·15 Nis

Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available via our new audio playground in AI Studio and in the Gemini API!

English

248

418

5.9K

791.9K

Ahmed@halflings·15 Nis

@juminoz @anorth_chen It doesn’t invalidate the prefix cache because the trajectory fed at the next step is not recomputed every time (eg not reopening the 65 files it previously did), it’s just passing whatever it had last time

English

Jack Vinijtrongjit | Saakuru Labs@juminoz·14 Nis

@halflings @anorth_chen When a file is changed, the cache is invalidated. If you have 2 agents within the same scope, that’s a major issue. Set them to run in their own subfolder to avoid this.

English

226

North@CreaoAI@anorth_chen·13 Nis

Peter是我们的CTO，一个月前他开始实施新构建的AI-First的工作方式，结果是显而易见的，我们现在每天至少有20个PR会合并上线。产品changlog可以非常直观的体现我们团队的效率：docs.creao.ai/community-and-… 没想到他会写文章把这么干货的实践经验分享出来。我推荐X上所有的founders好好读一下这篇文章，如果你们团队还在以AI-assisted而不是AI-first的方式去运转，很可能会在未来一年以内就逐渐淡出这个市场了。

Peter Pang@intuitiveml

x.com/i/article/2043…

中文

146

881K

Ahmed@halflings·14 Nis

@juminoz @anorth_chen How does one agent invalidate the other agent’s prefix cache exactly?

English

236

Jack Vinijtrongjit | Saakuru Labs@juminoz·14 Nis

No. That's not the assumption. The harness as to be so good that the context will never cross since prompt caching for Claude and many LLM is based on prefix. You run 2 agents that crosses path and one will invalidate the other and the cost immediately spike by 9x. So create a different branch is one way to get around this, but if you run subagents within the same branch, you are shooting yourself in the foot.

English

465

Ahmed retweetledi

Omar Khattab@lateinteraction·11 Nis

As promised, here's a recording of my 30-min keynote and the subsequent Q&A for the inaugural late interaction retrieval (LIR) workshop, cc @bclavie @antoine_chaffin. The talk is admittedly advanced, as it's directed at an expert IR community. But hopefully still broadly useful!

Amélie Chatelain@AmelieTabatta

Lots of people interested in the late Interaction workshop, listening to @lateinteraction's keynote!

English

106

805

231.4K

Ahmed@halflings·6 Nis

@jescalan Give Gemini a go, even the 3.1 Flash Lite endpoint is really good with tool use! I'm using it on a couple personal projects now.

English

Jeff Escalante@jescalan·5 Nis

Transition to gpt in openclaw not going so great this far 😂

English

438

280

10.7K

572.1K

Ahmed@halflings·3 Nis

@fraenkelj @GarneloMarta @wojczarnecki This was a really really great episode! It would be great to have this on YouTube so it has more reach, and it's easier to watch while on the go

English

123

Ahmed retweetledi

Jeremy Fraenkel@fraenkelj·18 Mar

First Principles is our series of honest conversations about AI. No script. No slides. No talking points. First up is a fascinating discussion between our Chief Science Officer @GarneloMarta & @wojczarnecki fundamental.tech/news/introduci…

English

365

3.3M

Ahmed retweetledi

Anish Moonka@anishmoonka·16 Mar

Every time you get a cancer biopsy, the lab makes a tissue slide that costs about $5. It shows the shape of your cells under a microscope, and every cancer patient already has one on file. There’s a much fancier version of that test called multiplex immunofluorescence (basically a protein-level map showing which immune cells are near your tumor and what they’re doing). It costs thousands of dollars per sample, takes specialized equipment most hospitals don’t have, and barely scales. But it’s the kind of data oncologists need to figure out whether immunotherapy will actually work for you. Right now, only about 20 to 40% of cancer patients respond to immunotherapy, and one of the biggest reasons is that doctors can’t easily tell whether a tumor is “hot” (immune cells actively fighting it) or “cold” (immune system ignoring it). Microsoft, Providence Health, and the University of Washington trained an AI to analyze the $5 slide and predict what the expensive test would show across 21 different protein markers. They called it GigaTIME, trained it on 40 million cells in which both the cheap slide and the expensive test coexisted, and then turned it loose on 14,256 real cancer patients across 51 hospitals in 7 US states. The results landed in Cell, one of the most selective journals in biology. The model generated about 300,000 virtual protein maps covering 24 cancer types and 306 subtypes. It found 1,234 real, verified connections between immune cell behavior, genetic mutations, tumor staging, and patient survival that were previously invisible at this scale. When they tested it against a completely separate database of 10,200 cancer patients, the results matched up almost perfectly (0.88 out of 1.0 agreement). Nature Methods named spatial proteomics (mapping where specific proteins sit inside your tissue) its Method of the Year in 2024, and specifically cited GigaTIME in a March 2026 update as a model that “democratizes” this kind of analysis. The full model is open-source on Hugging Face. Any cancer research lab with archived biopsy slides, and most of them have thousands, can now run virtual immune profiling without buying a single piece of new equipment.

Satya Nadella@satyanadella

We’ve trained a multimodal AI model to turn routine pathology slides into spatial proteomics, with the potential to reduce time and cost while expanding access to cancer care.

English

103

1.8K

11.1K

944.6K

Ahmed retweetledi

Andy Masley@AndyMasley·13 Mar

Each frontier AI model seems to use a little under a year's worth of a square mile of farmland's water to train. I think about this as the country having 4 square miles of farmland sectioned off to grow some of the most popular consumer products in history.

English

213

472

8.1K

600.6K

Ahmed retweetledi

Mixedbread@mixedbreadai·12 Mar

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.

English

121

951

197.4K

Ahmed@halflings·4 Mar

@MossyPathways @NoamShazeer one possible reason -> prefill-heavy tasks like summarizing a huge amount of text the small model is going to be much faster at that, even w/ thinking on

English

Serene Trails@MossyPathways·3 Mar

@NoamShazeer Serious question - why add thinking to small models? Wouldn't flash with minimal thinking be better than flash-lite high, with similar latency profile? Seems redundant.

English

288

Noam Shazeer@NoamShazeer·3 Mar

📢Introducing Gemini 3.1 Flash-Lite, our fastest and most efficient model, built for high-volume workloads. It outperforms 2.5 Flash in reasoning, reliability, and scalability at a lower cost. This model also introduces thinking levels. You can adjust compute by complexity of the task, burning zero thinking overhead on high-volume tasks, while reasoning through the complex edge cases. Maximum intelligence, minimal latency. Read more: blog.google/innovation-and…

English

219

13.6K

Ahmed retweetledi

Qwen@Alibaba_Qwen·2 Mar

🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast, great for edge device • 4B → a surprisingly strong multimodal base for lightweight agents • 9B → compact, but already closing the gap with much larger models And yes — we’re also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…

English

914

2.9K

21.4K

Ahmed retweetledi

Unitree@UnitreeRobotics·16 Şub

Unitree Spring Festival Gala Robots —a Full Release of Additional Details 🥳 Dozens of G1 robots achieved the world’s first fully autonomous humanoid robot cluster Kung Fu performance (with quick movement), pushing motion limits and setting multiple world firsts! H2 made striking appearances at both the Beijing main venue and the Yiwu sub-venue, clad in the Monkey King’s heavy armor and riding a “somersault cloud” played by B2W quadruped robot dogs, delivering New Year blessings from the clouds.

English

1.2K

4.4K

27.1K

27.9M

Ahmed retweetledi

Jerry Liu@jerryjliu0·17 Şub

Happy Chinese / lunar new year! 🧧 Growing up in the US, I used to watch the CNY gala 春节晚会 with my parents on tape delay broadcast from CCTV1 Now having spent most of my working career in AI, it's come full circle and this is one of the most insane things I've seen

English

358

40.5K

Ahmed retweetledi

shaurya@shauseth·15 Şub

> ibn battuta > 21 yo moroccan in 1325 > decide to do hajj to mecca > tells parents brb in 16 months > end up traveling for 29 yrs > visit every sultan or king in the east > they all give him wives and money for some reason > reach india > sultan of delhi gives him a job as a judge > doesn't know indian law but you can just do things > get kidnapped and robbed several times > end up in maldives where he'll marry four times > become a judge there too > finally make it to china > (forrest gump voice) i think i'll go home now > get back to morocco without dying > sultan asks to dictate whole story to a writer > tfw longest distance travelled of any explorer before steam age

English

145

23.1K

708.8K

Ahmed@halflings·24 Oca

@kidehen @aakashgupta @grok I honestly started by assuming you’re just an AI bot given the last answer, but damn :) you did actually create a cheesecake ontology I think structured representations are definitely super convenient and part of the solution.

English

Kingsley Uyi Idehen@kidehen·24 Oca

An here's interacting with an AI Agent equipped with SPARQL, SQL, and GraphQL querying capabilities. #asi-54773" target="_blank" rel="nofollow noopener">linkeddata.uriburner.com/chat/?chat_id=… -- static view scoped specifically to New York Cheese Cake example from this session. linkeddata.uriburner.com/chat/?chat_id=… -- animated walkthrough the entire session As I said, in the age of AI, ontology creation, deployment, and use is beyond trivial 😀

English

Aakash Gupta@aakashgupta·23 Oca

Vector databases might be the wrong abstraction for document retrieval. A new open-source approach called PageIndex just hit 98.7% accuracy on a financial benchmark, beating traditional RAG by 30+ points. No embeddings. No chunking. No vector DB. The insight: when a 10-K says “see Note 15 for debt details,” vector search has no idea what that means. It retrieves whatever text looks similar to your query, not whatever text actually answers it. Similarity and relevance are different things. PageIndex builds a hierarchical tree from document structure, then uses LLM reasoning to traverse it. The model asks “where would an expert look?” instead of “what text looks similar?” The math is stark. Traditional RAG systems hover around 60-70% on FinanceBench. That 30-point gap represents every time vector search found semantically similar text but missed the actual answer buried in an appendix or cross-referenced table. What makes this interesting: the infrastructure is simpler, not more complex. No vector DB to maintain. No embedding pipeline. No chunking decisions. Just a tree and reasoning. Vector search was the best we had when LLMs couldn’t reason well enough to navigate document structure. Now they can. The techniques we built around their limitations are becoming the bottleneck. For simple use cases, vector RAG still wins on speed and simplicity. But for professional documents requiring multi-step reasoning, treating structure as signal instead of noise changes everything.

Avi Chawla@_avichawla

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

English

228

2.9K

582.1K

Ahmed@halflings·24 Oca

@kidehen @aakashgupta Cool! Now give step by step instructions to bake a cheesecake

English

Kingsley Uyi Idehen@kidehen·24 Oca

Ontologies, as I’m using the term here, have nothing to do with committees—that association comes from formalontologies and their governance processes. I’m referring instead to the use of existing shared ontologies, or even home-grown ontologies created ad hoc—on a whim, if necessary. So, what is an ontology, really? It’s a description of entity and relationship types that provides a framework for coherent communication about the subject matter within a given discourse realm. Ontologies are not tightly coupled to any particular expression notation or serialization format. They simply aid conceptualization and the transmission of meaning through language. And what is language? The systematic use of signs, syntax, and semantics to encode and decode information. In conclusion: The world needs ontologies because communication has always been driven by ontologies—formal and informal alike. The age of LLMs, in which natural language has become a core part of computing’s UI/UX, only reinforces all of this —IMHO 😀

English

137

Keşfet

@antoine_chaffin @LightOnIO @Kimi_Moonshot @juminoz @anorth_chen @bclavie @jescalan @fraenkelj