Sofian Chaybouti (@ChaySofian) - Twitter Profili

Sabitlenmiş Tweet

Amazing work from the team to build FalconPerception and FalconOCR, two early-fusion perception and OCR models! Check out the demos, technical report, models, and PBench benchmark. Many thanks to the amazing people in the vision team and TII for making this possible.

Yasser Dahou@dahou_yasser

We are releasing Falcon Perception, an open-vocabulary referring expression segmentation model. Along with it, a 0.3B OCR model that is on par with 3-10x larger competitors. Current systems solve this with complex pipelines (separate encoders, late fusion, matching algorithms). We developed a novel simpler "bitter" approach: one early-fusion Transformer (image + text from first layer) with a shared parameter space, and let scale + training signal do the work. Please check our work ! 📄 Paper: arxiv.org/pdf/2603.27365 💻 Code: github.com/tiiuae/falcon-… 🎮 Playground: vision.falcon.aidrc.tii.ae 🤗 Blogpost: huggingface.co/blog/tiiuae/fa…

English

0

1

7

707

Sofian Chaybouti retweetledi

Google Gemma@googlegemma·16 Nis

Check out this amazing combo using Gemma4 + Falcon Perception for video tracking! 1️⃣Give Gemma 4 video frames 2️⃣It describes what it sees 3️⃣Falcon Perception takes those descriptions, segments the objects, and tracks them across the video! The best part? All running locally!

English

24

143

1.2K

82.6K

Sofian Chaybouti retweetledi

Yasser Dahou@dahou_yasser·15 Nis

okay another demo fo Gemma4 + Falcon Perception for automated video segmentation & tracking, no human prompts needed the idea: you feed Gemma4 a few sampled frames and ask it to describe what it sees. those descriptions get passed to Falcon Perception which segments and tracks them across the full video (using ByteTrack github.com/FoundationVisi…) you can steer what Gemma4 focuses on with different prompt levels: describe by visible text or brand -> dog with number 2 bib describe by spatial position -> horse on the right, horse in center, horse on the left describe by relationships -> rhinoceros walking with zebra same pipeline, different instructions ->different segmentation results. zero human labeling from raw video to tracked output. all local on M3 using mlx-vlm @Prince_Canuma @MaziyarPanahi Work done by @NarayanSanath Check our Falcon Perception repo: github.com/tiiuae/Falcon-…

English

9

32

347

51.3K

Sofian Chaybouti retweetledi

Yasser Dahou@dahou_yasser·12 Nis

tested Meta's Muse Spark @AIatMeta on level-1 tasks from our visres-bench (#CVPR2026) what it does is impressive, it doesn't just pick an answer. it crops the region, zooms into each boundary, traces edge continuity, checks lighting gradients. actual visual chain-of-thought. and tbh the reasoning is spot on. it identifies exactly the right cues to look at, but it still gets some wrong. and that's the interesting part, the failure isn't a reasoning failure. it's a perception issue imo. the model knows what to look for, it just can't resolve the fine-grained visual signal precisely enough to land on the right patch in all cases like the gap isn't "can it think about images" it clearly can. the gap is low-level spatial precision. and that's kinda an easier problem to solve ... maybe full traces here: YasserdahouML.github.io/visres-Bench

Yasser Dahou@dahou_yasser

Our Visual Reasoning Benchmark has been accepted to #CVPR2026 We wanted to know if VLMs can actually reason visually or if they're relying on text shortcuts. well -> take away the text context, and even the best models struggle hard we built a benchmark with 19k real images across 3 levels of difficulty. level 1: basic perception. can you complete the pattern or fix the occlusion? level 2: single rules. think raven's matrices but with real objects (color, count, orientation). level 3: multi-attribute. complex rules mixing everything together. paper: arxiv.org/abs/2512.21194 hf page: visres-bench.github.io with the amazing team @BrigiMala @andyhuynh1111 @NarayanSanath @lkhphuc @ChaySofian @griffintaur

English

0

3

22

2.6K

Sofian Chaybouti@ChaySofian·9 Nis

@NielsRogge Merged, thanks!

English

0

1

235

Niels Rogge@NielsRogge·9 Nis

@ChaySofian Congrats! Opened a tiny PR to fix metadata huggingface.co/tiiuae/siglino…

English

1

0

1

521

Sofian Chaybouti@ChaySofian·9 Nis

Happy that SigLino is a #CVPR2026 Highlight. It started as AMoE, focused purely on efficient MoE distillation (loss, data, multi-res management), and it is now a full series of Agglomerative ViTs (dense and MoE, from 30m to 0.6B params) distilled from SigLIP2 and DINOv3. We used the AMoE variant to initialize the vision experts of an early-fusion grounding MoE with modality-specific experts and show that it is a strong baseline on the small-scale training data regime on the refcoco benchmarks. Later, we figured that full early-fusion with a dense model works well, and even better, which led to Falcon Perception. Models: huggingface.co/collections/ti… Paper: arxiv.org/abs/2512.20157 Code: github.com/tiiuae/siglino… x.com/dahou_yasser/s… With @NarayanSanath @dahou_yasser @lkhphuc @griffintaur @HildeKuehne @hhacid

English

4

47

278

12.7K

Sofian Chaybouti@ChaySofian·9 Nis

and @andyhuynh1111

0

1

435

Sofian Chaybouti retweetledi

Phúc Lê@lkhphuc·8 Nis

Falcon-Perception can now be install with `pip install falcon-perception` - 1 model file - 2 variants Perception + OCR - Paged / Batch +KVCache inference engine - Torch+compile + cudagraph - Upsampler w/ async cache for high-res mask Plus: MLX batch inference support 🧵:

Yasser Dahou@dahou_yasser

We are releasing Falcon Perception, an open-vocabulary referring expression segmentation model. Along with it, a 0.3B OCR model that is on par with 3-10x larger competitors. Current systems solve this with complex pipelines (separate encoders, late fusion, matching algorithms). We developed a novel simpler "bitter" approach: one early-fusion Transformer (image + text from first layer) with a shared parameter space, and let scale + training signal do the work. Please check our work ! 📄 Paper: arxiv.org/pdf/2603.27365 💻 Code: github.com/tiiuae/falcon-… 🎮 Playground: vision.falcon.aidrc.tii.ae 🤗 Blogpost: huggingface.co/blog/tiiuae/fa…

English

1

5

30

3.6K

Sofian Chaybouti retweetledi

AK@_akhaliq·6 Nis

Falcon Perception paper: huggingface.co/papers/2603.27…

Français

1

8

45

6.3K

Sofian Chaybouti retweetledi

Yasser Dahou@dahou_yasser·5 Nis

haha fair questions, FP does open vocab + referring expression, so the prompts for the agent are more flexible than SAM3's prompting. it can pass things like "the player on the right" or "the sign with whatever written on it" and FP handles it, lesser tool calls overall ... check the paper and PBench please arxiv.org/pdf/2603.27365 table 7 where SAM3 is restricted to levels 0 and 1, whereas FP can go up to level-4 (Relationships & inter- actions), check table 1 for the levels definition

English

2

3

7

341

Sofian Chaybouti retweetledi

Maziyar PANAHI@MaziyarPanahi·5 Nis

I showed you SAM 3 all week. This is a 0.6B model that outperforms it. Falcon Perception. Type "detect the plane" and it segments every plane in the frame. Pixel-accurate masks from natural language. Fighter jets. Fire. Crowds. All on a MacBook via MLX. No cloud.

English

18

79

892

62.7K

Sofian Chaybouti retweetledi

Prince Canuma@Prince_Canuma·4 Nis

mlx-vlm v0.4.4 is out 🚀🔥 New models: 🦅 Falcon-Perception 300M by @TIIuae Highlights: ⚡️ TurboQuant Metal kernels optimized — upto 1.90x decode speed up over baseline on longer context with 89% KV cache savings. 👀 VisionFeatureCache — multi-turn image caching so you don’t re-encode the same image every turn. 🔧Gemma 4 fixes — chunked prefill for KV-shared models & thinking, vision + text degradation, processor config, and nested tool parsing 📹Video CLI fixes Get started today: > uv pip install -U mlx-vlm Shoutout to the awesome @N8Programs for helping me spot and fix some critical yet subtle issues on Gemma 4 ❤️ Happy easter everyone 🐣 and remember to leave us a star ⭐️ github.com/Blaizzy/mlx-vlm

English

16

41

370

87.3K

Sofian Chaybouti retweetledi

Prince Canuma@Prince_Canuma·3 Nis

mlx-vlm v0.4.3 is here 🚀 Day-0 support: 🔥 Gemma 4 (vision, audio, MoE) by @GoogleDeepMind 🦅 Falcon-OCR + Falcon Perception by @TIIuae 🪨 Granite Vision 4.0 by @IBMResearch New models: 🎯 SAM 3.1 with Object Multiplex by @facebook 🔍 RF-DETR detection & segmentation by @roboflow Infra: ⚡ TurboQuant (KV cache compression) 🖥️ CUDA support for vision models (Sam and RF-DETR) Get started today: > uv pip install -U mlx-vlm Leave us a star ⭐️ github.com/Blaizzy/mlx-vlm

English

77

192

2K

999.9K

Sofian Chaybouti retweetledi

Yasser Dahou@dahou_yasser·1 Nis

Huge team effort at @TIIuae thanks to: @lkhphuc @ChaySofian @LiAvBev @andyhuynh1111 @NarayanSanath @griffintaur wamiqpara @hhacid Would love feedback: where does it work, and where does it break?

English

0

3

16

2.1K

Sofian Chaybouti retweetledi

Yasser Dahou@dahou_yasser·18 Mar

People can find all model variants here huggingface.co/collections/ti… We’ve added dense variants alongside the Agglomerative MoE models.

Yasser Dahou@dahou_yasser

Happy to share that our paper AMoE is accepted at #CVPR2026! we distill SigLIP2 and DINOv3 into a single MoE student. 📄 Paper: arxiv.org/pdf/2512.20157 🤗 Models: huggingface.co/tiiuae/amoe 💻 Code: github.com/tiiuae/amoe with the amazing team @ChaySofian @lkhphuc @griffintaur @HildeKuehne @NarayanSanath

English

0

4

12

1.6K

Sofian Chaybouti retweetledi

Yasser Dahou@dahou_yasser·11 Mar

Our Visual Reasoning Benchmark has been accepted to #CVPR2026 We wanted to know if VLMs can actually reason visually or if they're relying on text shortcuts. well -> take away the text context, and even the best models struggle hard we built a benchmark with 19k real images across 3 levels of difficulty. level 1: basic perception. can you complete the pattern or fix the occlusion? level 2: single rules. think raven's matrices but with real objects (color, count, orientation). level 3: multi-attribute. complex rules mixing everything together. paper: arxiv.org/abs/2512.21194 hf page: visres-bench.github.io with the amazing team @BrigiMala @andyhuynh1111 @NarayanSanath @lkhphuc @ChaySofian @griffintaur