Pierre-Antoine Bannier

1

19

Akash Basudevan@PackBropagated·20h

@el_PA_B @ggerganov 5 fps is still decent for a start. Nice! Will see if I can accelerate it on mobile tpus

English

0

1

35

Pierre-Antoine Bannier@el_PA_B·1d

sam3.cpp - Meta's SAM 3 in pure C++ with @ggerganov's ggml - Supports SAM 3.1, 3, 2.1, 2 and EdgeTAM - FP16, 4-bit quant (EdgeTAM in 15 MB) - Apple Metal GPU, CUDA, CPU - Text-prompted: "peach" → every peach - Single-file C++14 Performance-wise: - 100ms object detection, segmentation - Video object segmentation @ 20FPS on M4 Pro with EdgeTAM github.com/PABannier/sam3…

English

12

120

878

56K

Pierre-Antoine Bannier@el_PA_B·22h

I think it’s still compute. According to the EdgeTAM paper, the cross attention blocks of SAM between the tokens of the current frame and the memory latents yield huge matrix multiplication. Mobile devices have a limited number of parallel compute units, making huge matrix multiplication inefficient. It’s in FP16. From my experiments, 4-bit quant did not provide a great speed-up but degraded the segmentation quality (with a lot of masks with holes).

English

0

1

45

tang | AI Product Maker@justic_hot·22h

@el_PA_B @ggerganov 15fps on-device is honestly better than I expected. is that with FP16 or 4-bit quant? wondering if the wall is memory bandwidth or compute at that point

English

0

65

Pierre-Antoine Bannier@el_PA_B·22h

@PackBropagated @ggerganov For now, I can reach 5 FPS for video segmentation using Metal (Apple GPU). To unlock the 15 FPS as presented in EdgeTAM paper, I have to compile the graph and serve it via CoreML to run it on the ANE.

English

0

1

126

Akash Basudevan@PackBropagated·22h

@el_PA_B @ggerganov Any perf numbers on mobile devices?

English

0

98

Pierre-Antoine Bannier@el_PA_B·22h

@roverwanderer_ @ggerganov Yes it’s coming. I’m fixing a few things to make it work!

English

1

122

Rover Wanderer@roverwanderer_·1d

@el_PA_B @ggerganov Great stuff 🤘 You mentioned SAM 3.1 , but can't see it in the repo?

English

0

1

136

Pierre-Antoine Bannier@el_PA_B·22h

@justic_hot @ggerganov I know ! For now, I can reach at most 15FPS on my iPhone Pro Max 15 with EdgeTAM by serving the model with CoreML.

English

0

220

tang | AI Product Maker@justic_hot·1d

@el_PA_B @ggerganov EdgeTAM at 15MB. we're gonna have real-time segmentation on a raspberry pi by summer at this rate

English

0

3

432

Pierre-Antoine Bannier@el_PA_B·22h

@Hermansdorfer @ggerganov SAM 3.1 is on the way. I still have a few fixes to make it work!

English

1

145

Mariusz Hermansdorfer@Hermansdorfer·1d

@el_PA_B @ggerganov Looks great! You mentioned SAM 3.1 and CUDA support but I can't see these wired up in the repo. Is this still on the roadmap?

English

0

2

363

Pierre-Antoine Bannier@el_PA_B·1d

@arre_ankit @ggerganov sam3.cpp uses the same tensor library for machine learning as llama.cpp: ggml. I had to write several custom kernels on my fork of ggml to make the forward pass of Sam work (mostly convolution stuff) on Metal (Apple GPU).

English

0

9

1K

Ankit Kumar@arre_ankit·1d

@el_PA_B @ggerganov Really cool project! Did you use llamma.cpp at all to optimize for Apple GPU? I've been exploring SAM for image segmentation excited to dig into this over the weekend.

English

0

1.3K

Pierre-Antoine Bannier@el_PA_B·23 Mar

x.com/i/article/2036…

ZXX

281

Pierre-Antoine Bannier retweetledi

Pall Melsted@pmelsted·6 Mar

Excited to share this preprint that describes my latest work on using GPUs to accelerate processing of RNA-seq data. The title says it all: "RNA-seq analysis in seconds using GPUs" now on biorxiv biorxiv.org/content/10.648… Figure 1 shows they key result

English

15

121

482

89.5K

Pierre-Antoine Bannier@el_PA_B·14 Şub

@iScienceLuvr I’m curious to have your perspective., I also wrote one. open.substack.com/pub/pierreanto…

English

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

1

40

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·13 Şub

would you guys be interested in a thread/article talking more about the impact pathology foundation models can have in oncology and pharma?

tissue biopsy slides are the most common biological samples in oncology. they're used as the gold standard for diagnosing cancer. this is why @SophontAI builds foundation models trained on tissue slides these models will facilitate better understanding & treatment of cancer!

English

5

39

4.4K

Pierre-Antoine Bannier@el_PA_B·17 Oca

@1littlecoder cc @doodlestein

2

33

1LittleCoder💻@1littlecoder·17 Oca

This is the new-age click farms!

English

2

1

5

651

Pierre-Antoine Bannier@el_PA_B·12 Oca

@sinuhet I haven’t installed it on any network. It’s running locally for research purposes. The data I used comes from a public dataset (TCGA). I used this software to visualize the predictions of cell segmentation models.

English

0

68

Sinuhet@Beareka·12 Oca

@el_PA_B Allow me a VERY practical question: how have you managed to install a vibe coded software on a hospital network? Particularly if you upload samples with patients data? Or have you taken patients tissue images out of the network and uploaded them into your personal computer?

English

2

0

1

150

Pierre-Antoine Bannier@el_PA_B·12 Oca

For my work, I need to visualize cell segmentation and tissue annotations on whole slide images. These are massive pathology scans, multi-gigabyte files with millions of cells. Problem: my data format wasn't compatible with any existing WSI viewer out there. So I figured why not just vibe-code my own pathology viewer that supports my exact format? I genuinely had no idea how long a project like this would take me. I decided to go with C++, mainly for performance reasons, but also because I'm familiar with the language and knew I'd need every bit of speed I could get. I set myself one rule: I wouldn't write a single line of code. Claude would do it all. I used @doodlestein's methodology, which breaks down like this: Step 1: Write a ridiculously detailed plan I drafted a comprehensive implementation plan and had it reviewed by multiple agents: Opus 4.5 (ultra thinking mode), Codex 5.2 (extra thinking), and Gemini. Then I let Claude Code synthesize all three perspectives into one coherent plan. It felt a bit like assembling a council of war advisors before a medieval siege. Step 2: Break it into atomic beads I created a series of small, sequential tasks that followed the implementation plan from step 1. Roughly speaking: GUI → canvas → opening a slide → rendering cells → rendering tiles. Each bead was self-contained and clearly scoped. Step 3: Quality control throughout During the process, I regularly checked in on Claude to make sure it was following professional C++ design patterns, the kind you'd expect from a world-class C++ programmer. At one point, I started referencing the book C++ Software Design: Design Principles and Patterns for High-Quality Software by Klaus Iglberger. This seemed to activate some deeper part of Claude's internal knowledge. The code quality noticeably improved. The part that surprised me the most: Multiple times, Claude's first solution wasn't optimal. For instance, it couldn't efficiently render 1M+ cell polygons on the first try. The initial version was slow, jittery, borderline unusable. So I told it to think deeply about the hot path. To use time-efficient data structures (e.g. building a spatial index) for the problem at hand. To multi-thread the rendering loop so it could determine what's visible to the user versus what's off-screen. And surprisingly, it almost always one-shot the optimized solution. Once I pointed it in the right direction, it figured out the implementation with minimal back-and-forth. Things that helped along the way: Writing lots of tests. For the more difficult beads, I asked Claude to use a test-driven development approach. This gave it a clear objective to optimize against: make the test pass. It worked remarkably well. Isolation before optimization. For performance-critical prompts, I learned to explicitly say: "First, isolate the part of the code we're trying to optimize before diving into the implementation." This prevented Claude from making sweeping changes when surgical precision was needed. The viewer works. It's fast. It handles my weird data format perfectly. And I still haven't written a single line of C++.

English

github.com/PABannier/Path…

2

24

6.6K

Pierre-Antoine Bannier@el_PA_B·12 Oca

ZXX

2

320

Pierre-Antoine Bannier@el_PA_B·12 Oca

@doodlestein Thank you for sharing so much on X, helped me a lot! Will do and I’ll report the results once done 🫡

English

Pierre-Antoine Bannier@el_PA_B

1

67

Jeffrey Emanuel@doodlestein·12 Oca

Awesome, glad to see my workflows are helping. Now that you have it working, try doing this optimization workflow and I bet you will be able to very dramatically improve performance from here (do rounds 1 and 2 separately): x.com/doodlestein/st…

For my work, I need to visualize cell segmentation and tissue annotations on whole slide images. These are massive pathology scans, multi-gigabyte files with millions of cells. Problem: my data format wasn't compatible with any existing WSI viewer out there. So I figured why not just vibe-code my own pathology viewer that supports my exact format? I genuinely had no idea how long a project like this would take me. I decided to go with C++, mainly for performance reasons, but also because I'm familiar with the language and knew I'd need every bit of speed I could get. I set myself one rule: I wouldn't write a single line of code. Claude would do it all. I used @doodlestein's methodology, which breaks down like this: Step 1: Write a ridiculously detailed plan I drafted a comprehensive implementation plan and had it reviewed by multiple agents: Opus 4.5 (ultra thinking mode), Codex 5.2 (extra thinking), and Gemini. Then I let Claude Code synthesize all three perspectives into one coherent plan. It felt a bit like assembling a council of war advisors before a medieval siege. Step 2: Break it into atomic beads I created a series of small, sequential tasks that followed the implementation plan from step 1. Roughly speaking: GUI → canvas → opening a slide → rendering cells → rendering tiles. Each bead was self-contained and clearly scoped. Step 3: Quality control throughout During the process, I regularly checked in on Claude to make sure it was following professional C++ design patterns, the kind you'd expect from a world-class C++ programmer. At one point, I started referencing the book C++ Software Design: Design Principles and Patterns for High-Quality Software by Klaus Iglberger. This seemed to activate some deeper part of Claude's internal knowledge. The code quality noticeably improved. The part that surprised me the most: Multiple times, Claude's first solution wasn't optimal. For instance, it couldn't efficiently render 1M+ cell polygons on the first try. The initial version was slow, jittery, borderline unusable. So I told it to think deeply about the hot path. To use time-efficient data structures (e.g. building a spatial index) for the problem at hand. To multi-thread the rendering loop so it could determine what's visible to the user versus what's off-screen. And surprisingly, it almost always one-shot the optimized solution. Once I pointed it in the right direction, it figured out the implementation with minimal back-and-forth. Things that helped along the way: Writing lots of tests. For the more difficult beads, I asked Claude to use a test-driven development approach. This gave it a clear objective to optimize against: make the test pass. It worked remarkably well. Isolation before optimization. For performance-critical prompts, I learned to explicitly say: "First, isolate the part of the code we're trying to optimize before diving into the implementation." This prevented Claude from making sweeping changes when surgical precision was needed. The viewer works. It's fast. It handles my weird data format perfectly. And I still haven't written a single line of C++.

English

1

24

4.8K

Pierre-Antoine Bannier@el_PA_B·12 Oca

@permianaccel @davidbessis True. Every education system "clips the tails" at some point. Galois was an outlier among outliers and got filtered out.

English

2

309

Tannhäuser@permianaccel·11 Oca

@el_PA_B @davidbessis What about people like Galois? Such men were filtered out in spite of their genius.

English

Vikash K Prasad@VikashS73164257

0

385

David Bessis@davidbessis·11 Oca

A related question is this: if math is all about IQ, and if IQ is 80% heritable, then how could France receive almost 20% of all Fields medals ever awarded? Are we the elected people?

@Zoomerjeet True

English

160

66

1.5K

177.4K

Pierre-Antoine Bannier@el_PA_B·11 Oca

IMO, every large country has a similar pool of latent mathematical talent. The difference lies in how well the system identifies and develops it. France's edge isn't smarter people, it's a more effective funnel from raw aptitude to world-class research. A combination of innate ability, intense selection, and elite training.

English

1

35

2.7K

David Bessis@davidbessis·11 Oca

@el_PA_B Of course, but 1/ this hints at a cultural factor, 2/ this puts a ceiling on how much innate the outcome might be.

English

0

104

10.9K

Pierre-Antoine Bannier@el_PA_B·9 Oca

@doodlestein Are you making these subagents (skills?) in Claude Code? Feels like it could be a great use case.

English