Florian Soudan

74 posts

Florian Soudan banner
Florian Soudan

Florian Soudan

@FSoudan

AI team lead, I manage cross functional teams delivering industrial AI solutions.

Montréal, Québec 가입일 Ekim 2013
195 팔로잉20 팔로워
Florian Soudan
Florian Soudan@FSoudan·
@RisingSayak @ben_nebulous Surprisingly Nano Banana 2 is doing quite well but love the work done to have physically correct. Ground work for world models!
Florian Soudan tweet media
English
1
0
1
76
Sayak Paul
Sayak Paul@RisingSayak·
Editing images is a series of state transitions between the source image and the edited image that we want. Yet, the existing paradigm doesn't explicitly include any transitioning priors in the editing process. This becomes particularly prevalent for edits, involving causal dynamics (e.g., refraction, deformation). To model this kind of physics-informed information, we leverage the rich priors present in videos and introduce PhysicEdit 🔥 TL;DR: We fine-tune QwenImage Edit on a curated dataset of videos with reasoning traces and fixed-length transition queries to do solid physics-aware image editing! In the process, we introduce a cool dataset "PhysicTran38K", consisting of 38K transition trajectories across five physical domains and devise a method to provide supervision from it QwenImage Edit. Hop in to learn more ⬇️
Sayak Paul tweet media
English
12
39
344
42.6K
Florian Soudan
Florian Soudan@FSoudan·
@PontiEdoardo Very interesting! Isn’t the model way slower as the inputs and outputs are now 3 to 5 times longer?
English
1
0
1
103
Edoardo Ponti
Edoardo Ponti@PontiEdoardo·
Finally, you can count the r's in strawberry and check if 3.11 is higher than 3.9 without tokenisation interfering: Here's Bolmo, a fully open byte-level LLM with latent tokenisation, derived from a SOTA LLM (Olmo 3). Promising on coding and char-level understanding!
Ai2@allen_ai

Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3—and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧵

English
2
7
44
4.2K
Florian Soudan
Florian Soudan@FSoudan·
@mervenoyann Thanks for that. Generic question on your training notebooks: why you don’t use lightning or any PyTorch training lib? Is do you need specific control you would not get?
English
0
0
0
34
merve
merve@mervenoyann·
made a small notebook on fine-tuning DINOv3 on image classification 🦖🦕 we will have DINOv3 task heads in transformers at some point, but you can customize and use this notebook in the meantime! 🤗
merve tweet media
English
12
44
421
26.1K
Florian Soudan
Florian Soudan@FSoudan·
@Prince_Canuma Hey MLX Prince! I was wandering if you were planning to keep working on Phi 4? I tried transformer code and the ONNX version both on MPS with no luck. Thanks again for all the work you do for the community
English
1
0
1
99
Prince Canuma
Prince Canuma@Prince_Canuma·
Hell yeah! 🔥 Phi-4-multimodal port to MLX update #05 Language model only inference is working 🚀 Next step, load LoRAs and test vision and audio inference.
English
5
2
86
6.7K
François Chollet
François Chollet@fchollet·
Twitter used to be my favorite place on the Internet. I've derived enormous value from it in the past 16 years. Not true anymore. Most of the people I enjoyed reading have left. My feed, which used to feature art and science and technology and humor, has become constant political propaganda -- on the opposite side of the Enlightenment values I believe in. (I like the rule of law, free speech, free markets, and democracy. I like science and reject obscurantism. I can see that Putin is a dictator, not a genius role model, and that it is Russia that is invading Ukraine, not the other way around.) I have been coming here less and less as a result -- it only brings me negative emotions. I never thought that would be possible, but I might eventually stop coming altogether.
English
553
405
6.9K
457K
AK
AK@_akhaliq·
4K followers on @huggingface 🔥
AK tweet media
English
4
6
116
17.9K
Florian Soudan
Florian Soudan@FSoudan·
@simonw We did extensive OCR tests and Gemini Flash 1.5 is amazing, you can disable any safety check in the arguments of the call. No open source model comes close to it
English
1
0
4
115
Simon Willison
Simon Willison@simonw·
Multimodal models like GPT-4o and Claude 3 Opus and Google Gemini seem great for OCR at first, but they're no good if they're going to refuse to return text because the content disagrees with their content policies, or they skip text labeled "ignore this text:" in the document!
English
10
6
133
14.2K
Simon Willison
Simon Willison@simonw·
Any OCR models out there with LLM-like capabilities - like the ability to "guess" partial words based on context - but that don't follow extra instructions or apply safety filters of any kind? I want reliable OCR that can't be prompt injected and that won't sometimes refuse text
English
57
29
580
218.6K
Florian Soudan
Florian Soudan@FSoudan·
@Prince_Canuma Love the composition and design of B. I think it just lacks "Llama" for your usage
English
1
0
1
11
Prince Canuma
Prince Canuma@Prince_Canuma·
Excited to announce MLX-VLLM 🎉 The first local framework for Vision Large Language inference powered by MLX. Still WIP and we are open for contributions. It will be on pipy soon. 🚀 github.com/Blaizzy/mlx-vl…
English
8
29
198
36.7K
arpit
arpit@arpitingle·
running llama3-8b-instruct-4bit using mlx
arpit tweet media
English
8
5
123
10.3K
Florian Soudan
Florian Soudan@FSoudan·
@julien_c Excellent, thanks 🤗 team! It works perfectly and Lama 3 70b is so fast!
English
0
0
0
38
Julien Chaumond
Julien Chaumond@julien_c·
we just shipped HuggingChat on iOS 💬 The app is super polished and gives you access to the community's best open AI models, on the go. Give it a try! link to Appstore below ⤵️
Julien Chaumond tweet media
English
76
133
829
176K
merve
merve@mervenoyann·
Ever wanted to learn about fantastic vision language models and how to find and fine-tune them? 🧙🏻 We've just added support to train VLMs like LLaVa in TRL and wrote a walkthrough on vision language models! 🎉 Read about VLMs and SFTTrainer for vision hf.co/blog/vlms
merve tweet media
English
12
29
177
37.5K
Florian Soudan
Florian Soudan@FSoudan·
@Lykon4072 Congrats! This result is amazing with very difficult features usually not well understood by diffusion models: a single contrastive color applied only on the proper spots and high contrast. Question: do you get the texture directly out of #SD3 (no upscaler or refiner or post)?
English
0
0
0
439
Ben Geskin
Ben Geskin@BenGeskin·
Lenovo's transparent laptop is so futuristic 🤩
Lietuvių
269
460
4.2K
734K
Florian Soudan
Florian Soudan@FSoudan·
@Lykon4072 Thanks great work! The model looks amazing, can’t wait to get access to it
English
0
0
0
23
Florian Soudan
Florian Soudan@FSoudan·
@Lykon4072 SD3 seems to be very "raw" compared to MJ6 which tends to force its aesthetic. How much have you worked on the style in the prompt to get this image? Or was it a lucky generation?
English
2
0
0
162
Florian Soudan
Florian Soudan@FSoudan·
@bartczernicki Not really. It is undeniably better but we still have 6 fingers and weird morphs 😉
English
1
0
0
21