Luc Georges

367 posts

Luc Georges

@LucSGeorges

Software & ML Engineer @huggingface 🦀

Paris, France Katılım Aralık 2020

470 Takip Edilen1.7K Takipçiler

Sabitlenmiş Tweet

Luc Georges@LucSGeorges·12 Eyl

we've been pushing commits to transformers discretely, time to talk about we've been cooking the last few months: ⚡️ Continuous Batching is in transformers ⚡️ this will simplify, most notably, evaluation and your training loop: no need for extra dependencies or infra to get fast inference, and no need for convoluted code to update your weights note that speed is currently not on par with the best inference frameworks and servers out there and probably never will be the goal is *not* to become as fast: we want to complement the existing landscape with features like these, aiming for transformers to be the toolbox for tinkering with and building models

English

177

51.6K

Luc Georges@LucSGeorges·9 Mar

@eliebakouch 🥺🫶🔊🔊🐐

QME

116

elie@eliebakouch·9 Mar

today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)

English

116

745

32.7K

Luc Georges@LucSGeorges·5 Mar

I seem to have found somewhat of a sweet spot. Talk into Claude for the ideation phase, write down the plan, and do everything by hand myself, apart from tests maybe, who likes writing test amiright I question / rework / ignore everything written in plan as it often misses the target, but it does help me think through the problem in great detail. I go from one big plan to smaller in depth plans for each substep which works quite nicely. Co-ideating with Claude keeps the fun alive imo, so long you ask it to tweak / give feedback on your original ideas and have vision for what you want to do. It kind of feels like pair programming!

Adam@adamdotdev

Programming was deeply satisfying work to me. Work for hours/days before getting the payoff of the code working well on your machine. I’m feeling so much friction now to open the editor and do this kind of task by hand, but also increasingly depressed with the nature of work in an AI assisted dev workflow. Back and forth prompting seems to eat at my soul. Need to find a balance that brings back some of the toil.

English

316

Luc Georges@LucSGeorges·20 Şub

@ggerganov Incredible news, let's go 🔥🔥

English

466

Georgi Gerganov@ggerganov·20 Şub

Today ggml.ai joins Hugging Face Together we will continue to build ggml, make llama.cpp more accessible and empower the open-source community. Our joint mission is to make local AI easy and efficient to use by everyone on their own hardware.

Georgi Gerganov@ggerganov

I've started a company: ggml.ai From a fun side project just a few months ago, ggml has now become a useful library and framework for machine learning with a great open-source community

English

140

232

1.6K

296.5K

Luc Georges@LucSGeorges·2 Şub

@XciD_ @huggingface Claude climbing the echelons of HF leadership was not on my bingo card

English

420

Adrien Carreira@XciD_·2 Şub

Finally updated the org chart. Yes, Claude gets a @huggingface.co email. No, we're not discussing their compensation.

English

7.9K

Luc Georges retweetledi

Lysandre@LysandreJik·26 Oca

Transformers v5's FINAL, stable release is out 🔥 Transformers' biggest release. The big Ws of this release: - Performance, especially for MoE (6x-11x speedups) - No more slow/fast tokenizers -> way simpler API, explicit backends, better performance - dynamic weight loading: way faster, and enabling: MoE now working w/ {quants, tp, peft, ...} We have a migration guide on the main branch; please take a look at it in case you run into issues. Come in our GH issues if you still do after reading it 😀

English

434

75.3K

Luc Georges@LucSGeorges·29 Ara

@steeve you clearly recognise the dance moves lol

English

Steeve Morin@steeve·27 Ara

Okay that one was worth it

English

1.8K

Luc Georges@LucSGeorges·18 Ara

safetensors save_file on mac go brrrr ⚡️ been working hard these last few weeks on trying to make safetensors loading & writing faster found that skipping the os page cache with `F_NOCACHE` for write operations yields about 30% speed improvement more coming up, stay tuned

English

461

Luc Georges@LucSGeorges·15 Ara

@art_zucker @jaredpalmer @github or in any huggingface repo 🥹

English

Arthur Zucker@art_zucker·15 Ara

@jaredpalmer @github Would be happy to try it with Hugging Face on github.com/huggingface/tr…!

English

2.3K

Jared Palmer@jaredpalmer·12 Ara

We are sending out a proposal for Stacked Diffs on @GitHub to trusted design partners to gather initial feedback over the next few days. From there we’ll iterate and share the gameplan

Jared Palmer@jaredpalmer

RE: Stacked Diffs on @GitHub After discussion w @ttaylorr_b, we can implement stacked PRs/PR groups already (in fact we kind of do with Copilot) but restacking (automatically fanning out changes from the bottom of the the stack upwards) would be wildly inefficient. To do it right, we need to migrate @GitHub to use git reftables instead of packed-refs so that multi-ref updates / restacking will be O(n) instead of ngmi. This will take some time but has been greenlit.

English

132

2.5K

494.3K

Luc Georges@LucSGeorges·13 Ara

@silasmarvin2 Well I think there are multiple things happening at once. I wouldn’t say the loop is “hot” per se, maybe ~160 calls I think the issue is that the ok_or call was chained to a `&PyBound<PyDict>::get_item` call, in a context where the GIL wasn’t released (no allow_threads)

English

Silas@silasmarvin2·13 Ara

@LucSGeorges Oh wild! Did you figure out why it added such a large overhead? Was it just in a tiny hot loop?

English

Luc Georges@LucSGeorges·12 Ara

fun 🦀 fact: you can nuke performance with a misplaced `ok_or`

English

1.1K

Luc Georges@LucSGeorges·13 Ara

@nicolas_clark2 Yes sir

English

Luc Georges@LucSGeorges·11 Ara

🥺🥺

Lysandre@LysandreJik

🪦text-generation-inference is now in maintenance mode. Going forward, we will accept pull requests for minor bug fixes, documentation improvements and lightweight maintenance tasks. TGI has initiated the movement for optimized inference engines to rely on a transformers model architectures. This approach is now adopted by downstream inference engines, which we contribute to and recommend using going forward: @vllm_project, @sgl_project, as well as local engines with inter-compatibility such as llama.cpp or MLX.

ART

732

Luc Georges@LucSGeorges·5 Ara

@remi_or_ 👁️👄👁️

QME

Rémi Ouazan@remi_or_·5 Ara

thanks @LucSGeorges for the idea of this graph

English

159

Rémi Ouazan@remi_or_·5 Ara

this is what it looks like when you query an llm api with 500 requests each white pixel is an actual token, each black pixel is padding the issue is not that you send too many requests. it's that they are decoding for too long