Undi

50 posts

Undi banner
Undi

Undi

@Undi95

Oui.

Katılım Kasım 2018
79 Takip Edilen82 Takipçiler
Undi
Undi@Undi95·
@MelyTi La femme en 2024
Undi tweet mediaUndi tweet media
Français
0
0
0
22
Camille G.
Camille G.@MelyTi·
Depuis quelques jours vous avez du voir passer cette terrible histoire. Là je m’adresse aux H, souvent sujet à être complexés par une calvitie naissante ou des trous dans la barbe … ce n’est PAS GRAVE. On S EN FOUT. Vous êtes les seuls à y prêter attention.
Le Parisien@le_Parisien

Mathieu a mis fin à ses jours, trois mois après avoir reçu une greffe de barbe à Istanbul, en Turquie. Résultat désastreux, douleurs atroces, sentiment de trahison… Son père Jacques témoigne ➡️ l.leparisien.fr/BwqW

Français
1.1K
2.3K
20K
5.1M
Undi
Undi@Undi95·
@ostrisai @bfl_ml @araminta_k LFG, really hype to see the guide. Tried a train with 300 pics, it was burned. I'm not even sure it's the right config to use.
English
0
0
1
212
Ostris
Ostris@ostrisai·
Did a lot of testing on my LoRA training script for @bfl_ml FLUX.1 dev model. Amazing model! I think it is finally ready. Running smooth on a single 4090. Posting a guide tomorrow. Special thanks to @araminta_k for helping me test on her amazing original character and artwork.
English
33
90
663
980.8K
Undi
Undi@Undi95·
@bdsqlsz Did you already got some result from the other train? Did it work?
English
1
0
0
164
washingtoncarver
washingtoncarver@washin111112ijj·
@ostrisai pretty sure simpletuner crew already beat you to it this week
English
1
0
4
84
Ostris
Ostris@ostrisai·
I take this back. I managed to squeeze LoRA training for FLUX.1-schnell-train on a single 4090 with 8bit mixed precision. We will see how well it works. 3s/iter.
Ostris tweet media
Ostris@ostrisai

This can be optimized further. I think you could maybe do mixed precision 8bit quantization on the transformer (maybe). But, no matter how optimized it gets, I don't think it will ever be possible to train on current consumer hardware (<=24gb). Someone please prove me wrong.

English
13
26
244
20.2K
Undi
Undi@Undi95·
After a long wait, Ikari and me finally made a new release of our last model on NeverSleep repo: Lumimaid-v0.2. This model can be used in different size, from the small Llama-3.1-8B to the gigantic Mistral-Large-123B! huggingface.co/collections/Ne… Hope you will enjoy them!
Undi tweet media
English
0
0
0
423
Undi
Undi@Undi95·
@BrianRoemmele Next step, you will run Grok on my Gameboy lmao Good luck Even 1bit quantization will need a big GPU, 314B is enormous I dunno if you realize
English
0
0
2
111
Brian Roemmele
Brian Roemmele@BrianRoemmele·
BOOM! It’s HERE: xai-org/grok: Grok open release! We have been quantizing Grok! My goal is to have it run locally on ANY modern computer. We will even get a model for a Raspberry Pi. Soon. Here is the model link: github.com/xai-org/grok
English
44
144
917
118.4K
Undi retweetledi
OpenRouter
OpenRouter@OpenRouter·
Noromaid Mixtral 8x7B Instruct An awesome new Mixtral fine-tune from IkariDev and @undi95, the creators of Remm Slerp and Noromaid, suitable for roleplaying: openrouter.ai/models/neversl…
English
0
1
5
654
Undi retweetledi
Far El
Far El@far__el·
Frankensteining models is a phenomenon that only exists thanks to a bunch of scrappy GPU poor open sorcerers to cut costs on crafting new larger and smaller models from existing pretrains and finetunes, rather than training from scratch. Truly a form of digital alchemy. @nisten @Undi95 @chargoddard
English
3
17
124
8.2K
Wing Lian (caseus)
Wing Lian (caseus)@winglian·
It is sad to see other creators who did exactly this not credited in these "papers". Here's Mistral 11B using the exact same "depth up-scaling" and layers as Solar huggingface.co/Undi95/Mistral… that was released 3 months ago. People have been doing this technique for a while now using Charles Goddard's mergekit github.com/cg123/mergekit repo.
Sung Kim@hunkims

#solarllm @upstageai Tech report: arxiv.org/abs/2312.15166 We introduce depth up-scaling (DUS), a novel technique to up-scale base LLMs efficiently and effectively in a simple manner. In contrast to mixture-of-experts (MoE), DUS does not require complex changes to train and inference. Using DUS, we build SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Comparative evaluations show that SOLAR 10.7B outperforms existing open-source pretrained LLMs, such as Llama 2 and Mistral 7B. We additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

English
5
21
166
59.9K