Jay

27 posts

Jay banner
Jay

Jay

@memmaptensor

Independent researcher

Bangkok Entrou em Aralık 2018
82 Seguindo602 Seguidores
Tweet fixado
Jay
Jay@memmaptensor·
I spent $30k and 3 months RL post-training an anime video model. This is only step 30 out of a planned 1000 step run. All samples are local text-to-video with no reference image/audio. Since it's based on LTX-2.3, each output takes under a minute on a single GPU. I'm 19 and a solo researcher. Most of the budget went into ablations, reward design, and trying different configurations before reaching this setup. The run is still extremely early, but the results already look much better than I expected. It's compute-limited, not idea-limited. I'm starting a company to continue scaling this and build frontier stylized video models. If you're an investor, compute partner, video team, or someone who wants to help build this, DMs are open.
English
36
31
334
18.9K
Jay
Jay@memmaptensor·
Looking for a co-founder to build the next generation of waifu tech! Figured out the solution to create a new interactive experience but struggle with app dev. Kind of the downside of focusing too much on ML. Ideally someone with mobile/web and maybe some cloud ML experience.
English
5
1
13
1.3K
Jay
Jay@memmaptensor·
@_akhaliq they cookin
English
0
0
8
1K
AK
AK@_akhaliq·
Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.
English
97
559
2.9K
366.9K
Jay
Jay@memmaptensor·
dehumidifiers are so good when it's cold and humid
English
0
0
1
983
Jay
Jay@memmaptensor·
I've set out some conditions for any future solvers to add: • No implicit solvers: those require root-finding, meaning a Jacobian has to be computed by running a backward pass through the model during LBFGS optimization. • No non-RK methods: this would cut off linear multistep methods like Adams-Bashforth or Adams predictor-corrector. From my testing, RK methods perform better, even for explicit RK vs. predictor-corrector linear multistep. • No duplicate methods: if two methods have different coefficients, they aren't duplicates (scipy methods have different coefficients and solver implementations). That means the current 31 solvers are almost all that exist to satisfy the conditions above. Project's done! I need to figure out what to make next.
English
0
0
1
905
Jay
Jay@memmaptensor·
Last major update! • Added solver settings for adaptive_scipy • Adaptive solvers now show the number of steps taken • Accurate 𝜎 timestep info is now displayed Check out the most comprehensive fixed and adaptive higher-order samplers on ComfyUI! github.com/wootwootwootwo…
Jay tweet mediaJay tweet media
English
0
3
20
2.5K
Jay
Jay@memmaptensor·
Refactored and fixed some bugs with the progress bar! Also wrapped the solvers from scipy.integrate If you count the a-methods as 2 (since they work with both the adaptive_pid and fixed_scheduled controllers), then this node has (excluding forward euler) 31 new samplers! I also tried the implicit solvers and they didn't work. Every implicit solver has a root find step, and that takes forever to converge. That leaves 3 new methods from scipy: se_RK23, se_RK45, and se_DOP853. I think this node has the most new working samplers for ComfyUI (a for adaptive, f for fixed, s for scipy, e for explicit).
Jay tweet media
English
0
0
5
652
Jay
Jay@memmaptensor·
the new class of models idea didn't work out well, so i tried this instead (which works decently well)
English
0
0
5
577
Jay
Jay@memmaptensor·
While trying to push the CFG scale up, I implemented some Explicit RK solvers for ComfyUI - 10 new adaptive step samplers - 8 unique fixed step samplers (excluding forward euler) - Best new sampler (perhaps) -> fe_ralston3 Check it out! github.com/wootwootwootwo…
Jay tweet media
English
2
10
85
6.5K
Jay
Jay@memmaptensor·
why do i prefer undersampled results 😭😭😭
English
1
0
1
609
Jay
Jay@memmaptensor·
@amogh42 nope, something a lot simpler
English
1
0
0
72
Amogh Vaishampayan
Amogh Vaishampayan@amogh42·
@_wootwoot you mean something like ELLA for SDXL? or Kolors approach with a LLM as a text encoder?
English
2
0
0
110
Jay
Jay@memmaptensor·
i have an idea for a slightly modified class of SDXL models that would mostly be compatible with existing finetunes and loras it's been proven to work well on SD1.5 with good results definitely next on my bucket list will post updates and releases soon, hopefully, if it works
English
2
0
16
932
Jay
Jay@memmaptensor·
diffusion models are definitely still not dead. sure, optimal transport conditional flow matching is provably better, but so much of the community was already built on discrete time diffusion. and with kolors out (an SDXL model trained with DDPM formulation and eps-pred objective). i doubt the switch from diffusion to OT-CFM will affect the quality as much as the other techniques shown in the technical report. if they made kolors work with the SDXL architecture, then it's shown that hybrid transformer-UNets are still competitive. they might just not scale as well as pure DiTs.
English
1
0
16
1.4K
Jay
Jay@memmaptensor·
@EsotericCofe @yifever bmi 17, i need a healthy way to gain weight and solve sleep deprivation homie
English
2
0
2
129
Nucleus☕️
Nucleus☕️@EsotericCofe·
@yifever going on a diet now to achieve that trap aesthetic
English
2
0
11
1.3K
Jay
Jay@memmaptensor·
i'd like to continue working on anime animation tech. version 1 is designed to be distilled for realtime inference. version 2 won't be concerned with realtime inference and would probably be based on a flow-matching mmdit with more fine-grained control and even better quality.
English
2
0
24
1K
Jay
Jay@memmaptensor·
training is done in 2 days!!! as for inference compute requirements: it's basically the same burden as animatediff will probably work on realtime inference next month after figuring out life. realtime txt/vid2vid on distilled models is already empirically shown to be possible.
English
1
0
9
977
Jay
Jay@memmaptensor·
@EsotericCofe @anifusion_ai Not realtime yet (still in roadmap) but we could get this deployed on anifusion before. The pose sequence is derived from mocap data from XR Animator, but we can probably work out a custom pipeline.
English
3
1
21
1.4K
Jay
Jay@memmaptensor·
yo @EsotericCofe @anifusion_ai wanna join forces best manga tool + best character animation model = ???
Jay tweet media
English
12
29
254
25.6K