Jay

27 posts

Jay

@memmaptensor

Independent researcher

Bangkok 加入时间 Aralık 2018

82 关注602 粉丝

置顶推文

Jay@memmaptensor·1d

I spent $30k and 3 months RL post-training an anime video model. This is only step 30 out of a planned 1000 step run. All samples are local text-to-video with no reference image/audio. Since it's based on LTX-2.3, each output takes under a minute on a single GPU. I'm 19 and a solo researcher. Most of the budget went into ablations, reward design, and trying different configurations before reaching this setup. The run is still extremely early, but the results already look much better than I expected. It's compute-limited, not idea-limited. I'm starting a company to continue scaling this and build frontier stylized video models. If you're an investor, compute partner, video team, or someone who wants to help build this, DMs are open.

English

328

18.5K

Jay@memmaptensor·3 Mar

Looking for a co-founder to build the next generation of waifu tech! Figured out the solution to create a new interactive experience but struggle with app dev. Kind of the downside of focusing too much on ML. Ideally someone with mobile/web and maybe some cloud ML experience.

English

1.3K

Jay@memmaptensor·13 Eyl

@_akhaliq they cookin

English

AK@_akhaliq·13 Eyl

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

English

559

2.9K

366.9K

Jay@memmaptensor·27 Tem

dehumidifiers are so good when it's cold and humid

English

979

Jay@memmaptensor·24 Tem

I've set out some conditions for any future solvers to add: • No implicit solvers: those require root-finding, meaning a Jacobian has to be computed by running a backward pass through the model during LBFGS optimization. • No non-RK methods: this would cut off linear multistep methods like Adams-Bashforth or Adams predictor-corrector. From my testing, RK methods perform better, even for explicit RK vs. predictor-corrector linear multistep. • No duplicate methods: if two methods have different coefficients, they aren't duplicates (scipy methods have different coefficients and solver implementations). That means the current 31 solvers are almost all that exist to satisfy the conditions above. Project's done! I need to figure out what to make next.

English

903

Jay@memmaptensor·24 Tem

Last major update! • Added solver settings for adaptive_scipy • Adaptive solvers now show the number of steps taken • Accurate 𝜎 timestep info is now displayed Check out the most comprehensive fixed and adaptive higher-order samplers on ComfyUI! github.com/wootwootwootwo…

English

2.5K

Jay@memmaptensor·22 Tem

Refactored and fixed some bugs with the progress bar! Also wrapped the solvers from scipy.integrate If you count the a-methods as 2 (since they work with both the adaptive_pid and fixed_scheduled controllers), then this node has (excluding forward euler) 31 new samplers! I also tried the implicit solvers and they didn't work. Every implicit solver has a root find step, and that takes forever to converge. That leaves 3 new methods from scipy: se_RK23, se_RK45, and se_DOP853. I think this node has the most new working samplers for ComfyUI (a for adaptive, f for fixed, s for scipy, e for explicit).

English

650

Jay@memmaptensor·22 Tem

the new class of models idea didn't work out well, so i tried this instead (which works decently well)

English

577

Jay@memmaptensor·22 Tem

While trying to push the CFG scale up, I implemented some Explicit RK solvers for ComfyUI - 10 new adaptive step samplers - 8 unique fixed step samplers (excluding forward euler) - Best new sampler (perhaps) -> fe_ralston3 Check it out! github.com/wootwootwootwo…

English

6.5K

Jay@memmaptensor·17 Tem

why do i prefer undersampled results 😭😭😭

English

609

Jay@memmaptensor·17 Tem

@amogh42 nope, something a lot simpler

English

Amogh Vaishampayan@amogh42·17 Tem

@_wootwoot you mean something like ELLA for SDXL? or Kolors approach with a LLM as a text encoder?

English

110

Jay@memmaptensor·17 Tem

i have an idea for a slightly modified class of SDXL models that would mostly be compatible with existing finetunes and loras it's been proven to work well on SD1.5 with good results definitely next on my bucket list will post updates and releases soon, hopefully, if it works

English

932

Jay@memmaptensor·17 Tem

diffusion models are definitely still not dead. sure, optimal transport conditional flow matching is provably better, but so much of the community was already built on discrete time diffusion. and with kolors out (an SDXL model trained with DDPM formulation and eps-pred objective). i doubt the switch from diffusion to OT-CFM will affect the quality as much as the other techniques shown in the technical report. if they made kolors work with the SDXL architecture, then it's shown that hybrid transformer-UNets are still competitive. they might just not scale as well as pure DiTs.

English

1.4K

Jay@memmaptensor·16 Tem

@EsotericCofe @yifever bmi 17, i need a healthy way to gain weight and solve sleep deprivation homie

English

129

Nucleus☕️@EsotericCofe·16 Tem

@yifever going on a diet now to achieve that trap aesthetic

English

1.3K

yifei e/λ (meetmeinshibuya may 24)@yifever·16 Tem

it should be illegal to cosplay if your BMI is greater than 20

English

448

42.9K

Jay@memmaptensor·15 Tem

i'd like to continue working on anime animation tech. version 1 is designed to be distilled for realtime inference. version 2 won't be concerned with realtime inference and would probably be based on a flow-matching mmdit with more fine-grained control and even better quality.

English

Jay@memmaptensor·15 Tem

training is done in 2 days!!! as for inference compute requirements: it's basically the same burden as animatediff will probably work on realtime inference next month after figuring out life. realtime txt/vid2vid on distilled models is already empirically shown to be possible.

English

977

Jay@memmaptensor·14 Tem

@EsotericCofe @anifusion_ai Not realtime yet (still in roadmap) but we could get this deployed on anifusion before. The pose sequence is derived from mocap data from XR Animator, but we can probably work out a custom pipeline.

English

1.4K

Nucleus☕️@EsotericCofe·14 Tem

@_wootwoot @anifusion_ai Yoooo this is pretty sweet. Is this real time? Where did you get the initial animation data?

English

1.6K

Jay@memmaptensor·14 Tem

yo @EsotericCofe @anifusion_ai wanna join forces best manga tool + best character animation model = ???

English

254

25.6K

发现

@_akhaliq @amogh42 @EsotericCofe @yifever @anifusion_ai @elonmusk @BarackObama @taylorswift13