Jama 🚀

454 posts

Jama 🚀 banner
Jama 🚀

Jama 🚀

@jama_______

☁️ founder and lifelong student 👨‍💼 MBA @UCLAAnderson + 🤓 alumni @Caltech 🐕 rare breed of biz & tech nerd 🌴 #longLA

Los Angeles Katılım Temmuz 2014
994 Takip Edilen273 Takipçiler
Jama 🚀 retweetledi
AK
AK@_akhaliq·
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion discuss: huggingface.co/papers/2408.03… We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.
English
11
157
812
66.4K
Jama 🚀 retweetledi
Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav@reach_vb·
Apple dropped 4M: Massively Multilingual Masked Modeling! 🔥 Is this what powers the on-device vision-text backbone? > A framework for training any-to-any multimodal foundational models. Training/ Finetuning/ Inference. > Release 4M-7 and 4M-21 model checkpoints (traied acros tens of tasks and modalities). > 198M, 705M and 2.8B model checkpoints. > release specialised Text to Image and image super resolution specialist model checkpoints. > Apache 2.0 license for code and weights! > Essentially a unfied transformer encoder-decoder model trained on masked modeling objective. > Spread across RGB, Edge, Geometric, Text, Semantic, Feature map, and more modalities. > Model checkpoints on the Hub 🤗 Kudos to EPFL and Apple. Specially liked the any-to-any generation bit paired with multimodal chained generation! ⚡
Vaibhav (VB) Srivastav tweet media
English
3
117
664
71.8K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
Champ Controllable and Consistent Human Image Animation with 3D Parametric Guidance In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion
English
6
88
549
194.4K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
Meta announces SceneScript Reconstructing Scenes With An Autoregressive Structured Language Model We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our
English
2
104
546
46.1K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
Motion Mamba Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains
AK tweet media
English
4
38
186
22.9K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency
English
4
47
271
32.1K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
CRM Single Image to 3D Textured Mesh with Convolutional Reconstruction Model Feed-forward 3D generative models like the Large Reconstruction Model (LRM) have demonstrated exceptional generation speed. However, the transformer-based methods do not leverage the geometric
English
3
72
308
103.9K
Jama 🚀 retweetledi
MrNeRF
MrNeRF@janusch_patas·
Forget about #Sora. DUSt3R is the real deal. I took two pictures of our kitchen that barely overlap. It took << 2sec on a RTX 4090 to reconstruct it in an insane quality. Can we get out a point cloud for Gaussian Splatting #3DGS training + the camera poses?
English
29
158
1.1K
103.3K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
Snap presents Panda-70M Captioning 70M Videos with Multiple Cross-Modality Teachers The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to
AK tweet media
English
3
39
224
43K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
Ideogram AI presents Ideogram 1.0 text-to-image model offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting
English
4
90
529
96.1K
Jama 🚀 retweetledi
Jim Fan
Jim Fan@DrJimFan·
A neural network can smell like humans do for the first time!👃🏽 Digital smell is a modality that AI community has long ignored, but maybe one day useful for robot chef 👩🏽‍🍳? Here's how to do smell2text: 1. Collected 5,000 molecules and ask humans to label "creamy, chocolate, alcoholic, beefy, spicy, citrus", etc. This dataset is one of its kind and a huge contribution from the paper. 2. Train a graph neural network (GNN) to map the molecule to label. Each molecule is a graph of atoms described by valence, degree, hydrogen count, hybridization, formal charge, atomic number, etc. 3. The GNN predictions match well with expert humans on novel smells. 4. The embeddings give us a "Principal Odor map (POM)" that faithfully represents hierarchies and distances among odorants.
Jim Fan tweet mediaJim Fan tweet media
English
117
745
3.8K
836.2K
Jama 🚀 retweetledi
Micah Berkley - The 50 Cent of AI.
So I did a thing... I complained about it long enough so I wrote my first peer-reviewed @arxiv paper... and I got shredded. 🤣 lmaooo. Not on the content necessarily; that was solid and Ivy League worthy. I got toasted on my use of urban language, professionalism and depth of knowledge. The paper is about how advanced AI like GANs & VAEs (Stable Diffusion, MJ, Dall-E) can overtly hyper-sexualize the female/male representation & often boost the biases in their learning data, especially against my African American folks. Diving deep into how these monster models often get it wrong with race & gender with AA and it's an insanely poor reflection of this culture. Breaking down the real talk on diversity in AI imagery & pushing for a change toward more inclusive & fair AI development Here's a shortened version of my abstract, not to bore you... "Generative artificial intelligence (AI) models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), have shown remarkable capabilities in synthesizing realistic images, including human likenesses. Despite their advancements, these models inherit and potentially amplify societal biases present in their training datasets, particularly concerning race and gender representations. This paper examines the portrayal of African American men and women by generative image models, hypothesizing that these models often rely on datasets that do not accurately represent the diversity and reality of African American life. By analyzing the outcomes of generated images, this study aims to uncover biases in attractiveness, economic status, and other societal attributes, contributing to the discussion on ethical AI development and the importance of inclusive training datasets." Below is one of my many roasts. I will reveal the entire paper once I get these suggestions remediated. #edTech #AIethics #DiversityInTech #InclusiveAI #BiasInAI #GenerativeModels #TechEquity #AIForGood #SocialImpactOfAI #RacialBiasInAI #GenderBiasInAI #AIArt #ChatGPT #Dalle #Midjourney #GPT4 #GPT5
Micah Berkley - The 50 Cent of AI. tweet media
Miami, FL 🇺🇸 English
8
1
26
2K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
Google presents Genie Generative Interactive Environments introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
English
77
500
2.3K
684.2K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
Google releases Gemma a family of lightweight, state-of-the-art open models for their class built from the same research & tech used to create the Gemini models.  Gemma is available worldwide starting today in two sizes (2B and 7B), supports a wide range of tools and systems, and runs on a developer laptop and workstation
AK tweet media
English
17
160
808
125.2K
Jama 🚀 retweetledi
AK
AK@_akhaliq·
AnimateLCM-SVD-xt is out fast image to video generation AnimateLCM-SVD-xt can generally produces videos with good quality in 4 steps without requiring the classifier-free guidance, and therefore can save 25 x 2 / 4 = 12.5 times compuation resources compared with normal SVD models
English
6
60
303
64.1K
Jama 🚀 retweetledi
Saining Xie
Saining Xie@sainingxie·
Here's my take on the Sora technical report, with a good dose of speculation that could be totally off. First of all, really appreciate the team for sharing helpful insights and design decisions – Sora is incredible and is set to transform the video generation community. What we have learned so far: - Architecture: Sora is built on our diffusion transformer (DiT) model (published in ICCV 2023) — it's a diffusion model with a transformer backbone, in short: DiT = [VAE encoder + ViT + DDPM + VAE decoder]. According to the report, it seems there are not much additional bells and whistles. - "Video compressor network": Looks like it's just a VAE but trained on raw video data. Tokenization probably plays a significant role in getting good temporal consistency. By the way, VAE is a ConvNet, so DiT technically is a hybrid model ;) (1/n)
Saining Xie tweet media
English
39
523
2.6K
1.3M
Jama 🚀 retweetledi
Eduardo Borges
Eduardo Borges@duborges·
This video was generated by Sora. That's the new model by OpenAI. The most advanced text-to-video tool created so far. I'll share the videos here. Absolutely insane. Prompt: This close-up shot of a Victoria crowned pigeon showcases its striking blue plumage and red chest. Its crest is made of delicate, lacy feathers, while its eye is a striking red color. The bird’s head is tilted slightly to the side, giving the impression of it looking regal and majestic. The background is blurred, drawing attention to the bird’s striking appearance.
English
751
2.3K
16.8K
17M
Jama 🚀 retweetledi
Sean McDonald
Sean McDonald@seanmcdonaldxyz·
Incredible presenters this month for @AITinkerers LA: @WilliamBakst: "Extraction: Making Using Tools With OpenAI Clean And Simple" @ryanrsamii is talking about an "AI POC Demo" for the legal industry, and @jama_______ is talking about "AI based YouTube personalities".
English
2
5
5
820