Jama 🚀

454 posts

Jama 🚀

@jama_______

☁️ founder and lifelong student 👨‍💼 MBA @UCLAAnderson + 🤓 alumni @Caltech 🐕 rare breed of biz & tech nerd 🌴 #longLA

Los Angeles Katılım Temmuz 2014

994 Takip Edilen273 Takipçiler

Jama 🚀 retweetledi

AK@_akhaliq·7 Ağu

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion discuss: huggingface.co/papers/2408.03… We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.

English

157

812

66.4K

Jama 🚀 retweetledi

Vaibhav (VB) Srivastav@reach_vb·17 Haz

Apple dropped 4M: Massively Multilingual Masked Modeling! 🔥 Is this what powers the on-device vision-text backbone? > A framework for training any-to-any multimodal foundational models. Training/ Finetuning/ Inference. > Release 4M-7 and 4M-21 model checkpoints (traied acros tens of tasks and modalities). > 198M, 705M and 2.8B model checkpoints. > release specialised Text to Image and image super resolution specialist model checkpoints. > Apache 2.0 license for code and weights! > Essentially a unfied transformer encoder-decoder model trained on masked modeling objective. > Spread across RGB, Edge, Geometric, Text, Semantic, Feature map, and more modalities. > Model checkpoints on the Hub 🤗 Kudos to EPFL and Apple. Specially liked the any-to-any generation bit paired with multimodal chained generation! ⚡

English

117

664

71.8K

Jama 🚀 retweetledi

Siddharth Sharma@siddrrsh·9 May

Introducing Gemma with a 10M context window We feature: • 1250x context length of base Gemma • Requires less than 32GB of memory • Infini-attention + activation compression Check us out on: • 🤗: tinyurl.com/gemma10hf • GitHub: tinyurl.com/gemma10git • Technical blog: tinyurl.com/gemma10blog Built by: @maxaljadery, @AkshGarg03, @siddrrsh

English

136

964

271.6K

Jama 🚀 retweetledi

AK@_akhaliq·25 Mar

Champ Controllable and Consistent Human Image Animation with 3D Parametric Guidance In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion

English

549

194.4K

Jama 🚀 retweetledi

AK@_akhaliq·21 Mar

Meta announces SceneScript Reconstructing Scenes With An Autoregressive Structured Language Model We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our

English

104

546

46.1K

Jama 🚀 retweetledi

AK@_akhaliq·13 Mar

Motion Mamba Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains

English

186

22.9K

Jama 🚀 retweetledi

AK@_akhaliq·12 Mar

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

English

271

32.1K

Jama 🚀 retweetledi

AK@_akhaliq·11 Mar

CRM Single Image to 3D Textured Mesh with Convolutional Reconstruction Model Feed-forward 3D generative models like the Large Reconstruction Model (LRM) have demonstrated exceptional generation speed. However, the transformer-based methods do not leverage the geometric

English

308

103.9K

Jama 🚀 retweetledi

MrNeRF@janusch_patas·2 Mar

Forget about #Sora. DUSt3R is the real deal. I took two pictures of our kitchen that barely overlap. It took << 2sec on a RTX 4090 to reconstruct it in an insane quality. Can we get out a point cloud for Gaussian Splatting #3DGS training + the camera poses?

English

158

1.1K

103.3K

Jama 🚀 retweetledi

AK@_akhaliq·1 Mar

Snap presents Panda-70M Captioning 70M Videos with Multiple Cross-Modality Teachers The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to

English

224

43K

Jama 🚀 retweetledi

AK@_akhaliq·28 Şub

Ideogram AI presents Ideogram 1.0 text-to-image model offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting

English

529

96.1K

Jama 🚀 retweetledi

Vaibhav (VB) Srivastav@reach_vb·28 Şub

Fuck! This is wild! 🤯 Infinite AI Jam - mix instruments, genres and more! MusicFX: aitestkitchen.withgoogle.com/tools/music-fx

English

364

35K

Jama 🚀 retweetledi

Jim Fan@DrJimFan·12 Eyl

A neural network can smell like humans do for the first time!👃🏽 Digital smell is a modality that AI community has long ignored, but maybe one day useful for robot chef 👩🏽‍🍳? Here's how to do smell2text: 1. Collected 5,000 molecules and ask humans to label "creamy, chocolate, alcoholic, beefy, spicy, citrus", etc. This dataset is one of its kind and a huge contribution from the paper. 2. Train a graph neural network (GNN) to map the molecule to label. Each molecule is a graph of atoms described by valence, degree, hydrogen count, hybridization, formal charge, atomic number, etc. 3. The GNN predictions match well with expert humans on novel smells. 4. The embeddings give us a "Principal Odor map (POM)" that faithfully represents hierarchies and distances among odorants.

English

117

745

3.8K

836.2K

Jama 🚀 retweetledi

Micah Berkley - The 50 Cent of AI.@MicahBerkley·27 Şub

So I did a thing... I complained about it long enough so I wrote my first peer-reviewed @arxiv paper... and I got shredded. 🤣 lmaooo. Not on the content necessarily; that was solid and Ivy League worthy. I got toasted on my use of urban language, professionalism and depth of knowledge. The paper is about how advanced AI like GANs & VAEs (Stable Diffusion, MJ, Dall-E) can overtly hyper-sexualize the female/male representation & often boost the biases in their learning data, especially against my African American folks. Diving deep into how these monster models often get it wrong with race & gender with AA and it's an insanely poor reflection of this culture. Breaking down the real talk on diversity in AI imagery & pushing for a change toward more inclusive & fair AI development Here's a shortened version of my abstract, not to bore you... "Generative artificial intelligence (AI) models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), have shown remarkable capabilities in synthesizing realistic images, including human likenesses. Despite their advancements, these models inherit and potentially amplify societal biases present in their training datasets, particularly concerning race and gender representations. This paper examines the portrayal of African American men and women by generative image models, hypothesizing that these models often rely on datasets that do not accurately represent the diversity and reality of African American life. By analyzing the outcomes of generated images, this study aims to uncover biases in attractiveness, economic status, and other societal attributes, contributing to the discussion on ethical AI development and the importance of inclusive training datasets." Below is one of my many roasts. I will reveal the entire paper once I get these suggestions remediated. #edTech #AIethics #DiversityInTech #InclusiveAI #BiasInAI #GenerativeModels #TechEquity #AIForGood #SocialImpactOfAI #RacialBiasInAI #GenderBiasInAI #AIArt #ChatGPT #Dalle #Midjourney #GPT4 #GPT5

Micah Berkley - The 50 Cent of AI. tweet media

Miami, FL 🇺🇸 English

Jama 🚀 retweetledi

AK@_akhaliq·26 Şub

Google presents Genie Generative Interactive Environments introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

English

500

2.3K

684.2K

Jama 🚀 retweetledi

AK@_akhaliq·21 Şub

Google releases Gemma a family of lightweight, state-of-the-art open models for their class built from the same research & tech used to create the Gemini models. Gemma is available worldwide starting today in two sizes (2B and 7B), supports a wide range of tools and systems, and runs on a developer laptop and workstation

English

160

808

125.2K

Jama 🚀 retweetledi

AK@_akhaliq·19 Şub

AnimateLCM-SVD-xt is out fast image to video generation AnimateLCM-SVD-xt can generally produces videos with good quality in 4 steps without requiring the classifier-free guidance, and therefore can save 25 x 2 / 4 = 12.5 times compuation resources compared with normal SVD models

English

303

64.1K

Jama 🚀 retweetledi

Saining Xie@sainingxie·16 Şub

Here's my take on the Sora technical report, with a good dose of speculation that could be totally off. First of all, really appreciate the team for sharing helpful insights and design decisions – Sora is incredible and is set to transform the video generation community. What we have learned so far: - Architecture: Sora is built on our diffusion transformer (DiT) model (published in ICCV 2023) — it's a diffusion model with a transformer backbone, in short: DiT = [VAE encoder + ViT + DDPM + VAE decoder]. According to the report, it seems there are not much additional bells and whistles. - "Video compressor network": Looks like it's just a VAE but trained on raw video data. Tokenization probably plays a significant role in getting good temporal consistency. By the way, VAE is a ConvNet, so DiT technically is a hybrid model ;) (1/n)

English

523

2.6K

1.3M

Jama 🚀 retweetledi

Eduardo Borges@duborges·15 Şub

This video was generated by Sora. That's the new model by OpenAI. The most advanced text-to-video tool created so far. I'll share the videos here. Absolutely insane. Prompt: This close-up shot of a Victoria crowned pigeon showcases its striking blue plumage and red chest. Its crest is made of delicate, lacy feathers, while its eye is a striking red color. The bird’s head is tilted slightly to the side, giving the impression of it looking regal and majestic. The background is blurred, drawing attention to the bird’s striking appearance.

English

751

2.3K

16.8K

17M

Jama 🚀 retweetledi

Sean McDonald@seanmcdonaldxyz·8 Şub

Incredible presenters this month for @AITinkerers LA: @WilliamBakst: "Extraction: Making Using Tools With OpenAI Clean And Simple" @ryanrsamii is talking about an "AI POC Demo" for the legal industry, and @jama_______ is talking about "AI based YouTube personalities".

English

820

Keşfet

@AkshGarg03 @siddrrsh @arxiv @AITinkerers @WilliamBakst @ryanrsamii @elonmusk @BarackObama