PeizeSun

22 posts

PeizeSun

@PeizeSun

@xAI AGI | @grok Multimodal, Imagine | prev Meta HKU XJTU

Katılım Temmuz 2022

78 Takip Edilen301 Takipçiler

PeizeSun@PeizeSun·2d

This is V1.0

Arena.ai@arena

Today we’re launching the Video Edit Arena to evaluate the frontier capability of video models! - #1 Grok-Imagine-Video, @xAI - #2 Kling-o3-pro, @Kling_ai - #3 Kling-o1-pro, @Kling_ai - #4 Gen4-aleph, @Runwayml The leaderboard is powered by thousands of real-world community votes. Click the Edit button in Video Arena to edit any video and compare top model outputs. More models coming soon!

English

615

PeizeSun@PeizeSun·5d

Imagine team at @xai is the best media generation team !

Yukun@YknZhu

At @xai we are making media creation more accessible, enjoyable and useful. Lots more creative features coming soon! Thanks to the amazing team that worked around the clock to ship this, and our captains @Guodzh @imhaotian ❤️

English

204

9.6K

PeizeSun@PeizeSun·13 Mar

@imhaotian @elonmusk Our Super Man 🫡

Français

717

Haotian Liu@imhaotian·13 Mar

I left xAI earlier this week. It was a difficult decision. The past two years have been an intense, fun, and deeply rewarding journey, and I accomplished things I could not have imagined two years ago. Thank you @elonmusk for the opportunity and for everything I learned at xAI. Thank you @Guodzh for the trust you placed in me and for all the days and late nights we worked through together. And thank you to the entire Omni / Imagine team: thank you for your trust, and for growing together with me. It has been an honor, and I am incredibly proud of what we achieved together. I feel fortunate to have had the chance to work with all of you. At xAI, everything feels possible. I had the chance to work with and learn from some of the most exceptional people I have ever met. I was able to explore across domains: from pretraining to post-training, from language models to multimodal, from perception to generation. Joining xAI was one of the best decisions I have ever made. @grok imagine is special to me. Building video generation models, where I started with almost zero prior knowledge, from 0 to No.1, as an IC and as a lead, alongside an extraordinary team, and shipping it as a great product used by millions, all within 6 months, at age 28: I feel proud. But now it’s time for me to move on. I’m burnt out, and I know my happiness is no longer maximized in my current state. It is sad to say goodbye, but it is just the right time for a change. Best wishes to the Imagine team, you are absolutely the best, and you deserve the best. I will cherish all our memories for the rest of my life. For now, I’m taking a break and giving myself time to figure out what comes next. Posted from Hawaii.

English

164

1.9K

177K

PeizeSun@PeizeSun·13 Mar

@Guodzh To our captain🫡

English

723

Guodong Zhang@Guodzh·13 Mar

Last day at xAI. Wild journey past three years but excited about next chapter. Thanks all for the love and support yesterday. So many friends made along the way and I will miss you all!

English

236

2.5K

650.6K

PeizeSun@PeizeSun·12 Mar

Try Reference to Video in Grok Imagine now !

Xiang Li@XiangLi5217365

We are supporting Reference to Video in Grok Imagine now! You can create videos with up to 7 ingredients, try it!

English

586

PeizeSun@PeizeSun·2 Mar

Grok Imagine supports video extension now !

Martin Ma@martin_ma_007

Since @Grok Imagine v1.0 release, many of you requested video extension. We hear you! It's available today 🎬 Enjoy, with more to come!!

English

825

PeizeSun@PeizeSun·11 Şub

Future is coming

xAI@xai

Since xAI was formed just 30 months ago, the small and talented team has made remarkable progress. The future has never looked more exciting!

English

697

PeizeSun@PeizeSun·11 Şub

@hangg70 It is a great pleasure to work with you, Hang. Best wishes to your next journey~

English

Hang Gao@hangg70·11 Şub

I left xAI today. It was truly rewarding to contribute to grok imagine video series: 0.9 as our first release, then 1.0 that recently topped across competitive leaderboards and user feedback. I see a mix of humble craftsmanship and ambitious vision throughout the team. They taught me about what I want and how I want to proceed in my career. Thank you to everyone who made this journey unique and memorable.

English

196

109

430K

PeizeSun@PeizeSun·8 Şub

faster, higher, stronger than 🍌🍌🍌

Xuhui Jia@jia_xuhui

Nano Banana has truly redefined what's possible with image generation models, pushing the boundaries of people's imagination when it debuted Today, we're excited to introduce Grok-Imagine-Image: a new model that's both faster and better than Nano Banana. Through this journey, we've built many of the essential building blocks needed to unlock the next generation of models and to keep fueling the growth and prosperity of the visual AI community. Stay tuned... something incredible is coming very soon! But today, hello world, grok-imagine-image!

English

584

PeizeSun@PeizeSun·29 Oca

State-of-the-art video generation across quality, cost, and latency

xAI@xai

Understanding requires imagining. Grok Imagine lets you bring what’s in your brain to life, and now it’s available via the world’s fastest, and most powerful video API: x.ai/news/grok-imag… Try it out and let your Imagination run wild.

English

559

PeizeSun@PeizeSun·13 Kas

15 second is coming !

Xinlei Chen@endernewton

2x and more to come!

English

791

PeizeSun@PeizeSun·9 Kas

Giant Orange🍊 > Nano Banana🍌

Hexiang (Frank) Hu@hexiang

@elonmusk

Indonesia

6.8K

PeizeSun retweetledi

AK@_akhaliq·11 Haz

Autoregressive Model Beats Diffusion Llama for Scalable Image Generation We introduce LlamaGen, a new family of image generation models that apply original ``next-token prediction'' paradigm of large language models to visual generation domain. It is an affirmative

English

542

75.6K

PeizeSun retweetledi

AK@_akhaliq·10 Tem

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest paper page: huggingface.co/papers/2307.03… Instruction tuning large language model (LLM) on image-text pairs has achieved unprecedented vision-language multimodal abilities. However, their vision-language alignments are only built on image-level, the lack of region-level alignment limits their advancements to fine-grained multimodal understanding. In this paper, we propose instruction tuning on region-of-interest. The key design is to reformulate the bounding box as the format of spatial instruction. The interleaved sequences of visual features extracted by the spatial instruction and the language embedding are input to LLM, and trained on the transformed region-text data in instruction tuning format. Our region-level vision-language model, termed as GPT4RoI, brings brand new conversational and interactive experience beyond image-level understanding. (1) Controllability: Users can interact with our model by both language and spatial instructions to flexibly adjust the detail level of the question. (2) Capacities: Our model supports not only single-region spatial instruction but also multi-region. This unlocks more region-level multimodal capacities such as detailed region caption and complex region reasoning. (3) Composition: Any off-the-shelf object detector can be a spatial instruction provider so as to mine informative object attributes from our model, like color, shape, material, action, relation to other objects, etc.

English

271

51K

AK@_akhaliq·11 Tem

Semantic-SAM: Segment and Recognize Anything at Any Granularity paper page: huggingface.co/papers/2307.04… introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Our model offers two key advantages: semantic-awareness and granularity-abundance. To achieve semantic-awareness, we consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts. This allows our model to capture rich semantic information. For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels that correspond to multiple ground-truth masks. Notably, this work represents the first attempt to jointly train a model on SA-1B, generic, and part segmentation datasets. Experimental results and visualizations demonstrate that our model successfully achieves semantic-awareness and granularity-abundance. Furthermore, combining SA-1B training with other segmentation tasks, such as panoptic and part segmentation, leads to performance improvements. We will provide code and a demo for further exploration and evaluation.

English

375

125K

PeizeSun@PeizeSun·11 Tem

@altryne @_akhaliq 😁😁😁Our demo is available at github.com/UX-Decoder/Sem…

English

Alex Volkov@altryne·11 Tem

@_akhaliq wasn't semantic sam out for a minute? 🤔

English

1.5K

PeizeSun@PeizeSun·11 Tem

@altryne @_akhaliq 4. We will properly discuss the related work in our next arXiv version.

English

PeizeSun@PeizeSun·11 Tem

@altryne @_akhaliq 3. The semantics of SSA is mostly object-level, as the pre-trained BLIP has no part-level training data and can not generate part-level semantics. Instead, we train on part-level segmentation data to be aware of both object and part-level semantics.

English

PeizeSun@PeizeSun·11 Tem

@altryne @_akhaliq 2. SSA is a combination of two pre-pretrained models. Based on a pre-trained SAM, it uses other pre-trained vision-language models (BLIP) to generate semantics. In contrast, we train an end-to-end model to directly generate semantics from user interactions.

English

PeizeSun@PeizeSun·11 Tem

@altryne @_akhaliq 1. Semantic-SAM is a new model for segmenting images with rich semantics beyond SAM, while SSA is an augmented annotation set with semantics beyond SA. Apart from semantics, our model can also produce multi-granularity predictions. We can generate up to six levels of granularity.

English

Keşfet

@xai @imhaotian @elonmusk @Guodzh @grok @hangg70 @altryne @_akhaliq