PeizeSun

22 posts

PeizeSun

PeizeSun

@PeizeSun

@xAI AGI | @grok Multimodal, Imagine | prev Meta HKU XJTU

Katılım Temmuz 2022
78 Takip Edilen301 Takipçiler
PeizeSun
PeizeSun@PeizeSun·
Imagine team at @xai is the best media generation team !
Yukun@YknZhu

At @xai we are making media creation more accessible, enjoyable and useful. Lots more creative features coming soon! Thanks to the amazing team that worked around the clock to ship this, and our captains @Guodzh @imhaotian ❤️

English
3
6
204
9.6K
Haotian Liu
Haotian Liu@imhaotian·
I left xAI earlier this week. It was a difficult decision. The past two years have been an intense, fun, and deeply rewarding journey, and I accomplished things I could not have imagined two years ago. Thank you @elonmusk for the opportunity and for everything I learned at xAI. Thank you @Guodzh for the trust you placed in me and for all the days and late nights we worked through together. And thank you to the entire Omni / Imagine team: thank you for your trust, and for growing together with me. It has been an honor, and I am incredibly proud of what we achieved together. I feel fortunate to have had the chance to work with all of you. At xAI, everything feels possible. I had the chance to work with and learn from some of the most exceptional people I have ever met. I was able to explore across domains: from pretraining to post-training, from language models to multimodal, from perception to generation. Joining xAI was one of the best decisions I have ever made. @grok imagine is special to me. Building video generation models, where I started with almost zero prior knowledge, from 0 to No.1, as an IC and as a lead, alongside an extraordinary team, and shipping it as a great product used by millions, all within 6 months, at age 28: I feel proud. But now it’s time for me to move on. I’m burnt out, and I know my happiness is no longer maximized in my current state. It is sad to say goodbye, but it is just the right time for a change. Best wishes to the Imagine team, you are absolutely the best, and you deserve the best. I will cherish all our memories for the rest of my life. For now, I’m taking a break and giving myself time to figure out what comes next. Posted from Hawaii.
English
164
59
1.9K
177K
Guodong Zhang
Guodong Zhang@Guodzh·
Last day at xAI. Wild journey past three years but excited about next chapter. Thanks all for the love and support yesterday. So many friends made along the way and I will miss you all!
English
236
62
2.5K
650.6K
PeizeSun
PeizeSun@PeizeSun·
Grok Imagine supports video extension now !
Martin Ma@martin_ma_007

Since @Grok Imagine v1.0 release, many of you requested video extension. We hear you! It's available today 🎬 Enjoy, with more to come!!

English
3
1
5
825
PeizeSun
PeizeSun@PeizeSun·
@hangg70 It is a great pleasure to work with you, Hang. Best wishes to your next journey~
English
1
0
4
2K
Hang Gao
Hang Gao@hangg70·
I left xAI today. It was truly rewarding to contribute to grok imagine video series: 0.9 as our first release, then 1.0 that recently topped across competitive leaderboards and user feedback. I see a mix of humble craftsmanship and ambitious vision throughout the team. They taught me about what I want and how I want to proceed in my career. Thank you to everyone who made this journey unique and memorable.
English
196
109
2K
430K
PeizeSun retweetledi
AK
AK@_akhaliq·
Autoregressive Model Beats Diffusion Llama for Scalable Image Generation We introduce LlamaGen, a new family of image generation models that apply original ``next-token prediction'' paradigm of large language models to visual generation domain. It is an affirmative
AK tweet media
English
13
98
542
75.6K
PeizeSun retweetledi
AK
AK@_akhaliq·
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest paper page: huggingface.co/papers/2307.03… Instruction tuning large language model (LLM) on image-text pairs has achieved unprecedented vision-language multimodal abilities. However, their vision-language alignments are only built on image-level, the lack of region-level alignment limits their advancements to fine-grained multimodal understanding. In this paper, we propose instruction tuning on region-of-interest. The key design is to reformulate the bounding box as the format of spatial instruction. The interleaved sequences of visual features extracted by the spatial instruction and the language embedding are input to LLM, and trained on the transformed region-text data in instruction tuning format. Our region-level vision-language model, termed as GPT4RoI, brings brand new conversational and interactive experience beyond image-level understanding. (1) Controllability: Users can interact with our model by both language and spatial instructions to flexibly adjust the detail level of the question. (2) Capacities: Our model supports not only single-region spatial instruction but also multi-region. This unlocks more region-level multimodal capacities such as detailed region caption and complex region reasoning. (3) Composition: Any off-the-shelf object detector can be a spatial instruction provider so as to mine informative object attributes from our model, like color, shape, material, action, relation to other objects, etc.
AK tweet media
English
3
64
271
51K
AK
AK@_akhaliq·
Semantic-SAM: Segment and Recognize Anything at Any Granularity paper page: huggingface.co/papers/2307.04… introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Our model offers two key advantages: semantic-awareness and granularity-abundance. To achieve semantic-awareness, we consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts. This allows our model to capture rich semantic information. For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels that correspond to multiple ground-truth masks. Notably, this work represents the first attempt to jointly train a model on SA-1B, generic, and part segmentation datasets. Experimental results and visualizations demonstrate that our model successfully achieves semantic-awareness and granularity-abundance. Furthermore, combining SA-1B training with other segmentation tasks, such as panoptic and part segmentation, leads to performance improvements. We will provide code and a demo for further exploration and evaluation.
AK tweet media
English
4
74
375
125K
PeizeSun
PeizeSun@PeizeSun·
@altryne @_akhaliq 4. We will properly discuss the related work in our next arXiv version.
English
0
0
0
47
PeizeSun
PeizeSun@PeizeSun·
@altryne @_akhaliq 3. The semantics of SSA is mostly object-level, as the pre-trained BLIP has no part-level training data and can not generate part-level semantics. Instead, we train on part-level segmentation data to be aware of both object and part-level semantics.
English
0
0
0
40
PeizeSun
PeizeSun@PeizeSun·
@altryne @_akhaliq 2. SSA is a combination of two pre-pretrained models. Based on a pre-trained SAM, it uses other pre-trained vision-language models (BLIP) to generate semantics. In contrast, we train an end-to-end model to directly generate semantics from user interactions.
English
0
0
0
37
PeizeSun
PeizeSun@PeizeSun·
@altryne @_akhaliq 1. Semantic-SAM is a new model for segmenting images with rich semantics beyond SAM, while SSA is an augmented annotation set with semantics beyond SA. Apart from semantics, our model can also produce multi-granularity predictions. We can generate up to six levels of granularity.
English
0
0
0
49