Yuxiang Nie

75 posts

Yuxiang Nie

Yuxiang Nie

@npyuxl

Ph.D. student in HKUST. Research interests: multi-modal language models and medical AI.

Hong Kong Katılım Mart 2020
975 Takip Edilen528 Takipçiler
Yuxiang Nie retweetledi
HKUST Smart Lab
HKUST Smart Lab@SMARTLab_HKUST·
🚀 Join us at the International Workshop on Large AI Models for Biomedicine (LAI4BM) at HKUST on July 12! Explore cutting-edge AI in healthcare with global experts. Don’t miss talks on AI-driven diagnostics, imaging, and more! #AI #Biomedicine #HKUST
HKUST Smart Lab tweet media
English
0
2
4
668
Yuxiang Nie retweetledi
OpenAI
OpenAI@OpenAI·
Evaluations are essential to understanding how models perform in health settings. HealthBench is a new evaluation benchmark, developed with input from 250+ physicians from around the world, now available in our GitHub repository. openai.com/index/healthbe…
English
176
464
3.7K
2.1M
Yuxiang Nie retweetledi
Qwen
Qwen@Alibaba_Qwen·
Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.5-72B-Instruct. For more information, feel free to try them out in Qwen Chat Web (chat.qwen.ai) and APP and visit our GitHub, HF, ModelScope, etc. Blog: qwenlm.github.io/blog/qwen3/ GitHub: github.com/QwenLM/Qwen3 Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… The post-trained models, such as Qwen3-30B-A3B, along with their pre-trained counterparts (e.g., Qwen3-30B-A3B-Base), are now available on platforms like Hugging Face, ModelScope, and Kaggle. For deployment, we recommend using frameworks like SGLang and vLLM. For local usage, tools such as Ollama, LMStudio, MLX, llama.cpp, and KTransformers are highly recommended. These options ensure that users can easily integrate Qwen3 into their workflows, whether in research, development, or production environments. Hope you enjoy our new models!
Qwen tweet mediaQwen tweet media
English
346
1.6K
8.1K
2.2M
Yuxiang Nie retweetledi
JUNDE WU
JUNDE WU@JundeMorsenWu·
deepseek在英文圈已经被吹上天了,发现中文圈还有很多非AI业内人士,对deepseek的能力没有一个清晰的认识,所以用中文发一条,先说结论,我认为行业贡献而言:GPT>deepseek>gemini>llama及其他 很多人的着眼点在于他用很少的卡也能训练出效果差不多的模型,但这是结果,更重要的他能做到这一点的技术: deepseek这次最亮眼的是证明了纯粹的outcome reward RL能够直接把模型提到o1水平,在他出来之前,业内所有人(包括deepmind)都认为需要prm (process reward model)才能做到这点,这就已经是颠覆行业的发现了,现在所有除gpt外的llm大组,都在推倒重来,copying他们的训练方法 另外非常重要的是deepseek还发现了这种训练方式甚至能够让模型自己学会longer-chain reasoning以及reflection,他们所谓“aha moment”。相当于只训练llm得到更准确的结果,llm就能自己学会反思,思考到一半知道自己这样做下去会错,然后尝试自己纠错,这种模型“自我进化”的特性是业内仅次于GPT intelligence emergence的重大发现 就结果而言,“用更少的卡训练出效果差不多的模型”可能不仅仅是节约成本这么简单,更是一种improvement of scaling law,意味着这种方法往上堆更多的卡有可能把模型能力再往上提升一个数量级,甚至直接达到AGI/ASI 这就是为什么这次业内这么hyper,deepseek开源的价值远大于llama,llama基本是大家已知的方法堆卡训练,deepseek带来太多的惊喜
中文
214
703
3.8K
1.4M
Yuxiang Nie retweetledi
AK
AK@_akhaliq·
Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.
English
98
563
2.9K
366.8K
Yuxiang Nie retweetledi
Hao CHEN
Hao CHEN@HaoChen_HKUST·
All in foundation models! Excited to share our recent survey on Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions. 400+ papers are included and analyzed. Challenges and future directions are discussed. arxiv.org/abs/2404.03264
Hao CHEN tweet media
English
1
8
26
1.9K
Yuxiang Nie retweetledi
DAIR.AI
DAIR.AI@dair_ai·
The Top ML Papers of the Week (Jan 29 - Feb 4): - OLMo - SliceGPT - MoE-LLaVA - Corrective RAG - Advances in Multimodal LLMs - LLMs for Mathematical Reasoning ...
English
3
90
525
62.1K
Yuxiang Nie retweetledi
Rachit Bansal
Rachit Bansal@rach_it_·
Extending an LLM for new knowledge sources is tedious—fine-tuning is expensive/causes forgetting, LoRA is restrictive. Excited to share our work where we show that an LLM can be efficiently *composed* with specialized (L)LMs to enable new tasks! arxiv.org/abs/2401.02412 🧵(1/8)
Rachit Bansal tweet media
English
22
136
654
125.2K
Yuxiang Nie retweetledi
Shizhe Diao
Shizhe Diao@shizhediao·
Can we align LLMs to honesty via instruction finetuning? Can we instruct LLMs to say I Don't Know? Can uncertainty learning improve prediction ability? Excited to share R-Tuning, Refusal-Aware Instruction Tuning to tackle hallucination in LLMs. Paper: arxiv.org/abs/2311.09677
Shizhe Diao tweet media
English
12
98
391
44.1K
Yuxiang Nie retweetledi
Zain
Zain@ZainHasan6·
❓When using LLMs is unsupervised fine-tuning better than RAG for knowledge-intensive tasks? Should you do both? If you want to augment an LLM with knowledge of your enterprise data you can do so by augmenting the parametric (finetune) or non-parametric(w/ a vector db like @weaviate_io) memory. 📜Researchers from Microsoft(arxiv.org/abs/2312.05934) asked if unsupervised next token prediction finetuning is better than RAG to improve LLM perf. on both seen and unseen QnA tasks? ⏩In Short: RAG is a better way to inject knowledge into LLMs than unsupervised fine-tuning(USFT) and more surprisingly they found that RAG alone is even better than RAG + finetuning. Probably because USFT is not efficiently persisting new knowledge into params. Would be cool to see a study comparing RAG vs. SFT/Instruction tuning or RLHF. This improvement in QnA tasks with RAG occurred for both questions in the MMLU dataset as well as on a new dataset of "current events" that the model was not trained on. 📑The details: 1. Used Mistral, Llama2, Orca2 7B for all assessments. 2. Only unsupervised finetuning was done - a direct continuation of the pre-training phase - by predicting the next token on the dataset 3. Used bge-large-en as the embedding model for the RAG component 4. Finetuning with multiple paraphrases of the same fact provides a significant improvement over the baseline. - To teach pre-trained LLMs new knowledge, the knowledge must be repeated in numerous ways ❌ Limitations/Short-comings: 1. Only a continuation of the pre-training was assessed - no instruction tuning or RLHF - SFT and RLHF will boost performance further. 2. Accuracy performance variance is quite high across the experiments - so it's quite hard to determine the statistical significance of results. 3. Why is the performance of baseline models on future data not 25% for MCQs with 4 choices? - Not truly "unseen" knowledge. 4. Only straightforward knowledge/fact tasks were assessed - reasoning capabilities were not assessed..
Zain tweet media
English
5
51
261
82.3K
Yuxiang Nie retweetledi
Lei Li
Lei Li@_TobiasLee·
Q: Will DPO also improve large vision language models? A: Yes! We release VLFeedback, a large-scale preference dataset consisting of 80k instructions, with samples decoded by 12 advanced models, and annotated by GPT-4V! Project Page: vlf-silkie.github.io
Lei Li tweet media
English
6
41
209
35.6K
Yuxiang Nie retweetledi
Qiao Jin, MD
Qiao Jin, MD@DrQiaoJin·
PMC-Patients, an open dataset of 167k patient summaries and their relations, finally published at nature.com/articles/s4159…. Many exciting things to try!
Qiao Jin, MD@DrQiaoJin

🏥 You can directly download 167k patient summaries (PMC-Patients) from @huggingface now! - 📚 Extracted from case reports - 🗒️ Summary = admission + labs + diagnosis + treatment + discharge + follow-up notes - 👥 Annotated with 293k similar patients and 3.1M relevant articles

English
2
28
119
17.8K
Yuxiang Nie retweetledi
Minji Yoon
Minji Yoon@MinjiYoon90·
🧐Consider this: In the real world, do multimodal data always exhibit straightforward one-to-one relationships between modalities? Join me for a discussion on how LLMs manage multimodal data with intricate intermodal connections at Hall C2! 🔥
Minji Yoon tweet media
English
15
59
929
124.4K
Yuxiang Nie retweetledi
Yutong Bai
Yutong Bai@YutongBAI1002·
How far can we go with vision alone? Excited to reveal our Large Vision Model! Trained with 420B tokens, effective scalability, and enabling new avenues in vision tasks! (1/N) Kudos to @younggeng @Karttikeya_m @_amirbar, @YuilleAlan Trevor Darrell @JitendraMalikCV Alyosha Efros!
English
17
158
1.1K
304.8K
Yuxiang Nie retweetledi
AK
AK@_akhaliq·
Kosmos-2.5: A Multimodal Literate Model paper page: huggingface.co/papers/2309.11… present Kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. This unified multimodal literate capability is achieved through a shared Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. This work also paves the way for the future scaling of multimodal large language models.
AK tweet media
English
4
26
151
35.9K
Yuxiang Nie retweetledi
Jim Fan
Jim Fan@DrJimFan·
Autonomous driving with Chain of Thought - autopilot thinking out loud in text! LINGO-1 is the most interesting work I've read in autodriving for a while. Before: perception -> driving action After: perception -> textual reasoning -> action LINGO-1 trains a video-language model that comments on the ongoing scene. You can ask it to explain its decisions ("why are you stopped?") and planning ("what are you gonna do next?"). The explicit reasoning step comes with key benefits: - Explainability: driving models are no longer a mysterious blackbox that you pray for safety. - Counterfactuals: it's able to imagine scenarios that are not in the training data, and reason through how to handle them correctly. - Long-tail programming: there are soooo many edge cases in driving. It's impossible to have good data coverage on everything. Instead of collecting 1000s of examples to "neural program" a case, you can now have a human teacher write prompts to explain a handful of examples. LINGO-1 is closely related to a few works in game AI: - MineDojo (my team's work at NVIDIA, minedojo.org): learns a reward model that aligns Minecraft gameplay videos with their transcripts. The model, called "MineCLIP", is able to ground commentary text in the video pixels. - Thought Cloning (@jeffclune): pixel -> language -> action loop in gridworlds.
English
63
450
2.2K
552.7K
Yuxiang Nie
Yuxiang Nie@npyuxl·
@mbodhisattwa I have research experience in the dialogue system and I could serve as an emergency reviewer in the Dialogue and Interactive System track if needed.
English
0
0
1
212
Bodhisattwa Majumder
Bodhisattwa Majumder@mbodhisattwa·
Hello #NLProc folks! We need some emergency reviewers for the #EMNLP2023 Dialogue and Interactive Systems track. Reach out if you have some bandwidth to be an emergency reviewer for papers on the track. TYVM!
English
3
3
13
8.3K