Cheng-Yu Hsieh

54 posts

Cheng-Yu Hsieh

Cheng-Yu Hsieh

@cydhsieh

PhD student @UWcse

Seattle, USA Katılım Eylül 2022
482 Takip Edilen574 Takipçiler
Sabitlenmiş Tweet
Cheng-Yu Hsieh
Cheng-Yu Hsieh@cydhsieh·
Excited to introduce FocalLens: an instruction tuning framework that turns existing VLMs/MLLMs into text-conditioned vision encoders that produce visual embeddings focusing on relevant visual information given natural language instructions! 📢: @HPouransari will be presenting the work @FM_in_Wild workshop #ICLR tomorrow (4/27) 12:30-1:30pm. Come say hi! 📜: arxiv.org/abs/2504.08368 More in 🧵!
Cheng-Yu Hsieh tweet media
English
1
7
30
2.9K
Cheng-Yu Hsieh retweetledi
Jae Sung Park
Jae Sung Park@jjaesungpark·
🔥We are excited to present our work Synthetic Visual Genome (SVG) at #CVPR25 tomorrow! 🕸️ Dense scene graph with diverse relationship types. 🎯 Generate scene graphs with SAM segmentation masks! 🔗Project link: bit.ly/4e1uMDm 📍 Poster: #32689, Fri 2-4 PM 👇🧵
GIF
English
2
6
20
5.6K
Cheng-Yu Hsieh retweetledi
Alex Ratner
Alex Ratner@ajratner·
Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–@SnorkelAI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! --- Snorkel Evaluate is our new data-centric agentic AI evaluation platform for specialized, mission-critical enterprise settings where vibe checks and out-of-the-box metrics driven by simple LLM prompts are not enough. Snorkel Expert Data-as-a-Service is our white glove service for expert-level AI datasets, powering frontier LLM developers in areas like expert knowledge, reasoning, agentic action and tool use, and more! Both built on top of @SnorkelAI’s Data Development Platform, using our programmatic technology to drive higher-quality expert data, faster– for getting specialized AI to real production value. If you’re building enterprise AI and want to partner around the key ingredient in AI today–the data–book a demo and let's talk! snorkel.ai/demo/ Finally, see thread for details on 🧵👇 - 📽️ A walkthrough of Snorkel Evaluate and Expert Data-as-a-Service on an agentic AI enterprise task - 📅 An upcoming event on Enterprise Agentic AI with innovators from @Accenture @BNY @Comcast @Stanford @QBE & others - 📊 An upcoming series of benchmark datasets and model artifact releases 👀 Want early access to the full agentic AI dataset? Retweet this post and we'll send you the link!
English
15
76
274
49.6K
Cheng-Yu Hsieh retweetledi
Peter
Peter@PeterSushko·
1/8🧵 Thrilled to announce RealEdit (to appear in CVPR 2025)! We introduce a real-world image-editing dataset sourced from Reddit. Along with the training and evaluation datasets, we release our model that achieves SOTA performances on a variety of real-world editing tasks.
GIF
English
3
8
55
10.9K
Cheng-Yu Hsieh retweetledi
Jason Ramapuram
Jason Ramapuram@jramapuram·
Stop by poster #596 at 10A-1230P tomorrow (Fri 25 April) at #ICLR2025 to hear more about Sigmoid Attention! We just pushed 8 trajectory checkpoints each for two 7B LLMs for Sigmoid Attention and a 1:1 Softmax Attention (trained with a deterministic dataloader for 1T tokens): - Sigmoid: gs://axlearn-public/experiments/gala-7B-sigmoid-hybridnorm-alibi-sprp-2024-12-03-1002/checkpoints/ - Softmax: gs://axlearn-public/experiments/gala-7B-hybridnorm-alibi-sprp-2024-12-02-1445/checkpoints/ Inference code at github.com/apple/ml-sigmo…
Jason Ramapuram tweet media
Jason Ramapuram@jramapuram

Small update on SigmoidAttn (arXiV incoming). - 1B and 7B LLM results added and stabilized. - Hybrid Norm [on embed dim, not seq dim], `x + norm(sigmoid(QK^T / sqrt(d_{qk}))V)`, stablizes longer sequence (n=4096) and larger models (7B). H-norm used with Grok-1 for example.

English
1
14
45
9.6K
Cheng-Yu Hsieh
Cheng-Yu Hsieh@cydhsieh·
🚀Being able to better focus on the relevant visual information, FocalLens shows improvements over standard CLIP models on a variety of downstream tasks, including image-image retrieval, image-text retrieval, and classification tasks!
Cheng-Yu Hsieh tweet media
English
1
0
1
135
Cheng-Yu Hsieh
Cheng-Yu Hsieh@cydhsieh·
Excited to introduce FocalLens: an instruction tuning framework that turns existing VLMs/MLLMs into text-conditioned vision encoders that produce visual embeddings focusing on relevant visual information given natural language instructions! 📢: @HPouransari will be presenting the work @FM_in_Wild workshop #ICLR tomorrow (4/27) 12:30-1:30pm. Come say hi! 📜: arxiv.org/abs/2504.08368 More in 🧵!
Cheng-Yu Hsieh tweet media
English
1
7
30
2.9K
Cheng-Yu Hsieh retweetledi
Jieyu Zhang
Jieyu Zhang@JieyuZhang20·
The 2nd Synthetic Data for Computer Vision workshop at @CVPR! We had a wonderful time last year, and we want to build on that success by fostering fresh insights into synthetic data for CV. Join us! We welcome submissions! Please consider submitting your work! (deadline: March 31) Website: syndata4cv.github.io #CVPR2025
English
3
9
25
9.8K
Cheng-Yu Hsieh retweetledi
Mahtab Bigverdi
Mahtab Bigverdi@MahtabBg·
I'm exited to announce that our work (AURORA) got accepted into #CVPR2025🎉! Special thanks to my coauthors: @ch1m1m0ry0, @cydhsieh, @ethnlshn, @Dongping0612, Linda Shapiro and @RanjayKrishna, This work wouldn’t have been possible without them! See you all in Nashville 🎸!
Mahtab Bigverdi@MahtabBg

Introducing AURORA 🌟: Our new training framework to enhance multimodal language models with Perception Tokens; a game-changer for tasks requiring deep visual reasoning like relative depth estimation and object counting. Let’s take a closer look at how it works.🧵[1/8]

English
4
4
39
4.9K
Cheng-Yu Hsieh retweetledi
Yung-Sung Chuang
Yung-Sung Chuang@YungSungChuang·
(1/5)🚨LLMs can now self-improve to generate better citations✅ 📝We design automatic rewards to assess citation quality 🤖Enable BoN/SimPO w/o external supervision 📈Perform close to “Claude Citations” API w/ only 8B model 📄arxiv.org/abs/2502.09604 🧑‍💻github.com/voidism/SelfCi…
Yung-Sung Chuang tweet mediaYung-Sung Chuang tweet media
English
12
75
313
39.5K
Cheng-Yu Hsieh retweetledi
Mahtab Bigverdi
Mahtab Bigverdi@MahtabBg·
Introducing AURORA 🌟: Our new training framework to enhance multimodal language models with Perception Tokens; a game-changer for tasks requiring deep visual reasoning like relative depth estimation and object counting. Let’s take a closer look at how it works.🧵[1/8]
GIF
English
1
9
33
8.6K
Cheng-Yu Hsieh retweetledi
Cheng-Yu Hsieh
Cheng-Yu Hsieh@cydhsieh·
🤔 In training vision models, what value do AI-generated synthetic images provide compared to the upstream (real) data used in training the generative models in the first place? 💡 We find using "relevant" upstream real data still leads to much stronger results compared to using synthetic data, calling for attention to this strong baseline in developing new synthetic data methods Check out more in our work led by @scottgeng00!! 👇 📜: t.co/K8PCArnFLD
Scott Geng@scottgeng00

Will training on AI-generated synthetic data lead to the next frontier of vision models?🤔 Our new paper suggests NO—for now. Synthetic data doesn't magically enable generalization beyond the generator's original training set. 📜: arxiv.org/abs/2406.05184 Details below🧵(1/n)

English
0
3
10
1.1K
Cheng-Yu Hsieh
Cheng-Yu Hsieh@cydhsieh·
🧵(5/n) 3⃣Finally, we show our method is complementary to existing re-ordering based methods that place relevant documents at the beginning/end of the input prompt, offering a new layer to improve current RAG pipelines.
Cheng-Yu Hsieh tweet media
English
1
0
6
403
Cheng-Yu Hsieh
Cheng-Yu Hsieh@cydhsieh·
🧵(4/n) We show that: 1⃣Models' calibrated attention reflects well the relevance of a document to a user query, outperforming existing re-ranking metrics. 2⃣Calibrated attention further improves models' RAG performances (over 10 pp) against the standard baseline.
Cheng-Yu Hsieh tweet mediaCheng-Yu Hsieh tweet media
English
1
0
4
312
Cheng-Yu Hsieh
Cheng-Yu Hsieh@cydhsieh·
🧵(3/n) We mitigate this by proposing to calibrate model attention: Removing positional bias from model attention so that models can more faithfully attend to relevant contexts, regardless of their position in the input prompt.
Cheng-Yu Hsieh tweet media
English
1
0
4
339