This project explores the demographic disparities that exist within computer vision models for Instagram and TikTok. We found that they DO show demographic disparities. The full paper is found here arxiv.org/abs/2403.19717.
Recently, I was accepted to present a project I have been working on at IEEE Symposium on Security and Privacy. The paper is titled, "A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok"
@anyware thanks. as always, super cool blog post with many insights. I never thought any industry would dare to go back to layered encoding.. I was super excited to see Apple came along. Even if it's not for regular videos...
@Mallesh__ I don't think so. Almost everything I've produced (and seen from others) across many content types, lens distances, etc. shows nearly independent streams that are almost exactly the same size. As if they were both targeting the same average bitrate.
@anyware I see.. but even in case of immersive TV videos, in your figure, the size seems at least 50% of the original.. do you think that's because the view overlap maybe larger than the Iphone stereo?
@Mallesh__ They should be smaller, but they aren't in the current AVFoundation MV-HEVC encoder. But Apple's Immersive streams in Apple TV are. Hopefully the version they're using for their streams eventually makes its way into AVFoundation so all apps can benefit.
@anyware the inter-layer references (across frames temporally) are definitely computationally a nightmare.. but even without that, just referencing the views, mv-hevc should have reduced the data for layer1 quite a bit.
@anyware it's strange that spatial videos are double the size of regular videos. I don't see the point of using mv-hevc then? we could just compress the two streams independently right?
@Mallesh__ Yes, it does. Apple is streaming their Apple TV Immersive content using HLS, and I've also (just yesterday) setup a test stream just to walk through the steps. I've had many questions, and it feels like a blog post might be in order.
The QuickTime Player in Sonoma 14.4 has added support for spatial video. You can now see "Spatial" in the Video Format of the Inspector, and there's a new "View Spatial" menu that lets you choose which eye (or eyes) to view. Thank you, Apple!
I am recruiting several research assistants, starting Jan 2024: PhD, MS & UG students!
If you are interested in working at the intersection of computer networks, vision and graphics with a key focus on building XR systems, reach out.
More details at: mallesham.com