Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos
Contributions:
1) A framework for obtaining real-world, dynamic, and pseudo-metric 4D reconstructions and camera poses at scale from existing online video.
2) DynaDUSt3R, a method that takes a pair of frames from any real-world video and predicts a pair of 3D point clouds along with the corresponding 3D motion trajectories that connect them in time.
Calling all Apple Vision Pro goggles owners: have you ever thought you could do a better job than the chancellor of the exchequer or the Fed chair?
A new app built by the FT and Infosys offers you the chance to play master of the economy from your home: on.ft.com/4gqAeAy
Super excited about our new research direction for aligning smarter-than-human AI:
We finetune large models to generalize from weak supervision—using small models instead of humans as weak supervisors.
Check out our new paper:
openai.com/research/weak-…
# On the "hallucination problem"
I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.
We direct their dreams with prompts. The prompts start the dream, and based on the LLM's hazy recollection of its training documents, most of the time the result goes someplace useful.
It's only when the dreams go into deemed factually incorrect territory that we label it a "hallucination". It looks like a bug, but it's just the LLM doing what it always does.
At the other end of the extreme consider a search engine. It takes the prompt and just returns one of the most similar "training documents" it has in its database, verbatim. You could say that this search engine has a "creativity problem" - it will never respond with something new. An LLM is 100% dreaming and has the hallucination problem. A search engine is 0% dreaming and has the creativity problem.
All that said, I realize that what people *actually* mean is they don't want an LLM Assistant (a product like ChatGPT etc.) to hallucinate. An LLM Assistant is a lot more complex system than just the LLM itself, even if one is at the heart of it. There are many ways to mitigate hallcuinations in these systems - using Retrieval Augmented Generation (RAG) to more strongly anchor the dreams in real data through in-context learning is maybe the most common one. Disagreements between multiple samples, reflection, verification chains. Decoding uncertainty from activations. Tool use. All an active and very interesting areas of research.
TLDR I know I'm being super pedantic but the LLM has no "hallucination problem". Hallucination is not a bug, it is LLM's greatest feature. The LLM Assistant has a hallucination problem, and we should fix it.
Okay I feel much better now :)
In Creative Software 2.0, machines push the pixels. Machines draw. We direct. We create with machines that can create anything. Constraints come from a lack of imagination, not from a lack of specialized knowledge. The most successful creators will be the most imaginative.
"R Plus Seven" by @0PN came out 10 years ago this week. By far one of the strangest album covers I've made and one that's taken on a weird life of it's own. Here's a thread with some backstory to how it came to be.
Cosine Similarity is now playing DJ for my tunes.
It's astounding how well it recommends what to play next based on the vibes of the current track. Additionally, I've implemented support to ProDJLink to push / pull tracks directly from my CDJs.
All built in TouchDesigner!
As a Creative Technologist, I can't stress enough the importance of iteration in our work.
Solutions are initially only hazy ideas.
Use iteration as a tool to gain clarity. Refocus and refine after each attempt, inching closer to the elegant solution you are aspiring for.
As a Creative Technologist I like to KISS - "keep it simple, stupid"
Strive for easy, solid, and clear technical solutions.
Stay away from unnecessary complexities.
Why has electronic music lost its soul? Is it because our short attention span isn’t able to appreciate more soulful and jazz-driven beats compared to bleeping sounds and hard kicks?
@_vade I sometimes forget that our job is all about simplifying, that's it.
Everything else is only a tech-bro culture that rewards those who can use more acronyms to describe their over-engineered software architecture to their manager.
@MarcoMartignone 💯 That is the best compliment you can be given. Assuming your code works any code that is well written, succinct and legible that also solves a problem is close to fucking perfect. Don’t change that shit at all!
I'm often disappointed when people understand my code at a glance, it feels like I haven't abstracted enough.
I need to remind myself that the power of software is to translate complex problems into readable instructions.
Complexity does not solve problems, it creates them.