Peter Kocsis

272 posts

Peter Kocsis

Peter Kocsis

@Peter4AI

PhD student at TUM, Visual Computing & Artificial Intelligence Group

Munich Katılım Kasım 2021
137 Takip Edilen714 Takipçiler
Peter Kocsis retweetledi
Matthias Niessner
Matthias Niessner@MattNiessner·
📢 3D world models from video diffusion suffer from inconsistent frames -> blurry output. Our fix: instead of naïve 3D reconstruction, we non-rigidly align each frame into a globally-consistent 3DGS representation. ->sharp visuals on top of any VDM! lukashoel.github.io/video_to_world
English
4
77
497
39.4K
Peter Kocsis retweetledi
Matthias Niessner
Matthias Niessner@MattNiessner·
📢📢📢Data release: high-res, multi-view, OLAT face recordings 📢📢📢 We captured individuals in our custom light stage with 16 high-end, global shutter cameras (72 fps) and 40 LED modules, totaling 2.8M precisely calibrated frames. We us the data for BecomingLit (#NeurIPS2025): intrinsically decomposed Gaussian avatars, enabling photorealistic and real-time relighting via hybrid neural shading. Code & Data: jonathsch.github.io/becominglit/ Great work by @jnthnschmdt, @SGiebenhain
English
5
39
237
19.2K
Peter Kocsis retweetledi
Matthias Niessner
Matthias Niessner@MattNiessner·
📢Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image📢 We directly regress neural parametric head models (NPHMs) from a single image — fast, stable, and significantly more expressive than classical 3DMMs such as FLAME. Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control. Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity. Key to successful and generalized training of our ViT-based network are: (1) large-scale registration of existing 3D head datasets, and (2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals. Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization. 🌍simongiebenhain.github.io/Pix2NPHM 🎥youtu.be/MgpEJC5p1Ts Great work by @SGiebenhain, @TobiasKirschst1, @liamschoneveld, Davide Davoli, Zhe Chen
YouTube video
YouTube
English
15
80
542
37.6K
Peter Kocsis retweetledi
Matthias Niessner
Matthias Niessner@MattNiessner·
📢📢📢𝐌𝐞𝐬𝐡𝐑𝐢𝐩𝐩𝐥𝐞: Structured Autoregressive Generation of Artist-Meshes High-fidelity, topologically complete 3D assets that expand naturally like a ripple on a surface! 🌊 Existing AR models often rely on sliding-window inference over truncated segments. However, this limitation breaks long-range geometric dependencies, causing holes and fragmentation. Instead, MeshRipple uses frontier-aware BFS and sparse-attention global memory to ensure coherent growth with an unbounded receptive field. -> Highly detailed-mesh generations -> Artist-like meshing quality -> Works on room-scale environments 🌍maymhappy.github.io/MeshRipple 🎥youtu.be/AHvaLslzXQU Great work by Junkai Lin, Hang Long, Huipeng Guo, Jielei Zhang, Jiayi Yang, Tianle Guo, Yang Yang, Jianwen Li, Wenxiao Zhang, Wei Yang
YouTube video
YouTube
English
5
101
657
33.9K
Peter Kocsis
Peter Kocsis@Peter4AI·
@karanjagtiani04 @MattNiessner For the occluded regions we don't have any predictions, so now they don't receive gradients, but one option would be to backprop through multiple bounces.
English
0
0
0
46
Karan Jagtiani
Karan Jagtiani@karanjagtiani04·
@MattNiessner Interesting approach to texture consistency. How do you handle occlusions in the monocular predictions?
English
1
0
0
229
Matthias Niessner
Matthias Niessner@MattNiessner·
📢 Intrinsic Image Fusion for Multi-View 3D Material Reconstruction 📢 We combine generative material priors with inverse path tracing: 1) define a parametric texture space 2) fuse monocular predictions across views into consistent textures 3) optimize low-dimensional parameters for physically-grounded reconstructions. The results are relightable PBR textures for 3D scenes: check out the result on a real-world 3D scan from the ScanNet++ dataset! 🌍peter-kocsis.github.io/IntrinsicImage… 🎥youtu.be/-Vs3tR1Xl7k Great work by @Peter4AI @LukasHollein!
YouTube video
YouTube
English
4
45
291
15.6K
Peter Kocsis retweetledi
Matthias Niessner
Matthias Niessner@MattNiessner·
Releasing Echo today is incredibly exciting for me — because it is a critical step for generative AI, enabling the creation of virtual worlds. Echo is our first world model at SpAItial AI. It turns text or images into explorable 3D environments — spaces you can move through, inspect, and build on. Seeing this work in real time still feels a bit surreal. My fascination with this goes back a long way: video games, virtual environments, and the idea of capturing the real world in 3D. As a researcher, I spent years working on 3D reconstruction, neural rendering, and scene understanding — all driven by the same question: how do we teach machines to understand the world? One thing became clear over time: the biggest bottleneck isn’t compute or rendering — it’s 3D worlds themselves. High-quality, consistent environments are expensive to create by hand and don’t scale to the experiences we want to build. In particular, I believe that the ability to generate virtual worlds is ultimately key towards understanding the real world. That’s why we founded SpAItial AI. We’re building spatial world models that combine geometric understanding with creative generation — models that can generate, edit, and eventually reason about 3D environments. Echo is just the beginning. For me, this feels like the moment when decades of research finally meet the imagination that got many of us into graphics, games, 3D understanding in the first place.🌍 spaitial.ai
SpAItial AI@SpAItial_AI

🚀 Announcing Echo — our new frontier model for 3D world generation. Echo turns a simple text prompt or image into a fully explorable, 3D-consistent world. Instead of disconnected views, the result is a single, coherent spatial representation you can move through freely. This is part of a bigger shift in AI: from generating pixels and tokens to generating spaces. Echo predicts a geometry-grounded 3D scene at metric scale, meaning every novel view, depth map, and interaction comes from the same underlying world — not independent hallucinations. Once generated, the world is interactive in real time. You control the camera, explore from any angle, and render instantly — even on low-end hardware, directly in the browser. High-quality 3D world exploration is no longer gated by expensive equipment. Under the hood, Echo infers a physically grounded 3D representation and converts it into a renderable format. For our web demo, we use 3D Gaussian Splatting (3DGS) for fast, GPU-friendly rendering — but the representation itself is flexible and can be easily adapted. Why this matters: consistent 3D worlds unlock real workflows — digital twins, 3D design, game environments, robotics simulation, and more. From a single photo or a line of text, Echo builds worlds that are reliable, editable, and spatially faithful. Echo also enables scene editing and restyling. Change materials, remove or add objects, explore design variations — all while preserving global 3D consistency. Editing no longer breaks the world. This is only the beginning. Echo is the foundation for future world models with dynamics, physical reasoning, and richer interaction — environments that don’t just look right, but behave right. Explore the generated worlds on our website and sign up for the closed beta. The era of spatial intelligence starts here. 🌍 #Echo #WorldModels #SpatialAI #3DFoundationModels Check it out: spaitial.ai

English
24
99
635
70.7K
Peter Kocsis retweetledi
George Kopanas
George Kopanas@gkopanas·
Radiance Meshes for Volumetric Reconstruction 🎉 We made a thing! A Radiance Field that is composited out of a triangle meshes that renders accurately the underlying volumetric field, with no popping AND faster than approximate methods like Gaussian Splatting.
George Kopanas tweet media
English
5
37
308
26.4K
Peter Kocsis retweetledi
Black Forest Labs
Black Forest Labs@bfl_ml·
FLUX.2 is here - our most capable image generation & editing model to date. Multi-reference. 4MP. Production-ready. Open weights. Into the new.
English
162
472
3K
481.5K
Peter Kocsis
Peter Kocsis@Peter4AI·
Congrats @yawarnihal! Amazing work, really well-deserved!
Matthias Niessner@MattNiessner

Congrats to @yawarnihal for winning the @MdsiTum best paper award for his amazing 𝐌𝐞𝐬𝐡𝐆𝐏𝐓 work🎉 MeshGPT autoregressively generates compact, artist-style triangle meshes by tokenizing faces into a learned discrete vocabulary (VQ-style codebook) and training a decoder-only transformer to predict those face tokens — because discrete tokenization + attention lets GPT-style models learn long-range geometric & topological patterns and produce coherent, high-fidelity 3D assets. MeshGPT's use cases go far beyond traditional content creation applications in computer graphics. For instance, the method was developed in collaboration with @Audi to help rapid prototyping of car designs, where explicit and precise mesh design are essential. In the research community, there have already been many follow ups such as MeshAnything, MeshXL, Meshtron, and many more - finally, we can use AI to generate high-fidelity 3D content :) Project: nihalsid.github.io/mesh-gpt/ Video: youtu.be/UV90O1_69_o

English
1
0
3
681
Peter Kocsis retweetledi
Angela Dai
Angela Dai@angelaqdai·
📢ProcGen3D: Learning Neural Procedural Graphs for Image-to-3D Reconstruction @xinyi092298 learns neural procedural graphs to generate high-fidelity 3D - MCTS-guided sampling maintains consistency with the input image, even from real images! Check it out: xzhang-t.github.io/project/ProcGe…
English
8
56
236
14.5K
Peter Kocsis retweetledi
Matthias Niessner
Matthias Niessner@MattNiessner·
📢📢 𝐏𝐞𝐫𝐜𝐇𝐞𝐚𝐝: 𝐏𝐞𝐫𝐜𝐞𝐩𝐭𝐮𝐚𝐥 𝐇𝐞𝐚𝐝 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐒𝐢𝐧𝐠𝐥𝐞-𝐈𝐦𝐚𝐠𝐞 𝟑𝐃 𝐇𝐞𝐚𝐝 𝐑𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 & 𝐄𝐝𝐢𝐭𝐢𝐧𝐠📢📢 PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text. At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS. Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output. In addition, our architecture seamlessly adapts to downstream tasks: by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt. We also provide an interactive GUI to enable the exploration of our editing pipeline. 🌍 antoniooroz.github.io/PercHead/ 📽️ youtu.be/4hFybgTk4kE Great work by @antonio_oroz and @TobiasKirschst1
YouTube video
YouTube
English
5
38
264
18.8K
Peter Kocsis retweetledi
Angela Dai
Angela Dai@angelaqdai·
📢New in ScanNet++: High-Res 360° Panos! @chandan__yes & @liuyuehcheng have added pano captures for 956 ScanNet++ scenes, fully aligned with the 3D meshes, DSLR, and iPhone data - multiple panos per scene Check it out: Docs kaldir.vc.in.tum.de/scannetpp/docu… Code #backproject-panocam-images" target="_blank" rel="nofollow noopener">github.com/scannetpp/scan…
English
0
46
225
12.8K
Peter Kocsis
Peter Kocsis@Peter4AI·
📢 Code for 𝙄𝙣𝙩𝙧𝙞𝙣𝙨𝙞𝙓: High-Quality PBR Generation using Image Priors 📢 We have just released the inference code and the pre-trained weights! Our model generates intrinsic properties (albedo, roughnes, metallic, normal) directly from text. peter-kocsis.github.io/IntrinsiX/
English
1
9
76
4.3K
Peter Kocsis retweetledi
Matthias Niessner
Matthias Niessner@MattNiessner·
Fantastic retreat this weekend by our research groups! Internal reviews, ideas brainstorming, paper reading, and much more! Of course also many social activities -- the highlight being our kayaking trip - lots of fun :)
Matthias Niessner tweet mediaMatthias Niessner tweet mediaMatthias Niessner tweet media
English
1
7
88
10.2K
Peter Kocsis retweetledi
Anand Bhattad
Anand Bhattad@anand_bhattad·
So You Want to Be an Academic? A couple of years into your PhD, but wondering: "Am I doing this right?" Most of the advice is aimed at graduating students. But there's far less for junior folks who are still finding their academic path. My candid takes: anandbhattad.github.io/blogs/jr_grads…
English
16
98
745
90.4K
Peter Kocsis
Peter Kocsis@Peter4AI·
🎉Our paper, 𝗜𝗻𝘁𝗿𝗶𝗻𝘀𝗶𝗫, just got accepted to #NeurIPS2025! 🎉 Our model generates renderable PBR maps directly from text and can also be used for rooms-scale scene texturing using SDS. peter-kocsis.github.io/IntrinsiX/ Let's meet in San Diego!
Matthias Niessner@MattNiessner

All six of our submissions were accepted to #NeurIPS2025 🎉🥳 Awesome works about Gaussian Splatting Primitives, Lighting Estimation, Texturing, and much more GenAI :) Great work by @Peter4AI, @YujinChen_cv, @ZheningHuang, @jiapeng_tang, @nicolasvluetzow, @jnthnschmdt 🔥🔥🔥

English
1
5
52
4.5K