Zhenghao (Mark) Peng

12 posts

Zhenghao (Mark) Peng

@pengzh97

彭正皓. PhD student at CS@UCLA. Interested in Multi-agent RL, Human-AI interaction, Gaming x RL.

Katılım Temmuz 2020

68 Takip Edilen26 Takipçiler

Zhenghao (Mark) Peng retweetledi

Bolei Zhou@zhoubolei·7 Ara

My students and I will be at #NeurIPS2023 to present a Spotlight paper for a new Human-in-the-loop Learning method. It has extraordinary sample efficiency and AI alignment for various environments. Paper and code: metadriverse.github.io/pvp/

English

9.5K

Zhenghao (Mark) Peng@pengzh97·13 Ara

It's a great time at #NeurIPS2023 sharing my recent work on human-in-the-loop policy learning! Our method learns policy without reward and train from scratch, w/ 10x better efficiency, great improvement in the training-time safety and is extremely simple. metadriverse.github.io/pvp/

English

278

Zhenghao (Mark) Peng@pengzh97·13 Ara

@zhoubolei Haha!

Filipino

Bolei Zhou@zhoubolei·12 Ara

For tomorrow morning's poster session, my student and I will present our NeurIPS paper on ScenarioNet, an open-source platform for large-scale traffic scenario simulation and modeling based on MetaDrive. Paper, code, and data are available at metadriverse.github.io/scenarionet/ Poster #517

English

2.5K

Zhenghao (Mark) Peng@pengzh97·16 Mar

Finished by #ICML2023 rebuttal. I post multiple rebuttals for each reviewer (to workaround the word limit) and it works. Just want to know (1) can @icmlconf posts the scores distribution? (2) why the rebuttal shows only Program Chairs and Authors can read?

English

2.3K

Zhenghao (Mark) Peng@pengzh97·16 Mar

@cong_ml @triple_agi Diffusion model also generates action and reward? That sounds quite amazing. Have you tried to measure the accuracy of the generated rewards?

English

130

Cong Lu@cong_ml·15 Mar

@triple_agi Hey! One of the key differences between some of those is the generation of the full RL transition - observations, actions and rewards. We believe this gives it a greater chance to generalise to novel dynamics!

English

251

Cong Lu@cong_ml·14 Mar

RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in #GenerativeAI in order to vastly upsample⬆️ an agent’s training data! [1/N]

GIF

English

182

45.3K

Zhenghao (Mark) Peng@pengzh97·15 Mar

Are there review scores distribution/stat of #ICML2023 ?

English

1.4K

Zhenghao (Mark) Peng@pengzh97·5 Mar

@emollick What if adding nonstop flight is a result of good economics? Big city concentrates more companies as it growing up, meantime more flights.

English

Ethan Mollick@emollick·4 Mar

New paper shows that meeting in-person is a catalyst for innovation Adding non-stop flights between two cities increases patenting & collaboration in those areas, especially for firms with offices in both places. It works best when bridging time zones & big cultural differences.

English

222

1.4K

320.7K

Zhenghao (Mark) Peng@pengzh97·1 Eki

@EugeneVinitsky @j_foerst @elonmusk That's interesting. Isn't self-driving system naturally a multi-agent system?

English

Eugene Vinitsky 🦋@EugeneVinitsky·1 Eki

@j_foerst @elonmusk It is astonishing how hard it is to convince self driving people that multiagent has any relevance in the problem so I’m glad to see this

English

Zhenghao (Mark) Peng retweetledi

Jakob Foerster@j_foerst·1 Eki

"this amounts to solving the multi-agent planning problem" Tesla has now realised that self-driving is a multi-agent problem.. youtu.be/ODSJsviD_SU?t=… 4 years ago I tried to explain to @elonmusk that once CV etc was working, this was the next frontier. He said SL is all you need.

YouTube

English

441

Zhenghao (Mark) Peng@pengzh97·1 Eki

Our CoPO work (decisionforce.github.io/CoPO/) finds similar phenomenon that social behaviors can emerge in distributed multi-agent RL system (decentralized execution).

Andrew Davison@AjdDavison

New demo, with turns and swerves, of distributed real-time multi-agent planning/MPC; no central control needed. Uses GBP and p2p message passing over the joint factor graph so arbitrarily scalable and robust. @AalokPat @rmurai0610 Dyson Robotics Lab arxiv.org/abs/2203.11618

English

Zhenghao (Mark) Peng@pengzh97·15 Tem

Diversity/Novelty in policy space is sth that does exist but hard to touch. We know it is important from the researches on self-play, task-agnostic learning, evolution and so on. Though we don't how to describe it, we can measure it via obs/act diff, or even latent feature diff.

Deepak Pathak@pathak2206

RL agents get specific to tasks they are trained on. What if we remove the task itself during training? Turns out, a self-supervised planning agent can both explore efficiently & achieve SOTA on test tasks w/ zero or few samples in DMControl from images! ramanans1.github.io/plan2explore

English

Zhenghao (Mark) Peng@pengzh97·14 Tem

@pinyuchenTW Appreciate your understanding of research! Excited by the work not the acceptance.

English

Pin-Yu Chen@pinyuchenTW·14 Tem

My self-rated best-quality submission to #neurips2020 was desk rejected. Just thought 2020 couldn't be more wrong 😅 But seriously, I am very proud of the results and breakthroughs we've made. The rejection did not override my happiness. Now, who's the lucky conference?😉

English

381

Keşfet

@zhoubolei @icmlconf @cong_ml @emollick @EugeneVinitsky @j_foerst @elonmusk @pinyuchenTW