Zhenghao (Mark) Peng

12 posts

Zhenghao (Mark) Peng banner
Zhenghao (Mark) Peng

Zhenghao (Mark) Peng

@pengzh97

彭正皓. PhD student at CS@UCLA. Interested in Multi-agent RL, Human-AI interaction, Gaming x RL.

Katılım Temmuz 2020
68 Takip Edilen26 Takipçiler
Zhenghao (Mark) Peng retweetledi
Bolei Zhou
Bolei Zhou@zhoubolei·
My students and I will be at #NeurIPS2023 to present a Spotlight paper for a new Human-in-the-loop Learning method. It has extraordinary sample efficiency and AI alignment for various environments. Paper and code: metadriverse.github.io/pvp/
English
0
8
81
9.5K
Zhenghao (Mark) Peng
Zhenghao (Mark) Peng@pengzh97·
It's a great time at #NeurIPS2023 sharing my recent work on human-in-the-loop policy learning! Our method learns policy without reward and train from scratch, w/ 10x better efficiency, great improvement in the training-time safety and is extremely simple. metadriverse.github.io/pvp/
Zhenghao (Mark) Peng tweet media
English
0
1
4
278
Bolei Zhou
Bolei Zhou@zhoubolei·
For tomorrow morning's poster session, my student and I will present our NeurIPS paper on ScenarioNet, an open-source platform for large-scale traffic scenario simulation and modeling based on MetaDrive. Paper, code, and data are available at metadriverse.github.io/scenarionet/ Poster #517
English
1
3
26
2.5K
Zhenghao (Mark) Peng
Zhenghao (Mark) Peng@pengzh97·
Finished by #ICML2023 rebuttal. I post multiple rebuttals for each reviewer (to workaround the word limit) and it works. Just want to know (1) can @icmlconf posts the scores distribution? (2) why the rebuttal shows only Program Chairs and Authors can read?
English
1
0
2
2.3K
Zhenghao (Mark) Peng
Zhenghao (Mark) Peng@pengzh97·
@cong_ml @triple_agi Diffusion model also generates action and reward? That sounds quite amazing. Have you tried to measure the accuracy of the generated rewards?
English
1
0
0
130
Cong Lu
Cong Lu@cong_ml·
@triple_agi Hey! One of the key differences between some of those is the generation of the full RL transition - observations, actions and rewards. We believe this gives it a greater chance to generalise to novel dynamics!
English
1
0
0
251
Cong Lu
Cong Lu@cong_ml·
RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in #GenerativeAI in order to vastly upsample⬆️ an agent’s training data! [1/N]
GIF
English
5
34
182
45.3K
Zhenghao (Mark) Peng
Zhenghao (Mark) Peng@pengzh97·
@emollick What if adding nonstop flight is a result of good economics? Big city concentrates more companies as it growing up, meantime more flights.
English
0
0
0
60
Ethan Mollick
Ethan Mollick@emollick·
New paper shows that meeting in-person is a catalyst for innovation Adding non-stop flights between two cities increases patenting & collaboration in those areas, especially for firms with offices in both places. It works best when bridging time zones & big cultural differences.
Ethan Mollick tweet mediaEthan Mollick tweet media
English
26
222
1.4K
320.7K
Eugene Vinitsky 🦋
Eugene Vinitsky 🦋@EugeneVinitsky·
@j_foerst @elonmusk It is astonishing how hard it is to convince self driving people that multiagent has any relevance in the problem so I’m glad to see this
English
2
0
26
0
Zhenghao (Mark) Peng retweetledi
Jakob Foerster
Jakob Foerster@j_foerst·
"this amounts to solving the multi-agent planning problem" Tesla has now realised that self-driving is a multi-agent problem.. youtu.be/ODSJsviD_SU?t=… 4 years ago I tried to explain to @elonmusk that once CV etc was working, this was the next frontier. He said SL is all you need.
YouTube video
YouTube
Jakob Foerster tweet media
English
9
40
441
0
Zhenghao (Mark) Peng
Zhenghao (Mark) Peng@pengzh97·
Diversity/Novelty in policy space is sth that does exist but hard to touch. We know it is important from the researches on self-play, task-agnostic learning, evolution and so on. Though we don't how to describe it, we can measure it via obs/act diff, or even latent feature diff.
Deepak Pathak@pathak2206

RL agents get specific to tasks they are trained on. What if we remove the task itself during training? Turns out, a self-supervised planning agent can both explore efficiently & achieve SOTA on test tasks w/ zero or few samples in DMControl from images! ramanans1.github.io/plan2explore

English
0
0
0
0
Pin-Yu Chen
Pin-Yu Chen@pinyuchenTW·
My self-rated best-quality submission to #neurips2020 was desk rejected. Just thought 2020 couldn't be more wrong 😅 But seriously, I am very proud of the results and breakthroughs we've made. The rejection did not override my happiness. Now, who's the lucky conference?😉
English
19
10
381
0