Angehefteter Tweet
Jason Lin
133 posts

Jason Lin
@realJLin
ML PhD, language models @GoogleDeepMind. NLP @Stanford. deep learning @Lyft @GoogleX @Spotify @Apple. 30 hackathons🏅 I train mixture of experts
Palo Alto Beigetreten Aralık 2013
167 Folgt57 Follower
Jason Lin retweetet
Jason Lin retweetet

Explore, Establish, Exploit: Red Teaming Language Models from Scratch
paper page: huggingface.co/papers/2306.09…
Deploying Large language models (LLMs) can pose hazards from harmful outputs such as toxic or dishonest speech. Prior work has introduced tools that elicit harmful outputs in order to identify and mitigate these risks. While this is a valuable step toward securing language models, these approaches typically rely on a pre-existing classifier for undesired outputs. This limits their application to situations where the type of harmful behavior is known with precision beforehand. However, this skips a central challenge of red teaming: developing a contextual understanding of the behaviors that a model can exhibit. Furthermore, when such a classifier already exists, red teaming has limited marginal value because the classifier could simply be used to filter training data or model outputs. In this work, we consider red teaming under the assumption that the adversary is working from a high-level, abstract specification of undesired behavior. The red team is expected to refine/extend this specification and identify methods to elicit this behavior from the model. Our red teaming framework consists of three steps: 1) Exploring the model's behavior in the desired context; 2) Establishing a measurement of undesired behavior (e.g., a classifier trained to reflect human evaluations); and 3) Exploiting the model's flaws using this measure and an established red teaming methodology. We apply this approach to red team GPT-2 and GPT-3 models to systematically discover classes of prompts that elicit toxic and dishonest statements. In doing so, we also construct and release the CommonClaim dataset of 20,000 statements that have been labeled by human subjects as common-knowledge-true, common-knowledge-false, or neither.

English
Jason Lin retweetet

@paulg Yeah but the problem is that Airbnb has 50 things we didn't like and Apple mostly has things we like. Presumably, at some point you go from the former to the latter, but it looks like economic pressures force Airbnb into Booking.com patterns in the end.
English

As long as you can still use phrases like "what you don't like," you still have the core of being a startup. Imagine Google or Apple being so candid.
Brian Chesky@bchesky
You told us what you don’t like about Airbnb. Here are the 50 things we’re doing about it...
English
Jason Lin retweetet

second article on blockchain, bitcoin and why decentralization deserves your attention.
P.S. will try to get a (bi)weekly publication going, follow my medium to stay tuned!
link.medium.com/lcI9Z9z2Qhb
English

First dabble in publishing & lessons learned running an early-stage startup. Stay tuned for more! #founderwellness @justinkan
link.medium.com/fhsuvin0Khb
English

@gkimbwala @she_256 @megdeemeh So great to hear your impact with D&I in crypto! (imo it really needs more representation)
English

Just meet my @she_256 mentee @megdeemeh. She is amazing! Kudos to the @she_256 team for connecting us. Honored at the opportunity to knowledge share about the blockchain industry and bring more inclusion into the space!
English

How to represent part-whole hierarchies in a neural network: peeking into the future of transformer-based vision systems arxiv.org/abs/2102.12627 by @geoffreyhinton
English

“Consider instead what would be possible if these brands connected the data they have from the production process all…” fastcompany.com/90507470/the-f…
English
Jason Lin retweetet
Jason Lin retweetet


Does World Bank's latest paper (bit.ly/2ZXKyKU) on China’s productivity slowdown hint a need to curtail its foreign debt load rather than deepening capital towards non-productive GDP growth?
English

Stoked for what the future holds in #QuantumBlueprint Launch to the Future: Quantum Internet — The Department of Energy & UChicago youtu.be/cR0wVCs9DxI

YouTube
English








