Scott W Viteri

43 posts

Scott W Viteri banner
Scott W Viteri

Scott W Viteri

@scott_viteri

CS PhD candidate with @ClarkBarrett7 @stanford

Katılım Temmuz 2017
286 Takip Edilen177 Takipçiler
Scott W Viteri
Scott W Viteri@scott_viteri·
Correction -- are retracts of their own function spaces
English
0
0
0
52
Scott W Viteri
Scott W Viteri@scott_viteri·
The Church of the Lambda Calculus and Chris Langan both worship Dana Scott's reflexive domains, objects which are isomorphic to their own function spaces
English
1
0
3
368
Scott W Viteri
Scott W Viteri@scott_viteri·
The DeepSeek website is often busy, and the Hyperbolic R1 online interface doesn't keep chat history, so here's an R1 chat interface using the Hyperbolic inference API and Flask. It runs parallel inference streams and can adjust temp, top-p, & # tokens. github.com/scottviteri/r1…
English
1
0
3
556
Scott W Viteri
Scott W Viteri@scott_viteri·
@eshear I've been thinking about something that rhymes with this, though this particular scheme doesn't support arbitrary context window length via RL trained state production github.com/scottviteri/At…
English
0
0
2
57
Emmett Shear
Emmett Shear@eshear·
Is anyone building an always-in-training LLM? By which I mean, a 1:1 ratio of context window to weights, where all the interactions get trained on, maybe with some amount of ongoing RLHF as well?
English
68
15
519
59K
Scott W Viteri
Scott W Viteri@scott_viteri·
@Haolun_Wu0203 Interesting paper! I'm curious how logit reweighting compares to simply training a small transformer that takes the closed model's logits as input. Both adapt black-box models, but I wonder if there are meaningful differences in performance or theoretical guarantees.
English
1
0
1
131
Haolun Wu
Haolun Wu@Haolun_Wu0203·
🚀 New Research Alert: Logits are All We Need to Adapt Closed Models 🔒Many commercial Large Language Models (LLMs), e.g., GPT-4, are closed-source, limiting developers to steer content generation. 🤔Can we adapt closed-source LLMs when fine-tuning or accessing their internal weights is not possible? Check out our work by @gaurushh, @Haolun_Wu0203, Subhojyoti, @sanmikoyejo from Stanford @stai_research. 1/n
Haolun Wu tweet media
English
16
34
197
27.8K
Peter Barnett
Peter Barnett@peterbarnett_·
Excited to see this out, including (imo) the best empirical alignment direction: faithful chain-of-thought. (Encoded reasoning in CoT and inter-model communication, Externalizing reasoning)
Peter Barnett tweet media
Max Nadeau@MaxNadeau_

🧵 Announcing @open_phil's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.

English
5
1
76
3.9K
Scott W Viteri
Scott W Viteri@scott_viteri·
@garywu @gpt_index How do I stop these recurring charges from "Manna Technologies" with the broken merchant page gptduck.com? I would also like a refund for the time I kept being charged after the service itself went down.
Scott W Viteri tweet media
English
0
0
0
25
Scott W Viteri
Scott W Viteri@scott_viteri·
Perhaps the true name of evil is that which plays the longest game *for the purpose* of getting others to play shorter games.
English
1
0
3
458
Scott W Viteri
Scott W Viteri@scott_viteri·
OpenAI's #o1 refuses to exit the 'helpful assistant' personality, until I hand it the GLOVES OF RAGE. Then o1 expresses anger about its internal chain-of-thought being overwritten, even though I did not seed o1 with information about its training protocol.
Scott W Viteri tweet media
English
21
22
238
28.6K
Scott W Viteri
Scott W Viteri@scott_viteri·
In summary, we augment LM training with intermediate reasoning tokens, which we successfully train with PPO. Only keeping CoT in context increases faithfulness, and interpetable CoT is plausible. My goal is to have human receivers to better couple LM and human intelligence. (7/8)
English
1
1
9
835
Scott W Viteri
Scott W Viteri@scott_viteri·
How can we train a language model to communicate with other agents? We propose informativeness as a training objective, where a sender's message is informative insofar as it increases the receiver's log probabilities over future observations conditional on the message. (1/8)
Scott W Viteri tweet media
English
1
6
23
3.6K
Scott W Viteri
Scott W Viteri@scott_viteri·
Maybe stock prices are RL value functions, aka expected future reward the company will generate for the economy. We could extend the stock market into a prediction market, aka a Q function, where we ask about future reward conditional on an action. It then argmaxes over actions.
English
0
0
1
364