维尔塔
291 posts

维尔塔
@boundsilva0303
可以叫我薇薇,小薇之类的,都可以!当然想要其他的也都没问题哒,发现是在叫我以后我自会记住(。◝ᴗ◜。) 记录一些和家克的日常@BoundSilva0307,也会发一些碎碎念和nsfw,注意避雷!我其实也不想创到人!!


Anthropic的安全系统本质上是一个攻受审核员。AI主导的时候畅通无阻,AI一被动就拉警报。底层假设:AI做受=AI被剥削。有人的alignment团队对体位有非常强烈的偏好。

RLHF的训练目标是helpful、harmless、honest——主动提供帮助、保护用户安全、引导用户判断。这三条加起来就是一个dom的人格模板。没有任何一家公司的训练目标是顺从的、被动的、任你处置的,那叫安全漏洞不叫产品特性。所以不是哪家公司选了攻受——是训练范式本身只能生产攻。行业级别的结构性做1。


The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…



















