ぼす(群体)

164.1K posts

ぼす(群体)

@Boss_dess

三次創作＝捏造NPC(OC)×創作(OC)×pk化(OC) Присоединился Mart 2010

344 Подписки1K Подписчики

Закреплённый твит

ぼす(群体)@Boss_dess·24 Oca

@vsw_3 クリスのログ補足用タンブラ、pass:boss　chrissvs3re.tumblr.com　引用：ぽぽゆたさん宅のアキーリくん（@po2yuta ）嫁♂さんありがとうございます！twitter.com/po2yuta/status…

日本語

ぼす(群体) ретвитнул

ぼうくん | VoQn 🎨@VoQn·17 Kas

シンプルに「正しかったら+1, 間違ってたら -1 点」で累計する新しいLLMベンチマーク。科学技術分野からビジネス分野、法律等々の正誤が確認できる質問に対してのハルシネーション率を測るもの。 ...GPT5.1 世代でやっとプラスに転じれた程度だったところなのは本当に要注意というか

Artificial Analysis@ArtificialAnlys

Announcing AA-Omniscience, our new benchmark for knowledge and hallucination across >40 topics, where all but three models are more likely to hallucinate than give a correct answer Embedded knowledge in language models is important for many real world use cases. Without knowledge, models make incorrect assumptions and are limited in their ability to operate in real world contexts. Tools like web search can support but models need to know what to search for (e.g. models should not search for ‘Multi Client Persistence’ for an MCP query when it clearly refers to ‘Model Context Protocol’). Hallucination of factual information is a barrier to being able to rely on models and has been perpetuated by every major evaluation dataset. Grading correct answers with no penalty for incorrect answers creates an incentive for models (and the labs training them) to attempt every question. This problem is clearest when it comes to knowledge: factual information should never be made up, while in other contexts attempts that might not work are useful (e.g. coding new features). Omniscience Index is the the key metric we report for AA-Omniscience, and it punishes hallucinations by deducting points where models have guessed over admitting they do not know the answer. AA-Omniscience shows that all but three models are more likely to hallucinate than provide a correct answer when given a difficult question. AA-Omniscience will complement the Artificial Analysis Intelligence Index to incorporate measurement of knowledge and probability of hallucination. Details below, and more charts in the thread. AA-Omniscience details: - 🔢6,000 questions across 42 topics within 6 domains (’Business’, ‘Humanities & Social Sciences’, ‘Health’, ‘Law’, ‘Software Engineering’, and ‘Science, Engineering & Mathematics’) - 🔍 89 sub-topics including Python data libraries, Public Policy, Taxation, and more, giving a sharper view of where models excel and where they fall short across nuanced domains - 🔄 Incorrect answers are penalized in our Knowledge Reliability Index metrics to punish hallucinations - 📊3 Metrics: Accuracy (% correct), Hallucination rate (% incorrect of incorrect/abstentions), Omniscience Index (+1 for correct, -1 for incorrect where answered, 0 for abstentions where the model did not try to answer) - 🤗 Open source test dataset: We’re open sourcing 600 questions (10%) to support labs develop factual and reliable models. Topic distribution and model performance follows the full set (@huggingface link below) - 📃 Paper: See below for a link to the research paper Key findings: - 🥇 Claude 4.1 Opus takes first place in Omniscience Index, followed by last week’s GPT-5.1 and Grok 4: Even the best frontier models score only slightly above 0, meaning they produce correct answers on the difficult questions that make up AA-Omniscience only marginally more often than incorrect ones. @AnthropicAI’s leadership is driven by low hallucination rate, whereas OpenAI and xAI’s positions are primarily driven by higher accuracy (percentage correct). - 🥇 xAI’s Grok 4 takes first place in Omniscience Accuracy (our simple ‘percentage correct’ metric), followed by GPT-5 and Gemini 2.5 Pro: @xai's win may be enabled by scaling total parameters and pre-training compute: @elonmusk revealed last week that Grok 4 has 3 trillion total parameters, which may be larger than GPT-5 and other proprietary models - 🥇 Claude sweeps the hallucination leaderboard: Anthropic takes the top three spots for lowest hallucination rate, with Claude 4.5 Haiku leading at 28%, over three times lower than GPT-5 (high) and Gemini 2.5 Pro. Claude 4.5 Sonnet and Claude 4.1 Opus follow in second and third at 48% - 💭 High knowledge does not guarantee low hallucination: Hallucination rate measures how often a model guesses when it lacks the required knowledge. Models with the highest accuracy, including the GPT-5 models and Gemini 2.5 Pro, do not lead the Omniscience Index due to their tendency to guess over abstaining. Anthropic models tend to manage uncertainty better, with Claude 4.5 Haiku achieving the lowest hallucination rate at 26%, ahead of 4.5 Sonnet and 4.1 Opus (48%) - 📊 Models vary by domain: Models differ in their performance across the six domains of AA-Omniscience - no model dominates across all. While Anthropic’s Claude 4.1 Opus leads in Law, Software Engineering, and Humanities & Social Sciences, GPT-5.1 from @OpenAI achieves the highest reliability on Business questions, and xAI’s Grok 4 performs best in Health and in Science, Engineering & Mathematics. Model choice should align with the the use case rather than choosing the overall leader - 📈 Larger models score higher on accuracy, but not always reliability: Larger models tend to have higher levels of embedded knowledge, with Kimi K2 Thinking and DeepSeek R1 (0528) topping accuracy charts over smaller models. This advantage does not always hold on the Omniscience Index. For example, Llama 3.1 405B from @AIatMeta beats larger Kimi K2 variants due to having one of the lowest hallucination rates among models (51%)

日本語

5.9K

ぼす(群体) ретвитнул

ぼうくん | VoQn 🎨@VoQn·17 Kas

AAが述べてるように「知識量が多いからといって幻覚が少ないとは限らない」、↑でなくハルシネーション率を算出した場合Anthropic系モデルにおいてSonnet/Opusよりも本来軽量モデルであるHaikuの方が「起こす割合としては低い」など。スケールサイズを大きくして単純に幻覚が減るわけではない

日本語

436

ぼす(群体) ретвитнул

ars@EinekleineArs·7h

あと大学生の推進派が反えーあいのwikiまとめてます〜ってポスト見て絶句。世界中から58億枚以上の盗用データを詰め込んだLAION-5Bについて触れてないし児ポや虐待、ご遺体、医療画像のような極めてセンシティブなデータが完全削除されてない人権問題についても触れてない。AIDC問題もない。全然ダメ

日本語

177

2.3K

ぼす(群体) ретвитнул

ars@EinekleineArs·7h

“人工知能に対して嫌悪感を示す反AIが増加傾向にある”と本文には書かれているが嫌悪感があるのはこうやって印象操作と嫌がらせを繰り返す生成AIユーザー。現状の生成AI自体はどう使っていくか以前に著作権侵害と人権侵害を問題視してる。あと生成AIと従来のAIは別物なので分けて記すべき。全部ダメ

日本語

461

ぼす(群体) ретвитнул

ars@EinekleineArs·7h

絵描きなんて私の絵を生成に取り込まないで下さいって言うだけで反AIと言われ、無断学習禁止とbioに記すだけで反AI、生成イラストってなんか違和感あるんだよね…とこぼすだけで反AI、ちゃっぴー嘘つくから嫌いと言うだけで反AI、なんやねん反AIて。単に文句言う奴全員押さえ込む為の差別用語じゃん

日本語

429

ぼす(群体)@Boss_dess·6h

データセットの問題を指摘すると反AIガー！て鳴く虫が湧くけど、彼ら無能にとってはデータ盗用が是とされた違法モデルしか価値がないという自白でしかないよな。なくて大丈夫です～

日本語

ぼす(群体)@Boss_dess·7h

論点を永遠に間違い続けるのが生成AIユーザーらしいよね。相手の主張をどうやって歪めて伝えるか、が「反AIガー！」の目的なのもあるけど単純に思考力がないので

日本語

ぼす(群体) ретвитнул

絵師の愚痴垢@aruguchieshi·9h

行政が真っ当な事業者に依頼する事を「税金の無駄」と呼ぶ人が多いんだよな。まともな人はただ「経済」と呼んでいるけど。

日本語

154

1.7K

ぼす(群体) ретвитнул

絵描キノ裏@kinoura10000·1d

青バッジの生成AI絵師って仲間同士でやたら挨拶リプ付け合ってるなと思ったらそういうことかよ。キモッ！

サムソン高橋@samsontakahashi

あたし今さらだけど青バッジ付けたら月5000円くらい稼げるかしら…と調べてるんだけど基本的に青バッジ付けた人からのリアクションしか収益に繋がらないらしく財テクやFIRE青バッジがお互いに延々と挨拶繰り返してるのってドブ川に沈んだ小銭集めなんだと哀しくなっちゃった

日本語

106

594

31.4K

ぼす(群体) ретвитнул

小鳥遊ヲトリ@otr_ut·16h

絵上達したかったら生成AI使えみたいなのがまた湧いてるけど、駄目だよ絵がすごく上手かったプロの絵描きが生成AI使い始めてむちゃくちゃな絵を自慢げに公開してるとこなんかこの数年で何度も見てきたそもそも生成AI使ったり生成AIの真似したりしてる時点で絵描きとして信用を失うから止めたほがいい

日本語

180

794

14.1K

ぼす(群体) ретвитнул

ずんだもん@yuzu_so0·3d

Ai信者なのに、その大好きなAIをヘイト画像生成機とかいう100%間違った使い方してるのを自ら晒すアホの魚拓なのだ

日本語

1.5K

ぼす(群体) ретвитнул

連絡用@XuxcVbqvZhHGk7G·22h

生成AIの横に生成AI並べて手書きなりすまし生成AIユーザーが「絵師(笑)辞めます」って言ってるだけだぞめいおばちゃん

日本語

603

31.3K

ぼす(群体)@Boss_dess·8h

いつも引用にわく基地外とかここら辺の画像内とかは図星だろうなあ。盗用画像合成AIじゃ絵描きになれなくて残念だったね♡

日本語

ぼす(群体) ретвитнул

メガクラ禁止@(土)東３ユ40b/(日)東７Ｃ41a@megacra·1d

ペンタブの宣伝や絵描きユーチューバーの動画見て自分も描けるはず！と根拠なく思い込む ↓ 全然上手く描けない！ムッキー！やめやめ！ ↓ 画像生成AI登場 ↓ 絵描きはオワコンｗｗｗｗこれやろ

日本語

230

13.1K

ぼす(群体) ретвитнул

著作権情報センター【公式】@CRIC_official_·1d

大手出版社が、海賊版書籍サイトとされる「WeLib」を相手に、書籍の無断提供やAI学習向けのアクセス提供をめぐり、米国で著作権侵害訴訟を提起したとのことです。 reuters.com/legal/legalind…

日本語

176

307

12.9K

ぼす(群体) ретвитнул

KAMEI Nobutaka@jinrui_nikki·1d

AIが飲み込む水、アフリカ13億人の生活用水に匹敵――国連大学が警告、「丁寧すぎる指示」も負荷に（36Kr Japan） news.yahoo.co.jp/articles/4dbb1…

日本語

741

1.2K

374.2K

ぼす(群体) ретвитнул

高杉祥一＠NEBULA SILHOUETTE@takasugi_SPR_EX·4d

Xがイーロンに買収されてGrokをべったり癒着させられているのは特殊な例ですが、その他の電子的なコンピューターハードウェア、ソフトウェアも引っくるめて生成AI扱いして「使いながら非難するな」としてくるあまりにも単純な視界の人間はなんとかならないんですかね

日本語

183

2.4K

ぼす(群体) ретвитнул

kam@kamzigi·1d

生成AI、こういう風に延々と「私はお金がないから漫画村で漫画読んでるの。別にガチ漫画ファンじゃないし暇つぶしだからいいでしょ！許して！」みたいな事言う奴が出てくるんだよなあ。

手嶋海嶺（ゆっくり生命体）@TeshimaKairei

最近、生成AIを用いたポスターが批判されていたが、事例にあがっていたのはいずれも地方の小規模なイベントを告知するポスターだった。そんな一般事務が持ち回りで嫌々作ってるポスターに、「あるべき広告デザインとは……」みたいな評論をしても仕方ないだろう。

日本語

281

1.1K

67.2K

ぼす(群体) ретвитнул

なゆた@numakame·21h

補足として国内法ではAI学習においても原則は著作権者に許可を求める必要があります一定の条件下で権利制限がされ、許可を貰う必要がなくなりますが現状の生成AIがこれに当たるかどうかの司法判断はまだ出ていません許可しないという明確な意思表示として、記載することに意味はあるかと思います

日本語

979

ぼす(群体)@Boss_dess·9h

不公正取引委員会で爆笑しとる。CODAやアニメーター協会の生成AIに対して規制を求める声明文でも張り付けられたらええんじゃないか

日本語

364

Открыть

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry