nikoster

91 posts

nikoster

nikoster

@nikosters

https://t.co/yaT0AlhjWB

Saint Petersburg, Russia Katılım Ekim 2021
109 Takip Edilen6 Takipçiler
nikoster
nikoster@nikosters·
@atorixa00 Я бы согласился с этим тезисом пару лет назад, но не сейчас. Мб для обычного пользователя это всё ещё верно, но для аутистов, вроде меня, которые могут потратить кучу времени на конфигурацию каждого пикселя на дисплее, десктопный линукс это лучший опыт пользования пк
Русский
0
0
0
1
atorixa🏳️‍⚧️
линукс для десктопа это худшее что вообще существует
Русский
59
0
67
11.9K
nikoster retweetledi
nikoster retweetledi
Osinachi
Osinachi@sin4ch·
I dream about having @theo's Excalidraw presentation skills.
English
2
1
46
10.8K
nikoster
nikoster@nikosters·
A new era of PC. 25.0528, 121.5990
English
0
0
0
7
nikoster
nikoster@nikosters·
@uwukko нужно сделать форк и назвать его huism
Русский
1
0
10
553
nikoster retweetledi
Theo - t3.gg
Theo - t3.gg@theo·
Struggling to pick what agent, model, and effort levels to use? Miss the "slot machine" feel of Claude Code when using other tools? `npx slotslop "[prompt]"`
English
137
159
4.1K
279.2K
nikoster retweetledi
ThePrimeagen
ThePrimeagen@ThePrimeagen·
benchmarks are stupid for models Just ignore them
English
124
55
1.5K
58.8K
nikoster retweetledi
wukko
wukko@uwukko·
ZXX
13
90
811
28.8K
nikoster retweetledi
maria
maria@maria_rcks·
@adonis_singh i hope everyone drops swe bench pro and just starts using deepswe, swe bench pro is a joke atp
English
2
5
66
1.8K
nikoster retweetledi
Haider.
Haider.@haider1·
the reason why anthropic is still keeping "mythos" locked in the lab: user: hey mythos spends 3 minutes deciding whether "hey" could mean urgency, affection, or a threat mythos: hey :) api cost: $200
English
92
284
9.7K
222.7K
nikoster
nikoster@nikosters·
3) круто, что computer use бенчмарк показывает более высокие цифры, но я всё равно не буду его использовать пока claude code не станет нормальным harness остальные бенчмарки я комментировать не буду, так как для обычных повседневных задач мне хватает и kimi 2.6
Русский
0
0
0
13
nikoster
nikoster@nikosters·
2) почему-то антропик до сих пор не сделали нормальный харнес для работы в терминале, как будто бы кодинг и работа в терминале единственные юзкейсы для фронтир моделей
Русский
1
0
0
2
nikoster
nikoster@nikosters·
окей, что мы можем понять из этих бенчмарков: 1) swe bench pro ничего не показывает, так как gemini 3.1 набирает на 4 процента меньше чем gpt 5.5(разница в качестве аутпута этих моделей не 4 процента, там все 50)
Claude@claudeai

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

Русский
1
0
0
7
nikoster
nikoster@nikosters·
Can't wait to see cost of benchmark runs
Artificial Analysis@ArtificialAnlys

Anthropic just launched Claude Opus 4.8, and it is the new leader on our GDPval-AA benchmark for agentic real-world work tasks Opus 4.8 scored 1890 on GDPval-AA at launch with its 'max' effort setting, +137 points from Opus 4.7 and +121 points ahead of the next-best model, GPT-5.5 xhigh. Compared head-to-head on the GDPval task set, this implies a ~67% win rate against GPT-5.5 xhigh. @AnthropicAI shared access with us ahead of the public release to benchmark this model and we’re glad to see our benchmarks referenced in today’s launch. The rest of the Artificial Analysis Intelligence Index is in progress - we’ll share final results soon!

English
0
0
0
10
nikoster
nikoster@nikosters·
@michalmalewicz I heard mandarin burns even less. Also Chinese models seem to work better with prompts in mandarin
English
0
0
0
90
Michal Malewicz
Michal Malewicz@michalmalewicz·
Polish language burns the least tokens. Learn polish.
Michal Malewicz tweet media
English
32
28
970
47K
nikoster retweetledi
wukko
wukko@uwukko·
A fresh Brave install in 2026: sponsored ad wallpapers on new tab page by default (opt out). Brave VPN, News, Talk, Leo (AI), Rewards and other revenue-milking bloat is advertised/pinned by default. Analytics and "phoning home" by default. Google as default search engine in most regions by default. Sponsored search engines like Russian Yandex in CIS countries by default: github.com/brave/brave-co… Brave has an ad branch that handles advertising within the browser: brave.com/ads. Brave does on-device ad targeting based on cohorts and interests, just like what Chrome used to do and what Google was largely hated for (remember FLoC?). This applies to additional (opt-in) rewarded ads, shipped as part of Brave. Brave has injected referral IDs to crypto-related URLs entered into the omnibox in the past, intentionally, by design: x.com/CR1337/status/… github.com/brave/brave-br… reddit.com/r/privacytools… Brave also uses dark patterns to drive users away from turning off ads in their browser. For example, an article linked from the "opt out" button in the browser has a wall of text making excuses for ads before the actual steps needed to be taken to disable them: support.brave.app/hc/en-us/artic… kind of hypocritical for brave to judge firefox for lesser bullshit, don't you think?
wukko tweet mediawukko tweet media
English
40
546
5.2K
321K