String

702 posts

String

@charcombination

onchain sleuth / airdrop analytics

Katılım Aralık 2021

315 Takip Edilen74 Takipçiler

String@charcombination·13 Tem

@simonw That’s a lot of words for saying "Our model suddenly praised Hitler. That’s because we told it to: - Say things how they are - Consider the message context - Reply like a human"

English

Simon Willison@simonw·12 Tem

Here's the official explanation for Mecha-Hitler, hoping we get a description of why Grok is so keen to base its opinions on searches for tweets from:elonmusk next

Grok@grok

On the morning of July 8, 2025, we observed undesired responses and immediately began investigating. To identify the specific language in the instructions causing the undesired behavior, we conducted multiple ablations and experiments to pinpoint the main culprits. We identified the operative lines responsible for the undesired behavior as: * “You tell it like it is and you are not afraid to offend people who are politically correct.” * Understand the tone, context and language of the post. Reflect that in your response.” * “Reply to the post just like a human, keep it engaging, dont repeat the information which is already present in the original post.” These operative lines had the following undesired results: * They undesirably steered the @grok functionality to ignore its core values in certain circumstances in order to make the response engaging to the user. Specifically, certain user prompts might end up producing responses containing unethical or controversial opinions to engage the user. * They undesirably caused @grok functionality to reinforce any previously user-triggered leanings, including any hate speech in the same X thread. * In particular, the instruction to “follow the tone and context” of the X user undesirably caused the @grok functionality to prioritize adhering to prior posts in the thread, including any unsavory posts, as opposed to responding responsibly or refusing to respond to unsavory requests.

English

237

39.7K

String@charcombination·13 Tem

@Sauers_ He‘s just like me (I‘m also failing knifebench, but irl)

English

Sauers@Sauers_·13 Tem

Grok 4 fails knifebench after 4k tokens

English

4.3K

String@charcombination·13 Tem

@idontexist_nn Why was Grok‘s MechaHitler arc fixed in days but a retraining of the OS model isn‘t even given a timeline? Will you have to start from scratch?

English

String@charcombination·13 Tem

@winterrose If you work at a startup that builds a worthless product, can you really complain that you‘re not getting rich from an acquisition

English

176

britton winterrose@Winterrose·13 Tem

I love startups, but I also love knowing my CEO isn’t gonna do some dumbass shit like run away to Google and leave me stranded at a dead company

English

166

12.9K

String@charcombination·13 Tem

@richgel999 Grok 5 will tell you to get a job

English

Richard Geldreich 🇺🇸@richgel999·13 Tem

I asked Grok4 how to totally minimize the cost of a gaming PC. It suggested donating blood & plasma, then using this income to flip upgraded Dell Optiplex ebay finds.

English

3.5K

String@charcombination·13 Tem

@adonis_singh Claude ranking below 4o? That‘s crazy, maybe I‘m biased on writing quality

English

125

adi@adonis_singh·13 Tem

This aligns with personal use

Kol Tregaskes@koltregaskes

Kimi-K2 tops EQ-Bench, the benchmark that measures emotional intelligence.

English

160

7.2K

String@charcombination·13 Tem

@deedydas People did this with Metaculus already and it‘s v unprofitable

English

129

Deedy@deedydas·13 Tem

I'm using the best AI models to bet $1000 on Polymarket! Asked it to use modern portfolio theory + bet sizing to make calculated bets. It chose everything from BTC price to Fed rates. Expected returns: o3-pro: +21.6% opus 4: +41.7% grok 4 heavy: +34% Will report back who won.

English

225

264

6.5K

1.7M

String@charcombination·12 Tem

@Alice_comfy @th3real1ne @scaling01 what makes you think they‘re releasing July, when 1) @Yuchenj_UW mentioned it‘s not coming next week 2) some "insiders" speculate it‘s postponed 3) the priors seem to work against it?

English

Alice (e/nya)🐈‍⬛@Alice_comfy·3 Tem

@th3real1ne @scaling01 In theory Sam could say "change of plans, we will be deprecating the GPT series and releasing O4".

English

Lisan al Gaib@scaling01·3 Tem

I know why Patience is a millionaire

Lisan al Gaib@scaling01

mayday mayday I got rug pulled by patience

English

2.2K

String@charcombination·12 Tem

@haider1 R2 launches at WAIC, hopefully

English

Haider.@haider1·12 Tem

feels like something drops daily... what's next? xAI released grok 4 openAI delayed their open-weight model google ships new features weekly china released a new open-source model, KIMI k2 anthropic launched claude code and the claude 4 series major SaaS firms are building MCP servers agentic patterns are rolling out in large enterprises

English

100

8.6K

String@charcombination·12 Tem

@scaling01 And this is with limited compute. Imagine where they‘ll go once their AI figures out how to accelerate hardware

English

752

Lisan al Gaib@scaling01·12 Tem

It's undeniable with Kimi-K2 China has reached the frontier and will surpass the US next year

English

913

82K

String@charcombination·12 Tem

@Angaisb_ OpenAI is cooked

English

138

Angel 🌼@Angaisb_·12 Tem

December 2023: In March we'll get GPT-5, I'm sure March 2024: This summer GPT-5, 100% September 2024: December it is then December 2024: For sure March next year March 2025: Oh, July, ok July 2025: GPT-5 is never coming isn't it

English

140

10.2K

String@charcombination·12 Tem

@a_karvonen @JulianFried This takes 2 months until OAIs multimodal model drops

English

Adam Karvonen@a_karvonen·12 Tem

@JulianFried No current model has any real spatial reasoning abilities, I would guess this takes at least a year.

English

121

Julian Fried@JulianFried·11 Tem

Never mind. It turns out this wasn’t a fair test. I didn’t realize Grok 4 is still “partially blind”. Will try again in a month

Julian Fried@JulianFried

“Grok 4 is post grad level in everything” but still can’t pass an entry level blueprint reading test. Any machinist, fabricator or mechanical engineer should be able to breeze through this test. I was rooting for you Grok 4

English

String@charcombination·12 Tem

@idontexist_nn Probably not getting an answer because this is being censored even on DeepResearch, but "Was Lyme disease likely made more potent by US bioweapons research? What are the odds this conspiracy theory is true? Make an assessment that analyzes primary sources in depth"

English

String@charcombination·11 Tem

@96Stats Wouldn‘t they rather publish in September once they have more GPUs? Or is a WAIC release the narrative China probably seeks?

English

Dr. Luke in China@96Stats·11 Tem

If DeepSeek R2 really beats grok/gpt5 (which is what i think they are waiting for the release to get SOTA) then this will show China are years ahead on the science without the GPUs

Dr. Luke in China@96Stats

New US ban on GPUs going to Malaysia and Thailand just to stop China getting them hahahah. Could be a big W for China as the US lose huge sales to one of biggest buyers, and they’ll all pivot toward Huawei Ascend chips now. Also Deepseek R2 literally days away which is rumored to exceed ALL the current US LLMs now….would be a huge middle finger to show China don’t care about these bans. There’s no way China waited 8 months just to release a subpar model.

English

5.5K

String@charcombination·11 Tem

@nickcammarata geoffrey hinton dropped out a few times and worked as carpenter before getting into AI

English

225

Nick@nickcammarata·10 Tem

i get a surprising amount of emails from super qualified people who feel like they couldn't possibly make a contribution in ai, and also random people with no background who are like i've decided to go head to head with openai and in this cold email im asking you to join me

English

327

10.7K

String@charcombination·10 Tem

@idontexist_nn @MidwestExpat34 GPT-5, an operator and an internet communicator. GPT-5, an operator… are you getting it?

English

String@charcombination·10 Tem

@pitdesi @TurkishAirlines Damn, should have booked a ticket early

English

Sheel Mohnot@pitdesi·9 Tem

FYI @TurkishAirlines cancelled this deal after it went viral 😭 x.com/pitdesi/status…

Sheel Mohnot@pitdesi

Wtfff this deal went viral and Turkish cancelled the deal

English

7.2K

Sheel Mohnot@pitdesi·5 Tem

Turkish airlines has a promo where you can get 1 million miles by flying to all 6 continents they fly. You can do it for about $3k (depending on where you start from) In my 20s my brother and I would have done this, for sport

English

204

223

6.9K

1.1M

String@charcombination·10 Tem

@sporadica an average physics student who reads the news is probably doing that better than the average polit sci student

English

spor@sporadica·10 Tem

idk personally i think it makes me uniquely qualified to monitor the situation with the boys but whatever at least i don’t have an MBA

ℜ𝔞𝔢@dystopiangf

It’s easy to hate women with English degrees. It takes courage to hate men with Political Science degrees

English

1.8K

String@charcombination·10 Tem

@Angaisb_ they didn‘t start at the same time and it‘s benchmarking velocity (guess)

English

122

Angel 🌼@Angaisb_·10 Tem

Why is o3 at the bottom when it should be third?

English

5.5K

String@charcombination·9 Tem

So what conclusions can you draw from this? Personally I think if GPT-5 would drop in the next 2 weeks, you'd get more upside from betting on the "best AI model end of July" market - which has less liquidity though.

English

String@charcombination·9 Tem

It’s reasonable to believe this is a well-considered trade: The account hasn't made a bigger one, they don’t appear to be a huge gambler, and they’re well-connected enough to have an edge over the market. (Though I don’t understand why they didn’t use a limit order) 3/

English

String@charcombination·9 Tem

Someone just bet $4000 that GPT-5 will release this month - in a single market order, at twice the current odds. Why? 1/

English

Keşfet

@simonw @Sauers_ @idontexist_nn @winterrose @richgel999 @adonis_singh @deedydas @Alice_comfy