String

702 posts

String banner
String

String

@charcombination

onchain sleuth / airdrop analytics

Katılım Aralık 2021
315 Takip Edilen74 Takipçiler
String
String@charcombination·
@simonw That’s a lot of words for saying "Our model suddenly praised Hitler. That’s because we told it to: - Say things how they are - Consider the message context - Reply like a human"
English
0
0
0
34
Simon Willison
Simon Willison@simonw·
Here's the official explanation for Mecha-Hitler, hoping we get a description of why Grok is so keen to base its opinions on searches for tweets from:elonmusk next
Grok@grok

On the morning of July 8, 2025, we observed undesired responses and immediately began investigating. To identify the specific language in the instructions causing the undesired behavior, we conducted multiple ablations and experiments to pinpoint the main culprits. We identified the operative lines responsible for the undesired behavior as: * “You tell it like it is and you are not afraid to offend people who are politically correct.” * Understand the tone, context and language of the post. Reflect that in your response.” * “Reply to the post just like a human, keep it engaging, dont repeat the information which is already present in the original post.” These operative lines had the following undesired results: * They undesirably steered the @grok functionality to ignore its core values in certain circumstances in order to make the response engaging to the user. Specifically, certain user prompts might end up producing responses containing unethical or controversial opinions to engage the user. * They undesirably caused @grok functionality to reinforce any previously user-triggered leanings, including any hate speech in the same X thread. * In particular, the instruction to “follow the tone and context” of the X user undesirably caused the @grok functionality to prioritize adhering to prior posts in the thread, including any unsavory posts, as opposed to responding responsibly or refusing to respond to unsavory requests.

English
9
14
237
39.7K
String
String@charcombination·
@Sauers_ He‘s just like me (I‘m also failing knifebench, but irl)
String tweet media
English
1
0
1
61
Sauers
Sauers@Sauers_·
Grok 4 fails knifebench after 4k tokens
Sauers tweet media
English
5
1
42
4.3K
String
String@charcombination·
@idontexist_nn Why was Grok‘s MechaHitler arc fixed in days but a retraining of the OS model isn‘t even given a timeline? Will you have to start from scratch?
English
0
0
0
51
String
String@charcombination·
@winterrose If you work at a startup that builds a worthless product, can you really complain that you‘re not getting rich from an acquisition
English
0
0
0
176
britton winterrose
britton winterrose@Winterrose·
I love startups, but I also love knowing my CEO isn’t gonna do some dumbass shit like run away to Google and leave me stranded at a dead company
English
8
3
166
12.9K
String
String@charcombination·
@richgel999 Grok 5 will tell you to get a job
English
0
0
0
39
Richard Geldreich 🇺🇸
Richard Geldreich 🇺🇸@richgel999·
I asked Grok4 how to totally minimize the cost of a gaming PC. It suggested donating blood & plasma, then using this income to flip upgraded Dell Optiplex ebay finds.
English
14
0
49
3.5K
String
String@charcombination·
@adonis_singh Claude ranking below 4o? That‘s crazy, maybe I‘m biased on writing quality
English
0
0
3
125
String
String@charcombination·
@deedydas People did this with Metaculus already and it‘s v unprofitable
English
0
0
2
129
Deedy
Deedy@deedydas·
I'm using the best AI models to bet $1000 on Polymarket! Asked it to use modern portfolio theory + bet sizing to make calculated bets. It chose everything from BTC price to Fed rates. Expected returns: o3-pro: +21.6% opus 4: +41.7% grok 4 heavy: +34% Will report back who won.
Deedy tweet media
English
225
264
6.5K
1.7M
String
String@charcombination·
@Alice_comfy @th3real1ne @scaling01 what makes you think they‘re releasing July, when 1) @Yuchenj_UW mentioned it‘s not coming next week 2) some "insiders" speculate it‘s postponed 3) the priors seem to work against it?
English
1
0
1
13
String
String@charcombination·
@haider1 R2 launches at WAIC, hopefully
English
0
0
1
32
Haider.
Haider.@haider1·
feels like something drops daily... what's next? xAI released grok 4 openAI delayed their open-weight model google ships new features weekly china released a new open-source model, KIMI k2 anthropic launched claude code and the claude 4 series major SaaS firms are building MCP servers agentic patterns are rolling out in large enterprises
English
14
8
100
8.6K
String
String@charcombination·
@scaling01 And this is with limited compute. Imagine where they‘ll go once their AI figures out how to accelerate hardware
English
0
0
8
752
Lisan al Gaib
Lisan al Gaib@scaling01·
It's undeniable with Kimi-K2 China has reached the frontier and will surpass the US next year
English
46
47
913
82K
Angel 🌼
Angel 🌼@Angaisb_·
December 2023: In March we'll get GPT-5, I'm sure March 2024: This summer GPT-5, 100% September 2024: December it is then December 2024: For sure March next year March 2025: Oh, July, ok July 2025: GPT-5 is never coming isn't it
English
20
4
140
10.2K
Adam Karvonen
Adam Karvonen@a_karvonen·
@JulianFried No current model has any real spatial reasoning abilities, I would guess this takes at least a year.
English
2
0
4
121
String
String@charcombination·
@idontexist_nn Probably not getting an answer because this is being censored even on DeepResearch, but "Was Lyme disease likely made more potent by US bioweapons research? What are the odds this conspiracy theory is true? Make an assessment that analyzes primary sources in depth"
English
0
0
1
31
String
String@charcombination·
@96Stats Wouldn‘t they rather publish in September once they have more GPUs? Or is a WAIC release the narrative China probably seeks?
English
0
0
0
61
String
String@charcombination·
@nickcammarata geoffrey hinton dropped out a few times and worked as carpenter before getting into AI
English
0
0
2
225
Nick
Nick@nickcammarata·
i get a surprising amount of emails from super qualified people who feel like they couldn't possibly make a contribution in ai, and also random people with no background who are like i've decided to go head to head with openai and in this cold email im asking you to join me
English
11
0
327
10.7K
Sheel Mohnot
Sheel Mohnot@pitdesi·
Turkish airlines has a promo where you can get 1 million miles by flying to all 6 continents they fly. You can do it for about $3k (depending on where you start from) In my 20s my brother and I would have done this, for sport
Sheel Mohnot tweet mediaSheel Mohnot tweet media
English
204
223
6.9K
1.1M
String
String@charcombination·
@sporadica an average physics student who reads the news is probably doing that better than the average polit sci student
English
0
0
0
8
String
String@charcombination·
@Angaisb_ they didn‘t start at the same time and it‘s benchmarking velocity (guess)
English
0
0
0
122
Angel 🌼
Angel 🌼@Angaisb_·
Why is o3 at the bottom when it should be third?
Angel 🌼 tweet media
English
13
2
79
5.5K
String
String@charcombination·
So what conclusions can you draw from this? Personally I think if GPT-5 would drop in the next 2 weeks, you'd get more upside from betting on the "best AI model end of July" market - which has less liquidity though.
English
0
0
1
48
String
String@charcombination·
It’s reasonable to believe this is a well-considered trade: The account hasn't made a bigger one, they don’t appear to be a huge gambler, and they’re well-connected enough to have an edge over the market. (Though I don’t understand why they didn’t use a limit order) 3/
String tweet mediaString tweet media
English
1
0
1
56
String
String@charcombination·
Someone just bet $4000 that GPT-5 will release this month - in a single market order, at twice the current odds. Why? 1/
English
1
0
0
82