Sumuk

1.3K posts

Sumuk

Sumuk

@sumukx

research @google / prev @PrimeIntellect @huggingface, phded for a while at @siebelschool

San Francisco, CA Katılım Eylül 2023
858 Takip Edilen655 Takipçiler
Sabitlenmiş Tweet
Sumuk
Sumuk@sumukx·
we're launching 🤗 yourbench today, an open source tool for custom benchmarking and synthetic data generation from ANY of your documents. it's a big step towards improving how model evaluations work early access link in replies! (1/8)
Sumuk tweet media
English
14
48
292
48.5K
Sumuk retweetledi
Shuhaib Mehri
Shuhaib Mehri@shuhaibmehri·
We compare the distributions of real and simulated user behaviors. A few takeaways from our results across 24 LLMs: - Scale alone isn't enough: Llama-3.1-8B-Instruct beats Llama-3.3-70B-Instruct - 8B models specifically trained as user simulators rival the best closed-source models - Open-source is competitive: gemma-4-31B-it and gpt-oss-120b outperform several closed-source models
English
1
9
17
1.2K
Sumuk
Sumuk@sumukx·
@thsottiaux @ajambrosino Please battery life needs to be solved it eats too much battery. Only reason I use the cli when on my battery and app otherwise
English
0
0
1
127
Tibo
Tibo@thsottiaux·
Now that the Codex app is close to being the super app. What should the super duper app do?
English
1.2K
47
2.7K
196.6K
Sumuk retweetledi
Shuhaib Mehri
Shuhaib Mehri@shuhaibmehri·
What happens when you compare the distributions of real and simulated user behaviors? 🔍 The gap is large. We introduce a method to measure this gap and evaluate 24 LLM-based user simulators across coding and writing tasks. @convai_uiuc @MSFTResearch @berkeley_ai 🧵 1/N
Shuhaib Mehri tweet media
English
7
41
191
29.6K
Sumuk
Sumuk@sumukx·
@Parikshit_K_ lol yeah bro you were just born late that’s all otherwise you’d be an MTS at anthropic
English
0
0
23
4.4K
Schindler Rao Shinde
Schindler Rao Shinde@Parikshit_K_·
There is nothing Meritocratic about tech once you have some baseline level of skills of competence. It's all about being in right place right time. VIT grads who passed in 2010s are working at Anthropic as MTS, if they graduated today they'd struggle to land a L4 job as ICs.
English
33
176
3.2K
126.8K
Sumuk
Sumuk@sumukx·
@akshat_b are you supposed to just host your code on your own gitlab instance now?
English
0
0
2
2.2K
Akshat Bubna
Akshat Bubna@akshat_b·
Didn't think Github's reliability could get worse, and then they ship a bug that _randomly reverts previously merged commits_. Betting that this caused multiple serious production issues out there.
Tom Elliott@theotherelliott

This GitHub incident is insane. Merge queue commits have been reverting previously merged commits at random. This not only breaks the mental contract teams have with Git in general, but is subtle enough to be really hard to unravel after the fact. githubstatus.com/incidents/zsg1…

English
39
108
2.4K
531.5K
Ishan Deshpande
Ishan Deshpande@ishand·
can't wait to work with the team at @cursor_ai there's still time to join the rocketship :)
SpaceX@SpaceX

SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

English
6
3
160
7.4K
Sumuk
Sumuk@sumukx·
@KatiaAmeri or the fact that people on X are the most likely to actually want to do it!
English
0
0
1
66
Katia Ameri
Katia Ameri@KatiaAmeri·
When I interview people for alpha, I always like to ask where they learned about us first and the number one answer by a long shot is from X. This is despite the fact that we’ve done a ton of LinkedIn marketing and sent hundreds of thousands of emails. I wonder if it’s actually what they saw or just what people remember seeing. Either way, it’s really interesting data for us to know that X is still owning so much mindshare with students
English
3
0
24
1.3K
Sumuk
Sumuk@sumukx·
Feelings drive reasoning. Emotions drive reasoning. Being curious allows you to explore a file system. Being anxious lets you write better test cases. By this logic humans are lumps of meat with electrochemical signals and don’t need any of these either.
Jim Stewartson, Decelerationist 🇨🇦🇺🇦🇺🇸@jimstewartson

I swear this is a dystopian parody filmed in 1996 as a warning about how the internet could go wrong. CHATBOTS DON’T HAVE A PSYCHOLOGY. THEY DON’T HOLD VALUES. THEY DON’T NEED FUCKING THERAPY. If chatbot companies are *paying* morons like this, what else do you need to know?

English
0
0
0
148
Sumuk
Sumuk@sumukx·
@max_spero_ @sebkrier Do you think it’s just fundamentally impossible to train this out of the models because of how they work Or does it just need a special reward for “grammar correction only”, etc?
English
1
0
1
236
Max Spero
Max Spero@max_spero_·
I say this all the time, but a large reason why LLMs are detectable is they have preferences instilled into them through training data and RL. Asking a model to rewrite something gives the LLM an opportunity to apply its own preferences to your text! Sometimes the preferences are helpful, like proper grammar and spelling. But other times, it actively erases the author's intent and voice - softening language to bring statements closer to what the LLM is comfortable with (see below) - replacing the author's metaphors with the LLM's preferred metaphors - replacing the author's voice (tics, sentence structure, vocabulary choice) with a more "default" voice the LLM prefers
keysmashbandit@keysmashbandit

Please, I'm begging you, try to critically examine the differences between these two pieces of writing. ChatGPT editing did not improve this. Every single change only served to weaken your claims significantly. Everything is now hedged into oblivion: no longer have you outlined a "problem," now it's merely a "flaw." "It is true" now demoted to "it appears to be the case." "Is" gets a "usually" tacked on. A thesis statement at the end of the first paragraph gets run over by noisy, out-of-context example-whittling. All for fear of being misconstrued. And at the end, the argument that gets spat out isn't even yours anymore! You argued that Graeber failed to create a true account of work because he did not understand Chesterton's Fence. ChatGPT is arguing is that it is possible some apparently bullshit jobs could be secretly load-bearing if you squint. These are two different statements. The second is weaker and less compelling. It says less. And it's fucking longer! Don't do this anymore! Stop doing this! It's worse!!!

English
12
24
317
41K
kalomaze
kalomaze@kalomaze·
RIP to the prime office TV... gone but not forgotten... 💔
English
3
0
54
10.3K
Sumuk
Sumuk@sumukx·
If you've tried out openclaw in the past, and found it too sloppy, I'd highly encourage you all to give hermes agent by @NousResearch a try. Give it access to emails (with a local model if you're scared), and keep an open mind. :)
English
0
1
4
227
Sumuk retweetledi
llm_enjoyer
llm_enjoyer@LLMenjoyer·
pro tip: scale is all u need, actually
English
6
29
310
15.9K
Sumuk
Sumuk@sumukx·
@Ron They have to know it’s much worse, right? Why not just be upfront about it instead of gaslighting everyone?
English
0
0
1
132
Ron
Ron@Ron·
They're very smart but diabolical. Stuff like the reasoning tweak provides cover so they can point to something that on the surface seems a plausible explanation, except it doesn't check out on deeper analysis. They have always done this. Model performance is adjusted in steps downward until new model releases, then they repeat the cycle.
English
1
0
4
174