oreghall

985 posts

oreghall banner
oreghall

oreghall

@oreghall

less certain than I sound

Katılım Şubat 2022
316 Takip Edilen685 Takipçiler
oreghall
oreghall@oreghall·
@simonsarris I've personally been making good progress on this, but in general there's a big difference between knowing what you "should" do and actually being able to do it.
English
0
0
0
218
oreghall
oreghall@oreghall·
@tenobrus I don't see why they couldn't both make specialized models only available through specialized software e.g. Codex, AND keep API access for a model that is more generalized but less capable at any specific task. Same as they are doing now.
English
0
0
1
58
Tenobrus
Tenobrus@tenobrus·
here's some further nearterm reasons why. the longer term reason: once they have real superhuman RSI, it would be both incredibly unsafe and an insane loss of power to hand that to others at any price. x.com/tenobrus/statu…
Tenobrus@tenobrus

- labs already have and will continue to build application layers on top of their own models, eg claude code and codex - they have strongly signaled that they're going to just keep doing this in many other verticals, likely folding in capabilities into their "everything apps" (claude cowork etc, openai's upcoming consolidated app), allowing them to do shit like... law, and bio research - enterprise users are very very happy to pay large sums to use these specific apps because they do huge amounts of very valuable work, they don't need direct API access for these, and going forward as the labs get more into harness engineering etc the direct API access will just be less useful. people pay to solve problems! - increasingly exposing the newest models in public APIs just allows companies to 1. trivially build and maintain competitors to frontier labs and 2. distill rollouts at scale - as models become significantly more capable the labs need more and more control over what people are doing with them - so if their revenue mostly comes from first party non api offerings, and exposing the APIs just leads to competition + distillation + safety concerns... they'll just stop adding their best models to the API - openai has already effectively experimented with this with periods of the -codex suffixed models only being usable through codex!

English
6
1
105
7.2K
Tenobrus
Tenobrus@tenobrus·
i'll reiterate since it's buried in this longpost: how much longer do you *really* think OpenAI and Anthropic will continue to serve their raw frontier models through publicly accessible APIs? that was always a revenue and data bootstrap. it's ending within the next two years.
Tenobrus@tenobrus

"who cares if Cursor used Kimi 2.5 as a base, starting with a commoditized pretrained model was always the right move anyway" nah, sorry, what it proves is Cursor is still fundamentally reliant on frontier labs. Kimi 2.5 is only as capable as it is because it's a distill of Opus 4.5. the only open model that ever showed it was capable of trading blows w the frontier was deep seek, and it really seems that moment has passed. the question was whether Cursor could really break the dependency chain and start building improvements based entirely on their own expertise and data. and Composer 2 shows that they *can't*, that they need the general model quality and intelligence from 4.5 to get anywhere, and that really what they're doing is laundering culpability through Chinese labs so they don't have to get their hands dirty doing distillation themselves. when Opus 5 and GPT 6 are significantly more capable along many dimensions, more RL with coding rollouts aren't going to be enough to save Composer 3, they'll either need to have caught up with whatever the frontier labs are doing internally, which right now we have pretty strong evidence they just don't have the research capacity for or... wait for another distill. and how much longer do you *really* think OpenAI and Anthropic will continue to serve their frontier models through publicly accessible APIs? that was always a revenue and data bootstrap. it's ending within the next two years.

English
54
13
687
63.4K
zeta
zeta@zeta_globin·
ever since I learned about toxoplasmosis at 12yo I never really understood why anyone would have a cat but maybe as a cultural carryover from when mouse borne plagues were a greater threat
English
97
19
2K
205.1K
Crémieux
Crémieux@cremieuxrecueil·
It is kind of annoying that seed oils are good for your health compared to alternatives like beef tallow. It's also a bother that saturated fat is so bad for you. Tallow fries and a nice burger taste good. Alas...
English
135
11
578
99.2K
Namidaka
Namidaka@Namidaka1·
@suchnerve @musaesayy This is not up to debate. x = 0.99999999.. 10x = 9.999999... 10x - x = 9 9x = 9 x = 1 same if you write it as 9/10 + 9/100 + 9/ 1000 the series converges to 1 . The equality is well defined in mathematics.
English
4
0
56
3.5K
oreghall
oreghall@oreghall·
@fchollet The same thing happens with chess, models can perform very well using highly specific formats similar to what is found in their training data, but their performance drops significantly otherwise.
English
0
0
0
124
François Chollet
François Chollet@fchollet·
Interesting finding on frontier model performance on ARC -- due to extensive direct targeting of the benchmark, models are overfitting to the original ARC encoding format. Frontier model performance remains largely tied to a familiar input distribution.
Melanie Mitchell@MelMitchell1

@mikeknoop We found that if we change the encoding from numbers to other kinds of symbols, the accuracy goes down. (Results to be published soon.) We also identified other kinds of possible shortcuts.

English
45
26
397
50.5K
oreghall
oreghall@oreghall·
@jttiehen @KelseyTuoc Something being trained specifically and only to predict the next token does not imply that it wouldn't create an internal world model or use causal reasoning. Both of those things are very useful for the task of predicting the next token.
English
1
0
0
25
Justin Tiehen
Justin Tiehen@jttiehen·
@KelseyTuoc Suppose we built machines that really were *mere* next token predictors. They don’t create world models, are incapable of genuine causal reasoning, don’t really reason, etc. They might still cause a lot of disemployment—at least I don’t have strong intuitions that they couldn’t.
English
4
0
1
521
Kelsey Piper
Kelsey Piper@KelseyTuoc·
Joseph Heath coined the term 'highbrow misinformation' for climate reporting that was technically correct, but arranged every line to give readers a worse understanding of the subject. I think that 'stochastic parrots/spicy autocomplete' is, similarly, highbrow misinformation.
Kelsey Piper tweet media
English
33
78
1K
131.1K
oreghall
oreghall@oreghall·
@Arealmfngl Both of my parents were only childs. My dad, brother, and grandma on my mom's side have all passed away and I never met my grandparents on my dad's side. My only relatives are my mom and grandpa.
English
0
0
0
25
𝘽𝙡𝙖𝙠𝙚
𝘽𝙡𝙖𝙠𝙚@Arealmfngl·
I just met someone who’s parents were both an “only child” so he has zero aunts, uncles, cousins nothing😭
English
406
766
35.4K
2.2M
Shivers
Shivers@thinkingshivers·
@justalexoki This is a common opinion. People have no idea how percentages or samples work. Here’s what they say when asked: “If you had to guess, what percentage of American adults..."
Shivers tweet media
English
9
0
52
1.4K
oreghall
oreghall@oreghall·
@Puckmeat @cozywitchcakes @wolfestar4 @YellowCoatPunk This is why you use hypochlorous acid instead of bleach. It's more potent than bleach and safe to spray anywhere including your face. Your body produces it on its own as part of your innate defense against pathogens, and it's even an ingredient in some eye drops.
English
0
0
0
62
Puck
Puck@Puckmeat·
@cozywitchcakes @wolfestar4 @YellowCoatPunk I'd be scared of accidentally spraying myself in the face. I usually mix a smaller amount in a bowl and sponge it on with a washcloth, let it dry a it, and repeat. Or saturate the washcloth and let it rest on the intended area. Then rinse off in the shower.
English
1
0
15
3.9K
Puck
Puck@Puckmeat·
My skin is clearing up after like half a year of horrible acne and the thing that seems to be causing this change is, unfortunately, regularly treating my skin with diluted bleach which MY DERMATOLOGIST TOLD ME TO DO
Puck tweet media
English
108
299
49K
1.1M
oreghall
oreghall@oreghall·
@KevubASelene @cremieuxrecueil Natural selection is not really "survival of the fittest", it's more like "survival of the fit enough". Suboptimal genes can and do proliferate, and we are making it worse with modern treatments making up for bad genetics.
English
0
0
0
17
Kevin
Kevin@KevubASelene·
@cremieuxrecueil This does not pass the sniff test. If there were no tradeoffs then why wouldn't nature have already selected it?
English
5
0
13
1.3K
Crémieux
Crémieux@cremieuxrecueil·
Lots of people have asked: If you do embryo selection, isn't there a risk you select away good traits or select for bad traits? Maybe selecting for IQ will also lead to more myopia, for example. This new paper shows that virtually all such selection is beneficial, not harmful.
Crémieux tweet media
Jonathan Anomaly@JonathanAnomaly

1/ Today we launch an ambitious paper on the ethics of embryo screening. While the technology is new, our hopes and fears about our future children are as old as the Greek myths, including stories about Hera, goddess of fertility and the namesake of our company @herasight

English
57
127
1.6K
116.6K
oreghall
oreghall@oreghall·
@UpperCayce @adrusi quote retweet views count as views and there are some popular ones, so that's probably why
English
0
0
0
14
Cayce
Cayce@UpperCayce·
@adrusi The The like:view ratio here is surprisingly low. I would've guessed 5-10x that just off of vague affinities for cats.
English
1
0
0
209
autumn
autumn@adrusi·
i am convinced that growing up with cats makes someone better at sex to pet a cat, you need to learn to understand their subtle body cues and respond intuitively learning partner dancing or certain martial arts probably also works for the same reason
English
48
18
1.3K
621.4K
oreghall
oreghall@oreghall·
@harristic_ surprised nobody is mentioning the very flat lighting, it desperately needs more contrast and saturation and that's a very easy fix
English
0
0
0
314
oreghall
oreghall@oreghall·
@imagesaicouldnt The last person alive from the 1800s was born on November 29th, 1899 and lived to 117 years old. Her name was Emma Morano.
English
0
0
2
103
Shivers
Shivers@thinkingshivers·
Added portals!
English
66
89
4K
173.4K
Taelin
Taelin@VictorTaelin·
No it hasn't, we found some counter-examples. Seems like 'x' *can* be accessed more than once even if it occurs in different branches in unexpected ways? Sorry everyone ): On the bright side, it *could* be an implementation error. In any case, I'm... tired. And sad with personal matters unrelated to work... There wasn't much progress on SupGen since the last time I reported. To recap, we built a symbolic regression tool, sort of, that is faster than all other published tools, and synthesizing functions like sort is possible, for the first time. All else is still uncertain. Composition is slow, learning is still a big mystery, many of these things depends on this branching issue being solved... I think we should just launch HVM4 / SupGen as is, and keep researching. But for now I guess I need a break ): Time with the people you love matters more than work
Taelin@VictorTaelin

This has been solved! (or so it seems?) I wanted to give a context on how amazing that is, and how many this should unlock - things that puzzled me for 10 years - but I'll just explain the solution itself, because I'm so excited about it! Turns out it is really simple, elegant, and even obvious - which is embarrassing, because I should probably have realized it years ago... but that also makes it more likely to be correct, so, I have no complaints! To arrive at it, I isolated the simplest example that, intuitively, "should fuse" but doesn't, and evaluated it manually, using an new, hypothetical "mov node", which allows a variable to be used more than once across different branches. I then asked: which interactions are missing, to get to the expected output? Turns out, we just needed two interactions: - MOV-SUP: exactly like DUP-SUP (just commutes). - MOV-MOV: just compacts two MOV's into one. And... that's it. See the image below. But... why *these* interactions? Because *MOV nodes are just unordered DUPs*! (Yes, this is something that I have tried before, and even posted here, but it didn't work back then. Turns out my *implementation* was wrong, but the idea is solid.) This key insight justifies everything: - MOV-SUP is just DUP-SUP, which is fair enough. - Since MOV is unordered, x₀ and x₁ is just x and x. - This allows MOV vars to be used more than once. - It also explains MOV-MOV: it just compacts MOVs! This new interaction, MOV-MOV, is the missing piece: it prevents fan node accumulation, allowing us to dispatch linear variables to different branches, while still letting the function "fuse", in the sense the normal form of F^N(x) has constant size, which, in turn, drops the complexity of their iterated application from O(N) to O(log2(N)) in an optimal λ-calculus evaluator such as the HVM. So, I quickly added MOV nodes to HVM4, and the result speaks for itself: applying clr() 2^65535 times is down from 7,343,093 to just 4,064 interactions; a measly 1806x speedup! Obviously, that is just a silly way to convey that we observe there is, indeed, an exponential speedup. This should unlock so many things. For many years I've attempted so many promising algorithms on HVM, only to get blocked by this very issue. Not being able to use a variable on different branches is VERY limiting. Now that blocker is gone, I have so many old ideas to revisit. Even SupGen was profoundly affected by this... Below is the full fusion of `F.F` where `F = const False`. Also thanks Lorenzo, LeeeeT and others who posted insights and partial solutions that helped me get to this.

English
7
2
186
39.1K
oreghall
oreghall@oreghall·
@bryancsk it's also impossible for regular clouds to get that tall, they would have to be volcanic
English
0
0
3
1K
Bryan Cheong
Bryan Cheong@bryancsk·
This thing is terrifying. The marine layer prevents atmospheric convection near San Francisco, so either something has boiled the ocean near SF or it has been a season of record heat and wildfires.
Freeman Jiang@freemanjiangg

more of this

English
16
20
1.1K
102.3K
oreghall
oreghall@oreghall·
@Lovandfear I think the main problem here is basing all of your appreciation for someone on what they do for you.
English
0
0
0
34
🍂
🍂@Lovandfear·
After you get married, you’re going to meet ‘better’ people than your spouse. You’re going to meet more good-looking people; kinder and more romantic people; more intelligent and funny people. You will meet people who have in abundance what your partner lacks. The mushy and romanticized idea that your partner will be everything to you, and will satisfy all your needs and wants is idolatry. Contentment in marriage is a virtue not often spoken about. You must wake up every day appreciating everything your partner is to you, everything they have, their beauty and the things that made you marry them because if you focus on everything they don’t do well, you’ll always meet better people. Protect your heart! See their best part, and always remember that your commitment to marry is more of a duty than it is of mushy feelings. You have to stay committed even on the days you feel your spouse is no longer the best fit for you… -Buchi
🍂 tweet media
English
444
6.3K
48.2K
5.5M
oreghall
oreghall@oreghall·
@giffmana @suchenzang They never claimed that doing well in ARC is indicative of AGI. The goal is to show that there are tasks that humans can do very easily that AIs cannot. As long as they can still create new test sets for which that is true, they conclude we haven't achieved AGI.
English
0
0
2
58
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
@suchenzang Yeah this i agree i have never liked arc agi. The fact we're at v2 and they are working on v3 already says it has nothing to do with agi :)
English
4
0
25
3K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
I see multiple QTs saying "train on test" But the way i understand it, I don't think he is doing anything wrong? And this does not look like the classic "oops i trained on test" to me? Arc-agi is a meta-learning benchmark, but they don't like to call it that. - On the left, he is just using the whole (meta)training set. A (meta) training example is a "dataset" comprised of a few training pairs and a test pair. This is all (meta) training data. - On the right, he is doing "test-time training" by doing gradient steps on the test (meta) examples which are mini-datasets with labeled train examples and the test example without label (you have to predict that). TTT is fine too, crucially without using test label, ofc. We can argue about realistic or not, but I'd argue the whole benchmark is not realistic in the first place. Arc agi nomenclature is really bad, which it inherited from the whole metalearning field but decided to drop the word "meta" everywhere, probably to look more novel. This is not the first time that it has led to confusion. I'm a little surprised that multiple people thought this was wrong somehow, so there's also a small chance I'm completely missing something. If so please point it out!
Mithil Vakde@evilmathkid

In technical terms, this is joint self-supervised compression. To justify this, I rely on the Minimum Description Length principle (MDL) I take MDL to its logical conclusion by compressing every possible source of information

English
24
9
236
45.6K
oreghall
oreghall@oreghall·
@katewillett Buying things like mansions and yachts is actually one of the most harmful ways to spend money. They require a ton of labor for the benefit of very few people.
English
0
0
0
8