Jan

771 posts

Jan

Jan

@AnInsanityCheck

CS-PhD. Inquiring non-expert. The world is too complicated to be certain.

Tham gia Haziran 2019
306 Đang theo dõi53 Người theo dõi
John Locke
John Locke@NeverSayClever·
@BenShindel @SkyeSharkie I don't think this is within the bounds of Noah's statement, although I guess you could read it that way. He's not saying that someone who would otherwise be a euthanasia candidate is invalidated by depression; he's saying not to treat depression with death. Right?
English
3
0
6
144
Ben
Ben@BenShindel·
What a bizarre interaction?
Ben tweet media
English
114
9
788
73.9K
Jan
Jan@AnInsanityCheck·
@FakePsyho True. 100% is almost impossible, and basically unachievable in Kaggle compute limits. I'd argue the previous grand prizes were also unachievable, which makes marketing with the prize pool a bit dishonest. Still a nice benchmark though.
English
0
0
0
180
Psyho
Psyho@FakePsyho·
At this point, I've played through ALL of the games there. I might do a longer write-up describing all of the flaws. Sadly, this would probably get lost in a flood of "AI is 100x worse than humans!!!111" posts from clueless influencers. While it's clear this whole thing is designed in a way that the models will artificially get as low score as possible, I also suspect that game designers / tester team were just not experienced enough to design something fair. I'd expect that the private test games have an order of magnitude more terrible ideas (considering they are expected to be harder).
English
9
11
251
10.3K
Psyho
Psyho@FakePsyho·
AI (or any human) will never get 100% in ARC-AGI-3 Let me introduce you to the worst game mechanic you can find in a puzzle game: fog of war At the start, if you go right instead of bottom, you're wasting many moves. Your score on this level literally depends on a conflip!
Psyho tweet media
English
67
23
525
70.4K
Rishi Mehta
Rishi Mehta@rishicomplex·
@FakePsyho Except the humans get to retry after they figure it out x.com/i/status/20373…
Rishi Mehta@rishicomplex

@fchollet according to your paper: "Participants were limited to a single attempt per environment and could not revisit previously completed levels. However, they were allowed to reset the current level at any time. In some cases, participants reset levels after reaching a solution in order to improve efficiency, though this typically increased total interaction time." So humans could play around with the task a bunch, and then just reset the game when they figured it out to get the optimal trajectory? Is AI allowed to do this?

English
2
0
5
674
Jan
Jan@AnInsanityCheck·
@DeryaTR_ If I understand correctly, 100% is *defined* as being as good as a human, so ofc humans score 100%
English
0
0
0
5
Derya Unutmaz, MD
Derya Unutmaz, MD@DeryaTR_·
ARC-AGI-3 is an important benchmark. However, I have a major issue with the “Human score 100%” statement. How many humans have tested all 1000 puzzles? How were people selected? This was not published for previous ARCs either. In one case, the human score was based on I think 2 people. This is really an unscientific way, as it assumes all humans are the same or that previous exposure to puzzles or video games, for example, is not considered. What education level and background did these humans have? I am sure humans will still score highly, but it would be very surprising if this was 100%. Without this data and scientific measurement, this appears a biased test that assumes solving 100% of the puzzles is purely intrinsic intelligence common to all humans.
ARC Prize@arcprize

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

English
55
22
280
39K
Jan
Jan@AnInsanityCheck·
@pfau Pretty certain it is a scoring bug due to fully AI written pipeline. It is very common in the arc community, so much so that the community has developed their own generated test-set server to easily check for answer leakage.
English
0
0
14
1.1K
Jan
Jan@AnInsanityCheck·
@loveofdoing I am 99% certain your code has a bug. For example, in sampling mode you do not compare proposed solution to correct one. Always check predictions and always check counterexamples. This is just AI slop, and everyone falling for it needs better epistemic heuristics.
English
1
0
6
743
Jan
Jan@AnInsanityCheck·
@anart1c @levelsio @ankahira Yeah. I have a friend, who was immigrated to germany when they were 5 years old, 26 years ago. Their children will have "migrant background". It is a highly misleading and harmful term imo.
English
0
0
0
58
anartic
anartic@anart1c·
@levelsio @ankahira that news are so misleading. “migrant background” can mean only one parent wasn’t born in germany does that really make you think the kid shouldn’t get welfare?
English
3
0
4
743
Dr. Kahira
Dr. Kahira@ankahira·
I have lived in Europe for more than 10 years and I don’t kid you when I say all the people I have met on welfare were citizens. Not even once have I met an immigrant on welfare. Not to say they don’t exist, they do. But I have never met one and my circle is full of immigrants.
@levelsio@levelsio

Current state of Western Europe

English
33
0
44
39.7K
Jan đã retweet
Jan
Jan@AnInsanityCheck·
@max_spero_ @pangramlabs Depending on model cost and storage cost you might consider briefly caching results for specific websites (i.e. social media). Probably makes no sense rechecking viral tweets thousands of times.
English
1
0
1
72
Jan
Jan@AnInsanityCheck·
@goblinodds That's nice and all, but excuse me for being skeptical when kristi noem lies publicly about the case and then doubles down later on: youtu.be/T-UL_rEH1WQ
YouTube video
YouTube
English
0
0
0
89
2HP goblin advisor
2HP goblin advisor@goblinodds·
as for people assuming the US officials lied, i mean, possibly, but i went to a temporary chat w chatgpt to sanity check myself
2HP goblin advisor tweet media
English
10
0
48
2.7K
Jan
Jan@AnInsanityCheck·
@arithmoquine Skill issue. Simply always think that you should follow your souls call rn.
English
0
0
0
42
Jan
Jan@AnInsanityCheck·
@IsThisA3DModel What is the chance for half life 3?
English
0
0
0
76
Jan
Jan@AnInsanityCheck·
@celestepoasts The NoPE architecture also works, it just doesn't do positional encoding at all. IIrc it does exactly what you think: Uses causal mask to infert position implicitly. Also means that these approaches will not work on e.g. Masked diffusion LLMs or diffusion models.
English
0
0
3
174
Celeste (in london dm to hang)
Celeste (in london dm to hang)@celestepoasts·
oh ok wait yeah causal masking already kind of does this like atleast the permutation symmetry stuff and absolute number of tokens if im understanding this right still cursed as fuck this works
English
2
0
107
4.7K
vipli
vipli@viplismism·
well a bit on W_o matrix and why it's so critical for transformers? btw, this was in my drafts from 3 weeks back! well without the W_o projection, multi-head attention would be basically useless! here's the intuition behind it: now, so in multi-head attention, we split our embedding into multiple heads right where head_dim = embedding_dim / num_heads and each head does its own separate attention calculation: head₁ → output₁ head₂ → output₂ ... head_n → output_n and to think about it, if we just concatenated these outputs together, the heads would never share information! think about it - if head₁ learns syntax and head₂ learns entities, they'd never combine that knowledge, so concatenated heads → no information sharing → model can't combine different patterns! it's like there are separate juices sitting next to each other in the same glass but not actually mixing and the concatenation just puts them in the same container but W_o is like the blender that combines all these different "flavor patterns" (what each head learned) into a rich mixture where they can interact and enhance each other! now as we want the model to blend these specialized representations, we need that extra step: output = W_o × concat(output₁, output₂, ..., output_n) when we apply W_o (which is a d_model × d_model matrix), it literally mixes the outputs from all heads! it's like taking different juices and blending them together! and that's where the magic kicks in: without W_o: heads are independent silos with W_o: each output position gets info from every head so now the model can actually combine syntax understanding with this entity recognition or maybe any kind of logical relations! it's like saying for transformer models, we actually want all these different "attention head juices" to blend into one rich understanding, i mean without this matrix, it's just multi head is useless!
vipli tweet media
English
7
16
227
9.2K
SDL
SDL@SocDoneLeft·
Right-wing perpetrators committed 93% of political killings since 2000. Even if we include anti-police terrorist killings by non-leftists, "the left" did just 6%. Exclude those? Just 5 killings, 1.4%. The US right is far, far more violent than the US left.
SDL tweet media
Skscartoon@skscartoon

The modern left has firmly established themselves as the ideology of violence and terrorism. They don't believe in conversation, reason, or civility; and unfortunately, I don't think they can be dealt with such.

English
426
325
1.6K
203.1K
Agrippa
Agrippa@Agrippa84837·
@OriginalGoalie @JimSTruthBTold @SocDoneLeft I don't know the other three. But the Buffalo shooter wrote in his manifesto that he identifies as a communist. The Club Q shooter identifies as non-binary. Typical white nationalists.🤡
English
2
0
4
205
Jan
Jan@AnInsanityCheck·
@littmath @davidbessis @dean_weez Wait, I don't understand. Afaict, the expected ratio should be 1 (same number of boys and girls in expectation), but the expectation of the ratio is bigger than 1:1, is that what this is about?
English
0
0
0
112
Daniel Litt
Daniel Litt@littmath·
No. It depends on the formalism. If the population consists of N families who have kids until they have a girl and then stop, the expected ratio will depend on N but be greater than 1/2. It will tend to 1/2 as N-> \infty. If you model things differently (e.g. kids have more kids) you get a different answer.
English
5
0
7
1.1K
David Bessis
David Bessis@davidbessis·
Call me dumb, but I don't even understand how this could be a problem about conditional probabilities. If the chances of having a girl are 50%, and if all draws are independent, then—come on!—what else could happen than 50/50 long term trend? Am I this bad at probabilities?
Lyman Stone 石來民 🦬🦬🦬@lymanstoneky

reasonably confident this is a yale demog prof publicly getting a fairly common quant demo problem wrong very publicly i could be wrong, but the probability nerds in the comments and the people like me doing simulations all seem to agree that it's 50/50

English
26
1
105
26.7K
Jan
Jan@AnInsanityCheck·
@lymanstoneky It's just an interpretation difference. The number of boys and girls will be the same in expectation, but the expected ratio can behave weirdly. I.e. 1/1, 1/2, 1/3 and so on. Calculating the expectation over these ratios gives a different result (as it is a different question)
English
0
0
1
75
Jan
Jan@AnInsanityCheck·
@willdepue This is apparently calculated including benefits, which is why more children increases marginal tax rate, as people lose these benefits at some income level. Dishonest/Misleading way of displaying it.
English
1
0
3
496