Jan

771 posts

Jan

@AnInsanityCheck

CS-PhD. Inquiring non-expert. The world is too complicated to be certain.

Tham gia Haziran 2019

306 Đang theo dõi53 Người theo dõi

Jan@AnInsanityCheck·1d

@NeverSayClever @BenShindel @SkyeSharkie You are not eligible for MAID in canada if you have only psychological conditions, it *requires by law* a physical condition.

English

John Locke@NeverSayClever·1d

@BenShindel @SkyeSharkie I don't think this is within the bounds of Noah's statement, although I guess you could read it that way. He's not saying that someone who would otherwise be a euthanasia candidate is invalidated by depression; he's saying not to treat depression with death. Right?

English

144

Ben@BenShindel·1d

What a bizarre interaction?

English

114

788

73.9K

Jan@AnInsanityCheck·3d

@FakePsyho True. 100% is almost impossible, and basically unachievable in Kaggle compute limits. I'd argue the previous grand prizes were also unachievable, which makes marketing with the prize pool a bit dishonest. Still a nice benchmark though.

English

180

Psyho@FakePsyho·4d

At this point, I've played through ALL of the games there. I might do a longer write-up describing all of the flaws. Sadly, this would probably get lost in a flood of "AI is 100x worse than humans!!!111" posts from clueless influencers. While it's clear this whole thing is designed in a way that the models will artificially get as low score as possible, I also suspect that game designers / tester team were just not experienced enough to design something fair. I'd expect that the private test games have an order of magnitude more terrible ideas (considering they are expected to be harder).

English

251

10.3K

Psyho@FakePsyho·4d

AI (or any human) will never get 100% in ARC-AGI-3 Let me introduce you to the worst game mechanic you can find in a puzzle game: fog of war At the start, if you go right instead of bottom, you're wasting many moves. Your score on this level literally depends on a conflip!

English

525

70.4K

Jan@AnInsanityCheck·3d

@rishicomplex @FakePsyho AI gets to do that too

English

Rishi Mehta@rishicomplex·3d

@FakePsyho Except the humans get to retry after they figure it out x.com/i/status/20373…

Rishi Mehta@rishicomplex

@fchollet according to your paper: "Participants were limited to a single attempt per environment and could not revisit previously completed levels. However, they were allowed to reset the current level at any time. In some cases, participants reset levels after reaching a solution in order to improve efficiency, though this typically increased total interaction time." So humans could play around with the task a bunch, and then just reset the game when they figured it out to get the optimal trajectory? Is AI allowed to do this?

English

674

Jan@AnInsanityCheck·4d

@DeryaTR_ If I understand correctly, 100% is *defined* as being as good as a human, so ofc humans score 100%

English

Derya Unutmaz, MD@DeryaTR_·5d

ARC-AGI-3 is an important benchmark. However, I have a major issue with the “Human score 100%” statement. How many humans have tested all 1000 puzzles? How were people selected? This was not published for previous ARCs either. In one case, the human score was based on I think 2 people. This is really an unscientific way, as it assumes all humans are the same or that previous exposure to puzzles or video games, for example, is not considered. What education level and background did these humans have? I am sure humans will still score highly, but it would be very surprising if this was 100%. Without this data and scientific measurement, this appears a biased test that assumes solving 100% of the puzzles is purely intrinsic intelligence common to all humans.

ARC Prize@arcprize

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

English

280

39K

Jan@AnInsanityCheck·5d

@pfau Pretty certain it is a scoring bug due to fully AI written pipeline. It is very common in the arc community, so much so that the community has developed their own generated test-set server to easily check for answer leakage.

English

1.1K

David Pfau@pfau·5d

Wtf.

loveofdoing@loveofdoing

316 ARC-AGI tasks solved with zero learning. No neural net, no training, no DSL — just 19th-century projective geometry. Encode grid cell relationships as Plücker lines in P³, find transversals via Schubert calculus, score candidates by geometric incidence. 95% solve rate on the eval set (of non-timeout tasks). Single C file, runs in seconds.

QST

269

77K

Jan@AnInsanityCheck·5d

@loveofdoing I am 99% certain your code has a bug. For example, in sampling mode you do not compare proposed solution to correct one. Always check predictions and always check counterexamples. This is just AI slop, and everyone falling for it needs better epistemic heuristics.

English

743

loveofdoing@loveofdoing·5d

Beff (e/acc)@beffjezos

The masculine urge to try to hack a new solution to ARC-AGI benchmarks

English

961

187.7K

Jan@AnInsanityCheck·14 Şub

@anart1c @levelsio @ankahira Yeah. I have a friend, who was immigrated to germany when they were 5 years old, 26 years ago. Their children will have "migrant background". It is a highly misleading and harmful term imo.

English

anartic@anart1c·14 Şub

@levelsio @ankahira that news are so misleading. “migrant background” can mean only one parent wasn’t born in germany does that really make you think the kid shouldn’t get welfare?

English

743

Dr. Kahira@ankahira·14 Şub

I have lived in Europe for more than 10 years and I don’t kid you when I say all the people I have met on welfare were citizens. Not even once have I met an immigrant on welfare. Not to say they don’t exist, they do. But I have never met one and my circle is full of immigrants.

@levelsio@levelsio

Current state of Western Europe

English

39.7K

Jan đã retweet

Marvin Baumann@MarvinTBaumann·20 Oca

This is the real Europe.

Ursula von der Leyen@vonderleyen

Good discussion with CEOs today in Davos. Europe has all the assets it needs to attract investment. From capital to innovation. We’re working on mobilising them fully. Soon, we’ll put forward a single set of rules for companies to scale and grow across Europe - EU Inc. Because we need European champions for a global market. We are also working on a large-scale, deep and liquid capital market that attracts a wide range of investors. An interconnected and affordable energy market. And we’re shifting our focus to the high value sectors - from AI to defence. My message is: Europe is the right place to invest.

English

171

1.9K

93.3K

Jan@AnInsanityCheck·19 Oca

@max_spero_ @pangramlabs Depending on model cost and storage cost you might consider briefly caching results for specific websites (i.e. social media). Probably makes no sense rechecking viral tweets thousands of times.

English

Max Spero@max_spero_·19 Oca

Roadmap for @pangramlabs Chrome Extension 1) Train pangram-small 2) Proactive AI checks on every surface of the internet. AdBlock for AI-generated content

@jason@Jason

X Product request: a toggle to turn off all AI generated content. Thank you for your attention to this matter!

English

168

13.8K

Jan@AnInsanityCheck·18 Oca

@pangramlabs @tenobrus @iruletheworldmo Ohno, the AI checker is showing signs of malicious misalignment!

English

Pangram Labs@pangramlabs·18 Oca

@tenobrus @iruletheworldmo We believe that this document is fully human-written Disclaimer: For text under 75 words, results may be less accurate. pangram.com/history/5b745a…

Pangram Labs@pangramlabs

@tenobrus @iruletheworldmo We are confident that this document is primarily human-written, with some AI-generated content detected pangram.com/history/4cecf1…

English

452

🍓🍓🍓@iruletheworldmo·17 Oca

x.com/i/article/2012…

ZXX

201

29.3K

Jan@AnInsanityCheck·15 Oca

@goblinodds That's nice and all, but excuse me for being skeptical when kristi noem lies publicly about the case and then doubles down later on: youtu.be/T-UL_rEH1WQ

YouTube

English

2HP goblin advisor@goblinodds·15 Oca

as for people assuming the US officials lied, i mean, possibly, but i went to a temporary chat w chatgpt to sanity check myself

English

2.7K

2HP goblin advisor@goblinodds·15 Oca

i sure hope everyone who was insisting that Good's car completely missed the officer right up until this news broke is updating their beliefs rn

CBS News@CBSNews

BREAKING: The ICE agent who fatally shot Renee Good on Jan. 7 in Minneapolis, Jonathan Ross, suffered internal bleeding to the torso following the incident, according to two U.S. officials briefed on his medical condition.

English

426

24.8K

Jan@AnInsanityCheck·13 Oca

@arithmoquine Skill issue. Simply always think that you should follow your souls call rn.

English

henry@arithmoquine·13 Oca

3 years later and everyone i know who's tried this to any extent has discovered entirely new ways of being fucked up collapse the bountiful multiplicity of the human soul with one easy trick

shaurya@shauseth

you have access to your optimal policy at all times. you just choose not to follow it. you can literally access it by asking “what should i be doing rn?”

English

1.7K

176.4K

Jan@AnInsanityCheck·13 Oca

@IsThisA3DModel What is the chance for half life 3?

English

Is this a 3D model?@IsThisA3DModel·12 Oca

there is absolutely zero chance of this

Guillermo Rauch@rauchg

There's definitely a non-zero chance we get "generate your own GTA6 in a few minutes" before GTA6

English

8.9K

178.2K

Jan@AnInsanityCheck·12 Oca

@celestepoasts The NoPE architecture also works, it just doesn't do positional encoding at all. IIrc it does exactly what you think: Uses causal mask to infert position implicitly. Also means that these approaches will not work on e.g. Masked diffusion LLMs or diffusion models.

English

174

Celeste (in london dm to hang)@celestepoasts·12 Oca

oh ok wait yeah causal masking already kind of does this like atleast the permutation symmetry stuff and absolute number of tokens if im understanding this right still cursed as fuck this works

English

107

4.7K

Celeste (in london dm to hang)@celestepoasts·12 Oca

ml is fundamentally unserious man what do you mean this works

hardmaru@hardmaru

One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization. We found that if you simply delete them after pretraining and recalibrate for < 1% of the original budget, you unlock massive context windows.

English

1.4K

110.2K

Jan@AnInsanityCheck·9 Oca

@viplismism @pangramlabs ai?

vipli@viplismism·9 Oca

well a bit on W_o matrix and why it's so critical for transformers? btw, this was in my drafts from 3 weeks back! well without the W_o projection, multi-head attention would be basically useless! here's the intuition behind it: now, so in multi-head attention, we split our embedding into multiple heads right where head_dim = embedding_dim / num_heads and each head does its own separate attention calculation: head₁ → output₁ head₂ → output₂ ... head_n → output_n and to think about it, if we just concatenated these outputs together, the heads would never share information! think about it - if head₁ learns syntax and head₂ learns entities, they'd never combine that knowledge, so concatenated heads → no information sharing → model can't combine different patterns! it's like there are separate juices sitting next to each other in the same glass but not actually mixing and the concatenation just puts them in the same container but W_o is like the blender that combines all these different "flavor patterns" (what each head learned) into a rich mixture where they can interact and enhance each other! now as we want the model to blend these specialized representations, we need that extra step: output = W_o × concat(output₁, output₂, ..., output_n) when we apply W_o (which is a d_model × d_model matrix), it literally mixes the outputs from all heads! it's like taking different juices and blending them together! and that's where the magic kicks in: without W_o: heads are independent silos with W_o: each output position gets info from every head so now the model can actually combine syntax understanding with this entity recognition or maybe any kind of logical relations! it's like saying for transformer models, we actually want all these different "attention head juices" to blend into one rich understanding, i mean without this matrix, it's just multi head is useless!

English

227

9.2K

Jan@AnInsanityCheck·20 Ara

@Agrippa84837 @OriginalGoalie @JimSTruthBTold @SocDoneLeft You claimed that the shooter was not a white nationalist. His manifesto shows otherwise. You ignore it. Why would they continue arguing as if you were a good faith interlocutor?

English

Agrippa@Agrippa84837·20 Ara

@OriginalGoalie @AnInsanityCheck @JimSTruthBTold @SocDoneLeft Now you're threatening me with violence? I can't say I'm surprised. That always happens when you leftists run out of arguments.

English

SDL@SocDoneLeft·18 Ara

Right-wing perpetrators committed 93% of political killings since 2000. Even if we include anti-police terrorist killings by non-leftists, "the left" did just 6%. Exclude those? Just 5 killings, 1.4%. The US right is far, far more violent than the US left.

Skscartoon@skscartoon

The modern left has firmly established themselves as the ideology of violence and terrorism. They don't believe in conversation, reason, or civility; and unfortunately, I don't think they can be dealt with such.

English

426

325

1.6K

203.1K

Jan@AnInsanityCheck·20 Ara

@Agrippa84837 @OriginalGoalie @JimSTruthBTold @SocDoneLeft You seriously want to tell me the guy who wrote this is left wing? Read the first page hoplofobia.info/wp-content/upl… Literally the first sentence: "If there’s one thing I want you to get from these writings, it’s that White birth rates must change." Absolute joke. Brainworms

English

Agrippa@Agrippa84837·19 Ara

@OriginalGoalie @JimSTruthBTold @SocDoneLeft I don't know the other three. But the Buffalo shooter wrote in his manifesto that he identifies as a communist. The Club Q shooter identifies as non-binary. Typical white nationalists.🤡

English

205

Jan@AnInsanityCheck·3 Ara

@littmath @davidbessis @dean_weez Wait, I don't understand. Afaict, the expected ratio should be 1 (same number of boys and girls in expectation), but the expectation of the ratio is bigger than 1:1, is that what this is about?

English

112

Daniel Litt@littmath·3 Ara

No. It depends on the formalism. If the population consists of N families who have kids until they have a girl and then stop, the expected ratio will depend on N but be greater than 1/2. It will tend to 1/2 as N-> \infty. If you model things differently (e.g. kids have more kids) you get a different answer.

English

1.1K

David Bessis@davidbessis·2 Ara

Call me dumb, but I don't even understand how this could be a problem about conditional probabilities. If the chances of having a girl are 50%, and if all draws are independent, then—come on!—what else could happen than 50/50 long term trend? Am I this bad at probabilities?

Lyman Stone 石來民 🦬🦬🦬@lymanstoneky

reasonably confident this is a yale demog prof publicly getting a fairly common quant demo problem wrong very publicly i could be wrong, but the probability nerds in the comments and the people like me doing simulations all seem to agree that it's 50/50

English

105

26.7K

Jan@AnInsanityCheck·3 Ara

@lymanstoneky It's just an interpretation difference. The number of boys and girls will be the same in expectation, but the expected ratio can behave weirdly. I.e. 1/1, 1/2, 1/3 and so on. Calculating the expectation over these ratios gives a different result (as it is a different question)

English

Lyman Stone 石來民 🦬🦬🦬@lymanstoneky·2 Ara

Emma Zang@DrEmmaZang

Interesting problem. It looks obvious until you actually think about it. Many people gave the wrong answers because they assumed: E[G/(G+B)] = E[G] / (E[G] + E[B]), which is not true. Ratios are sneaky. The denominator moves, and suddenly all our nice intuition falls apart. The quick intuition: once family sizes vary, each family contributes differently to the population ratio, so you can’t just take expectations on the top and bottom and call it a day. And if you try to “just simulate it,” the result depends on how many families you draw. With small N you might get something that looks close to 0.5, but it’s always a bit above 0.5. Simulation will happily mislead you if you let it. If I ever teach PhD formal demography, I’m definitely putting this in the problem set. It’s too good not to. 😄

English

182

60.4K

Jan@AnInsanityCheck·2 Kas

@willdepue This is apparently calculated including benefits, which is why more children increases marginal tax rate, as people lose these benefits at some income level. Dishonest/Misleading way of displaying it.

English

496

will depue@willdepue·1 Kas

lmao what is this tax structure

Armin Ronacher ⇌@mitsuhiko

One of the most bizarre things in the UK is that it has a maximum wage for high income families. You have to push through this *hard* if you want to earn more. I don’t understand who designed that.

English

384

62.9K

Khám phá

@NeverSayClever @BenShindel @SkyeSharkie @FakePsyho @rishicomplex @DeryaTR_ @pfau @loveofdoing