biased estimator

2.8K posts

biased estimator banner
biased estimator

biased estimator

@selfattentive

deep learning stuff mostly

local minima (chicago) Katılım Temmuz 2023
1.7K Takip Edilen246 Takipçiler
Sabitlenmiş Tweet
biased estimator
biased estimator@selfattentive·
I cannot overstate how crowded "ai research" has gotten and how low quality the published material is becoming as a result. Nearly every week now I am seeing throw away experiments my lab looked into 1 or 2 years ago being dressed up as "research" and posted to arxiv.
English
1
0
18
2.2K
FeFe 🤺
FeFe 🤺@Fefe_no_covfefe·
@actsmaniac The socialists seem to be doing pretty well with Spain. Just shows that a representative democracy is superior to revolution in every way.
English
6
0
1
935
cephalopod
cephalopod@macrocephalopod·
Apropos of nothing in particular but if are just copy trading someone you saw online you are a bit of an idiot. And if you are copy trading them in options without knowing strikes/expiries/hedges etc then you are an even bigger idiot.
English
11
3
223
25.5K
biased estimator
biased estimator@selfattentive·
@alz_zyd_ Maybe your experience is different from mine but I think most people need to work through and overcome being stuck.
English
2
0
0
240
alz
alz@alz_zyd_·
The hard part about learning math in the past was getting stuck and not knowing what to do next. Now, math learning is no longer hard, because whenever the LLMs can unstuck you instantly whenever you get stuck
English
7
5
69
17.5K
biased estimator
biased estimator@selfattentive·
@ptuomov Then it’s “batteries are cool but the energy density of chemical propellants can’t be beat” and a few years from now they will have re discovered the cruise missile from first principles
English
0
0
1
50
biased estimator retweetledi
Mad Engineer
Mad Engineer@1llegalEngineer·
Performative Robotics is a very large industry
English
15
13
263
12.9K
baufinanciaphaster 👹
baufinanciaphaster 👹@bauhiniacapital·
@selfattentive I know exactly what it does. I have written about on Twitter for years and off-Twitter for longer. Maybe 20-25yrs. Tell me clearly how transporting oil and gas from a foreign country to the US or vice versa is currently prohibited by the Jones Act.
English
1
0
1
68
baufinanciaphaster 👹
baufinanciaphaster 👹@bauhiniacapital·
This Jones Act waiver - designed to allow US companies to do trade with Venezuela - does nothing to improve trade of oil with Venezuela. Jones Act didn't limit it before. This is garbage-y.
English
2
0
13
2.3K
biased estimator retweetledi
biased estimator
biased estimator@selfattentive·
@AlHendiify It isn’t a “blockade” at all. Anyone can trade with Cuba, America just refuses to subsidize our enemies. But it should be a blockade.
English
0
0
0
57
David AttenBruh
David AttenBruh@AlHendiify·
Mind you Cuba is literally just some island nation that has never launched a military attack at us. The US has them under a blockade because wealthy americans don't like their economy. That's literally it.
Acyn@Acyn

CNN: Breaking news. Cuba's electrical grid has suffered a complete and total collapse. This is according to the country's power operator. It's the first nationwide blackout since the US effectively shut off the flow of oil to Cuba

English
1.3K
12.7K
55.6K
1.2M
🇺🇸MichaelPhillipZ
🇺🇸MichaelPhillipZ@MikeyPhillipZ·
@ianmiles If a ship docks with Cuba it can’t dock with the US for a year and a half. Fuck you, chode
English
13
0
19
2.6K
Harry Partridge
Harry Partridge@part_harry_·
Attention residuals and mixture of expert reuse (x.com/yichen4nlp/sta…) are two independent results pointing in the same direction: a single transformer layer, looped n times, is more efficient than n independent transformer layers. As @willccbb has often remarked, the best, most enduring discoveries are when you get improved performance by making the architecture LESS complicated. It seems abundantly clear to me that a single ultra wide layer, looped n times, can be made into a strict generalisation of the current paradigm, whilst also being more elegant in its simplicity.
Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English
4
4
64
18.8K
biased estimator
biased estimator@selfattentive·
@part_harry_ @ccui42 @willccbb This has been explored extensively, there is a reason nobody uses UT models outside of toy problems. Also, once your model is big enough you need model parallelism and lose the advantages of parameter sharing.
English
0
0
1
64
biased estimator
biased estimator@selfattentive·
@mattparlmer But we can also see f18s doing strafing runs. My guess is there are some parts of the country that are safer for American planes than others.
English
0
0
17
972
mattparlmer 🪐 🌷
mattparlmer 🪐 🌷@mattparlmer·
The CSIS guy who came on Odd Lots today mentioned the talking point that we are transitioning to gravity bombs in the Iran fight We can all see the B-52s loaded with JASSMs on OSINT feeds, so govt is not being honest here, and it would be good for natsec policy ppl to flag that
English
3
6
208
10.5K
biased estimator retweetledi