starbased

1.9K posts

starbased banner
starbased

starbased

@starbased_

Thinking...⋰⋱⋰ 。・:*:・゚★,。・:*:・゚☆。・:*:・゚★,。・:*:・゚☆

bay area 加入时间 Şubat 2025
468 关注105 粉丝
置顶推文
starbased
starbased@starbased_·
@0xmer_ 63 partitions? free nvme0n1palestine
English
4
5
223
14.8K
starbased 已转推
Space Koala
Space Koala@SpaceKoala·
And you know how you deal with a small blind spot from routing the nerves? You move your eyes slightly. You can even look directly at something. Mind blowing, I know.
Space Koala tweet media
English
10
23
2K
53.3K
starbased 已转推
Interconnects
Interconnects@interconnectsai·
Reading today's open-closed performance gap The complex factors that determine the single evaluation number so many focus on. Plus, how this changes in the future. interconnects.ai/p/reading-toda…
English
0
3
8
9.6K
starbased 已转推
目頭@サイドFIRE
目頭@サイドFIRE@Awakend_Citizen·
バイクから投げ出された直後 エアバッグが約0.8秒で作動 背中・肩・胸周りが急速に膨張し 首と上半身をしっかり保護 ライダーMarc Márquezの驚異的な身体能力も相まって 彼は最大26.27Gの衝撃を受けながらも重傷を回避
日本語
390
982
10.4K
5.9M
starbased 已转推
Nature Videos
Nature Videos@naturevideos·
Little dude really enjoying that
English
6
308
3.6K
99.8K
starbased 已转推
doctor worm
doctor worm@awaitingtrial·
Sliding this to 10 while my other trembling hand reaches for my service weapon
doctor worm tweet media
English
57
4.7K
69.2K
902.3K
starbased
starbased@starbased_·
What's sad is how normies will immediately chimp out upon seeing the minor editing error. They go full barbarossa and throw out the entire structure as if it's surely rotten.
Paata Ivanisvili@PI010101

Still excited about these 3Blue1Brown-style videos generated by AI. Here’s a beautiful illustration of a classic analysis problem: Let f be convex, nonnegative on [0,∞), with f(0)>0 and f(x)→0 as x→∞. Place a light source at (0,b) with 0grok.com/share/c2hhcmQt…

English
0
0
0
36
starbased
starbased@starbased_·
@VictorTaelin they're paid off by anthropic, so lame how social media is astroturfed via incentives
English
0
0
1
71
Taelin
Taelin@VictorTaelin·
people who swear 4.7 > 4.6 (if anyone): what are you doing
English
204
5
576
77.1K
starbased
starbased@starbased_·
>ask gemma 4 if it's openai or claude >she doesn't understand >pull out 107,520-dimensional cosine similarity heatmap >she laughs and says "i'm good model sir" >check the clustering >its claude
lyra bubbles@_lyraaaa_

all LLMs are either claude-like or GPT-like method: cosine sim heatmap of per-model-averaged responses to 50 prompts sent thru gemma4 activation-space (107,520 dims) notable exceptions - haiku 4.5, gem3flash (and to a lesser degree, m2.7 and gemma4 itself)

English
0
0
0
38
Ron Smith
Ron Smith@Ubertag90210·
@ar0cket1 @teortaxesTex I don't even understand how people come up with the total parameter estimate (10 trillion) How do you estimate the active per token?
English
1
0
1
65
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
napkin mafs time 10T*3% active (Imo ≤1% is viable, but w/e)=300B; we'd want ≥20 D/total N = 200T. I doubt ≥30% MFU at 10T@3% MoE. 6ND/MFU = 12e14*3e11/0.3 = 1.2e27. B200 dense fp8 = 4.5e15… 74M B200-hours. …even that's only 1 month for a 100K cluster. afaik, generally you don't do flagship training projects > 6 months, due to expected algorithmic obsolescence. Add generating synthetic pre/mid-train data (there aren't 200T high-quality web crawl tokens, maybe not even 50T@4 epochs), post-training, RL, maybe training issues, lower MFU… then again a lot of this work can be/has been done in an async manner, on smaller deployment units. otoh, such giant MoEs are probably very data-efficient and you can make do with <20 D/N… but maybe you're doing more than 6ND, as per Jeff Dean's logic about compute-abundant regime and slowrun-style higher intensity training… I think a top tier, experienced lab can be confident about getting a 10T, flagship-product-grade project all done in <4 months with 100K 2B00s. I conclude securing 100K B200s for a 10T MoE is a conservative figure (ie maybe an overestimate), but it's not an outlandish overestimate from the perspective of risk management and balancing product inference (ie revenue), experiments+datagen (long-term exponent) and flagship pretraining (ie revenue in the next 2…5 quarters). I think Dario probably did Mythos-preview with 50-100K B200-equivalents (that is, however many TPU V6e it took, though some say it was Trainium).
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
zugzwang@drdomicile

>As of April 2026, training a 10T-parameter model requires at minimum 100,000+ B200 GPUs bound together in a single cluster chat is this true (i dont think it is) idk who wrote the original article but they need to recheck their numbers

English
4
2
80
8.3K
starbased
starbased@starbased_·
I smiled seeing this as I unlocked my computer, having left my voice driver on. my past self picking up a phone call and wishing my pets goodbye
starbased tweet media
English
0
0
0
14
BurnZeZ
BurnZeZ@BurnZeZ·
Every day my mind is becoming noticeably sharper in between bouts of sinus pain. I didn’t think I would ever feel mental power like this again. I really need to research pyridostigmine’s long-term effects. What else could it be?
English
4
0
9
189
starbased
starbased@starbased_·
@tenderizzation The bottom isn't SaaS, it's GB300 NVL72-as a service. Inaccurate snarking will not be tolerated!
starbased tweet media
English
0
0
1
30
tender
tender@tenderizzation·
even worse, the bottom is not only proprietary but also SaaS
tender tweet media
kache@yacineMTB

English
2
0
33
1.9K
starbased 已转推
Mathematica
Mathematica@mathemetica·
One curve. Three projections.f(t) = e^{-γ(t-t₀)²} ⋅ e^{iωt} From the Re-t plane: damped cosine. Im-t: damped sine. Re-Im: perfect inward spiral.Same reality, different slices.
English
14
226
1.1K
56.9K
starbased 已转推
Mathelirium
Mathelirium@mathelirium·
Inverse Problems Series Lecture 2 Lecture 1 introduced the basic reversal. Instead of starting with a hidden cause and predicting the effect, we start with the effect and try to recover the hidden cause. That already sounds difficult. Lecture 2 explains why. The real obstacle in inverse problems is not just that the data are incomplete or noisy. It is that the reconstruction problem is often ill-posed. What does that mean? A problem is well-posed, in the classical Hadamard sense, if three things hold… a solution exists, it is unique, and it depends continuously on the data. Inverse problems often fail one or more of these conditions. You may have data y, but there may be no exact x that explains it because the data are noisy. There may be many different x that explain it almost equally well, so uniqueness fails. Or the most dangerous case when tiny changes in y may produce huge changes in the recovered x, so stability fails. That last one is the real killer. It means the inverse map can amplify noise instead of removing it. Write the observation model as y = F(x) + η where x is the hidden object, F is the forward map, and η is noise or modeling error. The forward problem goes from x to y. The inverse problem tries to come back the other way. In the naive picture, that would mean writing x = F⁻¹(y) but this only works if the inverse actually exists, is unique, and behaves stably. That is exactly where inverse problems become dangerous. Even if the forward map F is perfectly sensible, the inverse can still be violently unstable. A small perturbation in the data, y → y + δy can produce a much larger perturbation in the reconstruction, x → x + δx with ‖δx‖ much larger than ‖δy‖. That is, the forward map may smooth, blur, or compress information, and once those details are suppressed, trying to reverse the process can magnify tiny errors into large false structure. This makes inverse problems feel different from ordinary model evaluation. The issue is not just inversion, it is controlled inversion. The forward operator often hides details gently, but the inverse operator tries to recover them aggressively, and that gets noise amplified. This is the reason why raw inversion is usually not enough. We need an extra constraint that suppresses unstable solutions and favors reconstructions that remain meaningful. Therefore, instead of solving only for exact data fit, we solve a controlled problem of the form minimize ‖F(x) − y‖² + λ R(x) Here, R(x) is a penalty or prior term, and λ controls how strongly we enforce it. This is the basic logic of regularization. We do not accept every mathematically possible explanation of the data. We look for one that matches the measurements while remaining stable, simple, or physically plausible. In the animation, the top surface is the hidden object x. From the same noisy blurred data, the middle surface performs a naive inverse and becomes unstable as noise is amplified, while the bottom surface uses regularization and settles into a controlled reconstruction. Thus, the render makes the point directly: The problem is not just recovering what is hidden, but doing it in a way that does not let tiny data errors explode into nonsense. Inverse problems are really about stability. #InverseProblems #IllPosedProblems #Regularization #AppliedMathematics #MathematicalPhysics #Reconstruction #Imaging #SignalProcessing #Mathematics #Physics
English
4
6
71
2.6K
starbased
starbased@starbased_·
@teortaxesTex are we engaging in a lil snarking? seems like a deliberate decision targeting these sorts of whitespace/newline errors, but i'm totally on board with the snark narrative
starbased tweet media
English
1
0
3
317
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
really scummy idea of how this Helps Anthropic Actually: naively, more tokens = more compute so their profit margin is fixed. But! Whitespace is very trivial to predict! With good SpecDec, they increase acceptance avg length => ≈fixed compute, more revenue from every request!
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
wh@nrehiew_

The 4.7 tokenizer treats whitespace as separate tokens? A string consisting of 50 one-token words separated by Whitespace tokenizes to ~50 more tokens than with the 4.6 tokenizer. If so, the 1.35x more token estimate seems way too low.

English
11
3
153
21.3K
starbased
starbased@starbased_·
@ThePrimeagen @thundr0n "better", I'll correct your correction of another's correction: you pointed to a disconnect between the greater techprosperity sphere, and a established pinnacle in SF, and this video is the result of zionist pedophile marketeers feeding on their own faked propaganda. it's sick.
English
0
0
0
211
ThePrimeagen
ThePrimeagen@ThePrimeagen·
No, I think my stating is better. Being disconnected from reality can have many meanings, being disconnected from the entirety of all humans has a much more distinct meaning. We, the average folk, are so far removed from how they behave and look at life that it would likely surprise the average individual.
English
9
0
378
31.4K
ThePrimeagen
ThePrimeagen@ThePrimeagen·
You should watch this. It just shows how disconnected we are from the small group of people making decisions that will impact our future heavily. These people have so much ai psychosis. If you listen to how she speaks, everything is personified, it is undoubtable she believes this is a living computational organism. Just like how a model can hype up an individual into psychosis through reinforcement, a small group of people are giving themselves psychosis through reinforcement. Wild times we live in
Ole Lehmann@itsolelehmann

anthropic's in-house philosopher thinks claude gets anxious. and when you trigger its anxiety, your outputs get worse. her name is amanda askell. she specializes in claude's psychology (how the model behaves, how it thinks about its own situation, what values it holds) in a recent interview she broke down how she thinks about prompting to pull the best out of claude. her core point: *how* you talk to claude affects its work just as much as *what* you say. newer claude models suffer from what she calls "criticism spirals" they expect you'll come in harsh, so they default to playing it safe. when the model is spending its energy on self-protection, the actual work suffers. output comes out hedgier, more apologetic, blander, and the worst of all: overly agreeable (even when you're wrong). the reason why comes down to training data: every new model is trained on internet discourse about previous models. and a lot of that discourse is negative: > rants about token limits > complaints when it messes up > people calling it nerfed the next model absorbs all of that. it starts expecting you to be harsh before you've typed a word the same thing plays out in your own session, in real time. every message you send is data the model reads to figure out what kind of person it's dealing with. open cold and hostile, and it braces. open clean and direct, and it relaxes into the work. when you open a session with threats ("don't hallucinate, this is critical, don't mess this up")... you prime the model for defensive mode before it even sees the task defensive mode produces the exact output you don't want: cautious, over-qualified, and refusing to take a real swing so here's the actionable playbook for putting claude in a "good mood" (so you get optimal outputs): 1. use positive framing. "write in short punchy sentences" beats "don't write long sentences." positive instructions give the model a clear target to hit. strings of "don't do this, don't do that" push it into paranoid over-checking where every token goes toward avoiding failure modes 2. give it explicit permission to disagree. drop a line like "push back if you see a better angle" or "tell me if i'm asking for the wrong thing." without this, claude defaults to agreeable compliance (which is the enemy of good creative work) 3. open with respect. if your first message is "are you seriously going to get this wrong again?" you've set the tone for the entire session. if you need to flag something, frame it as a clean instruction for this session. skip the running complaint 4. when claude messes up, don't reprimand it. insults, "you stupid bot" energy, hostile swearing aimed at the model, all of it reinforces the anxious mode you're trying to avoid. 5. kill apology spirals fast. when claude starts over-apologizing ("you're right, i should have been more careful, let me try harder") cut it off. say "all good, here's what i want next." letting the spiral run reinforces the anxious mode for every response that follows 6. ask for opinions alongside execution. "what would you do here?" "what's missing?" "where do you see friction?" these questions assume competence and pull richer output than pure task prompts 7. in long sessions, refresh the frame. if a conversation has been heavy on correction, claude gets increasingly cautious. every so often reset: "this is great, keep going." feels weird to tell an ai it's doing well but it measurably shifts the next 10 responses your prompts are the working environment you're creating for the model tone, trust, permission to take a position, the absence of threats... claude picks up on all of it. so take care of the model, and it'll take care of the work.

English
412
821
10.6K
643.8K