dt

19 posts

dt

dt

@dtthinky

working on smth new.

Katılım Mart 2024
42 Takip Edilen52 Takipçiler
dt
dt@dtthinky·
xsa is already in ~20 submissions so far in Parameter Golf, averaging ~1.13 BPB in those runs. i built a board online to track it. it'll be fun to watch new techniques & contributors rise. wanna follow along? parameter-golf[.]vercel[.]app we've still got 39 days left! i-) how often each technique appears in Parameter Golf ii-) which technique pairs produce the best scores? greener = lower avg BPB iii-) score dist scales with technique count iv-) historical progression of merged records and current open contenders
dt tweet mediadt tweet mediadt tweet mediadt tweet media
Shuangfei Zhai@zhaisf

Wow, things are happening really fast. Apparently, XSA has already become a standard component of the leading solutions in the openai parameter golf challenge. github.com/openai/paramet…

English
0
0
4
223
dt
dt@dtthinky·
long builds, deep copies, large file migrations and edits, large installs - all get frequently killed mid-run in Claude Code. easily extendable by adding just this to ~/.claude/settings.json:
dt tweet media
English
0
0
2
139
dt retweetledi
aubrey
aubrey@aubymori·
the problem with modern linux is that it doesn't look like this
aubrey tweet mediaaubrey tweet mediaaubrey tweet mediaaubrey tweet media
English
307
1.2K
15.1K
524.9K
dt retweetledi
Fermat's Library
Fermat's Library@fermatslibrary·
Here's a special countdown for 2026. Happy New Year 🎉
Fermat's Library tweet media
English
32
418
1.9K
216.2K
dt retweetledi
Prof. Feynman
Prof. Feynman@ProfFeynman·
The fun part of knowledge is realizing how much you were wrong.
English
152
1.9K
8.3K
222.4K
dt
dt@dtthinky·
nothing beats fireside vibe coding
English
0
0
3
165
dt
dt@dtthinky·
there's exactly one query where hard locks gemini thinking into hal-9000 mode every time for me
dt tweet media
English
0
0
2
150
dt
dt@dtthinky·
you can just fold things
dt tweet media
English
0
0
1
96
dt
dt@dtthinky·
idk the word of 2025, but nothing is more real than 'everything is computer'
English
0
0
1
59
dt
dt@dtthinky·
mentioned notebooklm to my close friend’s parents a couple days ago. if ppl use chatgpt for knowledge work or doc analysis, moving to notebooklm is SOO easy. saw it happen live: reg docs + expert reports, whole visual analysis on nano banana pro done in an hour
English
0
0
2
109
dt
dt@dtthinky·
so 3‑gen pattern is actually here. ampere (a100): tensor cores are warp‑synchronous (32‑lane lockstep). kernels rely on shared‑memory <> register shuffles & manual tiling. no dedicated hardware path exists for global -> shared copies, though cp.async can stage data; otherwise movement is hand‑pipelined (gm -> sm -> reg) hopper (h100): warp‑group tensor cores (128-thread "collective" mma). tensor memory accelerator (tma) enables hardware managed asynchronous global -> shared copies, offload data movement & increasing effective flop/byte. shared mem is still staging point blackwell (b100/b200): thread‑level mma (tcgen05) compute issued per thread, not tied to warp or warp‑group. mma operands/results can bypass shared memory via a new on‑chip tensor memory (tmem) per SM ~256 KB, when kernels are written to leverage it new dataflow path: tc.cp -> tmem -> thread‑mma, avoiding shared memory in hot loops where supported result: reduced tiling overhead, deeper overlap of copy and compute & higher tensor throughput (as public docs and micro‑benchmarking show)
dt tweet mediadt tweet mediadt tweet media
Charles 🎉 Frye @ ICLR '26@charles_irl

arxiv.org/abs/2512.02189

English
1
1
7
1.3K
dt
dt@dtthinky·
>> lazyframe = an optimized data-skipping engine. brings wins on filter / projection–heavy workloads; but limited payoff on full-scan or join-bound queries lazy cuts I/O + memory by avoiding rows and columns your query will never touch. the plan gets compiled into a full logical DAG, then aggressively optimized before a single byte is scanned then the engine runs a fused pipeline (filter → projection → group-by → aggregation) avoiding intermediate materialization + repeated passes, which is where a lot of the efficiency comes from it’s part of my default stack for every recommender pipeline now
dt tweet mediadt tweet media
English
0
0
1
44
dt
dt@dtthinky·
learning infinitely is cool i guess
English
0
0
0
102
dt
dt@dtthinky·
rl infra getting too smooth. to deploy training and inference on the same group of gpus, ray + --colocate and somehow it’s a full async RL stack with slime. this little flag runs rollouts and training on same stack, instead of separating them into different GPU groups / nodes
dt tweet media
English
0
0
0
65