7th

411 posts

7th banner
7th

7th

@fw7th

ml • mechE • plus ultra and things of that nature

Konohakagure Beigetreten Ağustos 2024
37 Folgt41 Follower
7th
7th@fw7th·
I'll allot intentional time to attack these in the days following this post. I'll need some sort of yardstick to measure myself against. I'll work problem sets involving system design. I'll also focus on seeing more design decisions, systems and implementation details.
English
0
0
0
8
7th
7th@fw7th·
Day 4/? Inference engine. Since my main goal of doing this was to improve my overall skills, I've identified two bottlenecks. 1. I design inefficient systems/modules under uncertainty. 2. My programming knowledge is very basic/rudimental albeit I complete projects with it.
English
1
0
0
16
7th
7th@fw7th·
@laurathesimp The answer is simply; balls of steel. Something you don't seem to possess
English
0
0
1
14
laura 
laura @laurathesimp·
how do people trust 10 agents to run without supervision over the weekend my claude messed up a merge conflict and then apologized and then proceeded to delete my changes
English
24
2
360
18.1K
7th
7th@fw7th·
@snwy_me probably ragebait
English
0
0
0
488
snwy
snwy@snwy_me·
i'm fucking fried holy shit
snwy tweet media
English
26
5
712
29.8K
7th
7th@fw7th·
@k7agar When it's paid compute too
7th tweet media
English
0
0
1
41
atharva ☆
atharva ☆@k7agar·
ran the training for 4hrs for the wrong backbone
atharva ☆ tweet media
English
5
0
60
1.8K
7th
7th@fw7th·
@shiri_shh His voice sounds like Sam Altman's
English
0
0
0
12
shirish
shirish@shiri_shh·
This startup lets you ORDER SUNLIGHT from space to your exact location in 30 seconds 😭
English
1.6K
1.2K
14.8K
4.6M
PiX
PiX@pa1nark·
@pepeller @shiri_shh actually not needed, just make it run windows...
English
1
0
14
1.8K
7th
7th@fw7th·
@realcryobyte @schteppe Forgot to add "make no mistaes, I give you the power of 4 senior develops", rookie mistake
English
0
0
1
14
cryobyte
cryobyte@realcryobyte·
@schteppe to tackle memory safety, C++29 adds an AI agent prompt block. int main(){ chatgpt{ write me a code that is memory and thread safe } }
English
3
0
42
1.9K
Stefan
Stefan@schteppe·
To tackle memory safety, C++29 adds Python support. Whenever you need memory safety, use a Python block: int main(){ py { print(“Hello, World!”) } } Bjarne’s comment: “I give up brø. Just use Python brø”
English
40
118
3.1K
146.8K
7th
7th@fw7th·
@schteppe I love how some comments can't tell his is a joke 😭
English
0
0
2
19
7th
7th@fw7th·
@wildmindai what kind of algorithm is used for gazing, it seems about like temporal averaging or optical flow
English
0
0
0
25
Wildminder
Wildminder@wildmindai·
NVIDIA says: no more "brute force every pixel" of video understanding. AutoGaze- identifies and removes redundant video patches before they enter a Vision Transformer. Now we can processes 4K long-video in real-time. Works with SigLIP2 and NVILA. autogaze.github.io
English
75
164
2.4K
294.9K
7th
7th@fw7th·
@probnstat What kind of data? And what are the inductive biases of deep net? What's our measure of performance? I think further questions would need to be asked no?
English
0
0
1
374
Probability and Statistics
Probability and Statistics@probnstat·
ML interview drill: You’re given a dataset with 1M samples, 100 features. A complex model (deep net) gives worse test performance than logistic regression. What’s the MOST likely reason? A) Underfitting B) Overfitting C) Bad optimization D) Data leakage Reply with your answer! Bonus: Name 2 concrete steps you’d take to improve the deep model.
English
25
8
157
40.5K
7th
7th@fw7th·
@jino_rohit I don't maintain any repos, is it really that bad?
English
0
0
0
13
Jino Rohit
Jino Rohit@jino_rohit·
i miss the pre agentic AI era, now every repo is filled with slop PRs(even major ones like vllm/sglang). by the time you read the issue and trace the data flow, theres already a slop PR with 999 lines of code. it must really suck for the repo maintainers.
English
4
0
31
904
7th
7th@fw7th·
Training Frameworks: Ops are API functions called by tensors. Tensors track what tensors created them and the ops used. This took me some time to figure out. Got frustrated; - re-read attention is all you need - Revised Inner product spaces - studied for fluid mechanics test
English
0
0
0
16
7th
7th@fw7th·
Day 3: ML inference engine Op kernels for inference engines e.g ggml, are different from ML frameworks e.g tinygrad & pytorch. Here's the difference: inf. Engines: build a static graph of predefined ops, some libs abstract away tensors and ops to a "layer", no grad tracking.
English
1
0
1
43
7th
7th@fw7th·
@dogecahedron say i had 3 rows and 2 cols, I would store stride=2 (row major), then "append" subsequent rows, but how do you scale it to n-D?
English
0
0
0
15
dogecahedron
dogecahedron@dogecahedron·
nice going cpu first is a good decision. its honestly easy to go from 2D to N-D just for each 1d buffer you store the View as a list of dim+stride. a stride tells you how many steps you take in the buffer for each step along a tensor dimension. in your case stride=1 for the columns and then stride = NColumns for the rows. but this pattern generalizes to more dimensions
English
3
0
1
23
7th
7th@fw7th·
day2: CPU inference engine 1d and 2d tensors working, but it's kind hacky; 1. Custom allocator but just a wrapper around malloc and free with some attachments. 2. implementation now allocates contiguous memory based on sizeof(dtype) * w for a 1d tensor. Tensors are row-major
English
2
0
2
37
7th
7th@fw7th·
@dogecahedron lemme my understanding; stride will allow me to traverse along one dim, and then subsequent rows/cols along that same dim so I store it as a contiguous array in memory?
English
1
0
0
12
7th
7th@fw7th·
3. For 2d tensors I figured out how to map to a contiguous array with; (i * n_cols) + j. Tensors can be randomly filled or with a specific number This won't scale to n-D tensors so I'll probably redesign this in the future. My aim for now is to write all ops.
English
0
0
1
22
7th
7th@fw7th·
@jino_rohit To what extent do you wanna add features?
English
0
0
0
3
Jino Rohit
Jino Rohit@jino_rohit·
day 16 of ml systems and gpu programming im building tachyon - a lightweight LLM inference engine to run on consumer hardware. im treating this a playground for ideas i read up and implement them into making it an actual inference engine. the library itself is quite readable, the concepts spelled out and everything benchmarked to reproduce. currently it has a llama 3.2 1B instruct model running at 84.7 tokens/sec with kv caching on a rtx 4060. it takes only 3 lines to run!
Jino Rohit tweet media
English
3
0
45
1.1K