swh

227 posts

swh banner
swh

swh

@swhsiang

building humanoid | prev ML @CashApp Infra @salesforce Purdue ECE

New York, NY Katılım Aralık 2019
209 Takip Edilen60 Takipçiler
swh
swh@swhsiang·
@zivdotcat True love. Altman is following Elon
English
0
0
1
15
dev
dev@zivdotcat·
🚨 BREAKING: Sam Altman is looking into creating a rocket company to challenge Elon Musk’s SpaceX.
dev tweet media
English
7
0
11
581
swh
swh@swhsiang·
This Agent tutorial on YouTube is underrated. Best video I’ve found so far.
swh tweet media
English
1
0
2
96
swh retweetledi
anshuman
anshuman@athleticKoder·
Techniques I'd master to fine-tune LLMs in production. Bookmark this 1. LoRA & QLoRA for parameter-efficient fine-tuning 2. PEFT library for adapter methods 3. Instruction tuning 4. Dataset formatting (ChatML, Alpaca, ShareGPT) 5. DeepSpeed ZeRO for memory optimization 6. Flash Attention 2 for efficient training 7. Gradient checkpointing for longer contexts 8. BitsAndBytes for 4-bit/8-bit quantization 9. RLHF & DPO for alignment 10. Tokenizer training & vocabulary extension 11. Evaluation metrics (perplexity, ROUGE, human eval) 12. Unsloth for 2x faster fine-tuning 13. Multi-GPU strategies (FSDP, DDP)
English
8
48
576
32K
Adan
Adan@Adnubiquitous·
I'm hiring a Founding Engineer to help build the fit bit for cows. Goal: Shipping 200 ear tags this month then 1 billion the next 10 years. Requirements: -Strong embedded + low-power design fundamentals -Not afraid of cows (bonus if you grew up on a farm) -Experience taking hardware (PCB!!) from prototype to scale -Based in SF Reply with the coolest thing you've shipped
English
91
24
471
134K
swh
swh@swhsiang·
@oprydai ᕙ(⇀‸↼‶)ᕗᕙ(⇀‸↼‶)ᕗ
0
0
0
5
Mustafa
Mustafa@oprydai·
I wanna connect with people who are into: 1. Engineering 2. Robotics 3. Hardware Startups 4. Manufacturing 5. AI + Control Systems 6. Building Real Tech If you’re building deep tech, hardware, or just obsessed with creating real systems; this account is your space.
Mustafa tweet media
English
135
60
1.2K
37K
swh
swh@swhsiang·
Been thinking about what’s the endgame of humanoid robotics. First of all, it’s fucking difficult to build the hardware that works as smooth as human. Second, the long tail problem you’ve seen in self driving car will happen again in humanoid robotics because both products operate in an open loop environment. Question is how much do your customer want to pay for you product? Coding or marking are more valuable skills in 21 century and junior programmers and marketing specialists are being replaced by ai agents. Now, we are targeting blue collar jobs. It makes sense in the US because it’s expensive to hire an electrician or plumber to solve your problem. Their world? I doubt it. Perhaps it also makes sense to send robots to develop Mars for human.
English
0
0
1
31
swh
swh@swhsiang·
your next 6 months: - learning RL/ML - learning Robotics - doing open source - sharing your work regularly - actually selling your stuff one of these pays rent
English
0
0
1
47
swh
swh@swhsiang·
At the end, combine all your work and build something like nanogpt. Your future self will thank you later. github.com/karpathy/nanoG…
Ahmad@TheAhmadOsman

step-by-step LLM Engineering Projects each project = one concept learned the hard (i.e. real) way Tokenization & Embeddings > build byte-pair encoder + train your own subword vocab > write a “token visualizer” to map words/chunks to IDs > one-hot vs learned-embedding: plot cosine distances Positional Embeddings > classic sinusoidal vs learned vs RoPE vs ALiBi: demo all four > animate a toy sequence being “position-encoded” in 3D > ablate positions—watch attention collapse Self-Attention & Multihead Attention > hand-wire dot-product attention for one token > scale to multi-head, plot per-head weight heatmaps > mask out future tokens, verify causal property transformers, QKV, & stacking > stack the Attention implementations with LayerNorm and residuals → single-block transformer > generalize: n-block “mini-former” on toy data > dissect Q, K, V: swap them, break them, see what explodes Sampling Parameters: temp/top-k/top-p > code a sampler dashboard — interactively tune temp/k/p and sample outputs > plot entropy vs output diversity as you sweep params > nuke temp=0 (argmax): watch repetition KV Cache (Fast Inference) > record & reuse KV states; measure speedup vs no-cache > build a “cache hit/miss” visualizer for token streams > profile cache memory cost for long vs short sequences Long-Context Tricks: Infini-Attention / Sliding Window > implement sliding window attention; measure loss on long docs > benchmark “memory-efficient” (recompute, flash) variants > plot perplexity vs context length; find context collapse point Mixture of Experts (MoE) > code a 2-expert router layer; route tokens dynamically > plot expert utilization histograms over dataset > simulate sparse/dense swaps; measure FLOP savings Grouped Query Attention > convert your mini-former to grouped query layout > measure speed vs vanilla multi-head on large batch > ablate number of groups, plot latency Normalization & Activations > hand-implement LayerNorm, RMSNorm, SwiGLU, GELU > ablate each—what happens to train/test loss? > plot activation distributions layerwise Pretraining Objectives > train masked LM vs causal LM vs prefix LM on toy text > plot loss curves; compare which learns “English” faster > generate samples from each — note quirks Finetuning vs Instruction Tuning vs RLHF > fine-tune on a small custom dataset > instruction-tune by prepending tasks (“Summarize: ...”) > RLHF: hack a reward model, use PPO for 10 steps, plot reward Scaling Laws & Model Capacity > train tiny, small, medium models — plot loss vs size > benchmark wall-clock time, VRAM, throughput > extrapolate scaling curve — how “dumb” can you go? Quantization > code PTQ & QAT; export to GGUF/AWQ; plot accuracy drop Inference/Training Stacks: > port a model from HuggingFace to Deepspeed, vLLM, ExLlama > profile throughput, VRAM, latency across all three Synthetic Data > generate toy data, add noise, dedupe, create eval splits > visualize model learning curves on real vs synth each project = one core insight. build. plot. break. repeat. > don’t get stuck too long in theory > code, debug, ablate, even meme your graphs lol > finish each and post what you learned your future self will thank you later

English
0
0
0
35
dev
dev@zivdotcat·
Choose wisely anons: Perplexity Comet or ChatGPT Atlas?
dev tweet media
English
11
1
19
756
Craig Weiss
Craig Weiss@craigzLiszt·
unpopular opinion: a good cofounder is inherently rare. most people are better off building solo
English
211
88
1.4K
54.7K
swh
swh@swhsiang·
@zekramu That's true prior to GPT moment. Now we have all these models. I believe the gap between EE and SWE is closer than ever.
English
0
0
0
41
zek
zek@zekramu·
an EE can easily be a SWE but a SWE can’t even dream about being an EE.
English
35
4
190
7K
swh
swh@swhsiang·
@yoobinray smart idea. it's useful for all the new grads
English
0
0
1
217
ray🖤🇰🇷
ray🖤🇰🇷@yoobinray·
fastest path to a job that i see rn join a hackathon every month and win one of them
English
16
5
434
15.1K
swh
swh@swhsiang·
@kmeanskaran @kanavtwt Data pipeline is just infra… the key is the labeled data. next JS app itself isn’t scalable. U need experienced engineer to scale your web app…
English
1
0
6
296
Karan🧋
Karan🧋@kmeanskaran·
@kanavtwt Yes but even that's possible in AI Anyone can learn and build the next js app, auth, supabase, node js But using AI you can't even write a functional data pipeline.
English
3
0
8
2K
Karan🧋
Karan🧋@kmeanskaran·
Fact, ML Engineers can learn web dev in 1-2 months but web devs can't learn ML in a shorter time.
English
181
79
2K
192.9K
swh
swh@swhsiang·
@QuickScreenAI Exactly, the key is to adopt new technologies as quick as possible. The only advantage in AI era is speed.
English
0
0
1
18
QuickScreen.AI
QuickScreen.AI@QuickScreenAI·
@swhsiang (4) ability to learn — and teach. Hire people who not only use AI but raise team workflow. Pro tip: give a short, tool-driven task to see how they think, justify outputs, and document assumptions.
English
1
0
1
11
swh
swh@swhsiang·
Reinventing Interview in the AI Era In 2025, the AI boom sparked by ChatGPT has entered its third year. Like every technological upheaval in human history, LLMs have permanently changed how we work. 1/n
English
2
0
0
49
swh
swh@swhsiang·
Future Technical Interviews The rules have shifted. Software hiring went from brainteasers to LeetCode. In the AI era, we need another way to spot talent. New failure modes show up when candidates accept every answer from an agent without judgment; interviewers must watch how they challenge and calibrate their collaborators. And yes, many companies complain that candidates use AI tools to cheat—platforms like cluely.ai make it obvious—but interviews should identify people who leverage tools best. Cheating is fundamentally an integrity check. If someone is willing to hide the assist, can they really carry ownership? n/n
English
1
0
0
43
swh
swh@swhsiang·
I encourage all companies to keep hiring in the US especially hiring new grads. Today’s new grads will become tomorrow’s experienced engineers who knows how to employ AI agents. 5/n
English
1
0
1
35