sajal

18.8K posts

sajal banner
sajal

sajal

@sajalshlan

ai. computers. notes to self.

Katılım Eylül 2021
249 Takip Edilen11.4K Takipçiler
Sabitlenmiş Tweet
sajal
sajal@sajalshlan·
my biggest kpi for 2026 is moving from "fear" driven actions to "curiosity" driven actions
English
0
0
2
623
sajal retweetledi
Digvijay Rathore
Digvijay Rathore@rathorehq·
The entire assumption that vibe coded applications are inferior comes from assumption that people vibe code because they are bad engineers. The real strength of vibe coding unlocks when you already are a master of entire scope of work being done by AI. Then you do not let bad decisions compound, correct them early, test thoroughly and get to same 1000 concurrent connections significantly quickly.
Saeed Anwar@saen_dev

@fidexcode vibe coders aren't cooked, they just need to learn what happens after the demo works. the gap between "it runs" and "it handles 1000 concurrent users without falling over" is where actual engineering lives.

English
1
2
12
1.1K
Samarth
Samarth@awesamarth_·
the fuck was i up to in 2022 😭😭🙏🙏
Samarth tweet media
English
7
0
27
963
sajal
sajal@sajalshlan·
there's so much to do. yes but you don't need to do everything, do you?
English
0
0
0
73
Ishaan
Ishaan@auto_grad_·
career update: i have joined @smallest_AI as a researcher engineer to work on improving small models and scale them. hoping to contribute a lot to the team and product!
Ishaan tweet media
English
58
5
568
33.7K
sajal
sajal@sajalshlan·
@neervj oh man i too stumbled upon this earlier 😭😭😭
English
0
0
1
23
Neeraj
Neeraj@neervj·
Neeraj tweet mediaNeeraj tweet media
ZXX
1
0
4
248
sajal
sajal@sajalshlan·
@arcinston never using no code tools for anything
English
0
0
0
25
arush
arush@arcinston·
never using no code tools for creating landing pages
English
2
0
10
254
sajal
sajal@sajalshlan·
"your credit balance is too low"
sajal tweet media
English
0
0
2
110
sajal
sajal@sajalshlan·
setting up openclaw made me feel so tokenpoor
English
0
0
0
180
himanshu
himanshu@himanshustwts·
Career update: Excited to share that I have joined the incredible team at @smallest_AI to work on Research x Devrel! The team is cooking incredible small + efficient multi-modal models and it feels like an exciting time to push the frontier on scale!
himanshu tweet media
English
209
31
1.9K
63.8K
Neeraj
Neeraj@neervj·
Composing music myself for a new film. Hope it sounds good to others’ ears as well, lol. Hit me up if you wanna take a look when it’s ready.
Neeraj tweet media
English
3
0
10
670
sajal
sajal@sajalshlan·
difference between max_length and context_length: 1. max_length is a parameter used in LLM API calls, it's an upper cap on your decode (output generation) only, not the prompt. Output stops either at the EOS token or when max_length is hit, whichever comes first. If max_length is 1000 and your output was only 200 tokens, the remaining 800 weren't wasted, no compute wastage, no memory wastage (each new token generated is incrementally added to the KV cache). Why have max_length at all? Two reasons: - Cap the output length explicitly, prevent runaway generation - Indirectly affects concurrency by controlling how long requests hold KV cache pages, longer max_length means requests stay alive in the batch longer, occupying pages longer, leaving less room for new requests 2. context_length is a parameter used when serving LLMs via an inference engine like vLLM or SGLang. It's a ceiling on the total token limit per request, system prompt + input + output combined. The scheduler uses this ceiling when making admission decisions. Actual memory usage is always just what's been generated so far, the ceiling only affects scheduling decisions, not physical page allocation. It directly impacts concurrency and throughput. Example: global KV cache pool holds 24K tokens worth of pages, context_length is set to 8K, and 3 requests are currently running at 3K tokens each. Each request is physically holding 3K worth of KV cache pages right now, but each could still grow up to 8K before finishing. The scheduler has to account for this worst-case growth, so with 3 requests potentially consuming up to 24K pages total, it can't safely admit new requests until one completes (hits EOS or max_length) and frees its pages back to the pool. This directly throttles throughput.
English
0
0
1
118
sajal
sajal@sajalshlan·
can't even trust claude on docs level stuff now???? gave me completely wrong info on how mem_fraction_static caps not only model_weights but also kv_cache_pool
sajal tweet media
English
0
0
3
150
sajal
sajal@sajalshlan·
finishing my irl conversations with "...thats the only thing that matters, everything else is just noise"
sajal tweet media
English
0
0
2
87