sajal

18.8K posts

sajal

@sajalshlan

ai. computers. notes to self.

Katılım Eylül 2021

249 Takip Edilen11.4K Takipçiler

Sabitlenmiş Tweet

sajal@sajalshlan·5 Şub

my biggest kpi for 2026 is moving from "fear" driven actions to "curiosity" driven actions

English

623

sajal retweetledi

Digvijay Rathore@rathorehq·23 Mar

The entire assumption that vibe coded applications are inferior comes from assumption that people vibe code because they are bad engineers. The real strength of vibe coding unlocks when you already are a master of entire scope of work being done by AI. Then you do not let bad decisions compound, correct them early, test thoroughly and get to same 1000 concurrent connections significantly quickly.

Saeed Anwar@saen_dev

@fidexcode vibe coders aren't cooked, they just need to learn what happens after the demo works. the gap between "it runs" and "it handles 1000 concurrent users without falling over" is where actual engineering lives.

English

1.1K

sajal@sajalshlan·20 Mar

@amritwt on "cursorbench"

English

amrit@amritwt·19 Mar

CURSOR MADE A MODEL BETTER AND CHEAPER THAN OPUS

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

1.8K

142.9K

sajal@sajalshlan·20 Mar

@awesamarth_ 😭😭

QME

Samarth@awesamarth_·19 Mar

true story

satvik@sxtvik

@awesamarth_ No, remove written by Claude Code Now commit

English

203

sajal@sajalshlan·20 Mar

@awesamarth_

GIF

QME

Samarth@awesamarth_·17 Mar

@sajalshlan me asfff how are you doing bhai g?

English

Samarth@awesamarth_·17 Mar

the fuck was i up to in 2022 😭😭🙏🙏

English

963

sajal@sajalshlan·20 Mar

????

Politics Global@PolitlcsGlobal

🚨🇫🇷 NEW: The location of the French aircraft carrier, FS Charles de Gaulle, has been given away by a sailor using Strava whilst jogging on the ship deck [@lemondefr]

QST

132

sajal@sajalshlan·18 Mar

there's so much to do. yes but you don't need to do everything, do you?

English

sajal@sajalshlan·18 Mar

@auto_grad_ @smallest_AI congrats ishaan!

Nederlands

Ishaan@auto_grad_·18 Mar

career update: i have joined @smallest_AI as a researcher engineer to work on improving small models and scale them. hoping to contribute a lot to the team and product!

English

568

33.7K

sajal@sajalshlan·17 Mar

@neervj oh man i too stumbled upon this earlier 😭😭😭

English

Neeraj@neervj·16 Mar

ZXX

248

sajal@sajalshlan·17 Mar

@arcinston never using no code tools for anything

English

arush@arcinston·17 Mar

never using no code tools for creating landing pages

English

254

sajal@sajalshlan·17 Mar

"your credit balance is too low"

English

110

sajal@sajalshlan·16 Mar

setting up openclaw made me feel so tokenpoor

English

180

sajal@sajalshlan·16 Mar

@himanshustwts @smallest_AI congrats himanshu!!

English

150

himanshu@himanshustwts·16 Mar

Career update: Excited to share that I have joined the incredible team at @smallest_AI to work on Research x Devrel! The team is cooking incredible small + efficient multi-modal models and it feels like an exciting time to push the frontier on scale!

English

209

1.9K

63.8K

sajal@sajalshlan·13 Mar

@neervj im all ears

English

Neeraj@neervj·13 Mar

Composing music myself for a new film. Hope it sounds good to others’ ears as well, lol. Hit me up if you wanna take a look when it’s ready.

English

670

sajal@sajalshlan·11 Mar

x.com/i/article/2031…

ZXX

121

sajal@sajalshlan·9 Mar

difference between max_length and context_length: 1. max_length is a parameter used in LLM API calls, it's an upper cap on your decode (output generation) only, not the prompt. Output stops either at the EOS token or when max_length is hit, whichever comes first. If max_length is 1000 and your output was only 200 tokens, the remaining 800 weren't wasted, no compute wastage, no memory wastage (each new token generated is incrementally added to the KV cache). Why have max_length at all? Two reasons: - Cap the output length explicitly, prevent runaway generation - Indirectly affects concurrency by controlling how long requests hold KV cache pages, longer max_length means requests stay alive in the batch longer, occupying pages longer, leaving less room for new requests 2. context_length is a parameter used when serving LLMs via an inference engine like vLLM or SGLang. It's a ceiling on the total token limit per request, system prompt + input + output combined. The scheduler uses this ceiling when making admission decisions. Actual memory usage is always just what's been generated so far, the ceiling only affects scheduling decisions, not physical page allocation. It directly impacts concurrency and throughput. Example: global KV cache pool holds 24K tokens worth of pages, context_length is set to 8K, and 3 requests are currently running at 3K tokens each. Each request is physically holding 3K worth of KV cache pages right now, but each could still grow up to 8K before finishing. The scheduler has to account for this worst-case growth, so with 3 requests potentially consuming up to 24K pages total, it can't safely admit new requests until one completes (hits EOS or max_length) and frees its pages back to the pool. This directly throttles throughput.

English

118

sajal@sajalshlan·7 Mar

x.com/i/article/2030…

ZXX

188

sajal@sajalshlan·9 Mar

can't even trust claude on docs level stuff now???? gave me completely wrong info on how mem_fraction_static caps not only model_weights but also kv_cache_pool

English

150

sajal@sajalshlan·9 Mar

finishing my irl conversations with "...thats the only thing that matters, everything else is just noise"

English

Keşfet

@amritwt @awesamarth_ @auto_grad_ @smallest_AI @neervj @arcinston @himanshustwts @elonmusk