Raj Saha

578 posts

Raj Saha

@cloudwithraj

Building Stealth Startup | Former Principal SA @AWSCloud | YouTuber

NYC Katılım Ekim 2021

167 Takip Edilen770 Takipçiler

Raj Saha@cloudwithraj·4h

“Close your eyes while answering.” That’s what an interviewer told a candidate to prevent Gen AI cheating. Sounds extreme? Maybe. But it signals something much bigger. Interview patterns are already changing: ❌Fewer “What is X?” questions. Because with AI agents, information is cheap. Knowing definitions is no longer a differentiator. ✅More system design deep dives. Why you made those choices, the tradeoffs, and how it scales in the real world. More importantly, can you communicate it clearly? ✅How you use Gen AI in real projects Can you decide when it’s wrong? Can you demonstrate how you augment your thinking to Gen AI? Do you know Gen AI with real-world factors such as running on cloud, security, cost optimization etc.? ✅Leadership over knowledge recall Driving teams, influencing decisions, and resolving conflicts. Behavioral and executive communication becomes more important than ever. That's what I've been building toward. Tomorrow, I'm announcing something I've been working on for a long time. 👉If you're serious about switching your career to cloud and Gen AI, waitlist at sabootcamp.com. Cohort 8 launches Sat with a live webinar! Here’s what you get when you show up LIVE: ▶My exclusive Solutions Architect framework to prep you for today's job market! ▶Full bootcamp details, Cloud and Gen AI curriculum, my new stealth product reveal to help you prepare better, AND a special offer for live participants only. ▶You will have the chance to interact with me and ask questions. And good news - it already worked for last cohort's students who secured cloud jobs in top companies, including at AWS, Microsoft, Google, Oracle, JPMorgan, Reddit, and more 💰. See you there!

English

Raj Saha@cloudwithraj·1d

Did you know that even Anthropic is operating at a loss, because running Gen AI is EXPENSIVE! But companies are not going to stop utilizing Gen AI. So what gives? My prediction is that, more unique Gen AI cost optimization will come into play: - Switch from LLM (Large Language Model) to SLM (Small Language Model). LLM is overkill for most tasks. We are seeing SLM specializing in certain tasks are emerging. These SLMs are so small that they can run on the edge - Usage of Semantic Caching will go up. Semantic caching can understand the intent and don’t rely on 100% match on the actual words. When intent is matched, the answer is returned without hitting the LLM - Memory prices are crazy. We are seeing, even for short-term memory, a shift to low-cost object storage. I believe we will sacrifice some speed for cost optimization - Training will be on GPU, but inference will switch more to general compute. We are already seeing rise of ONNX models - Last but not least, I hope electricity charges come down. Either supply needs to increase (nuclear? more solar?), or consumption needs to go down. Question to readers - What’s your theory about sustainable Gen AI consumption? — If you are mid career IT professional looking to switch career to Cloud and Gen AI, attend the free launch workshop of Cohort 8 of SA Bootcamp this Saturday: sabootcamp.com

English

Raj Saha@cloudwithraj·2d

Ran into the legend Kunal Kushwaha at an event in the Bay Area. He is wise beyond his years. He generously shared his time and tips on arranging hackathons, and building community. Until next time 🙌

English

2.3K

Raj Saha@cloudwithraj·2d

One limiting belief I used to have is I thought everyone working in cutting edge tech must be smarter than me. When I was writing code in COBOL, I thought those people working in the cloud must be way smarter, and I surely didn't deserve to be there. That kept me stuck for a decade. After switching to the Cloud team at Verizon, I thought those people working in Big Tech must be doing rocket science - who am I to even try. Then I got into AWS. Even became L7 Principal. And I realized two things that changed how I approach harder things: - A passionate fool can beat a distracted genius. I just kept chopping the tree at the same spot until it went down, beating others with sharper axes who were all over the place. - There are people across the entire smartness band in every technology and every company. Sometimes it's about being in the right place at the right time. I tell my students the same - don't lose the fight before it even starts. Unless you're literally designing rockets or doing PhD-level quant work, you always have a fighting chance. I used to think the same about starting a company and having employees. And I have to do what I preach. I'm releasing my startup this Saturday. It might succeed, it might fail. One thing is certain - I'll keep chopping. #startup #cloud #keepchopping

English

100

Raj Saha@cloudwithraj·2d

@kunalstwt @OracleDevs @techgirl1908 It was great to meet you and others 🙌

English

Kunal Kushwaha@kunalstwt·3d

Fantastic evening connecting with the @OracleDevs team and fellow community members before the main event in San Francisco! 🇺🇸 Looking forward to tomorrow’s technical workshops at Oracle HQ. I’ll be diving deep into building agentic applications and speaking with the executive team.

English

172

7.2K

Raj Saha@cloudwithraj·3d

Super pumped to combine three of my favorite things again - Claude, Kubernetes, and Carlos! Looking forward to presenting with Carlos Santana in KCD NY on how to run Claude Code in Kubernetes securely, and automate it, so platform teams can adopt it. Last time we presented together in Kubecon Paris, we built a working demo, and this will be no different. We are heads down building a demo that we wanna show you in NYC. See you all June 10th in NYC. Talk and ticket link in the comments 👇

English

Raj Saha@cloudwithraj·4d

@petergyang seriously, my kid's middle school sends more email than Big Tech

English

141

Peter Yang@petergyang·4d

You know what would be a good AI automation: When I receive those 10 page weekly newsletters from my kid's school I want AI to tell me if there's early dismissal or anything I should pay attention to.

English

15.6K

Raj Saha@cloudwithraj·4d

@petergyang @AnthropicAI LFG

Peter Yang@petergyang·4d

@cloudwithraj @AnthropicAI next weekend 🔥

English

Peter Yang@petergyang·7 May

Met some legends and friends today at Code with Claude. Of all the AIs, Claude still feels the most like a trusted friend and I’m glad that now @AnthropicAI has the compute to scale.

English

402

25.7K

Raj Saha@cloudwithraj·8 May

We ran the first-ever SA Bootcamp Hackathon. Students had one week to build something real. The winning team? Rishu Gandhi Shravanth Venkatesh Taranmeet K.. They built a flashcard app for people with dyslexia. It's not a tutorial project or a GitHub clone. Something that could actually help someone. They implemented real-world security including login, security, cost optimization of Cloud and Gen AI systems! What made it even better - the judges were SA Bootcamp alumni who've since landed roles at Google, AWS, and Amazon. People who were in these students' exact shoes not long ago. They even shipped a live version, link to try it in the comments. I also recorded a video, any specific area y'all wanna see - demo, design, learnings? The full video is quite long, hence the question. ------ Cohort 8 of SA Bootcamp opens May 16. If you want to be in a community that builds real stuff, not just learns, attend the free launch workshop. 👇 Links in comments.

English

Raj Saha@cloudwithraj·7 May

DynamoDB is NoSQL Database, so it doesn't support indexes - WRONG! Let's dispel myths about Dynamo: Dynamo supports both primary key and secondary indexes! Primary key can have up to two fields called partition key and sort key. Just like relational SQL databases, each record's primary key must be unique. In addition, Dynamo can also have secondary indexes - Local and Global Secondary Indexes. If the secondary index's partition key is the same as the primary key's partition key, then it's local, else global. Global Secondary Index (GSI) is more popular because it allows users to query using non-primary key fields. AWS recently expanded GSI to have upto 8 fields! It is strongly recommended to utilize indexes and keys and query the table, instead of scanning the table. However, as always, there are tradeoffs! GSIs take extra input output capacity, and cost you money. Unlike SQL, NoSQL Dynamo can't have foreign keys. ----- If you want more actionable tips with examples, and cloud interview guides from a real-world architect and mentor, subscribe at lnkd.in/eG7XdHmN (FREE) #aws #systemdesign #nosql

English

Raj Saha@cloudwithraj·7 May

@unusual_whales Fancy term for nightly batch job

English

209

unusual_whales@unusual_whales·6 May

BREAKING: Anthropic has released a feature called "dreaming" which allows AI agents to self-improve

English

377

395

6.6K

843K

Raj Saha@cloudwithraj·6 May

@asha_shar Wow, such welcoming changes. What a legend

English

Asha@asha_shar·5 May

Xbox needs to move faster, deepen our connection with the community, and address friction for both players and developers. Today, we promoted leaders who helped build Xbox, while also bringing in new voices to help push us forward. This balance is important as we get the business back on track. As part of this shift, you’ll see us begin to retire features that don’t align with where we’re headed. We will begin winding down Copilot on mobile and will stop development of Copilot on console.

English

3.3K

5.3K

75K

3.6M

Raj Saha@cloudwithraj·5 May

RAG is simple, but that's not how real-world projects use it. Let's dive in: 👎Bad: Uploading Whole Documents Most beginners think RAG works like this - upload a 100-page PDF, ask questions, get answers. This is the worst approach possible. Why? When you query “What are the security requirements?”, the system retrieves massive chunks containing everything EXCEPT what you need. The LLM gets overwhelmed with irrelevant context, burns through tokens, and gives you mediocre answers. 👍Good: Strategic Document Chunking Smart developers chunk documents into smaller pieces. But here’s where it gets interesting - there’s no one-size-fits-all chunking strategy. Different chunking strategies exist for different content types: Fixed-size chunking - Split every x number (e.g. 512) tokens. Simple, but breaks context mid-sentence Semantic chunking - Split at paragraph or section boundaries. Preserves meaning but creates uneven sizes Recursive chunking - Try large chunks first, split recursively if needed. Best for technical docs Context-aware chunking - Keep code blocks intact, preserve table structures, maintain bullet point groups together Here’s the trade-off - Smaller chunks give precise retrieval but lose context. Larger chunks maintain context but dilute relevance. You need to experiment with your specific content. The next one also shocked me! 👎Bad: Using random model for embedding We were using random embedding models, and saw the accuracy of RAG was changing. Why? 👍Good: Embedding Model Alignment (The Secret Sauce) ... I am at LinkedIn word count. To know about different types of embeddings, and other advanced techniques, read the full article here: lnkd.in/eRxBpHYC #RAG #GenAI #SystemDesign

GIF

English

129

Raj Saha@cloudwithraj·5 May

@kunchenguid Legend!

English

Kun Chen@kunchenguid·5 May

what does it feel like to quit your job and go solo? it's been just a month since I quit my big tech career as an L8 engineer i wrote down my experience of this past month in the most transparent way possible. hope it's helpful to fellow builders! blog.kunchenguid.com/p/the-month-af…

English

157

14.3K

Raj Saha@cloudwithraj·4 May

Claude models felt nerfed last week. LLM was fine, but the agent harness broke as per Anthropic post mortem. This is why the agent harness matters more than most people realize. What is an agent harness, and why does it matter? Each agent works in the Re (Reasoning) and Act (Action) cycle. It keeps on executing this Re-Act till the objective is accomplished - like a loop. That’s why it’s called an agent loop. The agent harness is the surrounding and rules that make the loop reliable: - How you wire tools (MCP, API, inside code) - Where you write artifacts (files, database) - How you manage memory (short term, long term, semantic, episodic etc.) - How you prevent the agent drowning in context (context engineering) - How you log/trace behavior The harness is the difference between a cool demo and something you can ship. If you are going for interviews, or building production agents, you must understand agent harness. 👉Explore the deep dives on these topics (FREE): fandf.co/3QFUc1J Next week I'm heading to San Francisco to build real-world agents with the Oracle Developers team. A decade ago, I was in the Bay Area presenting at conferences about deploying cloud systems to production (Pics attached!). Feels full circle that now I am going there to build production agents on the cloud! #GenAI #AIAgent

English

Raj Saha@cloudwithraj·2 May

They said AI will take over painful tasks, so we can relax. It’s quite the opposite. Every person I talk to - they are asked to do more, learn more. AI increased the pace of “doing”, but instead of using the time saved to do something fun, more tasks are assigned. Now, we have to learn: - New model every month - Agents. Sub-agents. Agent teams. Multi-agent orchestration - RAG. Hybrid search. Re-ranking. Graph RAG. Agentic RAG - Prompt engineering. Then context engineering. Evals - MCP servers. Tool use. Skills. Subagents. Hooks. Memory - Guardrails. Red-teaming. Jailbreak defense. - Fine-tuning. LoRA. QLoRA. RLHF. DPO. RLAIF. Distillation - Vector databases. Chunking strategy. Embedding models - Inference optimization. Batching. Speculative decoding - Model routing. Prompt caching. Session management. Multi-tenant isolation - Data residency for AI. Compliance for AI. Audit logs for AI Your PM discovered Cursor last week and now has opinions about your architecture. And still: system design, on-call, 2X2 docs, stakeholder management, 1:1s, the promo packet, the backlog, the incident from last Thursday. AI didn't take the work off your plate. It added a second plate and told you to balance both. Question to readers - are you feeling overwhelmed or relaxed by AI? --- Get byte sized tips on career switch, cloud, AI, system design, behavioral, and interviews in weekly newsletter (FREE) : lnkd.in/eG7XdHmN

English

Raj Saha@cloudwithraj·2 May

Every cloud interview has this question. As a former Principal Solutions Architect at AWS, and Distinguished Cloud Architect at Verizon, I conducted 300+ interviews. "How will you make your application scalable for a big traffic day?" And almost every candidate gives the same answer: "I'll use an Auto Scaling Group with EC2s and a load balancer to distribute traffic." Technically correct. Completely average. Three years ago that answer was fine. Today it will not get you hired. Start with the foundation - then go deeper. Yes, you use an Auto Scaling Group and a load balancer. That is table stakes. But a big traffic day like Thanksgiving or Diwali means a massive burst of traffic in a very short window. The standard autoscaling response is too slow for that. So here is what you add: - Pre-warm the load balancer. Elastic Load Balancer scales automatically, but if the traffic spike is sudden and enormous, there can be a brief lag. - Use Scheduled Scaling with Warm Pool. If you know traffic is hitting at 8pm on Black Friday, why wait for the CPU threshold to trigger scaling? - Schedule your EC2s to provision ahead of time. Pair that with a Warm Pool - a set of pre-initialized EC2 instances sitting ready in your Auto Scaling Group. When traffic arrives, those instances spin up faster because the initialization work is already done. - Keep your AMI lightweight. Every unnecessary package on your AMI adds provisioning time. Trim it down so new EC2s come online as fast as possible. - Add RDS Proxy for database connections. This one trips up a lot of candidates because they forget the database layer entirely. There are many more techniques. Read the full article (FREE): lnkd.in/eniZEtee #systemdesign #interview #aws

GIF

English

Raj Saha@cloudwithraj·2 May

Walking away from AWS after almost seven years was the hardest thing I’ve ever done. I still remember the day I went to my interview in New York City. I was standing near the Empire State Building, feeling nervous but excited. I had worked so hard for this chance, and now it was finally here. My time at Amazon has been a collection of incredible moments… Intense pride seeing the projects I architected go live, and have global impact. Or the feeling I had when attending a meeting at the Amazon Spheres for the first time. Getting to speak publicly at major conferences like RE:INVENT and DC Summit about things I am passionate about. I switched my career from Mainframe, and started at Amazon at the age of 38, and left at 45. I always had an idea of a product I wanted to build, but I kept telling myself “later”. But around mid-40s, you see your grandparents pass away, and your parents start showing their age. The “later” became “If not now, then when.” For the last 8 months after leaving Amazon, I have been building a product to help others switch their career. I am going to reveal it with the 8th cohort launch of sabootcamp.com on May 16th (register to get the launch invite). Honestly, I am nervous, but whatever the outcome, at least now I can tell myself I took my shot! This is the chapter I've been working toward since that nervous kid stood outside those glass doors in New York City. Onwards and upwards 🙌🚀. #startup #Amazon #takingmyshot

English

Raj Saha@cloudwithraj·25 Nis

Are you struggling to come up with creative words in replies and conversations? You are not hallucinating; this is real. A recent study shows that the more you use AI to write and summarize emails/docs, the more your vocabulary reduces. I am seeing it myself, first-hand. Before, I’d reply to each LinkedIn response myself. Which made me think, and when I came up with something witty, I’d chuckle and find self-satisfaction. In the AI world, I noticed that only shallow, short responses are coming to my mind. Kind of scary. This is what I am doing: - I turned off my AI pipeline to summarize emails, posts, and latest blogs. I started to read the full articles again - In my startup, I have restricted the use of minimum prompting. We have to think through how the design and code structure should be, and then put that as a prompt. Not just “Code me a microservice reading from table X” - It’s taking a bit more time, but may I say, my developer and I feel a little bit more joy coding and completing tasks this way. Serious question to readers - have you been experiencing this? If yes, how are you combating it? ---- Get byte sized tips on career switch, cloud, AI, system design, behavioral, and interviews in weekly newsletter (FREE) : lnkd.in/eG7XdHmN

English

Keşfet

@kunalstwt @OracleDevs @techgirl1908 @petergyang @AnthropicAI @unusual_whales @asha_shar @kunchenguid