Devan💧

518 posts

Devan💧

Devan💧

@639639_DEVAN

Working at @mysten_labs, Regional Solution engineer. https://t.co/npVgXrp4EQ Opinion are my own.

Katılım Mart 2025
248 Takip Edilen26 Takipçiler
Devan💧 retweetledi
Berryxia.AI
Berryxia.AI@berryxia·
Agent 记忆真是太特么卷了啊! 不得不说,这个赛道越多人加入越爽啊! Tencent AI团队花了整整6个月,就死磕一个问题:AI agent长会话里疯狂丢上下文。 他们最后把一套记忆系统做完,直接开源了。 我看完他们的分享,最大的感受是,99%的人还在卷上下文长度,真正把agent拉回正轨的,是这三招硬核操作。 第一招,实时压缩过期上下文。 直接把token消耗砍掉61%。 以前动不动就爆上下文,现在中途就瘦身,agent还能保持清醒。 第二招,给agent画一张结构化的任务地图,用mermaid语法直接生成。 30多步的复杂流程里,丢轨概率大幅下降。 agent不再像无头苍蝇,它知道自己现在在哪一步,该往哪走。 第三招,专门给agent建了Persona记忆。 人格一致性从48%直接跳到76%。 它不再一会儿专业一会儿随意,回答风格和角色设定稳得一批。 这套东西不是理论,是他们真实踩坑6个月踩出来的。 Repo已经挂出来了,谁在做agent的赶紧去试。 以前总觉得agent记忆难,是因为我们把问题想简单了。 真正难的不再是存更多信息,是需要解决让它在正确的时间用正确的方式想起正确的东西。 你还在靠堆token解决agent记忆问题吗? 项目地址:github.com/Tencent/Tencen…
Berryxia.AI tweet media
Tencent AI@TencentAI_News

We spent 6 months on one problem: agents losing context in long sessions. Ended up building and open-sourcing an agent memory system. A few things we learned: 🪄compressing stale context mid-session cut token usage by 61% 🪄giving agents a structured task map (mermaid-based) made them way less likely to lose track in 30+ step workflows 🪄persona coherence jumped from 48% to 76% once we added dedicated persona memory repo 👉 github.com/Tencent/Tencen… Agent memory is genuinely hard and we don't have all the answers. Happy to dig into architecture, benchmarks, tradeoffs, whatever. AMA👇 @TencentDBAbxo2 team is here to talk about it.

中文
24
186
875
119.3K
Devan💧 retweetledi
Avid
Avid@Av1dlive·
in 15 minutes, 2 Senior Staff Engineers at Airbnb gave a Live Lecture on Agentic Coding Airbnb already shipped one of the most ambitious LLM-agent migrations in production. Tonight two of their senior engineers shows how they actually build with agents in 2026. Most builders are guessing. These guys ship. bookmark & watch this.then read the complete article below.
Avid@Av1dlive

x.com/i/article/2053…

English
67
358
3.5K
676.7K
Devan💧
Devan💧@639639_DEVAN·
@devkoriel 설명해주셔셔 감사합니다. 🙏지금 저기 패널이 찍힌게 22만원이라고 이해하면 될까요?
한국어
1
0
0
17
Jinsoo Heo
Jinsoo Heo@devkoriel·
@639639_DEVAN 그렇죠 ㅎㅎ 저기 사진에 찍힌거 말곤 멀쩡하고 뭐 기능엔 당연히 영향 없습니다.
한국어
1
0
0
15
Jinsoo Heo
Jinsoo Heo@devkoriel·
스타링크 스탠다드 4를 염가로 하나는 15만원 나머지 하나는 22만원에 팔면 사실 분 계시려나...
Jinsoo Heo tweet mediaJinsoo Heo tweet media
한국어
1
0
2
259
Devan💧
Devan💧@639639_DEVAN·
@devkoriel 아하?? ㅎㅎㅎ 그렇군요. 22만원짜리는 그럼 상등품인지?
한국어
1
0
0
20
Devan💧 retweetledi
Abhishek Singh
Abhishek Singh@0xlelouch_·
As a Senior Backend Engineer trying to move towards Staff, I can tell you one thing clearly: At Senior level, knowing system design fundamentals is not enough anymore. You are expected to design a good system. At Staff level, you are expected to design the right system for the business, explain the tradeoffs, influence multiple teams, reduce long term operational pain, and make sure the system does not collapse when traffic, teams, and complexity grow. So if you are already good at system design but still feel stuck at Senior, spend the next 3-6 months building these Staff Engineer muscles. Architecture & Technical Strategy ↬ System boundaries ↬ Platform thinking ↬ Build vs buy decisions ↬ Monolith decomposition ↬ Multi-region architecture ↬ Migration strategies ↬ Backward compatibility ↬ API contracts ↬ Long-term maintainability ↬ Reducing operational complexity ↬ Designing for org structure ↬ Architecture decision records ↬ Technical roadmap planning ↬ Removing accidental complexity ↬ Identifying single points of failure ↬ Choosing boring technology ↬ Knowing when not to build ↬ Designing systems that teams can own Scalability & Distributed Systems ↬ Caching strategy ↬ Queueing strategy ↬ Partitioning ↬ Sharding ↬ Replication ↬ Leader election ↬ Rate limiting ↬ Load shedding ↬ Backpressure ↬ Fan-out/Fan-in ↬ Idempotency ↬ Retry storms ↬ Consistency models ↬ Eventual consistency ↬ Distributed transactions ↬ Data locality ↬ Hot partitions ↬ Graceful degradation ↬ Capacity planning ↬ Failure mode analysis Databases & Data Architecture ↬ Data modeling ↬ Indexing strategy ↬ Query patterns ↬ Read/write scaling ↬ OLTP vs OLAP ↬ CDC ↬ WAL ↬ Transaction isolation ↬ Schema evolution ↬ Data retention ↬ Backup and restore ↬ Archival strategy ↬ Hot/cold storage ↬ Multi-tenant data design ↬ Event sourcing ↬ CQRS ↬ Denormalization tradeoffs ↬ Data correctness ↬ Reprocessing pipelines ↬ Analytics vs product database separation Reliability & Operations ↬ SLO/SLI/SLA ↬ Error budgets ↬ Alert quality ↬ Incident response ↬ Postmortems ↬ Runbooks ↬ On-call pain reduction ↬ Canary deployments ↬ Rollbacks ↬ Feature flags ↬ Disaster recovery ↬ Load testing ↬ Chaos testing ↬ Health checks ↬ Circuit breakers ↬ Distributed tracing ↬ Metrics design ↬ Log quality ↬ Dependency failure handling ↬ Designing for recovery, not perfection Execution & Influence ↬ Writing design docs ↬ Getting alignment ↬ Mentoring seniors ↬ Reviewing architecture ↬ Asking better questions ↬ Challenging vague requirements ↬ Explaining tradeoffs simply ↬ Driving cross-team projects ↬ Creating technical standards ↬ Reducing duplicate systems ↬ Unblocking other teams ↬ Making hidden risks visible ↬ Communicating with product ↬ Saying no with reasoning ↬ Turning ambiguity into execution ↬ Making other engineers more effective The Senior to Staff jump is not just about “I can build complex systems.” It is: “I can help the org make better technical decisions, avoid expensive mistakes, and create systems that other engineers can safely build on top of.” That is the mindset shift imo.
Puneet Patwari@system_monarch

As a Principal Backend Engineer with over 12 years of experience, I can tell you quite certainly that if you're still getting rejections in system design interviews after good efforts, I think your fundamentals are not strong... Dedicate 2-3 months to mastering these design fundamentals, then practice designing a few systems(and do plenty of mock interviews). Scaling & Architecture ↬ CDN ↬ Caching ↬ Sharding ↬ Queueing ↬ Replication ↬ Partitioning ↬ API Gateway ↬ Rate Limiting ↬ CAP Theorem ↬ Microservices ↬ Load Balancing ↬ Fault Tolerance ↬ Database Scaling ↬ Service Discovery ↬ Consistency Models ↬ Eventual Consistency ↬ Distributed Transactions ↬ Monolith vs Microservices ↬ Leader Election Databases & Storage ↬ Leader-Follower Replication ↬ WAL (Write Ahead Log) ↬ Asynchronous Processing ↬ Transaction Isolation ↬ Read/Write Patterns ↬ Consistent Hashing ↬ Redis/Memcached ↬ Backup & Restore ↬ Hot/Cold Storage ↬ Data Partitioning ↬ Object Storage ↬ SQL vs NoSQL ↬ Data Retention ↬ Data Modeling ↬ OLAP vs OLTP ↬ ACID & BASE ↬ Bloom Filters ↬ File Systems ↬ S3 Basics ↬ B+ Trees ↬ Indexing Communication & APIs ↬ JWT ↬ CORS ↬ OAuth ↬ Throttling ↬ Serialization ↬ API Security ↬ Long Polling ↬ WebSockets ↬ API Gateway ↬ Idempotency ↬ Service Mesh ↬ Retry Patterns ↬ REST vs gRPC ↬ API Versioning ↬ Circuit Breaker ↬ API Rate Limits ↬ Fan-out/Fan-in ↬ Protocol Buffers ↬ Message Queues ↬ Dead Letter Queue Reliability & Observability ↬ Metrics ↬ Alerting ↬ Failover ↬ Logging ↬ Rollbacks ↬ Monitoring ↬ Heartbeats ↬ Retry Logic ↬ Autoscaling ↬ SLO/SLI/SLA ↬ Load Testing ↬ Error Budgets ↬ Health Checks ↬ Circuit Breaker ↬ Incident Response ↬ Chaos Engineering ↬ Distributed Tracing ↬ Canary Deployments ↬ Graceful Degradation ↬ Blue-Green Deployment

English
18
120
1.2K
143.6K
Devan💧 retweetledi
lucas
lucas@lucas_flatwhite·
무신사 AI 네이티브 채용 시리즈 트릴로지를 내부 테크 세션에서 공유하면서 대화를 나눠봤어요. 텍스트 하나하나 확인하면서 느껴보려고 했는데, 굉장히 많은 고민들이 녹아있어서 1,2,3부작을 정말 흥미진진하게 읽게 되었네요. 이래서 철학은 중요합니다..! 그리고 역시 하네스 엔지니어링은 설계보다 실행 과정에서 더 많은 비용이 든다.. 무신사에서는 많은 사람을 채용해야 하는 상황에서 꽤 많은 리소스가 필요했던 것으로 보입니다. 그렇기 때문에 이런 고민을 피할 수 없을 것 같구요. 풀어나가는 과정이 멋졌어요.. 이런 고민이 이건 개발자 채용에만 국한될까? 이런 생각을 해봤는데 결코 그렇지 않을 것 같습니다. 이 과정에서 많은 고민과 논의가 있겠죠. 꼭 읽어보시길 권해요. 채용 너머 에이전트로 무언가를 만드는 모든 사람들에게 해당되는 얘기라고 생각합니다. 💬 - LeetCode 스타일 알고리즘 문제는 AI 에이전트가 몇 초 만에 풀 수 있으므로 더 이상 변별력이 없음. - 구현 비용이 극적으로 낮아진 시대에 병목은.. 무엇을 만들지 결정하는 능력으로 이동함. - 테스트 설계에서 모호함의 수준을 조절하는 것이 가장 어려운 문제임. 너무 열면 신호 부족, 너무 닫으면 AI가 대신 풀어버림. - "오픈소스처럼 문서화하세요"는 요구가 테스트 가능성과 모호함 사이의 딜레마를 해결하는 열쇠였음. - 멀티 에이전트 아키텍처에서 컨텍스트 격리가 핵심임. 한 에이전트가 순차 채점하면 이전 제출물의 기억이 다음 채점을 오염시킴. - 마크다운으로 파이프라인을 정의하면 수정 즉시 반영됨. 코드 컴파일/배포 없이 하네스를 빠르게 고칠 수 있음. - JSON Schema strict 모드로 AI 출력의 구조를 강제하지 않으면, 400명 규모에서 형식 불일치가 누적됨. - 후보자마다 API 엔드포인트가 다르기 때문에 AI가 문서와 소스를 읽고 동적으로 매핑해야 함. - 빌드 실패해도 코드 리뷰는 수행 가능함. 실행할 수 없다고 읽을 수 없는 건 아님. - 채점 초기에 동일 지원자 점수가 ±6~11점 변동했으나, 증거 체크리스트 추가 후 ±3으로 수렴함. - 캐싱, 인증, 모니터링 같은 추가 구현은 존재 여부가 아니라 통합 품질과 문서화 수준으로 평가해야 함. - 일종의 중간계 랭크가 면접에서 가장 극적으로 갈림. 같은 패턴으로 분류된 사람이 상위권 사고력을 보이기도, 자기 코드를 설명 못하기도 함. - 면접 질문은 범용이 아니라 후보자의 실제 코드(파일:라인)에서 출발해야 깊이를 확인할 수 있음. - 프롬프트 이력은 형태가 제각각이고 AI가 리터치한 요약본인 경우가 많아 변별력이 약했음. - 설계 문서(왜 이렇게 만들었는지)를 평가 중심에 놓았어야 했음. 코드는 의도의 그림자일 뿐임.. - 면접관도 AI 가이드를 "따르는" 것과 "활용하는" 것의 차이가 있음. 도구를 쓰는 깊이가 결과를 가르는 건 후보자와 동일한 원리임. - 지금은 "AI가 만든 코드를 이해하는 사람"이 중요하지만, 자동차 네비게이션이 그랬듯 이 중요함이 얼마나 오래 갈지는 확신할 수 없음. 여기서 시작해요! techblog.musinsa.com/the-philosophy…
lucas tweet medialucas tweet medialucas tweet media
한국어
2
23
138
13.4K
Devan💧 retweetledi
Joruno
Joruno@wsl8297·
加州大学开放课程《大语言模型的强化学习》,用“理论 + 实战”的方式,把 AI 训练的关键技术从零到一讲透,帮你系统建立从强化学习到 LLM 训练的完整框架。 课程内容覆盖全面,配套资源齐全:讲座幻灯片、完整视频、实践练习一应俱全,学完就能上手做。 课程地址:ernestryu.com/courses/RL-LLM… 你将学到: - 深度强化学习核心:MDP、策略梯度、A3C、PPO 等关键算法 - 大语言模型基础:NLP、语言建模、RNN 等入门与脉络 - RLHF 全流程拆解:基于人类反馈的训练方法与落地思路 - 可验证奖励强化学习:面向更安全、更可靠的训练范式 - 动手实践:Jupyter 代码示例 + 课后作业,边学边练 课程由 UCLA 数学系助理教授主讲,YouTube 提供全套视频,内容扎实,适合想把“RL + LLM 训练”真正学明白的人。
Joruno tweet media
中文
13
176
951
34.8K
Devan💧 retweetledi
Chao Ma
Chao Ma@ickma2311·
Efficient AI Lecture 12: Transformer and LLM This lecture is not only about how LLMs work. It also explains the building blocks behind them: multi-head attention, positional encoding, Transformer variants, and KV cache. LLMs are not one single trick. They are a stack of design choices where architecture and efficiency are deeply connected. My note: ickma2311.github.io/ML/HW-SW-codes…
Chao Ma tweet media
English
1
43
276
11K
Devan💧 retweetledi
Alexey Grigorev
Alexey Grigorev@Al_Grigor·
A new cohort of LLM Zoomcamp starts on June 8, 2026. It’s a free 10-week course where you go from LLM basics to building a production-ready AI assistant. For this cohort, I'll update the course content during a series of live workshops. In the course, you'll learn: - Retrieval-Augmented Generation - Vector search and embeddings - AI agents - Function calling and tool use - Evaluation of RAG and agentic systems - Monitoring LLM applications Join the new cohort and build your LLM application step by step: github.com/DataTalksClub/…
Alexey Grigorev tweet media
English
3
26
121
5.1K
Devan💧 retweetledi
Tech with Mak
Tech with Mak@techNmak·
Jay Alammar is the best teacher in AI. Period. If you have ever seen "The Illustrated Transformer," you know his diagrams are legendary. He also open-sourced the entire codebase for his O'Reilly book: Hands-On Large Language Models. It’s effectively a visual masterclass in LLMs for free. Chapter 1: Introduction to Language Models Chapter 2: Tokens and Embeddings Chapter 3: Looking Inside Transformer LLMs Chapter 4: Text Classification Chapter 5: Text Clustering and Topic Modeling Chapter 6: Prompt Engineering Chapter 7: Advanced Text Generation Techniques and Tools Chapter 8: Semantic Search and Retrieval-Augmented Generation Chapter 9: Multimodal Large Language Models Chapter 10: Creating Text Embedding Models Chapter 11: Fine-tuning Representation Models for Classification Chapter 12: Fine-tuning Generation Models I will put the repo link in the comments.
Tech with Mak tweet media
English
7
221
944
73.7K
Devan💧 retweetledi
Jason Zhu
Jason Zhu@GoSailGlobal·
Stanford CS336 上,Tatsu 讲了一节 LLM 架构课,把过去 3 年所有主流 LLM 拆开,看它们的共通模板 结论挺爆:90% 的架构选择已经收敛,你随便挑一个开源大模型,它跟其他模型在这些维度上几乎一模一样 讲师的原话 - 2024 年大家都在 cosplay Llama2 - 2025 年的主题是「怎么训得不崩」 - 2026 年的主题是「怎么扛住长上下文」 下面是 2026 年开源 LLM 的标准模板 你训自己的模型可以直接抄 【架构层 已经收敛的 7 件事】 1)Layer Norm 挪出残差流(pre-norm) 原版 Transformer 把 LN 放在残差里 几乎所有现代模型都挪到外面 原因:keep your residual stream clean 梯度反传更稳 2)RMS Norm 替代 LayerNorm LayerNorm 的减均值 + 加 bias 那部分实际没怎么帮上忙 丢掉之后 flops 只省 0.17% 但运行时省到 25% (瓶颈在数据搬运 计算反而次要) 3)所有 bias 项全删 跟 RMS Norm 一个道理 系统层省内存搬运 4)激活函数用 SwiGLU 或 GeGLU gated linear unit 几乎所有现代模型都用 Llama 系 / Qwen / Mistral 用 SwiGLU Google 系(Gemma / T5)用 GeGLU 区别极小 选哪个都行 5)位置编码用 RoPE 2024 年之后基本统一了 原理:把每对维度按位置旋转一个角度 让 inner product 只依赖相对位置 6)Transformer block 串联(不是并联) GPT-J / Palm 试过并联 现在基本被放弃 串联的实现优化得太好了 并联省的那点系统开销不值得损失表达力 7)Layer norm 可以「撒」 哪儿不稳就在哪儿加 LN attention 之前能加 之后能加 两边都加(double norm)也可以 现代模型很多这样做 【超参数 已经收敛的 5 个数】 1)feedforward 维度 / hidden 维度 - 非 GLU 模型:4 倍 - GLU 模型:8/3 ≈ 2.67 倍(因为 GLU 多一组矩阵 要保持总参数量) - Llama 系:3.5 倍 - T5 1.0 试过 64 倍 后来 T5 1.1 改回标准 别学 2)head 数 × head 维度 ≈ hidden 维度 几乎所有模型都遵守 T5 是为数不多的例外 3)模型纵横比(hidden / 层数)≈ 100 太深 pipeline parallel 难做 太宽 表达力受限 100 这个数字是系统约束 + 表达力的平衡点 4)vocab size 单语模型:30K 左右(早期 GPT-2 那种) 多语 / 通用模型:100K-200K(GPT-4 / Llama 3 / Gemma 都在这个范围) 现代基本都是后者 5)weight decay 仍然普遍使用 但研究发现它在 LLM 里干的事其实是优化器干预 让你最终能收敛到更深的最优点 跟你想的「防过拟合」没什么关系 所以别因为「单 epoch 不会过拟合」就把它关掉 【稳定性 三个救命 trick】 训练大模型最怕中途 loss 突然飙升 然后 NaN 全军覆没 现代模型用三个 trick 防这件事 1)Z-loss output softmax 的 normalizer 容易爆 加一个 (log Z)² 的正则项 让 Z 始终接近 1 DCLM / Olmo 都用 2)QK norm attention 的 Q 和 K 在矩阵乘之前各加一个 LN 让 softmax 的输入永远是单位尺度 multimodal 圈先用起来 现在所有大模型都加 3)Logit soft cap(仅 Google 系) attention logit 用 tanh 硬封顶 Gemma 2/3/4 都在用 但会损失一点点性能 慎用 【Attention 两个新趋势】 1)GQA(Grouped Query Attention)几乎统一 原版 multi-head 推理时 KV cache 会让算术强度崩到 1/h GQA 共享 K 和 V 但保留多个 Q 表达力几乎不损失 推理成本砍掉 80% 现在所有要做生产部署的大模型 没有不用 GQA 的 2)局部 + 全局 attention 交替 处理长上下文的新方式 Cohere Command A 起头 现在 Llama 4 / Gemma 4 / Olmo 3 全在用 比如每 4 层有 1 层 full attention 其他 3 层是 sliding window 只看附近的 token 比纯 SSM 更稳 比纯 full attention 便宜得多 (Qwen 3.5 做了变体 把 sliding window 那 3 层换成 SSM) 收尾一句 如果你正在训自己的 LLM,上面这一套就是 2026 年的「默认配置」 不需要重新发明,直接抄 如果你只是想看懂 GitHub 上那些 modeling_xxx.py 这一份足够你不再被术语吓住
Roan@RohOnChain

Anthropic pays $750,000+ a year for engineers who can build LLM architectures from scratch. Stanford taught the entire thing in 1 hour lecture & released it for free. Bookmark & watch this today before someone takes it down.

中文
28
586
3.1K
527.3K
Devan💧 retweetledi
크롱
크롱@Krongggggg·
와... 일레븐랩스가 로컬에서 LLM을 처음부터 직접 훈련시키는 방법을 영상으로 방금전에 공개함. 대애박... Google Colab 환경에서 T4 GPU를 활용해 약 15분 내외로 직접 모델을 학습시켜 볼 수 있는 실습 가이드를 제공 youtube.com/watch?v=UsB70T…
YouTube video
YouTube
한국어
0
76
278
18.9K
Devan💧 retweetledi
Zak 🦈 (e/acc)
Zak 🦈 (e/acc)@ZakShark·
Formez vous à l'inference/kernel engineering. Savoir bien optimiser les GPU kernels dans les workloads d'inference vaut de l'or. Maitriser CUDA ou Triton, vLLM, SGLang, TensorRT-LLM est un vrai plus si vous voulez vous démarquer pour 2026-2027 en que AI/ML Engineer.
Français
11
47
498
21.2K
Devan💧 retweetledi
How To AI
How To AI@HowToAI_·
The entire RAG industry is about to get cooked. Researchers have built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. It's called PageIndex. Instead of chunking your docs and stuffing them into pinecone, it builds a tree index and lets the LLM reason through it like a human reading a book. hit 98.7% on financebench. beats every vector RAG on the leaderboard. no embeddings. no chunking. no vector DB. 100% open source.
How To AI tweet media
English
224
782
6.9K
608.7K
Devan💧 retweetledi
Pallavi
Pallavi@pallavishekhar_·
Learn LLM internals step by step - from tokenization to attention to inference optimization github.com/amitshekhariit…
Pallavi tweet media
English
2
126
784
45.4K
Devan💧 retweetledi
Jahir Sheikh
Jahir Sheikh@jahirsheikh8·
90% of AI Engineer interviews in 2026 will test these concepts: * Transformers / Attention * Embeddings / Vector Search * RAG Architecture * Fine-Tuning / LoRA / PEFT * Prompt Engineering / Structured Outputs * LLM Evaluation / Benchmarking * Hallucination / Guardrails * Inference / Latency Optimization Not just “build a chatbot.”
English
39
81
612
73.6K
Devan💧 retweetledi
Mr Panda
Mr Panda@PandaTalk8·
176 页,50 万次下载,一本能装进手机的深度学习教科书。 日内瓦大学教授 François Fleuret 写的 The Little Book of Deep Learning,是我见过信息密度最高的 AI 入门读物: Part I 基础——机器学习、损失函数、梯度下降、反向传播、Scaling Laws Part II 模型——卷积网络、注意力机制、Transformer、GPT、ViT Part III 应用——图像分类、目标检测、语音识别、文本生成、图像生成 每一页都配图解,每个概念点到即止,不废话。 最适合两类人: 想系统补一遍 AI 底层知识的从业者, 以及被千页教材劝退过的初学者。 它做一件事:把深度学习从 CNN 到 Transformer 到 GPT 的完整脉络,压缩到了一个人能在一周内读完的体量,同时没有牺牲任何数学严谨性。 作者 François Fleuret 的原则很简单——不追求穷尽一切,只讲理解核心模型所必需的知识。 如果你一直想系统学一遍深度学习但被大部头劝退过,这本书可能是最好的起点。 免费:fleuret.org/public/lbdl.pdf
Mr Panda tweet media
中文
15
292
1.3K
149.1K
Devan💧 retweetledi
Jahir Sheikh
Jahir Sheikh@jahirsheikh8·
As an AI Infrastructure Engineer. Please learn: - GPU/VRAM fundamentals, quantization & batching - vLLM / TensorRT-LLM / inference optimization - KV caching, speculative decoding & token throughput - Distributed training basics (DDP/FSDP/DeepSpeed) - Model serving & autoscaling - Vector DB retrieval pipelines - Prompt caching & cost optimization - Observability for LLM apps This is what production AI teams actually care about.
English
68
416
2.8K
236.4K