Tech with Mak

4.9K posts

Tech with Mak banner
Tech with Mak

Tech with Mak

@techNmak

AI, coding, software, and whatever’s on my mind.

Katılım Temmuz 2024
773 Takip Edilen38.2K Takipçiler
Sabitlenmiş Tweet
Tech with Mak
Tech with Mak@techNmak·
In 1948, a 32-year-old at Bell Labs published a paper nobody fully understood. Engineers found it too mathematical. Mathematicians found it too engineering-focused. One prominent mathematician reviewed it negatively. That paper - "A Mathematical Theory of Communication", became the founding document of the digital age. The man was Claude Shannon. Father of Information Theory. At 21, he wrote the most important master's thesis of the 20th century. Working at MIT on an early mechanical computer, Shannon noticed its relay switches had exactly two states - open or closed. He had just taken a philosophy course introducing Boolean algebra, which also operated on two values: true and false. Nobody had ever connected these two things. His 1937 thesis proved that Boolean algebra and electrical circuits are mathematically identical, and that any logical operation could be built from simple switches. Howard Gardner called it "possibly the most important, and also the most famous, master's thesis of the century." Every digital computer ever built traces back to this insight. At 29, he proved that perfect encryption exists. During WWII, Shannon worked on classified cryptography at Bell Labs. His work contributed to SIGSALY, the secure voice system used for confidential communications between Roosevelt and Churchill. In a classified 1945 memorandum, he mathematically proved the one-time pad provides perfect secrecy, unbreakable not just computationally, but provably, permanently, against an adversary with infinite power. When declassified in 1949, it transformed cryptography from an art into a science. It laid the foundations for DES, AES, and every modern encryption standard. At 32, he defined what information is. His 1948 paper introduced one equation: H = −Σ p(x) log p(x) Shannon entropy. The average uncertainty in a probability distribution. The minimum bits required to encode a message. Three things followed: > He defined the bit - the fundamental unit of all information. His colleague John Tukey coined the name. > He proved the channel capacity theorem, every communication channel has a maximum rate of reliable transmission. You can approach it. You can never exceed it. > He unified telegraph, telephone, and radio into a single mathematical framework for the first time. Robert Lucky of Bell Labs called it the greatest work "in the annals of technological thought." Where his equation lives in AI today: Cross-entropy loss - the function training every classifier and language model, is derived directly from H. Decision tree splits use information gain, which is H applied to data. Perplexity, the standard LLM evaluation metric, is an exponentiation of cross-entropy. Every time a neural network trains, Shannon's formula runs inside it. He also built the first AI learning device. In 1950, Shannon built Theseus, a mechanical mouse that navigated a maze through trial and error, learned the correct path, and repeated it perfectly. Mazin Gilbert of Bell Labs said: "Theseus inspired the whole field of AI." That same year he published the first paper on programming a computer to play chess. He co-organized the 1956 Dartmouth Workshop, the founding event of AI as a field. The man: He rode a unicycle through Bell Labs hallways while juggling. He built a flame-throwing trumpet, a rocket-powered Frisbee, and Styrofoam shoes to walk on the lake behind his house. He called his home Entropy House. When asked what motivated him: "I was motivated by curiosity. Never by the desire for financial gain. I just wondered how things were put together." In 1985, he appeared unexpectedly at a conference in Brighton. The crowd mobbed him for autographs. Persuaded to speak at the banquet, he talked briefly, then pulled three balls from his pockets and juggled instead. One engineer said: "It was as if Newton had showed up at a physics conference." He died in 2001 after a decade with Alzheimer's, the cruel irony of information slowly leaving the mind of the man who defined what information was. Claude, the AI model, is named after Claude Shannon, the mathematician who laid the foundation for the digital world we rely on today.
Tech with Mak tweet media
English
195
2.1K
7.7K
458.9K
Tech with Mak
Tech with Mak@techNmak·
Someone finally built a security database for the Claude ecosystem. It's called ClaudeSec, and Pluto Security just launched it for free. Here's the gap it fills => 53 new Claude connectors shipped in the last 30 days. Your security team reviewed zero of them. Someone on your team authorized at least one. Most enterprises adopting Claude have no process to evaluate connectors before authorization. ClaudeSec tracks 384 connectors. 103 flagged high risk. That's around 27% of the ecosystem. Every entry shows: → What capabilities the connector actually has → What tools it exposes to the model → Why it's rated risky → Source-code findings where they did the review Security guides are live for Claude Managed Agents and Cowork. Real configuration - policies, hooks, permission scopes, allow/deny rules. The Cowork guide is the one Enterprise teams need to read first. Cowork runs code, browses with real user sessions, and operates unattended. The architecture is solid, gVisor sandbox, layered network controls. But Cowork activity is excluded from Audit Logs, the Compliance API, and Data Exports. All plan tiers. Including Enterprise. Your visibility tools don't see what Cowork is doing. Claude Code and Office Agents guides ship next. The curated news feed flags CVEs and incidents as they happen. The window between a connector being compromised and detection is roughly 3 hours. The feed is built around that window. Read here: ClaudeSec: claudesec.pluto.security Launch blog: pluto.security/blog/introduci… Cowork teardown: pluto.security/blog/claude-co… Thanks to @pluto_security for supporting this post.
Tech with Mak tweet media
English
0
6
18
894
Tech with Mak
Tech with Mak@techNmak·
The full AI engineering curriculum is now free. It's called AI Engineering from Scratch. 20 phases, 428 lessons, roughly 320 hours end to end. Free. MIT license. Runs on your own laptop. The design principle that makes it different from everything else => every algorithm gets built from raw math before a single framework loads. Backprop by hand. Tokenizer by hand. Attention by hand. Agent loop by hand. Then you implement the same thing in PyTorch or sklearn. By the time the production library appears, you already know what it's doing underneath. Every lesson ends with something you keep: → Prompt templates for any AI assistant → Skill files for Claude, Cursor, Codex, OpenClaw, Hermes  → Agent definitions you wrote the loop for yourself  → MCP servers built from scratch in Phase 13 428 lessons means 428 artifacts by the end. Tools you built and actually understand. The full 20 phases: → Phase 0 - Setup & Tooling (12 lessons)  → Phase 1 - Math Foundations (22 lessons)  → Phase 2 - ML Fundamentals (18 lessons)  → Phase 3 - Deep Learning Core (13 lessons)  → Phase 4 - Computer Vision (28 lessons)  → Phase 5 - NLP (29 lessons)  → Phase 6 - Speech & Audio (17 lessons)  → Phase 7 - Transformers Deep Dive (14 lessons)  → Phase 8 - Generative AI (14 lessons)  → Phase 9 - Reinforcement Learning (12 lessons)  → Phase 10 - LLMs from Scratch (22 lessons)  → Phase 11 - LLM Engineering (15 lessons)  → Phase 12 - Multimodal AI (25 lessons)  → Phase 13 - Tools & Protocols (23 lessons)  → Phase 14 - Agent Engineering (42 lessons)  → Phase 15 - Autonomous Systems (22 lessons)  → Phase 16 - Multi-Agent & Swarms (25 lessons)  → Phase 17 - Infrastructure & Production (28 lessons)  → Phase 18 - Ethics, Safety & Alignment (30 lessons)  → Phase 19 - Capstone Projects (17 projects, 20-40 hours each) Python, TypeScript, Rust, Julia throughout. GitHub Repo: github.com/rohitg00/ai-en…
Tech with Mak tweet media
English
11
118
529
21K
Tech with Mak
Tech with Mak@techNmak·
I finally found someone who explained why LLM inference is fundamentally different from regular inference… without overcomplicating it. just a guy casually walking and dropping one of the clearest AI explanations on the internet.
English
1
5
74
4.4K
Tech with Mak
Tech with Mak@techNmak·
For 38 years, computer scientists believed Dijkstra's algorithm was optimal for sparse graphs. The logic seemed airtight: Dijkstra sorts vertices by distance. Sorting has a lower bound of O(n log n). Therefore shortest paths can't be faster. 5 researchers proved the assumption wrong. The trick => combine Dijkstra's priority queue with Bellman-Ford's dynamic programming. Divide and conquer on vertex sets. Shrink the frontier. Result: O(m log^(2/3) n) First improvement for directed graphs since Fibonacci heap in 1987. Tsinghua. Stanford. Max Planck. 17 pages.
Tech with Mak tweet media
English
5
25
164
11.8K
Tech with Mak
Tech with Mak@techNmak·
Most devs are barely scratching the surface with Claude Code. Here's a full cheat sheet. Core building blocks - ✦ CLAUDE[.]md - persistent project instructions and repo context loaded every session. Keep it short and durable. ✦ Skills - reusable workflows packaged in SKILL[.]md files, loaded on demand or when relevant. Use for repeatable tasks like code review, release checklists, and debugging playbooks. ✦ Hooks - user-defined shell commands, HTTP endpoints, or LLM prompts that run automatically at lifecycle points. Use for enforcement, not vague advice. ✦ Subagents - focused specialists with isolated context for side investigations, security reviews, and test generation. Don't spawn them for trivial edits. ✦ Agent Teams - multiple Claude Code sessions coordinated by a lead, with teammates working independently in parallel. ✦ MCP - Model Context Protocol connections to external tools and data. Connect only what you actually need. ✦ Plugins - installable bundles of skills, agents, hooks, and MCP servers for teams and repeated use. Commands - Setup: /init → generate starter CLAUDE[.]md /memory → edit memory files /permissions → manage allow / ask / deny rules /mcp → manage MCP server connections /agents → manage subagents During a task: /plan → read-only mode before large changes /model → switch model for the session /effort → adjust reasoning depth /context → visualize context window usage /compact → summarize conversation to free context /goal → keep working until a condition is met /btw → ask a side question without polluting history Review & ship: /diff → inspect uncommitted changes /review → deeper read-only code review pass /security-review → security-focused review of pending changes /doctor → diagnose installation and runtime issues Advanced: /background → detach session as a background agent /batch → decompose a large change into parallel work units /tasks → list running background tasks /loop → run a prompt repeatedly on a schedule Session control: /clear → fresh conversation, keeps project memory /resume → reopen an earlier conversation /rewind → roll back code and conversation to a checkpoint (aliases: /checkpoint, /undo) Best Practices - ✅ Keep CLAUDE[.]md short, durable, and repo-specific ✅ Use skills for repeatable procedures ✅ Use hooks for rules that must run automatically ✅ Use /context and /compact to control context growth ✅ Create only the MCP servers you actually need ✅ Review diffs and run tests before shipping ✅ Use subagents for deep side investigations so the main thread stays clean ✅ Prefer skills over giant always-on memory files Failure Patterns - ✗ Giant CLAUDE[.]md with tutorials and fast-changing info ✗ No verification loop after edits ✗ Context drift from long sessions without /compact ✗ Too many connected MCP tools adding noise ✗ Touching unrelated code during a focused task Treat it like an engineering environment, not a chat interface. The setup investment pays back on every task.
Tech with Mak tweet media
English
2
11
45
2.8K
Tech with Mak
Tech with Mak@techNmak·
This math sits underneath every AI model being trained right now. Gradient. Jacobian. Hessian. Three words that look intimidating at first. But they are really just three ways of measuring change. 𝟭. 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 ∇f Takes a scalar function: f : ℝⁿ → ℝ Returns a vector of first-order partial derivatives. It answers: "Which direction makes f increase fastest?" That is why gradients are central to optimization. Gradient descent moves in the opposite direction because the gradient points uphill. Backpropagation efficiently computes gradients during training. 𝟮. 𝗝𝗮𝗰𝗼𝗯𝗶𝗮𝗻 J_F Takes a vector-valued function: F : ℝⁿ → ℝᵐ Returns an m × n matrix of first-order partial derivatives. It answers: "How does each output change with each input?" The Jacobian is the local linear map of a vector-valued function. It shows up in: → sensitivity analysis → change of variables → automatic differentiation → forward-mode AD → reverse-mode AD / backpropagation In simple terms: forward-mode AD uses Jacobian-vector products. reverse-mode AD uses vector-Jacobian products. 𝟯. 𝗛𝗲𝘀𝘀𝗶𝗮𝗻 H_f Takes a scalar function: f : ℝⁿ → ℝ Returns an n × n matrix of second-order partial derivatives. It answers: "How does the gradient itself change?" That means the Hessian measures curvature. When the second partial derivatives are continuous, the Hessian is symmetric. At a critical point: → positive definite Hessian → strict local minimum → negative definite Hessian → strict local maximum → indefinite Hessian → saddle point The clean mental model Gradient = first derivatives of one output → tells you direction Jacobian = first derivatives of many outputs → tells you sensitivity Hessian = second derivatives of one output → tells you curvature And the relationship between them is simple: The Hessian is the Jacobian of the gradient. For a scalar output, the Jacobian contains the same partial derivatives as the gradient, up to row/column convention. Same idea: measure change. Different object: direction, sensitivity, curvature. Once this clicks, optimization stops looking like a pile of formulas. It starts looking like a map of the problem.
Tech with Mak tweet media
English
29
342
1.4K
50.5K
Tech with Mak
Tech with Mak@techNmak·
Learn LLMs from Stanford this weekend. Stanford's Autumn 2025 Transformers & LLMs course is fully public and 100% free. In 9 lectures, you’ll unlock the exact mechanics behind: • Flash Attention (3x faster models) • LoRA (90% cheaper fine-tuning) • Mixture of Experts (Massive efficiency scaling) ➕What's covered: ➡️Lecture 1: Transformer Fundamentals → Tokenization and word representation → Self-attention mechanism explained → Complete transformer architecture → Detailed implementation example ➡️Lecture 2: Advanced Transformer Techniques → Position embeddings (RoPE, ALiBi, T5 bias) → Layer normalization and sparse attention → BERT deep dive and finetuning → Extensions of BERT ➡️Lecture 3: LLMs & Inference Optimization → Mixture of Experts (MoE) explained → Decoding strategies (greedy, beam search, sampling) → Prompting and in-context learning → Chain-of-thought reasoning → Inference optimizations (KV cache, PagedAttention) ➡️Lecture 4: LLM Training & Fine-tuning → Pretraining and scaling laws (Chinchilla law) → Training optimizations (ZeRO, model parallelism) → Flash Attention for 3x speedup → Quantization and mixed precision → Parameter-efficient finetuning (LoRA, QLoRA) ➡️Lecture 5: LLM Tuning → Preference tuning → RLHF overview → Reward modeling → RL approaches (PPO and variants) → DPO ➡️Lecture 6: LLM Reasoning → Reasoning models → RL for reasoning → GRPO → Scaling ➡️Lecture 7: Agentic LLMs → Retrieval-augmented generation → Advanced RAG techniques → Function calling → Agents → ReAct framework ➡️Lecture 8: LLM Evaluation → LLM-as-a-judge overview →Best practices and benefits →Biases and pitfalls ➡️Lecture 9: Recap & Trending topics From Stanford Online: Rigorous instruction. Latest techniques. Free access. Perfect for: → ML engineers building with LLMs → AI engineers understanding transformers → Researchers working on language models → Anyone learning beyond API calls This weekend: learn the techniques that separate good engineers from great ones. (I will put the playlist in the comments.) ♻️Repost to save someone $$$ and a lot of confusion. ✔️Follow @techNmak for more AI/ML insights.
Tech with Mak tweet media
English
3
45
162
6.1K
Tech with Mak
Tech with Mak@techNmak·
Lecture 7: Agentic LLMs - Retrieval-augmented generation - Advanced RAG techniques - Function calling - Agents - ReAct framework Watch here: youtube.com/watch?v=h-7S6H…
YouTube video
YouTube
English
1
1
0
350
Tech with Mak
Tech with Mak@techNmak·
Lecture 4: LLM Training - Pretraining - FLOPs, FLOPS - Scaling laws, Chinchilla law - Training optimizations overview - Data parallelism with ZeRO - Model parallelism - Flash Attention - Quantization - Mixed precision training - Supervised finetuning - Instruction tuning - Parameter-efficient finetuning with LoRA - QLoRA Watch here: youtube.com/watch?v=VlA_jt…
YouTube video
YouTube
English
1
1
2
188
Tech with Mak
Tech with Mak@techNmak·
Lecture 3: Tranformers & Large Language Models - LLM definition - Mixture of Experts - Dense & Sparse MoE - MoE in LLMs - Response generation - Greedy decoding & beam search - Sampling-based methods - Impact of temperature on predictions - Guided decoding - Prompting strategies - In-context learning - Chain-of-thought, self-consistency - Inference optimizations with KV cache - PagedAttention, MLA Watch here: youtube.com/watch?v=Q5baLe…
YouTube video
YouTube
English
1
1
0
229
Tech with Mak
Tech with Mak@techNmak·
Lecture 2: Transformer-Based Models & Tricks - Overview of position embeddings - Sinusoidal embeddings - T5 bias, ALiBi - RoPE - Layer normalization - Sparse attention - Sharing attention heads - Transformer-based models - BERT deep dive - BERT finetuning - Extensions of BERT Watch here: youtube.com/watch?v=yT84Y5…
YouTube video
YouTube
English
1
1
0
447
Tech with Mak
Tech with Mak@techNmak·
Lecture 1: Transformer - Class logistics - NLP overview - Tokenization - Word representation - Recurrent neural networks - Self-attention mechanism - Transformer architecture Watch here: youtube.com/watch?v=Ub3GoF…
YouTube video
YouTube
English
1
2
1
624