Sameer goel

118 posts

Sameer goel

@sameer_goel

Computer Vision Engineer Fixing broken CV deployments (ONNX, opsets, latency) PyTorch → real-time inference systems Production-first

India Katılım Şubat 2026

46 Takip Edilen38 Takipçiler

Sameer goel@sameer_goel·21 Nis

JWT is stateless, meaning the server doesn’t store the token, so there’s nothing to delete. The token is still valid on the client until it expires. To actually log out blacklist and check on each request or expire it quickly + revoke refresh token Without that the token keeps working.

English

350

Suni@suni_code·21 Nis

Interviewer: If JWT tokens are stateless, how can a system log a user out immediately?

English

22K

Sameer goel@sameer_goel·21 Nis

“Please” and “thank you” just tokenize into a handful of extra subword units literally a few more integers in the sequence. The transformer still runs full attention over the entire context, so the marginal compute from politeness is ~O(n²) noise compared to long prompts, chain-of-thought, or giant context windows. If you’re optimizing GPU usage, those words are a rounding error the real cost is in sequence length and repeated tokens, not courtesy.

English

359

Tyler@rezoundous·20 Nis

Stop saying “please” and “thank you” to AI. Save the GPUs.

English

611

697

75.4K

Sameer goel@sameer_goel·21 Nis

@sflorimm Human–AI integration tighter loops, possibly through interfaces like brain-computer links, where the boundary between using AI and thinking with it blurs.

English

163

Floro S.@sflorimm·21 Nis

What will come after AI?

English

2.4K

1.3K

253.9K

Sameer goel@sameer_goel·21 Nis

@chribjel Great, so now we’re optimizing LLM costs by inventing employees again. Full circle innovation.

English

1.6K

53.8K

Christoffer Bjelke@chribjel·21 Nis

We hired a junior developer to write the simple code, so we don't have to spend a ton of money on tokens for those basic/primitive tasks

English

243

469

12.6K

Sameer goel@sameer_goel·9 Nis

Got my stipend today. Not much yet but working to make it bigger.

English

394

Sameer goel@sameer_goel·31 Mar

If quantum breaks Bitcoin, it does not instantly nuke the entire internet. •HTTPS doesn’t rely on a single primitive •Systems can rotate keys, switch algorithms, and deploy post-quantum crypto •Most infrastructure is upgradeable; it’s not a one-shot collapse Also: •Bitcoin specifically relies on exposed public keys → easier target surface •Many banking systems don’t expose keys the same way •Military / critical systems already plan for crypto migration So no, it’s not: “quantum = instant apocalypse” It’s: gradual break → patch → migrate → repeat

English

Quinten | 048.eth@QuintenFrancois·31 Mar

If quantum “kills” Bitcoin, it also kills: • The global banking system • SWIFT transfers • Stock exchanges • Military communications • Nuclear command systems • Every HTTPS website on earth If Bitcoin is dead from quantum, your portfolio is the least of your problems.

English

802

1.7K

13.7K

756.3K

Sameer goel@sameer_goel·31 Mar

What’s even more interesting is that this turns optimization into a runtime property, not a training artifact the model isn’t getting smarter, the system design is. Given a fixed set of weights, it’s effectively performing online meta-optimization over its own execution graph tuning prompts, tool policies, memory structures, and control flow in a closed loop. At that point, the real question isn’t performance it’s stability: •Does the harness converge or drift? •Can it overfit to its own eval loop? •What prevents degenerate self-reinforcing behaviors? Because once agents can rewrite their own scaffolding, you’re no longer optimizing outputs you’re optimizing the process that generates them.

English

Akshay 🚀@akshay_pachaar·31 Mar

The first AI that improves without retraining. (it rewrites its own agent harness) Every developer I know has one thing in common: they obsess over their setup. The terminal, the scripts, the shortcuts. They don't just write code. They constantly refine how they work. The code gets better because the environment gets better. MiniMax just released M2.7, and I think the most interesting thing about it isn't a benchmark number. It's the fact that M2.7 improves its own agent harness. Autonomously. Let's break this down: When you run an AI agent today, it operates inside a "harness." Think of it as the agent's operating environment: the skills it can invoke, the tools it can call, its memory, and the rules it follows. Normally, a human engineer builds this harness, and the agent operates within it. The harness stays fixed. M2.7 treats its harness as something it can rewrite. Here's what the loop looks like: - The agent runs a task and analyzes where things went wrong - It plans changes to its own scaffold: skills, MCPs, memory - It applies those changes, runs evaluations against a benchmark - It compares the results and decides whether to keep or revert - It writes self-criticism into memory so the next round starts smarter Then it loops back and does it again. And again. Think of it like a developer who finishes a project, writes a retrospective, restructures their workflow based on what they learned, and shows up the next day with a better setup. Except the developer here is the model itself. MiniMax ran this self-optimization loop for over 100 rounds internally. Along the way, the model discovered things on its own: it systematically searched for optimal sampling parameters (temperature, penalties), wrote workflow-specific guidelines for itself (like automatically checking for the same bug pattern in other files after a fix), and even added loop detection to avoid getting stuck. No human had to tell it to do any of this. They also tested this in a more controlled setting. They had M2.7 compete in 22 ML competitions from OpenAI's MLE Bench Lite. Each trial ran for 24 hours, fully autonomous. After each iteration, the agent wrote a memory file and performed self-criticism, feeding those insights into the next round. With every round, the ML models it trained achieved higher medal rates. The best run earned 9 gold medals. I've summarized the self-evolving architecture in the graphic below. The reason I find this compelling: this isn't about making a smarter model. It's about making a model that makes itself smarter. The weights never change. What changes is the system around it: better skills, better memory, better workflow rules. And that distinction matters because it means the improvement loop can run continuously without any retraining. We're entering a phase where agents don't just follow instructions. They redesign their own playbook. If you want to learn more, I've shared a link to their official blog post in the next tweet.

GIF

English

106

8.3K

Sameer goel@sameer_goel·31 Mar

2 questions: What is the latency improvement from removing NMS compared to total inference time on GPU? When an image is passed through a traditional YOLO model, at what stage is NMS applied, and how does it process multiple overlapping bounding box predictions to produce the final detections?

English

Akshay 🚀@akshay_pachaar·28 Şub

Real-time object detection will never be the same. Traditional YOLO needs NMS to remove duplicate boxes; it's slow and inconsistent. YOLO26 skips it entirely: single-pass predictions, faster inference and up to 300 detections per image. Download model: platform.ultralytics.com/ultralytics/yo…

Akshay 🚀@akshay_pachaar

x.com/i/article/2025…

English

282

2.3K

373.5K

Sameer goel@sameer_goel·31 Mar

@tom_doerr How would you handle noisy raster PDFs (scanned images) vs vector PDFs in the same pipeline?

English

Tom Dörr@tom_doerr·31 Mar

GPU accelerated 3D data processing in Python and C++ github.com/isl-org/Open3D

English

223

10.9K

Sameer goel@sameer_goel·31 Mar

@GergelyOrosz What’s interesting is not the leak itself, but that a from-scratch Python reimplementation is reportedly matching or surpassing the original from Anthropic.

English

1.3K

Gergely Orosz@GergelyOrosz·31 Mar

This is either brilliant or scary: Anthropic accidentally leaked the TS source code of Claude Code (which is closed source). Repos sharing the source are taken down with DMCA. BUT this repo rewrote the code using Python, and so it violates no copyright & cannot be taken down!

English

442

1.2K

12.9K

2.2M

Sameer goel@sameer_goel·31 Mar

The interesting part isn’t that Meta is optimizing harnesses — it’s that they’re trying to solve the credit assignment mess we’ve all been ignoring. Because in practice: •a change in prompt/tooling today •affects eval scores hours or days later •across multiple tasks and traces and nobody really knows which change actually mattered. Meta-Harness feels like the inevitable direction: treat the entire harness (prompts, tools, routing, evals) as one optimization surface, instead of patching pieces blindly. If this works, “prompt engineering” stops being artisanal tweaking and starts looking more like gradient descent over workflows.

English

255

Yoonho Lee@yoonholeee·30 Mar

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

English

283

1.7K

572.1K

Sameer goel@sameer_goel·31 Mar

Meta just dropped the Efficient Universal Perception Encoder on Hugging Face — curious how it plays with VLM-style fine-tuning. Like: •does it take LoRAs cleanly across both vision + language bridges •how well multi-teacher distilled features adapt under low-rank updates •whether you can stack lightweight adapters instead of retraining heads

English

333

DailyPapers@HuggingPapers·30 Mar

Meta just released the Efficient Universal Perception Encoder on Hugging Face A vision backbone for edge devices that unifies image understanding, vision-language modeling, and dense prediction via multi-teacher distillation.

English

223

24.4K

Sameer goel@sameer_goel·31 Mar

The inevitable part isn’t the attack — it’s the speed. AI is already writing code, reviewing PRs, publishing packages so of course it’s also accelerating supply chain attacks. The window between malicious publish and production impact is collapsing to minutes. Which basically forces a new equilibrium: AI attackers vs AI defenders, both operating faster than humans can even context-switch. The takeaway isn’t nice catch it’s that manual review as a security layer is quietly becoming irrelevant.

English

889

Scott Wu@ScottWu46·31 Mar

Devin Review caught the axios supply chain attack for multiple Cognition customers before the attack was publicly known. These attacks will be 10x more frequent in the age of AI; it is critical that repo maintainers start using AI for defense as well. (showing one example below where Devin Review caught the attack within an hour of its release - text minorly edited for anonymization)

English

146

1.7K

326.8K

Sameer goel@sameer_goel·31 Mar

@amaan8429 And i thought my app.tsx is bad

English

Amaan@amaan8429·31 Mar

Claude code main.tsx file is 4600 lines long ☠️☠️

Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

247

32.7K

Sameer goel@sameer_goel·31 Mar

RF-DETR is the best open-source detector for aerial and drone footage… but now I’m just wondering how it handles: •a half-visible bike under a tree shadow at 200m altitude •3 pixels pretending to be a human •cars that are basically just vibes + motion blur •and that one guy wearing camouflage who is literally the background

English

SkalskiP@skalskip92·30 Mar

RF-DETR is the best open-source detector for aerial and drone footage link: github.com/roboflow/rf-de…

English

330

12.9K

Sameer goel@sameer_goel·31 Mar

Claude rolling out “enterprise-grade security”… Meanwhile: attack vector: view-source impact: full source disclosure severity: politely labeled “oops” Somewhere in a security report: Threat model included prompt injection, jailbreaks, adversarial attacks… Did not include “someone opens the .map file” Red team: “Did you exfiltrate weights?” “No.” “Find a jailbreak?” “No.” “…so what did you do?” “I clicked a CDN link.” Zero-day exploits ❌ Zero-click curiosity exploit ✅

English

3.7K

Chaofan Shou@Fried_rice·31 Mar

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

3.3K

7.6K

48.7K

35.6M

Sameer goel@sameer_goel·31 Mar

Non technical client:we need real time video processing Me:we should use nvidia… Non technical client:Lets use tpu to train and deploy our system they are free on google collab Me:yeah but you wont be able to host it on prem Non technical client:no we can,we will just buy the tpu’s Me:nice

English

290

Sameer goel@sameer_goel·30 Mar

Not that surprising if you look under the hood. MoE ≠ full model capacity per token. You’re routing through a few experts, not the entire network. So in tasks like UI recreation: • layout consistency • spatial reasoning • deterministic structure a smaller dense model can actually be more stable. Bigger helps with breadth. Not always with precision.

English

0xMarioNawfal@RoundtableSpace·29 Mar

Someone tested all three Qwen3.5 models on the same UI recreation task same ss, same prompt - 27B dense vs 35B-A3B MoE vs 122B-A10B MoE - Task: recreate 4 UI components from a screenshot BIGGER DOESN'T ALWAYS MEAN BETTER. THE RESULTS MIGHT SURPRISE YOU

English

146

64.7K

Sameer goel@sameer_goel·30 Mar

@heynavtoor 397B parameters… running on a MacBook. At this point the laptop isn’t “running a model” — it’s just politely pretending not to be a data center. Cloud GPU startups watching this like: “yeah… but can your SSD scale horizontally?”

English

805

Nav Toor@heynavtoor·30 Mar

🚨 397 billion parameters. On a MacBook. No cloud. No GPU cluster. No data center. A laptop. Someone ran one of the largest AI models on Earth on a machine you can buy at the Apple Store. It's called flash-moe. A pure C and Metal inference engine that runs Qwen3.5-397B on a MacBook Pro with 48GB RAM. At 4.4 tokens per second. With tool calling. No Python. No PyTorch. No frameworks. Just raw C and hand-tuned Metal shaders. Here's why this should not be possible: → The model is 209GB. The laptop has 48GB of RAM. → It streams the entire model from the SSD in real time → Only loads the 4 experts needed per token out of 512 → Uses just 5.5GB of actual memory during inference → Production-quality output with full tool calling → 58 experiments. Hand-optimized Metal compute kernels. → The entire engine is ~7,000 lines of C and ~1,200 lines of Metal shaders Here's the wildest part: One person built this. A VP of AI at CVS Health. Not Google. Not OpenAI. A healthcare company executive. Side project. Used Claude Code as his coding partner. Built the entire engine in 24 hours. Running a 397B model on cloud GPUs costs hundreds of dollars per hour. Companies spend millions per year on inference infrastructure for models this size. This runs on a $3,499 laptop. Offline. Private. No API key. No monthly bill. Forever. Trending on GitHub. 332 points on Hacker News. 100% Open Source.

English

115

339

2.6K

207.1K

Sameer goel@sameer_goel·30 Mar

@hxxwhite No they aren’t. Manual QA doesn’t die it gets pushed up the stack. Vision agents can click buttons. They can’t: • understand intent behind features • catch subtle UX regressions • question why something exists What dies is repetitive QA. What survives is judgment.

English

hayden@hxxwhite·30 Mar

Manual QA is dead Vision-based agents can now test mobile apps the way a human would, but at 100x the scale, at 1/100th the price. Engineering velocity has nowhere to go but vertical

English

491

70.5K

Keşfet

@sflorimm @chribjel @tom_doerr @GergelyOrosz @elonmusk @BarackObama @taylorswift13 @cristiano