Sameer goel

118 posts

Sameer goel

Sameer goel

@sameer_goel

Computer Vision Engineer Fixing broken CV deployments (ONNX, opsets, latency) PyTorch → real-time inference systems Production-first

India Katılım Şubat 2026
46 Takip Edilen38 Takipçiler
Sameer goel
Sameer goel@sameer_goel·
JWT is stateless, meaning the server doesn’t store the token, so there’s nothing to delete. The token is still valid on the client until it expires. To actually log out blacklist and check on each request or expire it quickly + revoke refresh token Without that the token keeps working.
English
0
0
0
350
Suni
Suni@suni_code·
Interviewer: If JWT tokens are stateless, how can a system log a user out immediately?
English
17
2
72
22K
Sameer goel
Sameer goel@sameer_goel·
“Please” and “thank you” just tokenize into a handful of extra subword units literally a few more integers in the sequence. The transformer still runs full attention over the entire context, so the marginal compute from politeness is ~O(n²) noise compared to long prompts, chain-of-thought, or giant context windows. If you’re optimizing GPU usage, those words are a rounding error the real cost is in sequence length and repeated tokens, not courtesy.
English
0
0
4
359
Tyler
Tyler@rezoundous·
Stop saying “please” and “thank you” to AI. Save the GPUs.
English
611
56
697
75.4K
Sameer goel
Sameer goel@sameer_goel·
@sflorimm Human–AI integration tighter loops, possibly through interfaces like brain-computer links, where the boundary between using AI and thinking with it blurs.
English
0
0
0
163
Floro S.
Floro S.@sflorimm·
What will come after AI?
English
2.4K
82
1.3K
253.9K
Sameer goel
Sameer goel@sameer_goel·
@chribjel Great, so now we’re optimizing LLM costs by inventing employees again. Full circle innovation.
English
2
31
1.6K
53.8K
Christoffer Bjelke
Christoffer Bjelke@chribjel·
We hired a junior developer to write the simple code, so we don't have to spend a ton of money on tokens for those basic/primitive tasks
English
243
469
12.6K
1M
Sameer goel
Sameer goel@sameer_goel·
Got my stipend today. Not much yet but working to make it bigger.
Sameer goel tweet media
English
0
0
1
394
Sameer goel
Sameer goel@sameer_goel·
If quantum breaks Bitcoin, it does not instantly nuke the entire internet. •HTTPS doesn’t rely on a single primitive •Systems can rotate keys, switch algorithms, and deploy post-quantum crypto •Most infrastructure is upgradeable; it’s not a one-shot collapse Also: •Bitcoin specifically relies on exposed public keys → easier target surface •Many banking systems don’t expose keys the same way •Military / critical systems already plan for crypto migration So no, it’s not: “quantum = instant apocalypse” It’s: gradual break → patch → migrate → repeat
English
1
0
4
1K
Quinten | 048.eth
Quinten | 048.eth@QuintenFrancois·
If quantum “kills” Bitcoin, it also kills: • The global banking system • SWIFT transfers • Stock exchanges • Military communications • Nuclear command systems • Every HTTPS website on earth If Bitcoin is dead from quantum, your portfolio is the least of your problems.
English
802
1.7K
13.7K
756.3K
Sameer goel
Sameer goel@sameer_goel·
What’s even more interesting is that this turns optimization into a runtime property, not a training artifact the model isn’t getting smarter, the system design is. Given a fixed set of weights, it’s effectively performing online meta-optimization over its own execution graph tuning prompts, tool policies, memory structures, and control flow in a closed loop. At that point, the real question isn’t performance it’s stability: •Does the harness converge or drift? •Can it overfit to its own eval loop? •What prevents degenerate self-reinforcing behaviors? Because once agents can rewrite their own scaffolding, you’re no longer optimizing outputs you’re optimizing the process that generates them.
English
0
0
0
51
Akshay 🚀
Akshay 🚀@akshay_pachaar·
The first AI that improves without retraining. (it rewrites its own agent harness) Every developer I know has one thing in common: they obsess over their setup. The terminal, the scripts, the shortcuts. They don't just write code. They constantly refine how they work. The code gets better because the environment gets better. MiniMax just released M2.7, and I think the most interesting thing about it isn't a benchmark number. It's the fact that M2.7 improves its own agent harness. Autonomously. Let's break this down: When you run an AI agent today, it operates inside a "harness." Think of it as the agent's operating environment: the skills it can invoke, the tools it can call, its memory, and the rules it follows. Normally, a human engineer builds this harness, and the agent operates within it. The harness stays fixed. M2.7 treats its harness as something it can rewrite. Here's what the loop looks like: - The agent runs a task and analyzes where things went wrong - It plans changes to its own scaffold: skills, MCPs, memory - It applies those changes, runs evaluations against a benchmark - It compares the results and decides whether to keep or revert - It writes self-criticism into memory so the next round starts smarter Then it loops back and does it again. And again. Think of it like a developer who finishes a project, writes a retrospective, restructures their workflow based on what they learned, and shows up the next day with a better setup. Except the developer here is the model itself. MiniMax ran this self-optimization loop for over 100 rounds internally. Along the way, the model discovered things on its own: it systematically searched for optimal sampling parameters (temperature, penalties), wrote workflow-specific guidelines for itself (like automatically checking for the same bug pattern in other files after a fix), and even added loop detection to avoid getting stuck. No human had to tell it to do any of this. They also tested this in a more controlled setting. They had M2.7 compete in 22 ML competitions from OpenAI's MLE Bench Lite. Each trial ran for 24 hours, fully autonomous. After each iteration, the agent wrote a memory file and performed self-criticism, feeding those insights into the next round. With every round, the ML models it trained achieved higher medal rates. The best run earned 9 gold medals. I've summarized the self-evolving architecture in the graphic below. The reason I find this compelling: this isn't about making a smarter model. It's about making a model that makes itself smarter. The weights never change. What changes is the system around it: better skills, better memory, better workflow rules. And that distinction matters because it means the improvement loop can run continuously without any retraining. We're entering a phase where agents don't just follow instructions. They redesign their own playbook. If you want to learn more, I've shared a link to their official blog post in the next tweet.
GIF
English
17
20
106
8.3K
Sameer goel
Sameer goel@sameer_goel·
2 questions: What is the latency improvement from removing NMS compared to total inference time on GPU? When an image is passed through a traditional YOLO model, at what stage is NMS applied, and how does it process multiple overlapping bounding box predictions to produce the final detections?
English
0
0
0
41
Sameer goel
Sameer goel@sameer_goel·
@tom_doerr How would you handle noisy raster PDFs (scanned images) vs vector PDFs in the same pipeline?
English
0
0
0
94
Sameer goel
Sameer goel@sameer_goel·
@GergelyOrosz What’s interesting is not the leak itself, but that a from-scratch Python reimplementation is reportedly matching or surpassing the original from Anthropic.
English
0
0
3
1.3K
Gergely Orosz
Gergely Orosz@GergelyOrosz·
This is either brilliant or scary: Anthropic accidentally leaked the TS source code of Claude Code (which is closed source). Repos sharing the source are taken down with DMCA. BUT this repo rewrote the code using Python, and so it violates no copyright & cannot be taken down!
Gergely Orosz tweet media
English
442
1.2K
12.9K
2.2M
Sameer goel
Sameer goel@sameer_goel·
The interesting part isn’t that Meta is optimizing harnesses — it’s that they’re trying to solve the credit assignment mess we’ve all been ignoring. Because in practice: •a change in prompt/tooling today •affects eval scores hours or days later •across multiple tasks and traces and nobody really knows which change actually mattered. Meta-Harness feels like the inevitable direction: treat the entire harness (prompts, tools, routing, evals) as one optimization surface, instead of patching pieces blindly. If this works, “prompt engineering” stops being artisanal tweaking and starts looking more like gradient descent over workflows.
English
0
0
0
255
Yoonho Lee
Yoonho Lee@yoonholeee·
How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end
Yoonho Lee tweet media
English
78
283
1.7K
572.1K
Sameer goel
Sameer goel@sameer_goel·
Meta just dropped the Efficient Universal Perception Encoder on Hugging Face — curious how it plays with VLM-style fine-tuning. Like: •does it take LoRAs cleanly across both vision + language bridges •how well multi-teacher distilled features adapt under low-rank updates •whether you can stack lightweight adapters instead of retraining heads
English
0
0
0
333
DailyPapers
DailyPapers@HuggingPapers·
Meta just released the Efficient Universal Perception Encoder on Hugging Face A vision backbone for edge devices that unifies image understanding, vision-language modeling, and dense prediction via multi-teacher distillation.
DailyPapers tweet media
English
8
29
223
24.4K
Sameer goel
Sameer goel@sameer_goel·
The inevitable part isn’t the attack — it’s the speed. AI is already writing code, reviewing PRs, publishing packages so of course it’s also accelerating supply chain attacks. The window between malicious publish and production impact is collapsing to minutes. Which basically forces a new equilibrium: AI attackers vs AI defenders, both operating faster than humans can even context-switch. The takeaway isn’t nice catch it’s that manual review as a security layer is quietly becoming irrelevant.
English
0
0
0
889
Scott Wu
Scott Wu@ScottWu46·
Devin Review caught the axios supply chain attack for multiple Cognition customers before the attack was publicly known. These attacks will be 10x more frequent in the age of AI; it is critical that repo maintainers start using AI for defense as well. (showing one example below where Devin Review caught the attack within an hour of its release - text minorly edited for anonymization)
Scott Wu tweet media
English
95
146
1.7K
326.8K
Sameer goel
Sameer goel@sameer_goel·
RF-DETR is the best open-source detector for aerial and drone footage… but now I’m just wondering how it handles: •a half-visible bike under a tree shadow at 200m altitude •3 pixels pretending to be a human •cars that are basically just vibes + motion blur •and that one guy wearing camouflage who is literally the background
English
0
0
0
67
Sameer goel
Sameer goel@sameer_goel·
Claude rolling out “enterprise-grade security”… Meanwhile: attack vector: view-source impact: full source disclosure severity: politely labeled “oops” Somewhere in a security report: Threat model included prompt injection, jailbreaks, adversarial attacks… Did not include “someone opens the .map file” Red team: “Did you exfiltrate weights?” “No.” “Find a jailbreak?” “No.” “…so what did you do?” “I clicked a CDN link.” Zero-day exploits ❌ Zero-click curiosity exploit ✅
English
0
0
0
3.7K
Sameer goel
Sameer goel@sameer_goel·
Non technical client:we need real time video processing Me:we should use nvidia… Non technical client:Lets use tpu to train and deploy our system they are free on google collab Me:yeah but you wont be able to host it on prem Non technical client:no we can,we will just buy the tpu’s Me:nice
English
0
0
1
290
Sameer goel
Sameer goel@sameer_goel·
Not that surprising if you look under the hood. MoE ≠ full model capacity per token. You’re routing through a few experts, not the entire network. So in tasks like UI recreation: • layout consistency • spatial reasoning • deterministic structure a smaller dense model can actually be more stable. Bigger helps with breadth. Not always with precision.
English
0
0
0
43
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
Someone tested all three Qwen3.5 models on the same UI recreation task same ss, same prompt - 27B dense vs 35B-A3B MoE vs 122B-A10B MoE - Task: recreate 4 UI components from a screenshot BIGGER DOESN'T ALWAYS MEAN BETTER. THE RESULTS MIGHT SURPRISE YOU
English
25
18
146
64.7K
Sameer goel
Sameer goel@sameer_goel·
@heynavtoor 397B parameters… running on a MacBook. At this point the laptop isn’t “running a model” — it’s just politely pretending not to be a data center. Cloud GPU startups watching this like: “yeah… but can your SSD scale horizontally?”
English
0
0
0
805
Nav Toor
Nav Toor@heynavtoor·
🚨 397 billion parameters. On a MacBook. No cloud. No GPU cluster. No data center. A laptop. Someone ran one of the largest AI models on Earth on a machine you can buy at the Apple Store. It's called flash-moe. A pure C and Metal inference engine that runs Qwen3.5-397B on a MacBook Pro with 48GB RAM. At 4.4 tokens per second. With tool calling. No Python. No PyTorch. No frameworks. Just raw C and hand-tuned Metal shaders. Here's why this should not be possible: → The model is 209GB. The laptop has 48GB of RAM. → It streams the entire model from the SSD in real time → Only loads the 4 experts needed per token out of 512 → Uses just 5.5GB of actual memory during inference → Production-quality output with full tool calling → 58 experiments. Hand-optimized Metal compute kernels. → The entire engine is ~7,000 lines of C and ~1,200 lines of Metal shaders Here's the wildest part: One person built this. A VP of AI at CVS Health. Not Google. Not OpenAI. A healthcare company executive. Side project. Used Claude Code as his coding partner. Built the entire engine in 24 hours. Running a 397B model on cloud GPUs costs hundreds of dollars per hour. Companies spend millions per year on inference infrastructure for models this size. This runs on a $3,499 laptop. Offline. Private. No API key. No monthly bill. Forever. Trending on GitHub. 332 points on Hacker News. 100% Open Source.
Nav Toor tweet media
English
115
339
2.6K
207.1K
Sameer goel
Sameer goel@sameer_goel·
@hxxwhite No they aren’t. Manual QA doesn’t die it gets pushed up the stack. Vision agents can click buttons. They can’t: • understand intent behind features • catch subtle UX regressions • question why something exists What dies is repetitive QA. What survives is judgment.
English
1
1
11
2K
hayden
hayden@hxxwhite·
Manual QA is dead Vision-based agents can now test mobile apps the way a human would, but at 100x the scale, at 1/100th the price. Engineering velocity has nowhere to go but vertical
English
25
36
491
70.5K