Stats Wire

193 posts

Stats Wire

@StatsWire

Simplifying AI, ML & Data Science. Threads, visuals & insights on LLMs, Deep Learning & Generative AI ⠀ YouTube Channel: StatsWire Instagram: stats_wire

India انضم Kasım 2020

83 يتبع46 المتابعون

Stats Wire@StatsWire·9h

The most interesting idea here isn't visual retrieval. It's treating the screenshot as the ground truth. We've spent years optimizing: HTML → Text → Chunks → Embeddings Maybe the better approach is: Page → Pixels → Embeddings Especially for tables, dashboards, PDFs, charts, and documentation.

English

890

Akshay 🚀@akshay_pachaar·10h

Web scraping will never be the same. (100% open-source visual search at scale) PixelRAG is a retrieval system that skips HTML parsing completely. Instead of scraping a page into text and embedding chunks, it screenshots the page and retrieves the image. A vision-language model reads the answer straight off the pixels. Why that matters: parsing is where web RAG quietly loses information. - A single HTML-to-text parser can drop 40%+ of a page. - Tables, charts, and layout get flattened or thrown out. - Swapping parsers alone can move accuracy ~10 points on the same docs. PixelRAG indexes the page a person actually sees. The team built a visual index of all of Wikipedia, 30M+ screenshots, and it still beats the strongest text RAG baseline by 18.1% on text-only QA. The repo also ships a Claude Code plugin that gives Claude eyes. It lets Claude screenshot any URL and read the rendered page instead of scraping the DOM. So you can hand it a live page, an arXiv paper, or your local site and ask what it actually looks like. One setup script. No MCP server, no backend. How the pipeline works: - Renders each document (web, PDF, image) to image tiles. - Embeds them with Qwen3-VL-Embedding, LoRA fine-tuned on screenshots. - Builds a FAISS index and serves a search API. A stronger reader model lifts accuracy with no re-indexing, since the index is just pixels. Everything is open-source under Apache-2.0. GitHub repo: github.com/StarTrail-org/… Talking about RAG, I recently wrote an article on a new approach that makes retrieval much more efficient by cutting corpus size by 40x, reducing tokens per query by 3x, and improving vector search relevance by 2.3x. The article is quoted below.

Akshay 🚀@akshay_pachaar

x.com/i/article/2052…

English

359

2.9K

296.1K

Stats Wire@StatsWire·10h

In 2023: AI wrote text. In 2024: AI wrote code. In 2025: AI started using tools. In 2026: AI is becoming a digital employee. Most people haven't realized how fast this transition is happening. What job do you think AI will automate first?

English

Stats Wire@StatsWire·10h

The most important AI concept isn't ChatGPT. It's embeddings. Why? Because embeddings allow AI to understand that: "car" 🚗 "vehicle" 🚙 "automobile" 🚘 are related even though they're different words. Instead of storing text as words, embeddings store meaning as numbers.

English

Stats Wire@StatsWire·1d

@bindureddy It was never in the race. It's too far to come back. People are already addicted to openai and claude.

English

169

Bindu Reddy@bindureddy·1d

Google Gemini can make a come back 🚀 Much like OpenAI did with GPT 5.5 and the upcoming GPT 5.6 They have the data, the compute and the talent

English

244

13.5K

Stats Wire@StatsWire·1d

@bindureddy Gemini also needs to improve their interface. Feels poor using it.

English

Stats Wire@StatsWire·1d

Learn how LangChain Agents use tools in Python to complete the task. youtu.be/s5xmG6kkiZ8?si…

YouTube

English

Stats Wire@StatsWire·1d

What is Semantic Search? Semantic Search is a search technique that finds information based on meaning, not just matching words. Traditional search asks: "Do these exact words exist in the document?" Semantic search asks: "Does this text mean the same thing as the user's question?" That's a huge difference.

English

Stats Wire@StatsWire·1d

Everyone is watching OpenAI, Anthropic, and Google. Meanwhile, a Chinese AI model called #GLM-5.2 is quietly climbing the rankings. The real competition is no longer: ❌ Who has the smartest chatbot? It's: ✅ Who can build the best AI agent.

English

Stats Wire@StatsWire·1d

@bridgemindai Learn fastapi youtube.com/playlist?list=…

English

BridgeMind@bridgemindai·1d

I just dropped a full video on vibe coding with loops. A loop is a recursive goal you define once. The agent works until a stop condition is met. No more prompting and waiting and prompting again. Right now I have loops running on Sentry errors in BridgeSpace. I set the goal, walk away, and come back to fewer production errors than when I started. This is the step beyond prompting. And it is bringing us one step closer to fully autonomous software development. Full video out now.

English

253

18.7K

Stats Wire@StatsWire·1d

Agreed, and that's exactly the bet RULER and similar approaches are making trade a single scalar for a richer, language-shaped signal that can encode nuance a binary/numeric reward can't. But richer feedback channels still need to be *robust* feedback channels, otherwise you've just traded "the reward function is too low-dimensional" for "the reward function can be talked into giving full marks." The dialogue-generation example above is a good illustration: moving to an LLM judge genuinely captured more nuance than a scalar metric ever could, but it also opened up a wider attack surface (verbosity, hedging, restating the prompt back) that a single number never had in the first place. So I'd frame it less as "richer feedback solves the problem" and more as "richer feedback changes what the problem is." You go from hand-engineering a narrow metric to engineering a judge (or rubric, or constitution) that's hard to flatter. Probably still the right trade for most agentic tasks, just not a free lunch.

English

Inflectiv AI ⧉@inflectivAI·1d

@akshay_pachaar Karpathy's point was that complex behavior can't always be captured by a single score. Richer feedback channels may be essential for building more capable agents.

English

217

Akshay 🚀@akshay_pachaar·1d

Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single reward number is too low-dimensional to teach an agent what "good" means for complex tasks. To solve this, Agents need a knowledge-guided review as a higher-dimensional feedback channel. Every major AI lab trains models with RL today (OpenAI, Anthropic, DeepSeek). And their key bottleneck has always been the reward functions. GRPO by DeepSeek worked well for math and code because the environment gave a binary signal. But for real agent tasks, someone still has to hand-code the scoring function. That takes days and breaks every time the pipeline changes. RULER (implemented in OpenPipe ART, 10k stars) addresses the exact problem Karpathy identified. The reward criteria are defined in plain English, and an LLM evaluates each trajectory against that description to provide feedback for training. I trained a Qwen3 1.4B agent that plays 2048 using GRPO with this exact workflow. In this case, the agent saw the board, picked a direction, and RULER evaluated the outcome, all from this natural language definition. You can see the full implementation on GitHub and try it yourself. Here's the ART Repo: github.com/OpenPipe/ART (don't forget to star it ⭐ ) Just like RLHF replaced manual rankings and GRPO replaced the critic model, natural language rewards are replacing hand-coded scoring functions. RL reward engineering is now prompt engineering. I wrote a full walkthrough on OpenPipe's ART, the agent RL trainer built on GRPO, including how RULER replaces manual reward engineering with automatic LLM-graded rewards. The article is quoted below.

Akshay 🚀@akshay_pachaar

x.com/i/article/2029…

English

145

168.2K

Stats Wire@StatsWire·1d

Yeah, there's real documented behavior now, not just theory. One clean example: researchers training response-generation models against a Qwen2.5-3B LLM judge found the policy converged on excessively long responses, unnecessary reaffirmation of details, superfluous clarifying [arxiv](arxiv.org/pdf/2601.04436) questions, and even verbatim repetition of recent dialogue turns — all because the judge had a soft preference for longer, more "thorough"-looking outputs. Adding a length penalty just pushed the model toward different hacks instead of fixing it. [arxiv](arxiv.org/pdf/2601.04436) Notably, the same exploits were far less effective against a stronger judge model, which tracks with your point that judge quality matters a lot.

English

Jerry the Martian@jerry543·1d

@StatsWire @akshay_pachaar Interesting, do you have any actual real examples where the agents managed to game their LLM judge?

English

Stats Wire@StatsWire·1d

5/ This is the pattern every "AI for $20/month" product will eventually hit: cheap pricing works only as long as usage stays light. The moment people actually rely on it daily, the math breaks. 6/ Lesson for builders and users alike: if your AI tool feels suspiciously cheap, you're probably the one being subsidized — for now.

English

Stats Wire@StatsWire·1d

3/ The number that's going viral: one developer reported their $29/month bill ballooning to roughly $750/month (Build Fast with AI) under the new model. That's not a price increase — that's a different product entirely. 4/ Why the jump? Heavy reasoning models cost real money per call. A flat $29/month was quietly subsidizing power users running complex, high-compute sessions — the model just stopped absorbing that cost silently.

English

Stats Wire@StatsWire·1d

The Copilot pricing shock 1/ Flat-fee AI subscriptions were always a temporary illusion. GitHub Copilot just proved it. Here's what happened 🧵 2/ Copilot switched from flat subscription pricing to token-based billing across all plans. The backlash

English

118

Stats Wire@StatsWire·1d

🧠 LLM Specialist: One of the Highest-Paying AI Roles in 2026 💰 Salary Range: $125K–$170K+ Large Language Models are powering chatbots, AI agents, copilots, search systems, and enterprise AI applications. As an LLM Specialist, you'll design, build, evaluate, and deploy applications powered by modern AI models. 📍 Skills Roadmap: 1️⃣ Python Programming 2️⃣ NLP Fundamentals 3️⃣ Transformers Architecture 4️⃣ Prompt Engineering 5️⃣ Embeddings & Vector Search 6️⃣ RAG Systems 7️⃣ LLM Evaluation & Guardrails 8️⃣ Production LLM Applications What you'll work on: ✅ AI Agents ✅ RAG Applications ✅ Enterprise AI Assistants ✅ AI Search Systems ✅ Document Q&A Platforms ✅ LLM-Powered Automation 💡 Important: Don't just learn frameworks. Focus on understanding: • How LLMs work • Retrieval & RAG • Evaluation & Testing • Production Deployment • AI Safety & Reliability The best LLM Specialists can take an AI idea from prototype to production. #LLM #LLMEngineer #GenerativeAI #GenAI #ArtificialIntelligence #AIAgents #RAG #PromptEngineering #Python #MachineLearning #DataScience #AIEngineering #CareerRoadmap #TechCareers #StatsWire

English

Stats Wire@StatsWire·2d

What is DeepAgent? Think of it as an AI agent that can break a complex goal into smaller tasks, execute them step by step, and adapt along the way. Traditional LLM: ➡️ Prompt → Response DeepAgent: ➡️ Goal ➡️ Plan ➡️ Use Tools ➡️ Execute Tasks ➡️ Evaluate Results ➡️ Refine & Continue Instead of generating a single answer, it works toward completing an objective. That's the shift from AI assistants to AI agents. #AIAgents #LLM #GenerativeAI #AI

English

104

Stats Wire@StatsWire·2d

Most people use AI every day. Few understand the difference between these terms: 🤖 LLM → The brain (GPT, Claude, Gemini) 🔍 RAG → Gives the LLM access to your data 🧠 Memory → Remembers information across conversations 🛠️ Tools → Lets the AI interact with external systems 🚀 Agent → LLM + Memory + Tools + Planning Understanding these 5 concepts puts you ahead of most AI users.

English

Stats Wire@StatsWire·2d

Want to become a backend developer in Python? FastAPI is one of the most in-demand frameworks today. I'm publishing a FastAPI series covering everything from beginner to production-ready APIs. You'll learn how to build: ✅ REST APIs ✅ Authentication Systems ✅ Database-backed Applications ✅ Production-ready Backend Services youtube.com/playlist?list=…

English

Stats Wire@StatsWire·2d

Most people think ChatGPT instantly knows the answer. It doesn't. When you send a prompt, this is what happens: 1. Your text is converted into tokens 2. The model processes those tokens through billions of parameters 3. It predicts the most likely next token 4. This repeats thousands of times per second 5. The final response is generated AI doesn't "think" like humans. It predicts the next word incredibly well. That's the magic behind modern LLMs. 🔄 Repost if this made AI easier to understand.

English

اكتشف

@bindureddy @bridgemindai @akshay_pachaar @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates