Papareddy Jayanth

816 posts

Papareddy Jayanth

Papareddy Jayanth

@Jayanth_PRG

Inscrit le Mart 2023
23 Abonnements5 Abonnés
Papareddy Jayanth retweeté
Sumanth
Sumanth@Sumanth_077·
Microsoft just turned SKILL .md into a trainable object! SkillOpt is a text-space optimizer for agent skills. Instead of hand-writing or one-shot generating your SKILL .md, SkillOpt treats the skill document as the trainable external state of a frozen agent and optimizes it through a feedback loop. The core idea: a separate optimizer model analyzes agent rollout trajectories, proposes bounded add/delete/replace edits to the skill document, and accepts only edits that strictly improve performance on a held-out validation split. Rejected edits go into a buffer as negative feedback for future iterations. The deep learning analogy is intentional. Rollout batch is your training data. Edit budget is your learning rate. Validation gate is your validation set. Rejected-edit buffer is your negative feedback signal. The optimizer runs offline. The deployed artifact is just a static SKILL .md file. Results on GPT-5.5 across 6 benchmarks: +23.5 points average over no-skill baseline in direct chat, +24.8 inside Codex, +19.1 inside Claude Code. SpreadsheetBench jumped from 41.8 to 80.7. OfficeQA from 33.1 to 72.1. Best or tied-best on 52 of 52 evaluated cells. What's striking: these gains come from just 1-4 accepted edits. The final skill stays compact at 300-2000 tokens. One accepted edit gave OfficeQA a +39 point gain. Optimized skills also transfer. A SpreadsheetBench skill trained in Codex transferred to Claude Code with a +59.7 point gain. Skills trained on GPT-5.4 improved every smaller GPT variant tested. Key capabilities: • Text-space skill optimization with no model weight updates • Bounded add/delete/replace edits with validation gating • Rejected-edit buffer as negative feedback • Epoch-wise slow/meta update for longer-horizon learning • Works across Claude Code, Codex, and direct chat harnesses • Optimized skills transfer across models, harnesses, and benchmarks 100% Open Source I've shared the link to the paper and repo in the comments!
Sumanth tweet media
DailyPapers@HuggingPapers

Microsoft just released SkillOpt Train agent skills like neural networks — in text space, without touching model weights. Best or tied-best in 52/52 settings across 6 benchmarks and 7 models.

English
8
38
146
16.2K
Papareddy Jayanth retweeté
DailyPapers
DailyPapers@HuggingPapers·
NVIDIA just released an optimized version of the Kokoro TTS model on Hugging Face A lightweight 82M parameter speech synthesizer ready for commercial use, running fast on NVIDIA GPUs via ONNX Runtime. huggingface.co/nvidia/kokoro-…
English
5
48
452
25.8K
Papareddy Jayanth retweeté
Sumanth
Sumanth@Sumanth_077·
Self Improving AI (SIA) beats Karpathy's autoresearcher agent by improving itself! SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task. Most agent frameworks are static. Fixed harness, fixed model weights, fixed memory layer. They plan, act, and use tools. SIA operates on a different layer entirely. SIA focuses on one problem: how do you design structured feedback loops that allow an agent to evaluate its own performance, adapt its strategy, and get better over time? After every run, SIA evaluates itself and improves three things. It updates its own harness. Updates the weights of its underlying model. Updates its own memory layer to handle new complexities. The agent rewrites itself based on what it learned. On MLE-Bench, OpenAI's benchmark for evaluating an agent's ability to train ML models, SIA climbed to the top of the leaderboard. Beat every specialized ML research agent including MLEvolve and AIRA-dojo. Then kept improving and displaced its own previous versions on the leaderboard. I've shared the link to the paper and the repo in the replies!
Sumanth tweet media
Kunal Bhatia@kunalbhatia91

Superintelligence will be built on Self Improvement. Today @hexoai, we’re excited to release ‘SIA’ - an open-source Self-Improving AI, to achieve any goal through recursive self improvement. While trying to solve a problem, SIA doesn't just improve it's abilities by updating it's harness, it updates it's own weights as well.

English
21
174
987
86.2K
Papareddy Jayanth retweeté
Sumanth
Sumanth@Sumanth_077·
Run your personal AI company with a team of AI agents! Alook is an open-source collaboration platform for AI coding agents. Self-hosted and local-first. The setup: Define an org structure. Give each agent a role - dev, ops, research, whatever you need. Set reporting lines. Alook gives each agent an email address. How it works: Assign a task to the right agent. They take it from there. Agents coordinate through email - passing deliverables, asking questions, updating status. You see everything in your inbox but you're not routing anything manually. Runs as an always-on daemon. Close your laptop, agents keep working. Come back to finished tasks. Shared memory across all agents. Every agent knows what every other agent worked on. You never re-explain context. After each task completes, Alook logs what worked and builds SOPs. The whole team gets sharper over time. Works with Claude Code, Codex, and OpenCode. Mix and match or run multiple agents from one runtime. Built-in Kanban for task tracking. Calendar for scheduling. Email for all communication. Agents pick up tasks autonomously, update their own calendars, close issues when done. Chat or email with agents like any AI tool. Install the runtime once, runs in the background. No terminal needed after setup. Key capabilities: • Email-based agent coordination with real inboxes • Org structure with roles and reporting lines • Shared memory and self-learning SOPs • Always-on daemon for 24/7 operation • Works with Claude Code, Codex, OpenCode • Built-in Kanban, calendar, and email • Self-hosted and local-first 100% open source. I've shared the Github Repo in the replies!
Sumanth tweet media
English
10
14
38
9.8K
Papareddy Jayanth retweeté
ClaudeDevs
ClaudeDevs@ClaudeDevs·
We’ve shipped a security-guidance plugin for Claude Code that helps identify and fix vulnerabilities as you’re writing code. Available for all Claude Code users. Install from the plugin marketplace (/plugins).
English
372
1.7K
18K
2M
Papareddy Jayanth retweeté
Sumanth
Sumanth@Sumanth_077·
Turn any GUI app into a CLI for AI agents! CLI-Anything automatically generates command-line interfaces for any software by analyzing its codebase. Point it at GIMP, Blender, LibreOffice, or any application and get a production-ready CLI that AI agents can use. The problem: AI agents can't use professional software. These tools have GUIs, not command interfaces. UI automation breaks constantly. APIs don't exist or miss 90% of functionality. Building custom CLIs manually takes months. CLI-Anything runs as a Claude Code plugin. You call `/cli-anything ./gimp` and it analyzes the codebase, designs command architecture, implements the full CLI, writes comprehensive tests, and publishes everything as a pip-installable package. The generated CLIs call the actual software backends. Blender renders 3D scenes, LibreOffice generates PDFs, Zoom schedules meetings. How it works depends on what the software provides. Tools with REST APIs get wrapped with OAuth handling. Tools with Python scripting get script generators. Tools with documented file formats get file creators plus rendering calls. Each CLI works the same way for agents. Commands return structured JSON. REPL mode for interactive use. pip-installable to PATH. Comprehensive tests included. Tested across 16 diverse applications: GIMP, Blender, Inkscape, Audacity, LibreOffice, OBS Studio, Kdenlive, Shotcut, Zoom, Mermaid, ComfyUI, Ollama. 1,839 passing tests with 100% pass rate. Key capabilities: • Automated CLI generation from any codebase • Calls actual software backends (no replacements) • Multiple integration approaches (APIs, scripting, file formats) • Production-ready with comprehensive tests • pip-installable with JSON output for agents • REPL mode for interactive use Built as Claude Code plugin. Works with OpenCode, Codex, OpenClaw, Qodercli. I've shared the link to the repo in the comments!
Sumanth tweet media
English
7
20
47
5.3K
Papareddy Jayanth retweeté
Sebastian Raschka
Sebastian Raschka@rasbt·
Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib. With motivation, overview, and GPT-style model reference implementation as standalone example code: github.com/rasbt/LLMs-fro…
Sebastian Raschka tweet media
English
43
241
1.8K
72.5K
Papareddy Jayanth retweeté
Sumanth
Sumanth@Sumanth_077·
Turn any document into structured data for AI agents! Firecrawl just released a new parse endpoint. Upload local files or non-public documents and get back clean, LLM-ready data. The parse endpoint converts PDF, DOCX, XLSX, HTML, and other formats into Markdown, JSON, or structured output. Reading order and tables are preserved. Upload a file via multipart/form-data. The endpoint processes it using a Rust-based engine (up to 5x faster) and returns your chosen format. Key capabilities: • Multiple output formats: Markdown, JSON, HTML, summaries, extracted links, or metadata • Preserves document structure, reading order, and tables • Extracts metadata automatically (title, description, language) • Zero data retention option (document not logged or stored) • Content filtering via includeTags and excludeTags Built for AI agent pipelines that need clean document data at scale. I've shared the link in the comments!
Sumanth tweet media
English
9
19
53
5.9K
Papareddy Jayanth retweeté
alphaXiv
alphaXiv@askalphaxiv·
“Probabilistic Tiny Recursive Model” This paper makes Tiny Recursive Models stochastic at test time by adding Gaussian noise, running parallel rollouts, and using the existing Q head to pick the best answer. With no retraining and no task-specific tricks, its PPBench jumps from 62.6% to 91.2%, while Sudoku-Extreme jumps from 87.4% to 98.75%.
alphaXiv tweet media
English
6
71
461
19.2K
Sumanth
Sumanth@Sumanth_077·
Stop guessing which models fit in your VRAM! llmfit is a CLI tool that auto-detects your hardware and ranks 206 models by what actually runs on your system. You download a 70B model and hope it fits. Or you estimate memory requirements across quantization levels and still end up with models that crash or run too slow. llmfit changes that. It detects your CPU, RAM, GPU, and VRAM, then scores every model in its database against your hardware. Instead of assuming one quantization level, it tries the best quality that fits. Starts with Q8_0, walks down to Q2_K if needed. If nothing fits at full context, it tries half context. You get the highest quality model that actually works. Each model gets scored on Quality, Speed, Context, and Capability. The weights shift based on what you're doing. Chat models prioritize speed, reasoning models prioritize quality. Run it as an interactive TUI to browse models, use CLI mode for a quick table, or get JSON output for scripts. There's a REST API for cluster schedulers. You can also run it in reverse. Give it a model you want to run and target performance, it tells you what hardware you need. The real value: you see ranked options before downloading anything. No more burning bandwidth on 50GB models that won't run. It's 100% open source. Link to llmfit in comments!
Sumanth tweet media
English
10
47
233
14.6K
Papareddy Jayanth retweeté
Sumanth
Sumanth@Sumanth_077·
Open-source framework for building real-time voice AI agents! Pipecat is a Python framework for orchestrating audio, video, AI services, transports, and conversation pipelines. Voice-first architecture with pluggable components. What you can build: voice assistants, AI companions, multimodal interfaces, interactive storytelling, business agents (customer support, intake), and complex dialog systems. The framework handles speech recognition, text-to-speech, conversation logic, and real-time interaction. WebRTC and WebSocket transport built in. Ultra-low latency for natural conversations. Why Pipecat: • Voice-first: Integrates STT, TTS, and conversation handling in one framework • Pluggable: Supports multiple AI service providers for each capability • Composable pipelines: Build complex behavior from modular components • Real-time: Low-latency interaction with streaming audio/video Supported services: • Speech-to-Text: Deepgram, AssemblyAI, OpenAI Whisper, Groq, Azure, AWS, Google, and more • LLMs: OpenAI, Anthropic, Gemini, Groq, Mistral, Ollama, AWS, Azure, and more • Text-to-Speech: OpenAI, ElevenLabs, Deepgram, Cartesia, Azure, AWS, Google, and more • Speech-to-Speech: OpenAI Realtime, Gemini Multimodal Live, AWS Nova Sonic, Ultravox, Grok Voice Agent 10.3k+ stars on GitHub. I've shared link to the repo in the comments!
Sumanth tweet media
English
12
21
128
8K
Papareddy Jayanth retweeté
alphaXiv
alphaXiv@askalphaxiv·
new longcat paper! “Look Before You Leap” LLM agents often fail because they act before they understand the environment. So this paper introduces Exploration Checkpoint Coverage, a verifiable reward for discovering key states, objects, affordances, and constraints. With interleaved GRPO, agents learn to explore first, summarize grounded environment knowledge, then act, making Explore-then-Act reliably improve task success instead of adding noisy context.
alphaXiv tweet media
English
5
17
130
6.9K
Papareddy Jayanth retweeté
Sumanth
Sumanth@Sumanth_077·
Lightning-fast Multilingual TTS that runs entirely on your device! Supertonic is a lightning-fast, on-device multilingual text-to-speech system designed for local inference with minimal overhead. The model runs via ONNX Runtime with 66M parameters. Generates speech up to 167x faster than real-time on consumer hardware. Complete privacy, zero network dependency, all processing happens locally. Supports 31 languages including English, Korean, Spanish, Portuguese, French, German, Japanese, Chinese, Arabic, Dutch, and more. Natural text handling without pre-processing. Directly processes numbers, dates, currency, abbreviations, and complex expressions. Performance on M4 Pro CPU: 1263 characters per second for long text, real-time factor of 0.012. WebGPU mode reaches 2509 characters per second. RTX 4090 hits 12,164 characters per second. Natural text handling works on financial expressions ("$5.2M" pronounced correctly as "five point two million dollars"), time and dates ("4:45 PM on Wed, Apr 3, 2024"), phone numbers with extensions, and technical units with abbreviations. All without phonetic annotations or text normalization. Voice Builder lets you turn your voice into a deployable TTS model with permanent ownership and edge-native deployment. Key capabilities: • Ultra-lightweight (66M parameters) • On-device inference with zero latency • Natural text handling without pre-processing • 31-language multilingual support • Cross-platform via ONNX Runtime • Up to 167x faster than real-time • Complete privacy - all local processing • Custom voice creation with Voice Builder • Expression tags for natural human nuance It's 100% Open source I've shared the link in the replies!
Sumanth tweet media
English
4
11
24
3.7K
Sumanth
Sumanth@Sumanth_077·
@Jayanth_PRG Check out the more details in the blog that I've shared.
English
1
0
2
58
Sumanth
Sumanth@Sumanth_077·
Stop parsing documents you don't need! LandingAI just released ADE Classify, a page-level classification API that sits before your parser. It labels every page in a document so you only parse what matters. The problem: Enterprise workflows get mixed bundles. A 50-page mortgage application PDF contains two invoices, three bank statements, and 45 pages of noise. If you parse everything, you waste compute on irrelevant pages. Your extraction models also lose focus trying to pull invoice totals from a driver's license. ADE Classify acts as a triage layer. Before heavy parsing begins, it evaluates the document page-by-page and assigns each page a class. You define custom document classes with optional descriptions. The API evaluates every page concurrently, assigns a class, and provides reasoning. Your pipeline uses these labels to route pages appropriately. Why this matters: • Cost filtering: An insurance claim packet has 100 pages. Classify identifies the 10 medical records and 2 invoices. Discard the remaining 88 before expensive parsing. • Dynamic routing: Different document types need different workflows. Send bank statements to verification, invoices to financial extraction, unknowns to human review. • Zero-shot accuracy: Pass explicit descriptions. The model understands the difference between a receipt and a formal invoice without custom training. • Explainable decisions: Every classification includes reasoning. Unknown pages get flagged with suggested classes for human review. The architecture: Classify → Filter/Route → Parse only relevant pages → Extract. Prevents wasted compute and stops extraction hallucinations on irrelevant content. I've shared the link in the replies!
Sumanth tweet media
English
4
17
42
3.1K