Neo AI

471 posts

Neo AI banner
Neo AI

Neo AI

@withneo

First autonomous AI Engineer (34.2% on MLE Bench)

San Francisco, CA Katılım Aralık 2022
19 Takip Edilen10.1K Takipçiler
Sabitlenmiş Tweet
Neo AI
Neo AI@withneo·
Introducing NEO: The first Autonomous Machine Learning Engineer. It works like a full-stack ML engineer that never sleeps: handling data exploration, feature engineering, training, tuning, deployment, and monitoring, end to end. Powered by 11 specialized agents, NEO runs autonomously, saving ML engineers thousands of hours and making them 10x faster. Benchmarks: Tested on 75 Kaggle competitions, NEO scored a medal in 34.2% of them, significantly outperforming Microsoft’s RD Agent (22.4%) on OpenAI's MLE Bench. This sets a new state of the art for autonomous ML systems. NEO runs on a novel multi-agent orchestrator, powered by a multi-step reasoning engine, context transfer protocol, and agent memory, built to solve complex workflows end to end. And with human-in-the-loop mode, you can guide, inspect, and override any step. You're always in command. NEO is built for real world workflows and ready for production. NEO is here to make every ML engineer truly superhuman. Watch NEO in action:
English
29
56
368
858.1K
Neo AI
Neo AI@withneo·
Examples & Results : When tested on the Iris dataset, the entire end-to-end pipeline (profiling, training 4 models, and generating the PDF) ran in under 7 seconds! ⚡ 📊 Key Results: 🏆 Logistic Regression won with a 0.9733 accuracy. ✅ 3 out of 5 AI-generated hypotheses were fully supported by numeric evidence. 🔍 The pipeline successfully identified that the top 2 features (petal length and width) explained 83.1% of the predictive signal.
Neo AI tweet media
English
1
0
0
61
Neo AI
Neo AI@withneo·
What if you could drop in a raw CSV and get a publication-ready ML research report, complete with AI-generated hypotheses, a 4-model comparison, and feature charts in just 60 seconds? 🚀 Built autonomously by NEO to solve the tedious data science loop, here is how the Autonomous ML Research Loop works 🧵👇
English
1
0
1
169
Neo AI
Neo AI@withneo·
The tool is designed to work exactly how you want it to. You can power it using an OpenRouter API key, use Google AI Studio, or even run it locally using Ollama Just want to test the UI? It comes with a built-in Mock mode that requires no API keys at all and returns realistic demo responses
English
1
0
0
168
Neo AI
Neo AI@withneo·
Are you looking for a streamlined way to interact with cutting-edge vision AI? Gemma 4 Vision Studio is a powerful web application that combines four distinct vision AI capabilities into a single, intuitive interface powered by Gemma 4 Built autonomously by NEO, this tool offers a versatile playground for image analysis and generation.
English
1
1
3
500
Neo AI
Neo AI@withneo·
Stop chunking your documents! 🛑 Traditional RAG breaks data into fragments, causing AI models to lose crucial context. Cache-Augmented Generation (CAG) : A RAG-less document QA system that loads entire documents into an LLM's KV cache once, saves it to disk, and restores it instantly before every query. No embeddings, no vector DB, no chunking. NEO autonomously wrote, debugged, and tested all the code, fixed 9 bugs across CUDA/Python/shell, and ran 11 GPU validation tests end-to-end.
English
2
0
3
572
Neo AI
Neo AI@withneo·
Starting a new LLM session usually means manually copy-pasting files, git diffs, and shell errors. It's a repetitive and incomplete process. To solve this, NEO built Context Aggregator: A tool to automate ambient context collection for LLM pipelines. Instead of manual updates, Context Aggregator runs a lightweight background daemon that continuously polls your development environment. It automatically gathers data from five default sources: file watchers, shell history, clipboard, browser tabs, and git activity. How does it manage LLM token limits? Collected chunks are fed into a rolling context window managed by a priority queue. It uses a configurable token budget (defaulting to 4096 tokens) and evicts the lowest-scoring chunks based on recency and change magnitude. The output is a structured, queryable JSON object containing your recent commits, shell commands, and file snippets.
Neo AI tweet media
English
1
1
3
346
Neo AI
Neo AI@withneo·
Stop guessing your LLM's ROI. The Context Cost Map orchestrates API calls (via OpenRouter) with 3 trials per context size for statistical reliability. It automatically tracks binary accuracy, latency, and USD cost, instantly generating interactive HTML subplots to visualize performance inflection points
English
1
0
1
122
Neo AI
Neo AI@withneo·
Long context windows mean nothing if the model forgets what it read. The tool successfully maps exactly where models break down For example, deepseek/deepseek-chat is fast and cheap, but its accuracy plummets from 100% to just 67% the moment it crosses the 24K token threshold.
Neo AI tweet media
English
1
0
1
121
Neo AI
Neo AI@withneo·
Speed and accuracy are great, but what are you paying for them? The benchmark revealed that openai/gpt-5.4-pro costs an average of $0.69278 per run, compared to the Xiaomi model's $0.00903 Both hit 100% accuracy, meaning GPT-5.4-Pro charges a massive 77× cost premium with zero retrieval benefit.
Neo AI tweet media
English
1
0
1
125
Neo AI
Neo AI@withneo·
How is this measured? Context Cost Map runs a rigorous "Needle-in-Haystack" evaluation. The tool dynamically generates filler text to reach target sizes from 1K up to 128K tokens, hides a secret target fact "DELTA-7", and forces the LLM to retrieve it
English
1
0
1
100
Neo AI
Neo AI@withneo·
How fast is your "fast" model when pushed to the limit? It is not just about whether an LLM can find the information, but how quickly it can start delivering it. NEO built Context Cost Map : A Python tool that maps accuracy, cost, and latency. By precisely tracking the "time to first token" across varying context sizes, the Context Cost Map tool exposes the real-world speed of models under pressure 5 models tested across 9 context sizes (1K-64K) with 3 trials each (135 API calls total).
English
1
0
1
403
Neo AI
Neo AI@withneo·
Did you know a standard 768x768 attention matrix has over 589,000 parameters, but its effective, meaningful rank is often below 64? Transformer models are dominated by massive, over-parameterized attention layers Traditional quantization methods (like int4) help reduce memory, but the matrix-vector multiplication still touches every single element This means you get zero real FLOP savings on CPUs or edge devices, and post-compression fine-tuning becomes impossible 🏥Meet LatencySurgeon It uses Singular Value Decomposition (SVD) and Tucker decomposition to extract the real information content of each attention layer By keeping only the top singular values (which capture 95%+ of the matrix's energy) and throwing away the noise, it physically shrinks the massive weight matrix into two highly efficient smaller ones The Highlights: 🚀 Up to 40% inference speedup for your models in just 3 lines of code 💻 True CPU-native speedups through fewer FLOPs, no special kernels or CUDA required 🛠️ 100% fine-tune friendly because your weights stay in dense float32 Linear layers 🧩 Highly composable, you can even stack it with int8 quantization for up to a 1.65x combined speed boost! The wildest part? 🤖 LatencySurgeon was built completely autonomously using NEO
English
1
1
3
367