@AlphaSignalAI Fan out and merge is also great for divergent and convergent sequences, something specifically for fresh context windows bringing fresh eyes
An open-source model just hacked Chrome better than nation-state hackers.
A new paper introduces AgentFlow, a system that automatically designs teams of AI agents to hunt software bugs.
Most setups wire agents together by hand.
One analyst, one explorer, one verifier, fixed forever.
AgentFlow searches across every knob at once.
Which roles exist, what prompts they use, how they talk, what tools they touch.
When a trial fails, it reads the actual crash logs and coverage maps.
Then it rewrites the weakest agent.
Running on Chrome with a mid-tier open-weight Chinese model, the framework uncovered ten zero-days.
Two were Critical sandbox escapes, meaning a single webpage could take over your machine.
The setup spun up 192 parallel explorers across seven browser subsystems.
All bugs were confirmed by Google.
What makes it work:
> Typed graph language for harnesses
> Runtime feedback instead of pass/fail
> Structural checks catch broken proposals
Human data annotators just became the next job AI replaces.
AI models need huge amounts of training data.
But for specialized tasks like fraud detection or medical analysis, real data is scarce, private, or expensive to label.
Google Research just dropped Simula, a framework that generates synthetic datasets from scratch using reasoning.
Instead of starting with seed examples or random prompts, it acts like an architect.
It plans the dataset structure first, then fills it in with controlled coverage and complexity.
Here's what makes it different:
> Seedless and fully agentic
> Fine-grained control over quality
> Explainable generation choices
> Scales without human annotation
The framework already powers real products.
It trained safety classifiers for Gemini, scam detection on Android calls, and spam filtering in Messages.
It also fueled specialized models like ShieldGemma and MedGemma.
The shift is treating data creation as a controllable science, not a guessing game.
The fix for agents losing context is a folder of markdown files.
AGENTS.md falls apart as codebases grow.
A single flat file buries design decisions and forces agents to hallucinate missing context.
Lat.md replaces it with a knowledge graph.
It lives in a lat.md/ directory at your project root.
Markdown files describe architecture, business logic, and test specs.
Wiki-style links connect sections to each other, point into source code symbols, and tie implementation back to concepts through inline comments.
The CLI keeps everything consistent:
> lat init scaffolds the directory
> lat check validates all references
> lat search runs semantic queries
> lat section navigates the graph
Agents stop grepping blindly through files.
They search the graph to find decisions, constraints, and domain context in seconds.
Knowledge from past sessions persists instead of vanishing.
Test specs can require backlinks from test code, so coverage gets enforced automatically.
Installs with one npm command.
An AI just invented a virus part that doesn't exist on Earth.
A team at Stanford and Arc Institute trained a language model on DNA instead of text.
Then asked it to write a virus from scratch.
It wrote hundreds. 16 came out alive in the lab.
The model is called Evo 2.
They prompted it with a known phage that infects E. coli, then synthesized the top candidates and tested them on real bacteria.
Several outperformed the original.
They killed bacterial cells faster and grew better in competition.
One generated phage uses a DNA packaging protein unlike anything found in nature.
The system invented biological machinery evolution never produced.
What stands out:
> Whole genomes written from scratch
> Open-source paper and method
> Cocktail beat resistant E. coli strains
> Trained on two million phage genomes
This shifts AI pathogen design from theoretical debate to published wet-lab result.
A former MIT researcher just mapped the path from a worm to a digital human brain.
The plan scales from a 302-neuron worm to 86 billion neurons.
Three technologies are making this tractable.
1. High-resolution imaging now maps neurons at scale
2. Functional scans capture whole brains in young fish
3. Biologically accurate neuron models can run on GPUs
Connectomics cost dropped from $16,500 per neuron to $100.
A complete fruit fly brain with 140,000 neurons has already been reconstructed.
Rough estimates suggest simulating a human brain needs 600 exaFLOP/s of compute.
That is roughly 50,000 H100 chips.
One major AI lab already runs over 200,000.
The real bottleneck is data.
Hundreds of next-gen microscopes must run for years to stain receptors and map connectivity.
Early worm and fish emulations are already live.
Someone built an agent that fixes its own bugs on autopilot.
Agents fail in production constantly.
Hallucinated tool calls, refusal loops, redundant arguments.
Fixing these usually means engineers digging through thousands of traces by hand.
A new open-source framework called HALO automates the whole thing.
It uses a specialized reasoning language model to read execution traces, find systemic failures, and feed those findings to a coding agent that rewrites the harness.
The loop repeats until performance plateaus.
The results on AppWorld, a benchmark covering multi-app tasks like Spotify and Venmo:
> Sonnet jumped 73.7 to 89.5
> Gemini 3 Flash climbed 36.8 to 52.6
> Test split confirmed no overfitting
> Findings verified against raw traces
Why a specialized model instead of general-purpose tools?
Traces get massive, and broad coding agents tend to overfit to single errors instead of catching harness-wide patterns.
The full framework, evals, and data are open source.
Google DeepMind just turned image generators into the best vision model.
They introduced Vision Banana.
It's a single model that treats every vision task as image generation.
Segmentation, depth estimation, 3D understanding, all framed as pictures to draw.
The approach is simple but radical.
Take a generative image model, mix a tiny amount of vision task data into its training, and let it learn.
The outputs are RGB images that can be decoded back into measurable geometry or labels.
Results that stand out:
> Beats Segment Anything 3 on segmentation
> Rivals Depth Anything on metric depth
> Handles both 2D and 3D tasks
> Keeps original image generation intact
Human evaluations showed 53.5% win rates against the base model on text-to-image.
This suggests generative pretraining plays the same role for vision that next-token prediction plays for language.
One interface. Every visual task.
Researchers just taught AI to think 12x faster without using words.
Reasoning chains are powerful but expensive.
Every token a model "thinks" costs time and money.
A new paper called Abstract Chain-of-Thought proposes a fix.
Instead of reasoning in full sentences, the model invents its own compressed language.
It uses reserved placeholder tokens like as shorthand for entire thoughts.
The result is up to 11.6x fewer reasoning tokens with comparable accuracy.
Training happens in two stages:
1. A warm-up loop teaches the model what these abstract tokens mean using a teacher's verbal reasoning.
2. Reinforcement learning then refines how the tokens are sequenced for better answers.
Tested on math, instruction-following, and multi-hop benchmarks, performance held up against verbal chains.
Even stranger, the abstract vocabulary started forming patterns similar to real language.
Frequent tokens dominated like common words do.