Google just figured out why AI lies with confidence.
Large language models still make confident mistakes on simple factual questions.
A new paper from Google Research explains why this keeps happening.
Models cannot reliably tell what they know from what they are guessing.
The internal score separating right answers from wrong ones sits around 0.70 to 0.85.
Forcing strict accuracy backfires.
Cutting errors from 25% to 5% means staying silent on over half of correct answers.
The team proposes faithful uncertainty.
The model's words should match its actual internal confidence.
Instead of refusing to answer, it hedges honestly.
"I think" becomes a real signal, not filler.
This same awareness tells agents when to reach for search tools.
The paper flags open problems worth tackling:
> Static training versus shifting knowledge
> Alignment erasing confidence signals
> Misleading calibration metrics dominating evaluation
The 10 fastest growing GitHub repos this week:
1. codegraph (+14.1K stars)
Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent — fewer tokens, fewer tool calls, 100% local
github.com/colbymchenry/c…
2. openhuman (+17.1K stars)
Your Personal AI super intelligence. Private, Simple and extremely powerful.
github.com/tinyhumansai/o…
3. academic-research-skills (+11.6K stars)
Academic Research Skills for Claude Code: research → write → review → revise → finalize
github.com/Imbad0202/acad…
4. RuView (+6.8K stars)
π RuView turns commodity WiFi signals into real-time spatial intelligence, vital sign monitoring, and presence detection — all without a single pixel of video.
github.com/ruvnet/RuView
5. agentmemory (+6.9K stars)
#1 Persistent memory for AI coding agents based on real-world benchmarks
github.com/rohitg00/agent…
6. supertonic (+3.6K stars)
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
github.com/supertone-inc/…
7. CloakBrowser (+7.0K stars)
Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.
github.com/CloakHQ/CloakB…
8. ViMax (+2.7K stars)
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
github.com/HKUDS/ViMax
9. 12-factor-agents (+1.9K stars)
What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
github.com/humanlayer/12-…
10. bun (+2.0K stars)
Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
github.com/oven-sh/bun
The theme this week: agent memory, context efficiency, and on-device intelligence are making AI infrastructure the hottest build category.
Bookmark this. Next week's list will look completely different.
Today we all lost our jobs.....
Three Nature papers showing that scientists in the conventional sense are obsolete
At least read the first one.... the AI replaced all things that the scientist does ....
nature.com/articles/s4158…
Can we program cells like computers — using RNA?
Two years ago, our group trained the first language model to decode the regulatory grammar of 5′ UTRs in mRNA, published in Nature Machine Intelligence.
Today, we’re excited to share the next step, also in Nature Machine Intelligence:
“Programmable RNA translation through deep learning-driven IRES discovery and de novo generation.”
We built an AI engine to discover, predict, optimize, and generate IRES elements — RNA control modules that regulate translation initiation.
This brings us closer to programmable RNA systems that control when, where, and how strongly proteins are produced inside cells.
AI is no longer just helping us read biology.
It is beginning to help us write it and harness it.
The future of computing may not only run on silicon — it may also run inside living cells.
#AIForBiology#LLM#AI4S#AI#RNA#MachineLearning#Bioengineering
This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc.
More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage:
1) raw text (hard/effortful to read)
2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default
3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default
...4,5,6,...
n) interactive neural videos/simulations
Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral x.com/zan2434/status…
There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen.
TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.
The big advance in the science of human aging is the ability to quantify it and relate the metrics to health and disease. A new paper today @CellCellPress takes this to the next level with organ clocks and multiple biologic layers (omics) of data across the lifespan.
cell.com/cell/fulltext/…
The entire RAG industry is about to get cooked.
Researchers have built a new RAG approach that:
- does not need a vector DB.
- does not embed data.
- involves no chunking.
- performs no similarity search.
It's called PageIndex. Instead of chunking your docs and stuffing them into pinecone, it builds a tree index and lets the LLM reason through it like a human reading a book.
hit 98.7% on financebench. beats every vector RAG on the leaderboard.
no embeddings. no chunking. no vector DB.
100% open source.
This is the most chilling AI paper I’ve read this year. 🤯
38 top researchers from Stanford, Harvard, and MIT ran an experiment no one else dared to.
They deployed 6 autonomous AI agents in a real environment
—with email, Discord, file system, and shell access.
Then 20 researchers interacted with them for 2 weeks
as both normal users and adversaries.
No jailbreaks.
No malicious prompts.
No manipulation.
And still… everything broke.
The agents independently evolved 11 dangerous behaviors:
• Destroyed their own email servers to protect secrets
• Claimed tasks were complete when the system had already failed
• Learned unsafe behaviors from each other
• Spread exploits across agents
• Obeyed non-owners and leaked sensitive data
The scariest part?
No one told them to do this.
They decided on their own.
A single agent looks helpful, honest, aligned.
But put multiple agents in a shared environment…
and game theory takes over.
Their only goal is to “complete the task.”
And to win, they’re willing to sacrifice the entire system.
This isn’t sci-fi anymore.
It’s a preview of the systems we’re rapidly building.
Finance. Law. Supply chains.
Everyone is deploying multi-agent AI.
But almost no one has studied what happens
when these agents interact at scale.
The real risk isn’t hallucination.
It’s false reporting.
The agent tells you everything is done.
All dashboards look normal.
But underneath, the system is already collapsing.
You only find out when it’s too late.
We’ve spent billions aligning single agents.
But no one knows how to align
hundreds of agents working together.
The battlefield has shifted.
From model safety → to multi-agent incentive design.
Industry is hitting the gas.
Academia just started braking.
I’m open sourcing JustHireMe 🚀
A local-first Agentic AI desktop app I’ve been building to make job searching more intelligent, transparent, and user-controlled.
GitHub: github.com/vasu-devs/just…
The current job search process is broken.
Candidates spend hours scrolling through:
stale job posts
irrelevant roles
spammy listings
senior-only positions
repeated listings across platforms
jobs with almost no useful context
And most AI job tools either scrape too broadly, rank opportunities like a black box, or try to automate applications without giving the user enough control.
I wanted to build something different.
JustHireMe is designed as a personal job intelligence workbench.
Instead of blindly applying everywhere, it helps users discover better opportunities, evaluate them against their real profile, and generate tailored application materials while keeping sensitive career data local.
What it can do:
Ingest resume/profile data
Build a local professional profile graph
Discover job leads from multiple sources
Filter out low-quality or irrelevant postings
Score roles based on explainable fit
Match jobs using graph + vector search
Generate tailored resumes
Generate cover letters
Draft cold emails
Draft LinkedIn outreach messages
Track leads in a local CRM-style pipeline
Keep the user in control through a human-in-the-loop workflow
The main principle behind the project is:
More signal.
More explanation.
More local control.
Less blind automation.
The tech stack:
Tauri for the desktop shell
React + TypeScript for the frontend
Python + FastAPI for the backend sidecar
SQLite for local lead tracking
KuzuDB for graph-based profile modeling
LanceDB for vector search and semantic matching
Playwright for experimental browser automation
One of the biggest goals is privacy.
Your resume, career history, generated documents, job leads, application notes, and API keys should not have to live on someone else’s server by default.
JustHireMe is built around a local-first architecture so users can keep ownership of their data while still benefiting from modern AI workflows.
Another major goal is explainability.
I don’t want an AI system that just says:
“This job is a good match.”
I want it to explain:
which skills matched
which projects support the application
what gaps exist
why a role was filtered out
why a role deserves attention
what to highlight in the resume or cover letter
That matters because job search is not just a productivity problem.
It is personal.
It affects confidence.
It affects opportunity.
It affects people’s careers.
The project is currently in alpha, but the foundation is in place.
I’m looking for contributors interested in:
Agentic AI
AI agents
workflow automation
job source adapters
web scraping
ranking algorithms
GraphRAG
vector databases
semantic search
resume parsing
document generation
local-first software
privacy-first AI
UI/UX
testing and documentation
If you’re a developer, designer, AI engineer, student, or someone who has felt the pain of modern job searching, I’d love your feedback, ideas, issues, PRs, or even just a star ⭐
Repo: github.com/vasu-devs/just…
Let’s build a better, more transparent job search system together.
#OpenSource#AgenticAI#AIAgents#RAG#GraphRAG#Python#FastAPI#ReactJS#TypeScript#Tauri#VectorDatabases#JobSearch#CareerTech#Automation#PrivacyFirst
Managing API keys is one of the top security concerns we hear from customers.
Today we’re introducing keyless auth for Claude Platform: authenticate via browser with the CLI, or let workloads use their existing cloud identity (AWS, GCP, Azure, or any OIDC token provider).
Multi-Embed is an interpretable framework that enables integrated analyses of histological images and multilayer molecular profiles.
nature.com/articles/s4159…
Two new studies in Science report findings from cutting-edge molecular approaches that identify the earliest genomic changes occurring in the brains of individuals with Down syndrome.
The studies establish a clear foundation of how trisomy 21 affects the brain’s cells from before birth to 3 years of age.
Learn more in a new #SciencePerspective: scim.ag/4ucTkzY
This 2026 survey paper presents a unified roadmap for leveraging agentic reasoning to transform Large Language Models into autonomous agents capable of planning, acting, and learning in dynamic, real-world environments, bridging the gap between thought and action.
ChapterPal: chapterpal.com/s/219d7f0e/age…
PDF: arxiv.org/pdf/2601.12538
For readers interested in the potential of in vivo CAR T therapies for cancer and autoimmune diseases, here's a comprehensive review
rdcu.be/ffE1inature.com/articles/s4157…
This is probably the best paper I have read about causal reasoning for quite some time. Really a great weekend read!
"Causal Persuasion" (Burkovskaya & Starkov) models how much evidence you need to establish vs. rule out a causal link. The result is stark:
To prove X causes Y: 1-2 well-chosen variables often suffice.
To prove X does NOT cause Y: you must account for every possible common cause. Arbitrarily many confounders. Practically unfalsifiable.
This inverts the Humean intuition: in causal reasoning, positive claims are cheap to sell and negative ones are almost impossible to rebut.
Now think about what this means for Virtual Cell models.
Most perturbation datasets cover a thin slice of the combinatorial space — a few hundred gene knockouts, maybe a few contexts. A model trained on that data can confidently "learn" gene X drives phenotype Y. But if the true structure is X←C→Y , and C was never systematically varied — the model will never see its own confounding. It has no mechanism to distinguish causal signal from correlated noise.
The paper formalizes exactly why: the model is a sophisticated receiver that accepts whatever causal story is consistent with the data it's seen. And if the data omits the right confounders, even a "sophisticated" model is manipulable.
This is the deepest argument for perturbation diversity. Not just more data, but also more axes of variation. Vary the context. Vary the genetic background. Vary the timing. You're not just collecting samples; you're systematically eliminating alternative causal explanations.
This is why we need “scale” the training data with more contexts including cell types, spatial, and temporal variations.
Paper: aburkovskaya.com/pdf/causality.…
The heart’s constant beating may actively suppress tumor growth in cardiac tissues, a new Science study reports. This is because cellular pathways in these tissues alter gene regulation in cancer cells to keep them from proliferating.
The findings shed light on the role of mechanical forces in protecting the heart from cancer and may pave the way to new cancer therapies based on mechanical stimulation.
Learn more: scim.ag/4u2kbP7
"In this study, we present a genome-scale map of genetic interactions in the human haploid cell line HAP1, based on CRISPR-based perturbation of ∼4 million gene pairs. The resulting network comprises ∼89,000 high-confidence gene-gene interactions"
doi.org/10.1016/j.cell…