Srikar

3.6K posts

Srikar banner
Srikar

Srikar

@itsmutnuri

PhD @CS_UVA

United States Katılım Temmuz 2011
2.8K Takip Edilen489 Takipçiler
Srikar retweetledi
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
The Terence Tao episode. We begin with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion. People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops. But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long. During this time, what we know today as the better theory can often actually make worse predictions (Copernicus's model of circular orbits around the sun was actually less accurate than Ptolemy's geocentric model). And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we don’t even understand well enough to actually articulate, much less codify into an RL loop. Hope you enjoy! 0:00:00 – Kepler was a high temperature LLM 0:11:44 – How would we know if there’s a new unifying concept within heaps of AI slop? 0:26:10 – The deductive overhang 0:30:31 – Selection bias in reported AI discoveries 0:46:43 – AI makes papers richer and broader, but not deeper 0:53:00 – If AI solves a problem, can humans get understanding out of it? 0:59:20 – We need a semi-formal language for the way that scientists actually talk to each other 1:09:48 – How Terry uses his time 1:17:05 – Human-AI hybrids will dominate math for a lot longer Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify.
English
102
555
3.9K
807.7K
Srikar retweetledi
Ryan sikorski
Ryan sikorski@Ryansikorski10·
Biological Maxwell's demons (BMD) are systems that have information processing capabilities that allow them to select their inputs & direct their outputs toward targets. These "biological Maxwell's demons" operate in open systems, in the midst of a wide availability of free energy & their role consists of channeling the energy transformations governed by information.  pmc.ncbi.nlm.nih.gov/articles/PMC85… Besides coding for ubiquitous structures, minimal genomes encode a wealth of functions that dissipate energy in an unanticipated way. Analysis of these functions shows that they are meant to manage information under conditions when discrimination of substrates in a noisy background is preferred over a simple recognition process. We show here that many of these functions, including transporters and the ribosome construction machinery, behave as would behave a material implementation of the information‐managing agent theorized by Maxwell almost 150 years ago & commonly known as Maxwell's demon (MxD). A core gene set encoding these functions belongs to the minimal genome required to allow the construction of an autonomous cell. These MxDs allow the cell to perform computations in an energy‐efficient way that is vastly better than our contemporary computers. pmc.ncbi.nlm.nih.gov/articles/PMC63… It all began w/ the observation of the phenomenon of enhanced enzyme diffusion (EED),phys.org/news/2025-02-p… in which enzymes transiently move faster after catalysis. Instead of treating enhanced diffusion as a secondary effect, the researchers asked whether it could play an active functional role in chemical reactions. The researchers simulated the scenario where chemical energy generated during a catalytic reaction is utilized by the enzymes to transiently increase mobility. They tested whether this change in motion altered subsequent reactions; in particular, they studied the composition of substrates & products. In their simulation analysis, they observed that the ratio of substrate to product exhibited a clear deviation from the expected chemical equilibrium. The key insight came from recognizing that the enzyme's behavior resembled a famous thought experiment known as Maxwell's demon, which describes an imaginary being that uses information about molecular motion to create order w/out doing work, seemingly violating the second law of thermodynamics. Based on this, the researchers constructed a theoretical model where the transient increase in motility served as a "memory" of the enzyme's immediate past reaction event. The enzyme used this information to leave the product molecules, thereby eliminating the probability of the reverse reaction. This behavior disrupts the delicate balance between forward & reverse reactions and drives the system to a new steady state that deviates from the chemical equilibrium. This study overturns the traditional passive role of enzymes by showing that enzymes can process information to actively control the directionality of chemical reactions. It also provides a concrete, biological realization of the theoretical "Maxwell's demon" and suggests that nature may have been utilizing information-to-energy conversion mechanisms in biomolecules all along. phys.org/news/2026-02-e… 📄 Enzyme as Maxwell's Demon: Steady-state Deviation from Chemical Equilibrium by Enhanced Enzyme Diffusion arxiv.org/html/2503.1758… Information Thermodynamics on Causal Networks Our result implies that the ENTROPY PRODUCTION IN A SINGLE SYSTEM IN THE PRESENCE OF MULTIPLE OTHER SYSTEMS IS BOUNDED BY THE INFORMATION FLOW BETWEEN THESE SYSTEMS. Our theory is applicable to quite a broad class of nonequilibrium dynamics such as an INFORMATION TRANSFER BETWEEN MULTIPLE BROWNIAN PARTICLES & INFORMATION PROCESSING IN AUTONOMOUS NANOMACHINES. We illustrate our result by a chemical model of biological adaptation w/ time-delayed feedback. Our result implies that INFORMATION PROCESSING plays a crucial role in biochemical reactions. arxiv.org/pdf/1306.2756
Matthew Oliphant@MatthewOli52917

@Ryansikorski10 All tech is applied demonology Paul Davies Demon in the machine As usual ~ you on point 🎲🎲 💯

English
16
171
1.2K
51.5K
Srikar retweetledi
Satya Nadella
Satya Nadella@satyanadella·
We’ve trained a multimodal AI model to turn routine pathology slides into spatial proteomics, with the potential to reduce time and cost while expanding access to cancer care.
English
442
1.9K
11.4K
2.7M
Srikar retweetledi
Valerio Capraro
Valerio Capraro@ValerioCapraro·
Here's the longer version of our Nature piece. Our argument is simple: statistical approximation is not the same thing as intelligence. Strong benchmark scores often say very little about how LLMs behave under novelty, uncertainty, or shifting goals. Even more importantly, similar behaviors can arise from fundamentally different processes. In another paper, we identified seven epistemological fault lines between humans and LLMs. For example, LLMs have no internal representation of what is true. They often generate confident contradictions, especially in longer interactions, because they do not track what is actually true. Another example. Yes, LLMs have solved some open mathematical problems, but these cases typically involve applying known methods to well-defined problems. LLMs cannot invent anything that is truly new and true at the same time, because they lack the epistemic machinery to determine what is true. None of this means LLMs are useless. Quite the opposite: they are extraordinarily useful. But we should be careful about what they are and what they are not. Producing plausible text is not the same as understanding. Statistical prediction is not the same as intelligence. So despite the hype from the usual suspects, AGI has not been achieved. * paper in the first reply Joint with @Walter4C and @GaryMarcus
Valerio Capraro tweet media
English
85
187
777
137.7K
Srikar retweetledi
Surya Ganguli
Surya Ganguli@SuryaGanguli·
Our work on causal mechanistic interpretability across brains and machines: arxiv.org/abs/2603.06557 to appear at #ICLR26 expertly lead by @melandrocyte @Zaki_Alaoui1, @sunnyliu1220 w/ Steve Baccus Key idea: there are two ways to understand hidden reps in a neural network: 1) how do inputs activate them? 2) what *causal impact* do they have on output? We introduce CODEC (COntribution DECompostion) to find sparse codes for all contributions network elements make to the input output map, combining *both* input activation *and* causal output impact. In both brains and machines we find: 1) sparser codes for contributions than activations 2) separation into interpretable excitatory/inhib effects 3) improved steerability 4) elucidation of causal computations that can't be seen through activations alone For more details see the excellent thread below
melandrocyte@melandrocyte

Trying to interpret how a neural-network does what it does? Activations tell you if a neuron responded. Contributions tell you if a neuron mattered! New paper from myself, @Zaki_Alaoui1, @sunnyliu1220 , @SuryaGanguli, and Steve Baccus: arxiv.org/abs/2603.06557

English
2
46
296
28.2K
Srikar
Srikar@itsmutnuri·
@deedydas Great initiative! Do you also plan to include epubs?
English
0
0
0
20
Deedy
Deedy@deedydas·
Today, I'm excited to launch my lifelong passion project, Grand Old Books!! 🚀 There are 1000s of beautiful novels of the past, not in English, locked up in old PDFs, with no physical copies left. We started with Indian texts and brought back 12 books in 6 languages with pictures and annotations. This is, and will always be, completely free. We can't let time wash away history. Please comment to let me know what book you'd like to see added.
English
162
280
2.6K
133.8K
Srikar retweetledi
Kanaka Rajan
Kanaka Rajan@KanakaRajanPhD·
Can we predict a thought before it happens? 🧠 To know what one neuron will do next, you have to know what the entire brain is doing right now. In our latest @KempnerInst blog, @yuven_duan introduces POCO: a tool paving the way for adaptive neurotechnologies.
Kempner Institute at Harvard University@KempnerInst

🧠📈A new foundation model for #neuroscience! In the latest Deeper Learning blog, @yuven_duan and @KanakaRajanPhD describe their state-of-the-art POCO model, which accurately forecasts neural dynamics across individuals and species. bit.ly/4afqx5A #NeuroAI

English
1
15
110
11.2K
Srikar retweetledi
LAIKA
LAIKA@LAIKAStudios·
Follow the crows... all will be revealed. #Wildwood
English
69
2.2K
12.6K
317.1K
Srikar retweetledi
λux
λux@novasarc01·
new blog is out! i went way too deep into the policy-optimization rabbit hole and dumped my “policy optmzn techniques beyond ppo” notes into one post. i have covered grpo, dr.grpo, gspo, dapo, cispo, gmpo, rspo, and sapo. blog link in next tweet.
λux tweet mediaλux tweet mediaλux tweet mediaλux tweet media
English
19
123
1K
73.7K
Srikar retweetledi
Surya Ganguli
Surya Ganguli@SuryaGanguli·
Beautiful picture! The Hermitian version of it is well understood: if you gradually add in random Gaussian Hermitian matrices to any fixed matrix, this induces a Markovian stochastic process on the real eigenvalues, known as Dyson's Brownian motion (eigenvalues repulse each other while being driven by noise). It helped us understand analog optimization for example in our paper on "Geometric landscape annealing..." journals.aps.org/prx/abstract/1… The non-Hermitian case is much more complicated because eigenvectors are non-orthogonal. You only get a Markovian process on the joint eigenvalues and eigenvector overlaps; marginalizing over these overlaps gives you a non-Markovian process on the eigenvalues shown below. Here is one possible reference on this: arxiv.org/abs/1403.7738
Simone Conradi@S_Conradi

Take two large random matrices and linearly interpolate between them at several hundred steps. Compute the eigenvalues for each interpolated matrix, then plot them in the complex plane. The result is shown here. Made with #python #numpy #matplotlib

English
15
59
587
57.9K
Srikar retweetledi
Michael Vinyard
Michael Vinyard@vinyard_m·
How does a stem cell "decide" its fate? Development requires both reliability (consistent cell types) AND flexibility (diverse outcomes from identical progenitors). Cells achieve this by dynamically tuning deterministic drift and stochastic diffusion. New in @NatMachIntell: scDiffEq models state-dependent drift AND diffusion, improving fate prediction by ~8% over SOTA. scDiffEq also enables genome-wide in silico perturbation screens and reveals temporal gene dynamics. 🧵nature.com/articles/s4225…
English
6
65
362
80.6K
Srikar retweetledi
Jeff Dean
Jeff Dean@JeffDean·
Performance Hints Over the years, my colleague Sanjay Ghemawat and I have done a fair bit of diving into performance tuning of various pieces of code. We wrote an internal Performance Hints document a couple of years ago as a way of identifying some general principles and we've recently published a version of it externally. We'd love any feedback you might have! Read the full doc at: abseil.io/fast/hints.html
Jeff Dean tweet media
English
106
1.1K
7.7K
2.1M
Srikar retweetledi
Markus J. Buehler
Markus J. Buehler@ProfBuehlerMIT·
How does an embryo reliably "compute" its form - "cell by cell" - using only local interactions and mechanics, yet produce a precise global body plan? I’m excited to share our Nature Methods paper "MultiCell: geometric learning in multicellular development", presenting #AIxBiology research led by @HaiqianYang and the result of a great collaboration with Ming Guo, George Roy, Tomer Stern, Anh Nguyen and Dapeng Bi. A long-standing challenge in developmental biology is to predict how thousands of cells collectively self-organize as tissues fold, divide, and rearrange. In MultiCell, we represent a developing embryo as a dual graph that unifies two complementary views of tissue mechanics with single-cell resolution: cells as moving points (granular) and cells as a connected foam (junction network). This lets the model learn dynamics from both geometry and cell–cell connectivity. On whole-embryo 4D light-sheet movies of Drosophila gastrulation (~5,000 cells), our model predicts key cell behaviors and the timing of events, including junction loss, rearrangements, and divisions with high accuracy, at single-cell resolution. Beyond prediction, the same representation supports robust time alignment across embryos and offers interpretable activation maps that highlight the morphogenetic "drivers" of development. The broader goal is a foundation for cell-by-cell forecasting in more complex tissues, and eventually for detecting subtle dynamical signatures of disease. Kudos to the team for this inspiring collaboration with brilliant researchers to push the boundary of AI for biology! Citation: Yang, H., Roy, G., Nguyen, A.Q., Buehler, M.J., et al. MultiCell: geometric learning in multicellular development. Nature Methods (2025), DOI: 10.1038/s41592-025-02983-x Code/data links are in the manuscript.
English
118
677
4.3K
386.3K
Srikar retweetledi
Jorge Bravo Abad
Jorge Bravo Abad@bravo_abad·
Biology-constrained RNNs that still learn like deep nets In the brain, each neuron has a clear “sign”: it’s either excitatory or inhibitory, and it keeps that identity for all its outgoing connections. This is Dale’s law. By contrast, most recurrent neural networks used in neuroscience ignore this completely—units mix positive and negative outputs freely, and everything is densely connected to everything else. That’s fine if you only care about task performance, but it becomes a real problem when you want to trust the model as a circuit-level explanation of brain data. Aishwarya Balwani and coauthors tackle this by building RNNs that are both biologically constrained and competitively trainable. They introduce Dale’s backpropagation, a variant of backprop that enforces Dale’s law at every update: excitatory neurons keep all-positive outgoing weights, inhibitory neurons all-negative, via a simple projection step in parameter space. On top of that, they apply a topologically informed pruning rule (“top-prob pruning”) that sparsifies the network while preferentially preserving high-magnitude, structurally important weights—those that form the backbone of the network’s effective connectivity. The result is a family of RNNs that are sign-constrained, highly sparse, and yet match the performance of unconstrained models on synthetic and ML-style benchmarks. The payoff comes when they apply this framework to multi-region two-photon data from mouse visual cortex (V1 and LM) during a visual change-detection task. Their “CelltypeRNN” explicitly separates pyramidal, Sst, and Vip populations across layers and areas, training under Dale’s law and experimentally informed connectivity motifs to predict population activity one step ahead. The inferred interactions are not just a good fit to the data—they line up with predictive coding–like dynamics, with enhanced feedforward drive during prediction errors, layer- and timescale-dependent feedback, and a prominent role for Vip interneurons in omission and surprise conditions. The broader message is clear: by baking in basic anatomical rules and realistic sparsity during training, we can move from generic RNN fits toward models that behave like deep nets and read like testable hypotheses about cortical circuitry. Paper: science.org/doi/10.1126/sc…
Jorge Bravo Abad tweet media
English
14
45
264
17.6K
Srikar retweetledi
National Weather Service
Today, we're rolling out a new suite of AI-driven global weather models, including a "first of its kind" hybrid ensemble system! Want to get geeky with us? Come check it out: noaa.gov/news-release/n…
National Weather Service tweet media
English
48
203
1.2K
137K
Srikar retweetledi
H.SHIMAZAKI
H.SHIMAZAKI@h_shimazaki·
🚀 New paper alert — Nature Communications: The first study to show time-varying entropy flow from neuronal spiking activity and link it to the behavioral performance of individual mice. nature.com/articles/s4146… 🧠 A step toward the thermodynamics of neural computation
English
3
28
165
14.1K
Srikar retweetledi
Philipp Schmid
Philipp Schmid@_philschmid·
What comes after Transformers? Neural Memory and Test-Time Training! @GoogleResearch presented 2 new papers during NeurIPS with an architecture that actively learns and update their own parameters during inference, acting as a "long-term neural memory" rather than a static context window. Implementation 1️⃣ Titans replaces the fixed-state memory of linear RNNs with a deep Multi-Layer Perceptron (MLP) memory module. 2️⃣ The model updates this memory at test-time by calculating a "surprise metric" based on the gradient of the input data. 3️⃣ MIRAS framework generalizes this by treating memory as an optimization problem with customizable loss functions and regularization. 4️⃣ Training is parallelized by chunking sequences, using linear operations within chunks and non-linear updates across chunks. 5️⃣ Models incorporate "Persistent Memory" (fixed learnable weights) alongside the dynamic "Contextual Memory" to store task-specific knowledge. Insights 💡 Attention mechanisms excellent at short-term memory but fails at efficient long-term storage due to quadratic costs. 📈 Deep memory structures (MLPs) significantly outperform vector/matrix-based compression used in Mamba and other linear RNNs. 🛠️ Memory updates are effective when driven by "surprise", high gradients indicate unexpected, memorable data. 📚 Forgetting mechanisms in recurrent models are mathematically equivalent to retention regularization (weight decay). 📉 Standard L2 (Mean Squared Error) objectives make memory sensitive to outliers; L1 or Huber loss provides better stability. 🧠 Titans outperforms GPT-4 on "Needle in a Haystack" tasks with 2M+ token contexts despite having fewer parameters. ⚡ Deep memory modules exhibit a trade-off where increased depth improves perplexity but slightly reduces training throughput. Titans and MIRAS show potential to replace, or at least augment, pure Transformer architectures. The hybrid approach (using attention for the immediate context and Neural Memory for the deep history) suggests the future might be a convergence of RNN efficiency and Transformer performance.
Philipp Schmid tweet media
English
40
301
1.7K
90.3K