Gilbert Nduwayezu, PhD

557 posts

Gilbert Nduwayezu, PhD banner
Gilbert Nduwayezu, PhD

Gilbert Nduwayezu, PhD

@Gilbert_Yezu

Geospatial Scientist | GeoAI & Spatial Analytics GeoAI-Driven Exposure Risk Assessment for Natural, Environmental & Health Inequities

Kigali, Rwanda Katılım Ağustos 2017
832 Takip Edilen394 Takipçiler
Gilbert Nduwayezu, PhD retweetledi
Yohan
Yohan@yohaniddawela·
Physics-based weather models still beat AI when it matters most. Not on average. On the most extreme days. This is the opposite of what we've been hearing... A new paper in Science Advances ran every major AI weather model: GraphCast, Pangu-Weather, Fuxi, against ECMWF's HRES across 162,751 record-breaking heat events, 32,991 cold records, and 53,345 wind records in 2020. On average conditions, the AI models win. GraphCast, Fuxi, and the rest outperform HRES on standard temperature and wind benchmarks across most lead times. This matches what every prior benchmark study has shown. AI weather forecasting is genuinely impressive. Then the researchers asked a different question. What happens when the event is unprecedented? Not extreme. Not the 95th percentile. Actually beyond anything in the training data. HRES won every single category. Heat records. Cold records. Wind records. Nearly every lead time. The performance gap was largest at short lead times, where AI models should have the most information and the least uncertainty. The bias pattern is pretty massive. The AI models systematically underestimated how extreme the events were. The bigger the record exceedance, the larger the underprediction. The researchers describe it as an implicit 'soft cap': the models behave as if they can't forecast values much beyond the most extreme thing in their training data. The bias grows almost linearly with how far the event exceeded the record. HRES showed no such pattern. This isn't a fluke. The same result held in 2018 and 2020, which had opposite ENSO conditions. It held across the tropics, subtropics, mid-latitudes, and high latitudes. It held for all three variables. It held when the researchers ran an alternative evaluation specifically designed to avoid the forecaster's dilemma. The mechanism is pretty straightforward. AI weather models are trained on ERA5 reanalysis data from 1979 to 2017. They learn to interpolate between historical weather patterns. When a new initial condition arrives, they find the nearest analogues in training and produce something in between. Record-breaking events, by definition, have no close analogues. The model has never seen anything quite like this, so it regresses toward the most extreme things it has. Physics-based models like HRES don't work this way. They solve partial differential equations describing atmospheric dynamics. They don't need a historical analogue for a 48°C heatwave in Siberia. The physics doesn't care whether it's happened before. The authors are careful about what this means. AI models remain faster, cheaper, and competitive on average conditions. Probabilistic AI forecasting is developing rapidly. Data augmentation with simulated extreme events and hybrid physics-AI architectures are plausible paths forward. This isn't a verdict on AI weather forecasting broadly. But the policy implication is quite important. The events where AI models fail hardest are exactly the events where accurate forecasting matters most. Record-shattering heat. Unprecedented wind storms. The scenarios that overwhelm emergency response, strain infrastructure, and kill people because no one expected them to be that bad. The authors wrote it plainly: it remains vital to fund and run physics-based NWP and AI weather models in parallel. I find it an unusually direct recommendation in a methods paper. Climate change means record-breaking events are becoming more frequent, not less. The training distribution is shifting. AI models trained on 1979 to 2017 data will see more and more out-of-distribution events as the climate diverges from that baseline. The extrapolation problem the researchers identified isn't going away. It's getting harder. The models that can't forecast records are being asked to forecast a world that's setting them constantly. Link to full paper: science.org/doi/10.1126/sc…
Yohan tweet media
English
11
101
347
28.4K
Gilbert Nduwayezu, PhD retweetledi
Joachim Schork
Joachim Schork@JoachimSchork·
Check out this R Shiny app for dimension reduction with UMAP and t-SNE. It allows you to transform and visualize complex data directly in your browser. You can: ✔️ Create interactive UMAP and t-SNE plots ✔️ Adjust parameters for each algorithm ✔️ Apply K-means or GMM clustering ✔️ Uncover patterns in high-dimensional data without coding You can upload your own CSV file or use a built-in R data set, and the tool will provide instant visual feedback along with clustering results. The image shows the app’s interface, including data upload, preprocessing, parameter tuning, and visualization panels. The example outputs feature UMAP and t-SNE plots with K-means and GMM cluster assignments. Thanks to Dean Smith for creating this app! Try it here: atamaianalytics.shinyapps.io/DimRedWithUMAP… Subscribe to my newsletter for more practical tips on statistics, data science, R, and Python. More info: statisticsglobe.com/newsletter #RStats #rstudioglobal #database #DataViz #programming #Statistics #VisualAnalytics #DataAnalytics
Joachim Schork tweet media
English
0
4
42
1.8K
Gilbert Nduwayezu, PhD retweetledi
Nature Methods
Nature Methods@naturemethods·
Multi-Embed is an interpretable framework that enables integrated analyses of histological images and multilayer molecular profiles. nature.com/articles/s4159…
Nature Methods tweet media
English
1
33
88
9K
Gilbert Nduwayezu, PhD retweetledi
Tom Yeh
Tom Yeh@ProfTomYeh·
ReLU vs Leaky ReLU 👉 byhand.ai/qRHNCA = ReLU = ReLU is the default activation in modern deep learning — cheap to compute, and stable enough to train networks hundreds of layers deep. To see what it does, picture five boba tea shops on the same block — 𝚊, 𝚋, 𝚌, 𝚍, 𝚎 — each running their own books. Each value is a shop's monthly profit — receipts minus rent, ingredients, and wages. When profit is positive, the shop stays open and the owner pockets every dollar. When profit turns negative, the shop runs out of cash and shutters — the lights go off, the books are wiped to zero. ReLU is exactly that rule, applied one shop at a time. Read the diagram left to right. The first column is the raw value x — each shop's profit at month's end. The second column is the gate: 1 if the shop is open (x > 0), 0 if it has shuttered. The last column is the ReLU output: open shops pass their profit through untouched, while shuttered ones are zeroed out. Five rows means five parallel shops on the same block, each evaluated independently. That's why ReLU is called an element-wise activation: every neuron decides its own fate. = LeakyRelu = Plain ReLU wipes negative values to zero — clean, but a shop that shutters can never recover, since both its output and its gradient stay pinned at zero. This is the dying ReLU problem, and in deep networks it can quietly kill a meaningful fraction of the units. Leaky ReLU is the one-line fix: instead of shuttering, the shop files for Chapter 11 protection and keeps the lights on at reduced capacity. Its debt is restructured down to a fraction α (typically 0.1) — the rest is forgiven, and the shop is wounded, not killed. A small negative signal still flows through, so the gradient survives, and the shop can crawl back to life if a TikTok goes viral. Read the diagram left to right. The first column is the raw value x — each shop's profit at month's end. The second column is the leakage α — the fraction of the loss held over after restructuring (default 0.1, editable). The third column is the gate: 1 for shops still in the black, α for those operating under bankruptcy protection. The last column is the Leaky ReLU output: y = x · gate. Profitable shops pass through untouched; struggling ones shrink by a factor of α but still carry a sign. Five rows means five parallel shops, each evaluated independently. Like ReLU, this is an element-wise activation: every neuron's fate is decided on its own merits. #aibyhahd
English
5
80
529
31.1K
Gilbert Nduwayezu, PhD retweetledi
Tom Yeh
Tom Yeh@ProfTomYeh·
Single vs Multi-hand Attention by hand ✍️ Resize matrices yourself 👉 byhand.ai/qNmYKw The most important fact about multi-head attention: it has the same parameter count as single-head attention. The difference is purely structural — same total Wqkv weights, partitioned into smaller q–k–v triples. Look at the two diagrams below. Both Wqkv matrices have the same height — same number of weight rows, same number of parameters. What changes is how that single tall block is sliced. • Left. One head. The full Wqkv produces one big QKV: a tall Q (36 rows), a tall K, a tall V. One scoring computation runs over those full-width tensors. • Right. 3 heads. The same-height Wqkv is sliced into 3 smaller q–k–v triples — each 12 rows tall. 3 scoring computations run in parallel, each a thinner version of the left. The compute trade-off — kind of. Same Wqkv weights. Multi-head runs the attention scoring S = Kᵀ × Q once per head, so the dot-product count multiplies by H. • Single-head: seq × seq = 40² = 1600 dot products • Multi-head: seq × seq × H = 40² × 3 = 4800 dot products (3×) But each multi-head dot product is narrower — its inner dimension is head_dim instead of H × head_dim. So when you count actual scalar multiplications, the totals are equal: • Single-head: seq² × (H × head_dim) = 40² × 36 = 57600 • Multi-head: seq² × H × head_dim = 40² × 3 × 12 = 57600 Same FLOPs. Multi-head buys you H independent attention patterns at no extra weight cost and no extra arithmetic cost — it's the same total compute, sliced into H finer-grained heads.
English
14
95
558
34.5K
Gilbert Nduwayezu, PhD retweetledi
tetsuo
tetsuo@tetsuoai·
how CNNs see images 16 boxes covering the core CNN stack. tensors, filters, feature maps, stride, padding, channels, pooling, receptive fields, mental model
tetsuo tweet mediatetsuo tweet mediatetsuo tweet mediatetsuo tweet media
English
12
147
841
33.6K
Gilbert Nduwayezu, PhD retweetledi
Turing Post
Turing Post@TheTuringPost·
13+ Attention mechanisms you should know ▪️ Self-attention ▪️ Cross-attention ▪️ Causal attention ▪️ Linear Attention ▪️ Softmax attention ▪️ Sliding Window (local attention) ▪️ Global attention ▪️ FlashAttention ▪️ Multi-Head Attention (MHA) ▪️ Multi-Query Attention (MQA) ▪️ Grouped-Query Attention (GQA) ▪️ Multi-Head Latent Attention (MLA) ▪️ Interleaved Head Attention (IHA) + Slim Attention, KArAt, XAttention, Mixture-of-Depths Attention (MoDA) Save the list and explore more about them here: turingpost.com/p/attention-ty…
Turing Post tweet media
English
13
294
2K
138.8K
Gilbert Nduwayezu, PhD retweetledi
Tom Dörr
Tom Dörr@tom_doerr·
Python and Numpy tutorials for Stanford and Cornell machine learning courses github.com/kuleshov/teach…
Tom Dörr tweet media
English
1
29
172
7.3K
Gilbert Nduwayezu, PhD retweetledi
Nainsi Dwivedi
Nainsi Dwivedi@NainsiDwiv50980·
Stop wasting hours trying to learn AI. 📘📚 I have already done it for you. With one list. Zero confusion. And no fluff 📹 Videos: 1. LLM Introduction: t.co/kyDon6qLrb 2. LLMs from Scratch: t.co/2hyMhuKoiI 3. Agentic AI Overview (Stanford): t.co/FXu6cAqITC 4. Building and Evaluating Agents: t.co/ZigR1tdOFL 5. Building Effective Agents: t.co/uYwfwO55mO 6. Building Agents with MCP: t.co/4arFTW1b3i 7. Building an Agent from Scratch: t.co/eOmveyM9Hz 8. Philo Agents: t.co/zLu7x1tx9m 🗂️ Repos 1. GenAI Agents: t.co/eXCl2YaRPv 2. Microsoft's AI Agents for Beginners: t.co/3CSW4zPAwf 3. Prompt Engineering Guide: t.co/GVzvxPYDVO 4. Hands-On Large Language Models: t.co/0rgDvhx3pI 5. AI Agents for Beginners: t.co/3CSW4zPAwf 6. GenAI Agentshttps://lnkd.in/dEt72MEy 7. Made with ML: t.co/9z5KHF9DMe 8. Hands-On AI Engineering:t.co/dldAj5Xkr6 9. Awesome Generative AI Guide: t.co/U2WZhT4ERV 10. Designing Machine Learning Systems: t.co/sYAZX34YdQ 11. Machine Learning for Beginners from Microsoft: t.co/NjFxHbC9jZ 12. LLM Course: t.co/N34YTPu1OK 🗺️ Guides 1. Google's Agent Whitepaper: t.co/bW3Ov3vMW0 2. Google's Agent Companion: t.co/wredwWAbBA 3. Building Effective Agents by Anthropic: t.co/fxtE4alVrJ. 4. Claude Code Best Agentic Coding practices: t.co/lLSwJ9pG7C 5. OpenAI's Practical Guide to Building Agents: t.co/xgkEIogGfh 📚Books: 1. Understanding Deep Learning: t.co/CjcKpTemmV 2. Building an LLM from Scratch: t.co/DaWBxOx8o3 3. The LLM Engineering Handbook: t.co/ZA1n0N41Mf 4. AI Agents: The Definitive Guide - Nicole Koenigstein: t.co/boLkl1VlKb 5. Building Applications with AI Agents - Michael Albada: t.co/H1Xf5EkJLL 6. AI Agents with MCP - Kyle Stratis: t.co/JI3ELQZE6a 7. AI Engineering: t.co/Xk0JzMIf7o 📜 Papers 1. ReAct: t.co/QNqE4UU55w 2. Generative Agents: t.co/CwEpoJgY1U. 3. Toolformer: t.co/5m9xZd5teZ 4. Chain-of-Thought Prompting: t.co/KjVlgdWi77. 🧑🏫 Courses: 1. HuggingFace's Agent Course: t.co/7FSUYKxIdG 2. MCP with Anthropic: t.co/IkZGiWm2yS 3. Building Vector Databases with Pinecone: t.co/2YRoMfLdXd 4. Vector Databases from Embeddings to Apps: t.co/23A50ixbHJ 5. Agent Memory: t.co/uc3L9BrNF7 Repost for your network ♻️
Nainsi Dwivedi tweet media
English
15
244
843
117.3K