OptimaLab

387 posts

OptimaLab

@optimalab1

Optimization for ML at Rice University (CS) led by Associate Prof. Anastasios Kyrillidis - Efficient training methods, non-convex optimization, and more.

Houston, Texas Katılım Aralık 2020

234 Takip Edilen1.4K Takipçiler

OptimaLab@optimalab1·16 Mar

📄 arxiv.org/abs/2602.20376 🌐 akyrillidis.github.io/explore-quantu… With Ria Stevens, Fangshuo Liao, Barbara Su, Jianqiang Li #MaxCut #CombinatorialOptimization #Algorithms

Indonesia

OptimaLab@optimalab1·16 Mar

Personal take: in a world where every problem gets a gradient descent hammer, there's something satisfying about pure algorithmic design. No loss function. No backprop. Just geometry and linear algebra. Creativity in CS didn't end with transformers.

English

OptimaLab@optimalab1·16 Mar

📊 New paper — and no, it's not about LLMs. "Exploiting Low-Rank Structure in Max-K-Cut Problems" 74× faster than heuristics on structured graphs, with provable optimality. No training loop in sight. arxiv.org/abs/2602.20376 🧵

English

837

OptimaLab@optimalab1·4 Şub

I’m trying five‑minute, ungraded in‑class checks across the term. Students do ~6–7 of 11; we keep the snapshots. When homework is perfect but exams not, we get a steadier signal; and it eases “blackout” anxiety. One more worry: NBC reported some professors now read assignments aloud because certain students struggle to read a single sentence; they cited figures like up to 50% at Northwestern Kellogg describing themselves as novice or reluctant readers. If scrolling culture erodes stamina and we add AI on top, the gap widens.Thoughts?

English

445

OptimaLab@optimalab1·4 Şub

LLM optimizers are starting to feel like close cousins: many variants, small gaps :). I’d love to see ideas with little overlap to Adam; even if they begin behind on benchmarks. The “bitter lesson” says simple, scalable wins, but we still need room for first‑principles algorithms. Reviewers/practitioners: what evidence would make you back a new, lower‑performing optimizer: strong theory, scaling curves, robustness, or hardware efficiency?

English

633

OptimaLab@optimalab1·4 Şub

Tight deadline, 13‑page report, one hour. I used AI to draft, then edited and owned the final. It sparked a feeling I’ll call “AI shaming”; the sense that typing every word yourself is more “real.” We had this debate with Google vs. the library. Tools change; responsibility doesn’t. Use AI, then read closely, fix errors, and make the judgment calls on structure, claims, and story. Save some writing for your own voice; and keep the big ideas human.

English

302

OptimaLab@optimalab1·2 Oca

Happy New Year! Health above all to you and yours. ✨I spent Winter Break building the first course for the new AI Major at Rice CS: COMP 282, Computational Optimization. The course bridges discrete math to the continuous optimization powering modern AI. The build: ✅ 100+ pages of rigorous LaTeX (Double column... looks shorter 😉) ✅ Custom slides for 6 modules ✅ "From-scratch" Python notebooks ✅ Problem sets mixing theory & code Soon to be released! Enough bragging: what is this post about? "Red Teaming" LLM Study Modes. 🧵 I took a linear algebra HW problem and fed it to a top LLM’s new "Study Mode." Prompt: "Help me study this." The result was… "mehhh". (See image attached) The LLM didn't give the answer instantly. But it said: "Excellent! Let's state it cleanly... First eq gives alpha=2... plugging into third gives contradiction..." It did 90% of the reasoning. It found the path. It connected the algebra to the geometry. This strips the student of the most critical part of learning: Getting stuck. Current "Study Modes" are just "Slow Reveal Modes." Learning math is about building the map in your head. My take on what a TRUE AI Tutor needs: 1️⃣ Student-Led: AI waits for the user to propose a path. 2️⃣ Socratic Friction: Don't fix errors. Ask "Does Eq 3 still hold?" 3️⃣ No Reasoning Trace: The logic chain IS the assignment. Don't output it. We aren't banning AI (we use it!). But we are adding "Analog Audits": Unpredicted 5-minute pen & paper checks in class. We need to verify the reasoning trace lives in the student's brain, not their chat history. #AI #Education #ComputerScience #EdTech #RiceUniversity

English

443

OptimaLab@optimalab1·21 Ara

Big news from @RiceCompSci: Jianqiang Li and our team are finalists in the global @xprize Quantum Applications competition (@GoogleQuantumAI + @GESDAglobal); 1 of just 7 worldwide. Recognized for a new quantum linear systems algorithm with implications for hard graph problems like MIS. Huge milestone for Jianqiang (on the job market!) and for @RiceUniversity. Backed by QuanTAS + Ken Kennedy Institute, we’re building practical quantum algorithms: better-conditioned QLSP (AAAI oral), Quantum EigenGame for excited states/PCA (CPAL), smarter VQA scheduling (SIGMETRICS), and new MaxCUT/Max‑3‑Cut work coming soon. More on the XPRIZE news + algorithm: news.rice.edu/news/2025/rice… Thanks to Christopher Jermaine and Rice CS for supporting Jianqiang’s position. Let’s connect if you’re exploring quantum algorithms, QAOA/VQEs, and real pathways to quantum advantage.

English

320

OptimaLab retweetledi

Rice Computer Science@RiceCompSci·17 Ara

🎉 Congrats to Rice CS researchers on making it to the @xprize Quantum Applications finals! Their novel algorithm has the potential to speed up #quantumcomputing. Team lead Jianqiang Li is advised by @naihuichia, Anastasios Kyrillidis & Tirthak Patel. bit.ly/4j5CPld

English

1.3K

OptimaLab@optimalab1·15 Ara

Big news and a big shout‑out to Fangshuo (Jasper) Liao. Over ~5 years together (UG → MS → almost‑PhD), Jasper led a result on joint training for Mixture‑of‑Experts. Most theory separates router/experts or uses toy top‑1 routing. We handle soft/top‑K‑style joint training in a student–teacher setup. Key idea: guided gating. Experts specialize first; the router then “snaps” into place. One‑to‑one matching emerges, extras stay orthogonal, so pruning is safe. After pruning, a short fine‑tune converges linearly—rates depend on the router nonlinearity, not model size. Why it matters: connects practice and theory, explains why letting experts specialize accelerates routing, and gives a recipe—mild over‑param → prune → fine‑tune. Jasper is on the postdoc market. If you want someone who turns hairy Hermite math into usable insights, talk to him. Next: beyond Gaussian data, expert scaling laws, soft→top‑k. Preprint: arxiv.org/pdf/2510.07205

English

1.2K

OptimaLab@optimalab1·19 Kas

1/ Thrilled to share the CrysFormer journey—3 years, multiple papers, code/data incoming. Thanks to co-authors (Tom Pan, Chen Dun, Shikai Jin, Evan Dramko, Ria Stevens, Mitchell D. Miller, George N. Phillips Jr.) and Welch Foundation + partners. 2/ Start: IUCrJ (2023) showed a 3D CNN can learn electron densities directly from Patterson maps for peptides—first proof we can bypass phases in simple cases. 3/ Next: CrysFormer (arXiv:2310.03899) uses a 3D Transformer with partial-structure attention to fuse Patterson maps + residue priors. Better PC, lower phase error, less compute vs U-Nets. 4/ RecCrysFormer (CPAL’25): “Recycling” training—predict → refine (SHELXE) → reuse maps as templates. On 15-res examples with variable cells/angles, mean phase error ~65° → ~35° in one run; refinement success up. 5/ New: Model completion with AlphaFold templates (Acta Cryst D, accepted; arXiv:2511.10440). Feed Patterson + incomplete AF-derived densities → predict complete maps: 6/ Why it matters: This bridges experimental crystallography and AI. Use real diffraction (Patterson) to fill low-confidence AF regions and improve phases—faster, more robust paths to validated structures. 7/ Links: CrysFormer: arxiv.org/abs/2310.03899 RecCrysFormer: openreview.net/forum?id=U9DhM… Model completion (AF): arxiv.org/abs/2511.10440 Project site (code/data updates soon): akyrillidis.github.io/crysformer/ 8/ Onward: scaling to 50–150 aa domains, multiple space groups, bulk solvent/noise, and tighter loops with refinement. Thanks for the support!

English

288

OptimaLab@optimalab1·31 Eki

“LLMs just memorize.” We say it like it’s a flaw. But most of human problem-solving is memory + experience + small epiphanies. The real question isn’t “Do they memorize?”—it’s “Can they retrieve, compose, and adapt what they know to new constraints?” Thanks @RiceUniversity @RiceCS for the great talk by Zhaozhuo Xu.

English

290

OptimaLab@optimalab1·23 Eki

Every week: a new acronym, “paradigm,” clever rebrand. Great for attention and funding—but it can feel like constant reinvention when many results are old ideas + scale. Plenty of papers today = what would’ve been a sharp blog post: smarter data transform, better schedule, different way to run the same algo. Valuable ≠ fundamental. Industry’s role is a net positive (data, compute, iteration). But let’s label contributions clearly: Applied gain: same model/optimizer run differently; restructured data; prompt engineering; more GPUs → wins. Fundamental shift: new function class, objective, optimization regime, or compute graph → new capability. Example: Chain-of-Thought. Powerful in practice by changing X→Y into X→(reasoning→answer). But it’s data structuring/prompting—not a new architecture or optimizer. And that’s okay. These “usage” wins paved the way for HRM/TRM, latent reasoning, deep supervision. The point isn’t to dismiss—it's to label so newcomers don’t chase mirages and teams double down where it moves the needle. My ask: 1. If it’s a new way to feed/schedule the same model, call it that (and celebrate the lift). 2. If it’s a new model class, training signal, or credit assignment, say so (and show ablations). Applied work often pays the bills and pushes SOTA. Clear tags help us balance foundational bets with pragmatic wins. Just my 2¢.

English

268

OptimaLab@optimalab1·7 Eki

Using LLMs ≠ building them ≠ studying smaller ones (that’s me). Question: Do LLMs “reread your entire chat” every turn? My mental model: - Finite context window = working memory. - New message + most relevant recent history get packed in. - If too long, truncation/summarization should drop/compress older parts. - Model runs self-attention over what’s inside and generates tokens. Open q’s for builders: - Is this accurate in practice? Why do some UIs just hit “max tokens” instead of summarizing? - How does “context length” work in multimodal (doctor’s report + X‑rays + labs)? - Beyond sliding windows: favorite stable memory patterns? RAG, summarizers, vector stores, tool-augmented notes, task graphs? Curious to hear from folks who’ve implemented this end-to-end.

English

182

Keşfet

@RiceCompSci @xprize @GoogleQuantumAI @GESDAglobal @RiceUniversity @naihuichia @RiceCS @elonmusk