Amaroc

950 posts

Amaroc

@ABDmarok25

Katılım Nisan 2025

147 Takip Edilen35 Takipçiler

Amaroc@ABDmarok25·15 Şub

@sebkrier

QME

Séb Krier@sebkrier·14 Şub

In AI discourse, there seems to be this eternal dance between people who put a lot of emphasis on process and people who focus on outcomes. It seems pretty visible in the 'is it reasoning?' debate. I think both are right in different ways: of course a model is reasoning in some sense, and functionally I don't really care if the mechanism that leads to the CoT and output is not analogous to how biological brains do it. But also, there are important considerations about the kinds of reasoning used, the rationale for certain chosen logical chains, and the degree to which these generalise robustly in out of distribution situations. The 'process people' are not always blind naysayers, and the 'functional equivalence' people aren't fundamentally incorrect either. The process failures (e.g. R in strawberries) used to be easier to catch though, and a lot of 'ideological skeptics' rely on them to make all sorts of unsupported claims, which makes it tempting for 'narrative activists' to dismiss process concerns entirely. Clearly the models are improving at an incredible rate, and this is great. But there remain failures or lacunae in the process by which an outcome is generated; this is less of an issue in coding, math, formal logic or areas where verification is easy, but more so in blurrier domains where we value diversity of processes precisely because we don't know the 'correct way', to the degree there even is a single one. With humans you had this cultural and scientific evolution that over time refines heuristics and mechanisms; I think it's important that we maintain a degree of model multiplicity and cognitive diversity with models too. If you optimize hard enough on outcomes alone, you could easily converge on reasoning monocultures that perform well in-distribution but fail in precisely the situations where diverse reasoning approaches would have generated useful signal. Hence why I'm so keen for the 'letting a thousand flowers bloom' approach to normative alignment, and generally insistent that a much wider set of people and groups should be able to customize and align models, beyond whoever happens to be in position to do so at the labs. Of course a lot of human cognitive diversity can also be noise like motivated reasoning, systematic biases, and cultural path dependencies that don't track truth, so you don't just want diversity for the sake of it. You need verification mechanisms that actually stress-test reasoning: e.g. adversarial collaborations are underused. You need institutions designed to promote truth-seeking, which are genuinely hard to build, as well as strong cultural and legal protections for the marketplace of ideas, which are increasingly under pressure. And you need better epistemic infrastructure broadly: there's an enormous amount we could do to improve how science is done.

English

13.6K

Amaroc@ABDmarok25·15 Şub

@sebkrier

QME

Amaroc@ABDmarok25·14 Şub

@fchollet ... I would be very interested in your perspective on this work and its implications for future architectures that move beyond scaling as the dominant paradigm.

English

François Chollet@fchollet·13 Şub

I don't think the rise of AGI will lead to a sudden exponential explosion in AI capabilities. There are bottlenecks on the sources of new capability improvements, and horizontally scaling intelligence in silicon (even by a massive factor) doesn't lift those bottlenecks.

English

410

34.4K

Amaroc@ABDmarok25·14 Şub

@fchollet ..why current architectures cannot be interpreted or extended by sheer scale alone. The paper argues that the structural limitations of such systems are inherent to their recursivity and contextual dynamics given your insightful critique of scaling and AGI i would be very ...

English

Amaroc@ABDmarok25·14 Şub

@fchollet ..“Epistemic limits of local interpretability in self-modulating cognitive architectures” doi.org/10.3389/frai.2… In this work I analyze how local interpretability methods fail in recursive, self-modulating cognitive systems and I provide formal, epistemological insights into...

English

Amaroc@ABDmarok25·14 Şub

@fchollet Your critique-that scaling alone no matter how massive does not remove the fundamental limitations on general intelligence-resonates strongly with a recent peer-reviewed article I published in Frontiers in Artificial Intelligence: ...

English

Amaroc@ABDmarok25·16 Oca

@omarsar0 ..Agentic RAG architecture that explicitly addresses these failure modes (grounded decoding, persistent belief state, VoI-based action policy). If you’re interested, I’d be happy to share the link and get your thoughts.if you are interest take this: linkedin.com/posts/abdelaal…

English

elvis@omarsar0·16 Oca

Is Agentic RAG worth it? RAG systems have evolved from simple retriever-generator pipelines to sophisticated workflows. It remains unclear when to use Enhanced RAG (fixed pipelines with dedicated modules) versus Agentic RAG (LLM orchestrates the entire process dynamically). This research provides the first empirical comparison. Enhanced RAG adds pre-defined components to address specific weaknesses: routers to determine if retrieval is needed, query rewriters to improve alignment, and rerankers to refine document selection. The workflow is fixed and manually engineered. Agentic RAG takes a different approach. The LLM decides which actions to perform, when to perform them, and whether to iterate. No extra components beyond the basic knowledge base, retriever, and generator. The model controls everything. The researchers evaluated both paradigms across four dimensions on QA and information retrieval tasks. User intent handling: Agentic slightly outperforms Enhanced on most tasks, but Enhanced wins decisively on FEVER (+28.8 F1 points), where the agent often retrieves unnecessarily. Query rewriting: Agentic RAG achieves 55.6 average NDCG@10 compared to 52.8 for Enhanced, showing the agent can adaptively rewrite queries when beneficial. Document refinement: Enhanced RAG with reranking (49.5 NDCG@10) outperforms Agentic (43.9). Dedicated reranker modules beat iterative retrieval attempts. Agentic RAG is far more sensitive to model capability. With weaker models, Enhanced RAG maintains stability while Agentic performance degrades significantly. Cost analysis reveals Agentic RAG requires 2-10x more computation time and tokens due to multi-step reasoning. The choice between Enhanced and Agentic RAG depends on your constraints. Enhanced RAG offers predictability, lower costs, and stability with weaker models. Agentic RAG provides flexibility but requires stronger models and more compute. Paper: arxiv.org/abs/2601.07711 Learn to build effective Agentic RAG systems in our academy: dair-ai.thinkific.com/pages/courses

English

108

609

32.7K

Amaroc@ABDmarok25·16 Oca

@omarsar0 ...over-retrieval and cost explosion seem less like architectural trade-offs and more like missing decision-theoretic constraints (belief state verification loops value-of-information–driven tool use).I recently published a technical blueprint on LinkedIn proposing a controlled..

English

222

Amaroc@ABDmarok25·16 Oca

@omarsar0 Very solid empirical comparison especially the cost stability and model-sensitivity analysis. One aspect I think is still missing though is the control layer:how to stabilize Agentic RAG rather than just compare it to Enhanced RAG in particular issues like hallucination ...

English

230

Amaroc@ABDmarok25·4 Oca

@sarahookr ... are not abuses; they are structurally possible. I can’t share the paper yet but I’d be happy to send the abstract if you’d like a quick overview.........

English

Amaroc@ABDmarok25·4 Oca

@sarahookr What this episode really exposes isn’t just bad incentives;it’s a technical failure of evaluation design.Current benchmarks don’t enforce model identity structural traceability or internal consistency.As long as evaluation is output-only swapping variants routing....

English

198

Sara Hooker@sarahookr·3 Oca

Honestly, this is pretty meaningful. When we wrote the leaderboard illusion paper which showed meta had submitted 36 private variants in the lead up to llama 4 to game lmarena, my biggest question was how do rigorous people who care about actual progress let this happen. It looks like from the latest profile in @FT by @Melissahei at least mark raised the alarm.

English

440

119.1K

Amaroc@ABDmarok25·4 Oca

@sarahookr Thank you 🙏... I recently submitted a paper that looks at exactly this issue but from the perspective of the evaluation protocols themselves.the core idea is that as long as evaluation does not constrain identity longitudinal coherence and traceability these kinds of outcomes...

English

Amaroc@ABDmarok25·4 Oca

@sarahookr ...guarantees that claims cannot be optimized independently of structure. Without that, leaderboard illusions aren’t anomalies; they’re the expected outcome.

English

Amaroc@ABDmarok25·4 Oca

@sarahookr ..models per task or tuning to the metric is not cheating it’s permitted by the protocol.If we want rigor evaluation has to constrain what a system is allowed to be not just measure what it outputs.That means fixed identities auditable internal invariants and guarantees that...

English

234

Amaroc@ABDmarok25·18 Ara

@iScienceLuvr Tanishq, your vision for Sophont resonates with my work on real-time symbolic transparency. My article: doi.org/10.3389/frobt.… shows how to anticipate and explain AI decisions. I am convinced this approach can strengthen your open medical AI ecosystem.

English

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·18 Ara

Our goal at Sophont is to be a frontier medical AI lab building the open medical AI ecosystem

English

6.9K

Amaroc@ABDmarok25·18 Ara

@omarsar0 Hey Elvis that little heart you dropped on my post basically lit up my entire night... To return the favor (and because I clearly can't handle the fame) here is my latest article, just for you... Your feedback would be the cherry on top. 😉 doi.org/10.3389/frai.2….

English

Amaroc@ABDmarok25·17 Ara

@prfsanjeevarora @dair_ai .... In this sense, PDR already does something profound. Our work aims to make it interpretable. If this resonates, you may find our perspective complementary: doi.org/10.3389/frai.2…

English

161

Sanjeev Arora@prfsanjeevarora·17 Ara

I'm glad this paper of ours is getting attention. It shows that there are more efficient and effective ways for models to use their thinking tokens than generating a long uninterrupted thinking trace. Our PDR (parallel/distill/refine) orchestration gives much better final accuracy, while avoiding context bloat. (So it might be much cheaper to serve than today's thinking models.) I'm guessing that so-called "deep research models" rely on such orchestrations.

DAIR.AI@dair_ai

NEW Research from Meta Superintelligence Labs and collaborators. The default approach to improving LLM reasoning today remains extending chain-of-thought sequences. Longer reasoning traces aren't always better. Longer traces conflate reasoning depth with sequence length and inherit long-context failure modes. This new research introduces Parallel-Distill-Refine (PDR), a framework that treats LLMs as improvement operators rather than single-pass reasoners. Instead of one long reasoning chain, PDR operates in phases: - Generate diverse drafts in parallel. - Distill them into a bounded textual workspace. - Refine conditioned on this workspace. - Repeat. Context length becomes controllable via degree of parallelism, no longer conflated with total tokens generated. The model accumulates wisdom across rounds through compact summaries rather than replaying full histories. On AIME 2024, PDR achieves 93.3% accuracy compared to 79.4% for standard long chain-of-thought at matched latency budgets. For o3-mini at 49k effective tokens, accuracy improves from 76.9% (Long CoT) to 86.7% (PDR), a 9.8 percentage point gain. PDR also achieves the same accuracy as sequential refinement with 2.57x smaller sequential budget by converting parallel compute into accuracy without lengthening per-call context. The researchers also trained an 8B model with operator-consistent RL to make training match the PDR inference interface. Mixing standard and operator RL yields an additional 5% improvement on both AIME benchmarks. Bounded memory iteration can substitute for long reasoning traces while holding latency fixed. Strategic parallelism and distillation is shown to beat brute-force sequence extension. Paper: arxiv.org/abs/2510.01123 Learn to build effective AI Agents in our academy: dair-ai.thinkific.com

English

283

73.7K

Amaroc@ABDmarok25·17 Ara

@prfsanjeevarora @dair_ai ..focus on tracking cognitive states detecting narrative bifurcations and structuring memory over trajectories;not just summaries...

English

252

Amaroc@ABDmarok25·17 Ara

@prfsanjeevarora @dair_ai ...spot on.What our article adds is not an alternative method but a conceptual lens: iterative refinement is not neutral each round can traverse distinct cognitive regimes with non-smooth narrative transitions that compact distillation alone cannot make visible we...

English

267

Amaroc@ABDmarok25·17 Ara

@prfsanjeevarora @dair_ai Your PDR work is elegant and deeply insightful Treating LLMs as improvement operators rather than single pass reasoners clarifies why parallelism+distillation can outperform long uninterrupted chains of thought—both in accuracy and efficiency the engineering intuition is

English

163

Keşfet

@sebkrier @fchollet @omarsar0 @sarahookr @FT @Melissahei @iScienceLuvr @elonmusk