Flow Research

23 posts

Flow Research

@FlowResearch_

We are a non-profit focused on research and education to develop ecosystems of world class talent

Internet Katılım Nisan 2026

88 Takip Edilen396 Takipçiler

Sabitlenmiş Tweet

Flow Research@FlowResearch_·6d

We’re excited to officially launch Flow Fellowship Program! 🎉 A 12-week cohort-based contribution + mentorship program for builders, researchers, creatives, and operators who want to work on meaningful open-source systems. Read more and apply here: flowresearch.tech/blog/introduci…

English

247

37.1K

Flow Research@FlowResearch_·4d

Note that the Fellowship runs for 12 months, beginning with a 12-week trial period. During this trial phase, fellows will work closely with mentors in their chosen workstream, contribute approximately 15+ hours weekly, and ship a public artifact tied to their contribution.

English

208

Flow Research@FlowResearch_·6d

English

247

37.1K

Flow Research retweetledi

Julian Dumebi Duru@julian__duru·6d

The Flow Research Fellowship is live. I am happy to announce the launch of our first cohort. If you would like to gain experience on impactful open source projects, come join us.

Flow Research@FlowResearch_

English

200

22.5K

Flow Research@FlowResearch_·4d

@AbdulMaajidz yeah, college students can apply!

English

abdul maajid@AbdulMaajidz·4d

@FlowResearch_ is it for college students

English

Flow Research retweetledi

call me gb@sheys_mst·6d

Excited to share that applications are now open. We are building the value engine. And we’re opening up a few spots on the team for exceptional talent to join us through our fellowship program.

Flow Research@FlowResearch_

English

842

Flow Research@FlowResearch_·6d

@0x_trophy 🚀🚀

QME

Trophy🏆@0x_trophy·6d

@FlowResearch_ Looking forward to it

English

Flow Research@FlowResearch_·6d

If you care about open source, AI systems, public work, mentorship, and building meaningful things, you should probably pay attention to what we’re announcing later today! Follow our page and turn on post notifications so you don’t miss out. Excited to share more soon 😃

English

1.4K

Flow Research retweetledi

Julian Dumebi Duru@julian__duru·15 May

I've been using Harnessy on my projects. Maybe someone else might find it useful. github.com/Flow-Research/…

English

1.3K

Flow Research retweetledi

call me gb@sheys_mst·16 May

join the community call today, to learn more about this: discord.gg/jBDwsteF

English

228

Flow Research retweetledi

call me gb@sheys_mst·16 May

we are opening the lab

English

219

Flow Research retweetledi

elvis@omarsar0·29 Nis

// Agentic Harness Engineering // Pay attention to this one, AI devs. (bookmark it) Most coding-agent harnesses are still tuned by hand or brittle trial-and-error self-evolution. This new work introduces Agentic Harness Engineering, a framework that makes harness evolution observable. They do this through three layers: components as revertible files, experience as condensed evidence from millions of trajectory tokens, and decisions as falsifiable predictions checked against task outcomes. Each edit becomes a contract you can verify or revert. Results: pass@1 on Terminal-Bench 2 climbs from 69.7% to 77.0% in ten iterations, beating human-designed Codex-CLI (71.9%) and self-evolving baselines like ACE and TF-GRPO. The evolved harness also transfers across model families with +5.1 to +10.1 point gains, while using 12% fewer tokens than the seed on SWE-bench-verified. Harness work is the biggest hidden cost in most agent systems. This is the first credible recipe for letting the harness improve itself without drifting into noise. Paper: arxiv.org/abs/2604.25850 Learn to build effective AI agents in our academy: academy.dair.ai

English

234

1.6K

139K

Flow Research retweetledi

Tech with Mak@techNmak·28 Nis

Prompting isn’t just asking the AI a question. It’s a deliberate, engineered input design process, and a critical skill when working with Large Language Models (LLMs). Let's breakdown the prompting techniques. ✅ 1. Core Prompting Techniques ▪ Zero-shot - No examples provided. Just the task. ▪ One-shot - One example shown before the task. ▪ Few-shot - A handful of examples used to teach patterns. 🧠 2. Reasoning-Enhancing Techniques ▪ Chain-of-Thought (CoT) - Encourage step-by-step reasoning. ▪ Self-Consistency - Sample multiple CoTs; choose the best. ▪ Tree-of-Thought (ToT) - Explore multiple reasoning paths (advanced). ▪ ReAct - Combine reasoning steps with action/tool use (e.g., API calls). 🧾 3. Instruction and Role-Based Prompting ▪ Instruction prompting - Clear directives (“Summarize this…”). ▪ System / Role prompting - Define persona or behavior (“You are a legal assistant”). ▪ Hybrid (Instruction + Examples) - Combine clarity with few-shot grounding. ⚙️ 4. Prompt Composition Techniques ▪ Prompt chaining - Use one prompt’s output in the next. ▪ Dynamic prompting - Inject real-time variables or context. ▪ Meta prompting - Ask the model to improve or verify its own response. 🖼️ 5. Multimodal Prompting ▪ Image + text - Provide both visual and textual context. ▪ Audio/Video + text - Use transcripts or sensory input (model-dependent, e.g., GPT-4o, Gemini 1.5). 🧑‍⚕️ 6. Domain-Specific Prompting ▪ Code prompting - Constrained, tool-specific inputs (e.g., Python, SQL). ▪ Medical / Legal prompting - High-precision language with strict format and accuracy needs. 🧪 7. Prompt Evaluation & Debugging (Not prompting techniques, but crucial tools.) ▪ Prompt ablation - Remove elements to test contribution. ▪ Injection testing - Evaluate prompt robustness in apps or agents. ❌ What’s Not a Prompting Technique ▪ RAG: A retrieval + generation architecture. Prompts are used inside it. ▪ Agents / Tool-use systems - Orchestration frameworks (e.g., LangGraph, AutoGPT). Prompting is one component, not the technique itself. 🔧 Prompting is no longer “just prompt engineering.” It’s system design. If you're working with LLMs, know these cold. Follow @techNmak for your daily dose of learning.

English

281

9.9K

Flow Research retweetledi

The Whizz AI@TheWhizzAI·28 Nis

🚨 The AI industry just wasted 3 years. Trillions spent. Billions burned. All on the wrong idea. Yann LeCun said it from day one. Nobody listened. Until now. The theory was simple: if you make the model big enough, it will eventually understand how the world works. Yann LeCun said that was stupid. He argued that generative AI is fundamentally inefficient. When an AI predicts the next word, or generates the next pixel, it wastes massive amounts of compute on surface-level details. It memorizes patterns instead of learning the actual physics of reality. He proposed a different path: JEPA (Joint-Embedding Predictive Architecture). Instead of forcing the AI to paint the world pixel by pixel, JEPA forces it to predict abstract concepts. It predicts what happens next in a compressed "thought space." But for years, JEPA had a fatal flaw. It suffered from "representation collapse." Because the AI was allowed to simplify reality, it would cheat. It would simplify everything so much that a dog, a car, and a human all looked identical. It learned nothing. To fix it, engineers had to use insanely complex hacks, frozen encoders, and massive compute overheads. Until today. Researchers just dropped a paper called "LeWorldModel" (LeWM). They completely solved the collapse problem. They replaced the complex engineering hacks with a single, elegant mathematical regularizer. It forces the AI's internal "thoughts" into a perfect Gaussian distribution. The AI can no longer cheat. It is forced to understand the physical structure of reality to make its predictions. The results completely rewrite the economics of AI. LeWM didn't need a massive, centralized supercomputer. It has just 15 million parameters. It trains on a single, standard GPU in a few hours. Yet it plans 48x faster than massive foundation world models. It intrinsically understands physics. It instantly detects impossible events. We spent billions trying to force massive server farms to memorize the internet. Now, a tiny model running locally on a single graphics card is actually learning how the real world works.

English

110

405

30.6K

Flow Research retweetledi

Julian Dumebi Duru@julian__duru·28 Nis

x.com/i/article/2048…

ZXX

162

6.2K

Flow Research retweetledi

elvis@omarsar0·26 Nis

NEW paper from Alibaba. A 30B MoE with only 3B active params matches Qwen3-235B on real tool-use workloads. AgenticQwen-30B-A3B: 50.2 average on TAU-2 + BFCL-V4 Multi-Turn. AgenticQwen-8B: 47.4. Both more than double their vanilla Qwen baselines and close most of the gap to a 235B model. How: two RL flywheels run in parallel. - The reasoning loop mines the model's own errors into harder problems each round. - The agentic loop grows simple linear tool-use trajectories into multi-branch behavior trees. - Simulated users actively try to mislead the agent. The training distribution gets harder on its own. Why it matters for agent devs: you can stop paying frontier prices for routine tool-use workloads. And the flywheel recipe is reusable. Generate your hard examples from your own agent's failures, not from static synthetic data. Paper: arxiv.org/abs/2604.21590 Learn to build effective AI agents in our academy: academy.dair.ai

English

434

37K

Flow Research retweetledi

Suryansh Tiwari@Suryanshti777·26 Nis

Learn AI for free directly from top companies. 1 - Anthropic: anthropic.skilljar.com 2 - Google: grow.google/ai 3 - Meta: ai.meta.com/resources/ 4 - NVIDIA: developer.nvidia.com/cuda 5 - Microsoft: learn.microsoft.com/en-us/training/ 6 - OpenAI: academy.openai.com 7 - IBM: skillsbuild.org 8 - AWS: skillbuilder.aws 9 - DeepLearning.AI: deeplearning.ai 10 - Hugging Face: huggingface.co/learn 👇Comment "Learning" if you find this helpful. Repost so others can take help. Must bookmark for future reference.

English

315

21.3K

Flow Research retweetledi

DAIR.AI@dair_ai·25 Nis

Great paper on improving proactive agents. (bookmark it) Proactive agents act before you do. But how do you evaluate something that's supposed to anticipate needs you haven't expressed? This work introduces PARE, a framework that models applications as finite state machines with stateful navigation and state-dependent action spaces, enabling realistic active user simulation. Building on this, PARE-Bench provides 143 diverse tasks across communication, productivity, scheduling, and lifestyle apps, testing context observation, goal inference, intervention timing, and multi-app orchestration. Why does it matter? Current benchmarks model apps as flat tool-calling APIs, missing the stateful, sequential nature of real user interaction. PARE closes this gap, giving researchers a principled way to measure whether agents can infer goals and act at the right moment. Paper: arxiv.org/abs/2604.00842 Learn to build effective AI agents in our academy: academy.dair.ai

English

182

31.1K

Flow Research retweetledi

Julian Dumebi Duru@julian__duru·27 Nis

The vision at @FlowResearch_ is a decentralized compute and knowledge economy that puts users and communities at the center of digital intelligence.

English

115

3.4K

Flow Research retweetledi

BURKOV@burkov·26 Nis

A must read for anyone interested in building practical AI systems in 2026: Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems The paper explains the architecture of a modern production-grade AI agent system (Claude Code) by analyzing its source code. This is what they call a "harness" of an agentic coding system. Learn by reading with an AI tutor: chapterpal.com/s/9b6bb47a/div… PDF: arxiv.org/pdf/2604.14228

English

241

1.4K

123.3K

Keşfet

@AbdulMaajidz @0x_trophy @techNmak @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates