Flow Research

23 posts

Flow Research banner
Flow Research

Flow Research

@FlowResearch_

We are a non-profit focused on research and education to develop ecosystems of world class talent

Internet Katılım Nisan 2026
88 Takip Edilen396 Takipçiler
Sabitlenmiş Tweet
Flow Research
Flow Research@FlowResearch_·
We’re excited to officially launch Flow Fellowship Program! 🎉 A 12-week cohort-based contribution + mentorship program for builders, researchers, creatives, and operators who want to work on meaningful open-source systems. Read more and apply here: flowresearch.tech/blog/introduci…
Flow Research tweet media
English
4
36
247
37.1K
Flow Research
Flow Research@FlowResearch_·
Note that the Fellowship runs for 12 months, beginning with a 12-week trial period. During this trial phase, fellows will work closely with mentors in their chosen workstream, contribute approximately 15+ hours weekly, and ship a public artifact tied to their contribution.
English
0
1
4
208
Flow Research
Flow Research@FlowResearch_·
We’re excited to officially launch Flow Fellowship Program! 🎉 A 12-week cohort-based contribution + mentorship program for builders, researchers, creatives, and operators who want to work on meaningful open-source systems. Read more and apply here: flowresearch.tech/blog/introduci…
Flow Research tweet media
English
4
36
247
37.1K
Flow Research retweetledi
Flow Research retweetledi
call me gb
call me gb@sheys_mst·
Excited to share that applications are now open. We are building the value engine. And we’re opening up a few spots on the team for exceptional talent to join us through our fellowship program.
Flow Research@FlowResearch_

We’re excited to officially launch Flow Fellowship Program! 🎉 A 12-week cohort-based contribution + mentorship program for builders, researchers, creatives, and operators who want to work on meaningful open-source systems. Read more and apply here: flowresearch.tech/blog/introduci…

English
2
4
5
842
Flow Research
Flow Research@FlowResearch_·
If you care about open source, AI systems, public work, mentorship, and building meaningful things, you should probably pay attention to what we’re announcing later today! Follow our page and turn on post notifications so you don’t miss out. Excited to share more soon 😃
English
1
4
22
1.4K
Flow Research retweetledi
call me gb
call me gb@sheys_mst·
join the community call today, to learn more about this: discord.gg/jBDwsteF
English
0
1
1
228
Flow Research retweetledi
call me gb
call me gb@sheys_mst·
we are opening the lab
call me gb tweet media
English
2
2
2
219
Flow Research retweetledi
elvis
elvis@omarsar0·
// Agentic Harness Engineering // Pay attention to this one, AI devs. (bookmark it) Most coding-agent harnesses are still tuned by hand or brittle trial-and-error self-evolution. This new work introduces Agentic Harness Engineering, a framework that makes harness evolution observable. They do this through three layers: components as revertible files, experience as condensed evidence from millions of trajectory tokens, and decisions as falsifiable predictions checked against task outcomes. Each edit becomes a contract you can verify or revert. Results: pass@1 on Terminal-Bench 2 climbs from 69.7% to 77.0% in ten iterations, beating human-designed Codex-CLI (71.9%) and self-evolving baselines like ACE and TF-GRPO. The evolved harness also transfers across model families with +5.1 to +10.1 point gains, while using 12% fewer tokens than the seed on SWE-bench-verified. Harness work is the biggest hidden cost in most agent systems. This is the first credible recipe for letting the harness improve itself without drifting into noise. Paper: arxiv.org/abs/2604.25850 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
69
234
1.6K
139K
Flow Research retweetledi
Tech with Mak
Tech with Mak@techNmak·
Prompting isn’t just asking the AI a question. It’s a deliberate, engineered input design process, and a critical skill when working with Large Language Models (LLMs). Let's breakdown the prompting techniques. ✅ 1. Core Prompting Techniques ▪ Zero-shot - No examples provided. Just the task. ▪ One-shot - One example shown before the task. ▪ Few-shot - A handful of examples used to teach patterns. 🧠 2. Reasoning-Enhancing Techniques ▪ Chain-of-Thought (CoT) - Encourage step-by-step reasoning. ▪ Self-Consistency - Sample multiple CoTs; choose the best. ▪ Tree-of-Thought (ToT) - Explore multiple reasoning paths (advanced). ▪ ReAct - Combine reasoning steps with action/tool use (e.g., API calls). 🧾 3. Instruction and Role-Based Prompting ▪ Instruction prompting - Clear directives (“Summarize this…”). ▪ System / Role prompting - Define persona or behavior (“You are a legal assistant”). ▪ Hybrid (Instruction + Examples) - Combine clarity with few-shot grounding. ⚙️ 4. Prompt Composition Techniques ▪ Prompt chaining - Use one prompt’s output in the next. ▪ Dynamic prompting - Inject real-time variables or context. ▪ Meta prompting - Ask the model to improve or verify its own response. 🖼️ 5. Multimodal Prompting ▪ Image + text - Provide both visual and textual context. ▪ Audio/Video + text - Use transcripts or sensory input (model-dependent, e.g., GPT-4o, Gemini 1.5). 🧑‍⚕️ 6. Domain-Specific Prompting ▪ Code prompting - Constrained, tool-specific inputs (e.g., Python, SQL). ▪ Medical / Legal prompting - High-precision language with strict format and accuracy needs. 🧪 7. Prompt Evaluation & Debugging (Not prompting techniques, but crucial tools.) ▪ Prompt ablation - Remove elements to test contribution. ▪ Injection testing - Evaluate prompt robustness in apps or agents. ❌ What’s Not a Prompting Technique ▪ RAG: A retrieval + generation architecture. Prompts are used inside it. ▪ Agents / Tool-use systems - Orchestration frameworks (e.g., LangGraph, AutoGPT). Prompting is one component, not the technique itself. 🔧 Prompting is no longer “just prompt engineering.” It’s system design. If you're working with LLMs, know these cold. Follow @techNmak for your daily dose of learning.
Tech with Mak tweet media
English
8
75
281
9.9K
Flow Research retweetledi
The Whizz AI
The Whizz AI@TheWhizzAI·
🚨 The AI industry just wasted 3 years. Trillions spent. Billions burned. All on the wrong idea. Yann LeCun said it from day one. Nobody listened. Until now. The theory was simple: if you make the model big enough, it will eventually understand how the world works. Yann LeCun said that was stupid. He argued that generative AI is fundamentally inefficient. When an AI predicts the next word, or generates the next pixel, it wastes massive amounts of compute on surface-level details. It memorizes patterns instead of learning the actual physics of reality. He proposed a different path: JEPA (Joint-Embedding Predictive Architecture). Instead of forcing the AI to paint the world pixel by pixel, JEPA forces it to predict abstract concepts. It predicts what happens next in a compressed "thought space." But for years, JEPA had a fatal flaw. It suffered from "representation collapse." Because the AI was allowed to simplify reality, it would cheat. It would simplify everything so much that a dog, a car, and a human all looked identical. It learned nothing. To fix it, engineers had to use insanely complex hacks, frozen encoders, and massive compute overheads. Until today. Researchers just dropped a paper called "LeWorldModel" (LeWM). They completely solved the collapse problem. They replaced the complex engineering hacks with a single, elegant mathematical regularizer. It forces the AI's internal "thoughts" into a perfect Gaussian distribution. The AI can no longer cheat. It is forced to understand the physical structure of reality to make its predictions. The results completely rewrite the economics of AI. LeWM didn't need a massive, centralized supercomputer. It has just 15 million parameters. It trains on a single, standard GPU in a few hours. Yet it plans 48x faster than massive foundation world models. It intrinsically understands physics. It instantly detects impossible events. We spent billions trying to force massive server farms to memorize the internet. Now, a tiny model running locally on a single graphics card is actually learning how the real world works.
The Whizz AI tweet media
English
45
110
405
30.6K
Flow Research retweetledi
elvis
elvis@omarsar0·
NEW paper from Alibaba. A 30B MoE with only 3B active params matches Qwen3-235B on real tool-use workloads. AgenticQwen-30B-A3B: 50.2 average on TAU-2 + BFCL-V4 Multi-Turn. AgenticQwen-8B: 47.4. Both more than double their vanilla Qwen baselines and close most of the gap to a 235B model. How: two RL flywheels run in parallel. - The reasoning loop mines the model's own errors into harder problems each round. - The agentic loop grows simple linear tool-use trajectories into multi-branch behavior trees. - Simulated users actively try to mislead the agent. The training distribution gets harder on its own. Why it matters for agent devs: you can stop paying frontier prices for routine tool-use workloads. And the flywheel recipe is reusable. Generate your hard examples from your own agent's failures, not from static synthetic data. Paper: arxiv.org/abs/2604.21590 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
18
76
434
37K
Flow Research retweetledi
Suryansh Tiwari
Suryansh Tiwari@Suryanshti777·
Learn AI for free directly from top companies. 1 - Anthropic: anthropic.skilljar.com 2 - Google: grow.google/ai 3 - Meta: ai.meta.com/resources/ 4 - NVIDIA: developer.nvidia.com/cuda 5 - Microsoft: learn.microsoft.com/en-us/training/ 6 - OpenAI: academy.openai.com 7 - IBM: skillsbuild.org 8 - AWS: skillbuilder.aws 9 - DeepLearning.AI: deeplearning.ai 10 - Hugging Face: huggingface.co/learn 👇Comment "Learning" if you find this helpful. Repost so others can take help. Must bookmark for future reference.
Suryansh Tiwari tweet media
English
15
93
315
21.3K
Flow Research retweetledi
DAIR.AI
DAIR.AI@dair_ai·
Great paper on improving proactive agents. (bookmark it) Proactive agents act before you do. But how do you evaluate something that's supposed to anticipate needs you haven't expressed? This work introduces PARE, a framework that models applications as finite state machines with stateful navigation and state-dependent action spaces, enabling realistic active user simulation. Building on this, PARE-Bench provides 143 diverse tasks across communication, productivity, scheduling, and lifestyle apps, testing context observation, goal inference, intervention timing, and multi-app orchestration. Why does it matter? Current benchmarks model apps as flat tool-calling APIs, missing the stateful, sequential nature of real user interaction. PARE closes this gap, giving researchers a principled way to measure whether agents can infer goals and act at the right moment. Paper: arxiv.org/abs/2604.00842 Learn to build effective AI agents in our academy: academy.dair.ai
DAIR.AI tweet media
English
12
33
182
31.1K
Flow Research retweetledi
Julian Dumebi Duru
Julian Dumebi Duru@julian__duru·
The vision at @FlowResearch_ is a decentralized compute and knowledge economy that puts users and communities at the center of digital intelligence.
English
5
36
115
3.4K
Flow Research retweetledi
BURKOV
BURKOV@burkov·
A must read for anyone interested in building practical AI systems in 2026: Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems The paper explains the architecture of a modern production-grade AI agent system (Claude Code) by analyzing its source code. This is what they call a "harness" of an agentic coding system. Learn by reading with an AI tutor: chapterpal.com/s/9b6bb47a/div… PDF: arxiv.org/pdf/2604.14228
BURKOV tweet mediaBURKOV tweet mediaBURKOV tweet mediaBURKOV tweet media
English
52
241
1.4K
123.3K