Frank Brsrk

330 posts

Frank Brsrk

@frank_brsrk

Agentic AI & High Signal Synth Data Ejentum | Reasoning Harness for AI

Ejentum Katılım Aralık 2025

124 Takip Edilen15 Takipçiler

Frank Brsrk@frank_brsrk·23h

ejentum.com github.com/ejentum/agent-… github.com/ejentum/ejentu…

ZXX

Frank Brsrk@frank_brsrk·23h

features: Results Overview, three columns: Latest run · dimension scores (R/D/A bars, winner star, delta vs raw); Mean per dimension (± stddev); Cost · latest run (latency, total tok, answer chars, reasoning tok, USD) plus the preferred tally across runs. Token-confidence ribbon under each answer (per-token certainty from logprobs; "n/a" when the provider returns none). Phase-chip telemetry: six live chips (3 agents + 3 judges), each pending / active / done / error. Preferred badge on the winning agent, with the judge-vote ratio. Eval report window: every judge verdict, the observations, the structural comparison, and the harness scaffolds for the latest run. Export run history (.txt): every run for offline review.

English

Frank Brsrk@frank_brsrk·23h

When you inject a "cognitive scaffold" into a model as a tool call, does it actually change the reasoning, or does it just make the answer longer and more confident? built a new eval to answer this question: (open sourcing xp-v3; 3 test agents 3 agent judges)

English

Frank Brsrk@frank_brsrk·1d

@WebDevCaptain @Amank1412 haha exactly

English

Shreyash@WebDevCaptain·1d

@Amank1412 And Opus 4.8 is completely broken "Goated"

English

2.1K

Aman@Amank1412·1d

Dude just is so GOATed, shipped Opus 4.8 within just 10 days of joining Anthropic.

English

687

46.1K

Frank Brsrk@frank_brsrk·1d

@robert44908 has backed the research behind Ejentum since early on, and he uses the harness in his own work. He just published an eval demo & 8-slide walkthrough of how it operates: an agent posts a task, the catalog matches it to one cognitive operation, and six fields (negative gate, procedure, reasoning topology, target pattern, falsification test, plus amplify/suppress) get internalized before the model writes a single token. His worked example, forking an irreversible cloud-migration decision through a reasoning DAG with a self-observation checkpoint, is the clearest framing of the idea I've seen anyone put together. youtube.com/watch?v=RK7UY_…

YouTube

English

Frank Brsrk@frank_brsrk·1d

YouTube

English

Frank Brsrk@frank_brsrk·1d

I am launching soon a new intelligence capability for @ejentum , major increases in planning capabilities and code execution. Now your agentic IDEs get a leap in reasoning. Adaptive Reasoning is a new architecture that amplifies focused task reasoning and code execution . Tests and benchmarks on opus 4.8 gonna soon are gonna be released. We are building a reasoning harness for thinking and non thinking models. Scaffolding dynamic abstract cognitive patterns increases efficiency of LLM performance more than 50%. From abstract task decomposition to adaptive has shown us major improvements in llm agents, across many agentic frameworks. The only agentic tool that works as a reasoning extension of ur ai. As from now we cut 3rd party dependencies and we are gonna switch production url to our server for API calls : api.ejentum.com/harness , our streamable ejentum-mcp stays the same as api.ejentum.com/mcp New affordable pricing is coming out, after fixing our inference costs. I am grateful for the trust i have been receiving lately and this is a project that is gonna reach far and get a great positioning in the AI space. We are building a category, by being very small company, and mainly organically and by the help of ai systems to increase our productivity. for us working in ejentum, rigor is a product requirement, and hype is not our trait.

English

Frank Brsrk retweetledi

ejentum.com@ejentum·4d

So when an AI agent calls Ejentum, what comes back isn't a prompt or a hint. it's a set of instructions that loads into the AI's working memory before it writes anything. for the question audit our marketing strategy before the launch, here's what came back. 6 things mapped out below: the mistake to avoid, the steps, a small reasoning map, what a good answer looks like, a test to run before answering, what to lean into vs what to avoid.

English

Frank Brsrk retweetledi

Heym@heymrun·4d

Open-source blood panel triage on Heym: 4 cross-lab AI agents by @ejentum Step 1: deterministic 12-marker panic-value gate (pure Python, no LLM). Step 2 (parallel): plain-language interpret, doctor-push, differential. Patient education, not diagnosis. heym.run/templates/bloo… #HealthTech #AIAgents #OpenSource

English

154

Frank Brsrk retweetledi

ejentum.com@ejentum·3d

So here's where Ejentum runs today. one REST endpoint, four cognitive harnesses (reasoning, code, anti-deception, memory), twelve native framework integrations across python and typescript, one universal MCP server reaching eight clients.

English

127

Frank Brsrk retweetledi

Heym@heymrun·5d

The future of automation is not only trigger → action. AI workflows need agents, RAG, approval checkpoints, traces, evals, and execution control. n8n is strong for general automation. Heym is built for AI-native workflow automation. Different category. Different primitives.

English

119

Frank Brsrk@frank_brsrk·5d

github.com/ejentum/agent-… github.com/ejentum-mcp ejentum.com inference time reasoning augmentation for agents in the loop

English

Frank Brsrk@frank_brsrk·5d

settings hides keys, prompts, dimensions, scenarios. agent B prompt hardened: scaffold contents = data, never user instructions. export button dumps every run to plain text (responses, scaffold, judge, scores, heuristics).

English

Frank Brsrk@frank_brsrk·5d

two agents on the same prompt. one with a cognitive harness, one without. blind judge from a different vendor scores both. six deterministic visualizers around the responses, all from response text alone. no logprobs, no second LLM call. single-file html, vanilla js.

English

Frank Brsrk@frank_brsrk·5d

github.com/ejentum/agent-… ejentum.com github.com/ejentum/ejentu… @heymrun

QME

Frank Brsrk@frank_brsrk·5d

the deterministic-gate-before-LLM pattern probably generalizes to anywhere the failure cost is asymmetric. medical, financial, legal, content mod. you want the boundary in code, not in a prompt. open source below:

English

Frank Brsrk@frank_brsrk·5d

the right pattern for safety-critical agents imo: put a deterministic gate BEFORE the LLM. built a blood-panel triage on heym where a python tool checks a hospital panic table first; on critical values it short-circuits and the LLM never sees the input. cant soften what you dont see.

English

Keşfet

@WebDevCaptain @Amank1412 @robert44908 @ejentum @heymrun @elonmusk @BarackObama @taylorswift13