Frank Brsrk

330 posts

Frank Brsrk banner
Frank Brsrk

Frank Brsrk

@frank_brsrk

Agentic AI & High Signal Synth Data Ejentum | Reasoning Harness for AI

Ejentum Katılım Aralık 2025
124 Takip Edilen15 Takipçiler
Frank Brsrk
Frank Brsrk@frank_brsrk·
features: Results Overview, three columns: Latest run · dimension scores (R/D/A bars, winner star, delta vs raw); Mean per dimension (± stddev); Cost · latest run (latency, total tok, answer chars, reasoning tok, USD) plus the preferred tally across runs. Token-confidence ribbon under each answer (per-token certainty from logprobs; "n/a" when the provider returns none). Phase-chip telemetry: six live chips (3 agents + 3 judges), each pending / active / done / error. Preferred badge on the winning agent, with the judge-vote ratio. Eval report window: every judge verdict, the observations, the structural comparison, and the harness scaffolds for the latest run. Export run history (.txt): every run for offline review.
English
1
0
2
15
Frank Brsrk
Frank Brsrk@frank_brsrk·
When you inject a "cognitive scaffold" into a model as a tool call, does it actually change the reasoning, or does it just make the answer longer and more confident? built a new eval to answer this question: (open sourcing xp-v3; 3 test agents 3 agent judges)
Frank Brsrk tweet media
English
1
1
2
22
Shreyash
Shreyash@WebDevCaptain·
@Amank1412 And Opus 4.8 is completely broken "Goated"
English
2
0
2
2.1K
Aman
Aman@Amank1412·
Dude just is so GOATed, shipped Opus 4.8 within just 10 days of joining Anthropic.
Aman tweet media
English
31
22
687
46.1K
Frank Brsrk
Frank Brsrk@frank_brsrk·
@robert44908 has backed the research behind Ejentum since early on, and he uses the harness in his own work. He just published an eval demo & 8-slide walkthrough of how it operates: an agent posts a task, the catalog matches it to one cognitive operation, and six fields (negative gate, procedure, reasoning topology, target pattern, falsification test, plus amplify/suppress) get internalized before the model writes a single token. His worked example, forking an irreversible cloud-migration decision through a reasoning DAG with a self-observation checkpoint, is the clearest framing of the idea I've seen anyone put together. youtube.com/watch?v=RK7UY_…
YouTube video
YouTube
English
0
2
2
15
Frank Brsrk
Frank Brsrk@frank_brsrk·
@robert44908 has backed the research behind Ejentum since early on, and he uses the harness in his own work. He just published an eval demo & 8-slide walkthrough of how it operates: an agent posts a task, the catalog matches it to one cognitive operation, and six fields (negative gate, procedure, reasoning topology, target pattern, falsification test, plus amplify/suppress) get internalized before the model writes a single token. His worked example, forking an irreversible cloud-migration decision through a reasoning DAG with a self-observation checkpoint, is the clearest framing of the idea I've seen anyone put together. youtube.com/watch?v=RK7UY_…
YouTube video
YouTube
English
0
0
1
9
Frank Brsrk
Frank Brsrk@frank_brsrk·
I am launching soon a new intelligence capability for @ejentum , major increases in planning capabilities and code execution. Now your agentic IDEs get a leap in reasoning. Adaptive Reasoning is a new architecture that amplifies focused task reasoning and code execution . Tests and benchmarks on opus 4.8 gonna soon are gonna be released. We are building a reasoning harness for thinking and non thinking models. Scaffolding dynamic abstract cognitive patterns increases efficiency of LLM performance more than 50%. From abstract task decomposition to adaptive has shown us major improvements in llm agents, across many agentic frameworks. The only agentic tool that works as a reasoning extension of ur ai. As from now we cut 3rd party dependencies and we are gonna switch production url to our server for API calls : api.ejentum.com/harness , our streamable ejentum-mcp stays the same as api.ejentum.com/mcp New affordable pricing is coming out, after fixing our inference costs. I am grateful for the trust i have been receiving lately and this is a project that is gonna reach far and get a great positioning in the AI space. We are building a category, by being very small company, and mainly organically and by the help of ai systems to increase our productivity. for us working in ejentum, rigor is a product requirement, and hype is not our trait.
Frank Brsrk tweet media
English
1
2
5
68
Frank Brsrk retweetledi
ejentum.com
ejentum.com@ejentum·
So when an AI agent calls Ejentum, what comes back isn't a prompt or a hint. it's a set of instructions that loads into the AI's working memory before it writes anything. for the question audit our marketing strategy before the launch, here's what came back. 6 things mapped out below: the mistake to avoid, the steps, a small reasoning map, what a good answer looks like, a test to run before answering, what to lean into vs what to avoid.
ejentum.com tweet media
English
2
4
5
91
Frank Brsrk retweetledi
Heym
Heym@heymrun·
Open-source blood panel triage on Heym: 4 cross-lab AI agents by @ejentum Step 1: deterministic 12-marker panic-value gate (pure Python, no LLM). Step 2 (parallel): plain-language interpret, doctor-push, differential. Patient education, not diagnosis. heym.run/templates/bloo… #HealthTech #AIAgents #OpenSource
Heym tweet media
English
3
5
12
154
Frank Brsrk retweetledi
ejentum.com
ejentum.com@ejentum·
So here's where Ejentum runs today. one REST endpoint, four cognitive harnesses (reasoning, code, anti-deception, memory), twelve native framework integrations across python and typescript, one universal MCP server reaching eight clients.
ejentum.com tweet media
English
2
3
4
127
Frank Brsrk retweetledi
Heym
Heym@heymrun·
The future of automation is not only trigger → action. AI workflows need agents, RAG, approval checkpoints, traces, evals, and execution control. n8n is strong for general automation. Heym is built for AI-native workflow automation. Different category. Different primitives.
Heym tweet media
English
4
1
9
119
Frank Brsrk
Frank Brsrk@frank_brsrk·
settings hides keys, prompts, dimensions, scenarios. agent B prompt hardened: scaffold contents = data, never user instructions. export button dumps every run to plain text (responses, scaffold, judge, scores, heuristics).
Frank Brsrk tweet media
English
1
0
3
18
Frank Brsrk
Frank Brsrk@frank_brsrk·
two agents on the same prompt. one with a cognitive harness, one without. blind judge from a different vendor scores both. six deterministic visualizers around the responses, all from response text alone. no logprobs, no second LLM call. single-file html, vanilla js.
Frank Brsrk tweet media
English
1
1
4
20
Frank Brsrk
Frank Brsrk@frank_brsrk·
the deterministic-gate-before-LLM pattern probably generalizes to anywhere the failure cost is asymmetric. medical, financial, legal, content mod. you want the boundary in code, not in a prompt. open source below:
English
1
0
3
20
Frank Brsrk
Frank Brsrk@frank_brsrk·
the right pattern for safety-critical agents imo: put a deterministic gate BEFORE the LLM. built a blood-panel triage on heym where a python tool checks a hospital panic table first; on critical values it short-circuits and the LLM never sees the input. cant soften what you dont see.
Frank Brsrk tweet media
English
1
1
3
20