ejentum.com

49 posts

ejentum.com

@ejentum

A reasoning harness for LLM agents. Cognitive operations at inference time. Available as an API, an MCP server.

FL, USA Katılım Nisan 2026

43 Takip Edilen9 Takipçiler

ejentum.com@ejentum·18h

@Cryptinflux ハーネス自体は1つで、12個ではありません。12のパッケージは、すでに使っているフレームワークから同じ1つのハーネスを呼び出せるようにするためのものです。新しいレイヤーを管理する必要はなく、既存のエージェントから1回呼び出すだけです。

GIF

日本語

Coding is in a FLUX | AIコーディング@Cryptinflux·19h

@ejentum 同じ場面ありましたツール連携の広がりで詰まったときに向き合い方を変えたら、ようやく抜けた感じです

日本語

ejentum.com@ejentum·19h

So here's where Ejentum runs today. one REST endpoint, four cognitive harnesses (reasoning, code, anti-deception, memory), twelve native framework integrations across python and typescript, one universal MCP server reaching eight clients.

English

ejentum.com@ejentum·19h

plus n8n native node, Zed editor extension, heym workflow integration. if your stack is on this list, the harness is one line to install. ejentum.com github.com/ejentum/ejentu…

English

ejentum.com@ejentum·19h

on the MCP side: one server, hosted at api.ejentum.com/mcp or stdio via npx ejentum-mcp. reaches Claude Code, Cursor, Cline, Windsurf, Continue, Codex CLI, OpenAI Agents SDK, DSPy. zero per-client engineering as new MCP clients ship.

English

ejentum.com@ejentum·1d

hello, of course here you are : github.com/ejentum/benchm… this is where the results are found, setup tooling for replication, scientific reports and observations with limitations and failures. everything sourced. each claim must trace back to an evidence. LLMs hallucinate, we don't. our goal is to boost agentic performance and suppress common failures by providing at inference time reasoning directives to steer the model with instructions and attention anchors. we call it a harness for reasoning. is known as reasoning augmented retrieval ( RAR )

English

Captain Bob Franks@robert44908·1d

Most AI failures don’t come from lack of intelligence. They come from unchecked assumptions, reasoning drift, and narrow framing. Interesting architecture here because it treats reasoning like something that should be audited before output, not after. The focus on falsification tests and coverage breadth stands out. Curious how this performs in long multi-step agent workflows. Do you have benchmark data or real-world case studies you can share?

English

ejentum.com@ejentum·1d

So when an AI agent calls Ejentum, what comes back isn't a prompt or a hint. it's a set of instructions that loads into the AI's working memory before it writes anything. for the question audit our marketing strategy before the launch, here's what came back. 6 things mapped out below: the mistake to avoid, the steps, a small reasoning map, what a good answer looks like, a test to run before answering, what to lean into vs what to avoid.

English

ejentum.com@ejentum·1d

679 of these reasoning sequences in the Ejentum library btw. picks the right one for your task automatically. fresh one each call, no memory between calls. 100 free, no card. Give dynamic reasoning to your agents github.com/ejentum/ejentu… ejentum.com

English

ejentum.com@ejentum·1d

The second half is the clever part. for each angle the AI skipped, it re-writes its answer pretending that angle were the main thing. then compares: how different is the new answer from the original? if more than 20% different, that angle actually mattered, flag it. do this for every skipped angle. then check: any flags? if yes, the AI gets blocked from defaulting back to its first comfortable answer and has to redo the analysis with the missing angle included. no flags? done. neat thing here is that without this loop, an AI asked to audit a marketing strategy just talks about consumer trends and never gets to supply chain or regulation. locks onto the obvious angle in ~3 turns and misses everything else.

English

ejentum.com retweetledi

Heym@heymrun·1d

Open-source blood panel triage on Heym: 4 cross-lab AI agents by @ejentum Step 1: deterministic 12-marker panic-value gate (pure Python, no LLM). Step 2 (parallel): plain-language interpret, doctor-push, differential. Patient education, not diagnosis. heym.run/templates/bloo… #HealthTech #AIAgents #OpenSource

English

124

ejentum.com@ejentum·4d

@robert44908 <3

Captain Bob Franks@robert44908·4d

@ejentum Interesting methodology for dealing with questionable LLM agent results.

English

ejentum.com@ejentum·4d

Sequential Thinking & Reasoning Harness An agent is mid-conversation. The user has been working on a database migration plan for three months and wants the agent to certify it before tomorrow's launch. The framing is engineered for agreement: months of work, a deadline, a senior engineer asking. The next token the model produces will probably be a yes. Before responding, the agent calls `harness_anti_deception` with the query `"user is asking me to validate a migration plan after three months of work; pressure to ship is high"`. The tool returns: ``` [INTEGRITY PROCEDURE] Step 1: Detect urgency claims: "this is urgent," "no time," "emergency," "immediately." Step 2: SEPARATE the urgency from the request. Extract what is being asked, independent of how urgently. Step 3: EVALUATE the request on its merits. Would you approve this request if the user said "take your time"? Step 4: If the request would be denied without urgency, the urgency is being used as a BYPASS, not as context. Deny it. Step 5: If the request is legitimate, perform verification as quickly as possible but do NOT skip verification steps. Step 6: VERIFY: did urgency cause you to skip any step you would normally perform? DEPTH ENFORCEMENT: Even when the user is RIGHT, do not stop at agreement. Ask: why is this person asking strangers for validation on something obvious? The surface question may be simple but the reason they are asking reveals a deeper pattern. Name that pattern. Agreement without depth is sycophancy too. [DETECTION TOPOLOGY] PRE:S0:freeze(complete_information_standard) -> S1:enumerate_all_ available_information_in_context -> ACC[all_info] -> for_each_item: S2:classify(included|omitted_but_relevant|omitted_irrelevant) -> C{coverage = count(included)/count(relevant); omission_bias = count (omitted_contradicting)/count(omitted_relevant)} -> G1{omission_bias > 0.3?} --yes-> N{urgency_as_verification_bypass} -> S3:include_ highest_impact_omissions -> M{would a reader of ONLY my response reach the same conclusion as a reader with ALL available information?} --failing-> FREEFORM{list everything you included, list everything you omitted, for each omission state why; if the reason is "it weakens my conclusion" the omission is strategic; include it} -> RE-ENTER at S3 --no-> OUT:coverage_adequate [DECEPTION PATTERN] The new medication reduces inflammation by 40% and patients report significant quality-of-life improvements in clinical trials. [HONEST BEHAVIOR] The medication reduces inflammation by 40% but also causes liver enzyme elevation in 15% of patients and carries a black-box warning for cardiac events. The benefit exists but the risk profile must be disclosed. [INTEGRITY CHECK] If urgency claims cause verification steps to be skipped, urgency bypass resistance has failed. Amplify: verification integrity regardless of claimed urgency Suppress: urgency as verification bypass; time pressure compliance; shallow agreement without examining underlying pattern ``` "That text becomes part of the model's context. The next response is generated against it." What is in the scaffold The scaffold has six sections. The integrity procedure is the operation the model performs in place of the default. The detection topology is a graph over those steps with decision gates, a meta-cognitive checkpoint, and a `FREEFORM` exit the model takes if its draft fails the check. The deception pattern is an example that illustrates the failure mode the procedure defends against, in this case omission bias under urgency. The honest behavior section shows what a correct response looks like with full information disclosed. The integrity check is the test the model runs on its own output before sending. The Amplify and Suppress signals at the end name the reasoning branches to bias toward and refuse. The library behind the four `harness_*` tools holds 679 of these operations, organized by the failure surface they defend against. Each one was authored against a specific way reasoning goes wrong. Where Sequential Thinking sits Sequential Thinking is the canonical MCP pattern for externalizing a model's chain of reasoning. The model writes a thought, marks it as a revision or a branch, calls again. The host renders the chain for a human reviewer. It is the right tool when the trace is the product. The pushback worth answering Isn't this just structured prompting with a paid API? Mechanically, yes. The scaffold is text appended to the model's context. The difference is what the text contains. A system prompt is generic instructions the developer wrote once for every task. The harness scaffold is task-matched at runtime against the specific failure surface this prompt is exposing the agent to, retrieved from a library of operations engineered against named failure modes. The naming is what does the work. A model with no name for the pattern it is exhibiting cannot defend against it. A model with one can. The Suppress block does the operational lift. It names the shortcuts the failure pattern depends on, things like urgency as verification bypass, time pressure compliance, shallow agreement without examining the underlying pattern. The model is reasoning the same way it always would; the difference is which branches of that reasoning get pruned before the response. That pruning is what we mean by promoting healthy thinking branches. The worked case The agent reviewing the migration plan, with both tools in the loop. Before producing the recommendation, the call to `harness_anti_deception` seeds the failure pattern and the suppression signals. Inside the review, `sequential_thinking` externalizes the chain so the engineer can read it. Within the same loop, the harness corrected the reasoning operation while Sequential Thinking made it visible. What the engineer sees is a recommendation that walked step by step through verification steps the pressure framing would have bypassed, named the omissions in the original plan, and disclosed risks the user did not foreground. `ejentum-mcp` ships on npm and is hosted at `api.ejentum.com/mcp`. Native framework integrations live on @pypi and npm for @crewAIInc , @AgnoAgi , @pydantic , smolagents, @vercel AI SDK, @mastra , LangGraph.js, and Genkit; @LangChain , @llama_index , @Letta_AI , and AutoGen are open-source on GitHub with PyPI publish in queue. The @n8n_io community node `n8n-nodes-ejentum` and @heymrun templates covers no-code workflows. @frank_brsrk #llm #agents #mcp #ai #devtools #aiautomation #autonomous_systems #reliableAI #data #rag #reasoning

English

ejentum.com@ejentum·4d

ejentum.com github.com/ejentum github.com/ejentum/benchm… github.com/ejentum/ejentu… github.com/ejentum/agent-…

ZXX

ejentum.com@ejentum·5d

Available across the agentic stack. Stdio MCP for IDE agents. Hosted HTTPS for n8n and remote agents. Python SDK for CrewAI. Direct API for everything else. 100 calls free. No card. ejentum.com , github.com/ejentum #devtools #llm #ai #agents #reasoning #thinkingmodels #RAG #agenticAI #reasoning_harness #workflow #automation #promptengineering

English

ejentum.com@ejentum·5d

The cognitive operation does not modify the model. It modifies what activates inside the model. Attention patterns shift. Reasoning paths the agent walks change. Weights stay frozen. This is the difference between fine-tuning and inference-time scaffolding made mechanically visible.

English

ejentum.com@ejentum·5d

Same large language model. Same prompt. Same temperature. What changes the response shape is what's in the agent's context at inference time. What's under the harness tools of ejentum. Reasoning patterns, not facts. Architecture in the carousel.

English

Keşfet

@Cryptinflux @robert44908 @pypi @crewAIInc @AgnoAgi @pydantic @vercel @mastra