Steven (Batman) Batchelor-Manning

3.8K posts

Steven (Batman) Batchelor-Manning banner
Steven (Batman) Batchelor-Manning

Steven (Batman) Batchelor-Manning

@S_BatMan

Founder | CTO | Building Exceptional Technology

Bratislava, Slovakia Katılım Eylül 2010
235 Takip Edilen1.4K Takipçiler
david bayendor
david bayendor@bayendor·
Just finished wiring a 3-layer memory stack into Hermes Agent. 🧠 Layer 1: Honcho Session + peer memory on PostgreSQL. Handles context tracking, history, and multi-agent coordination. ~12K messages indexed across 4 containers. ⚡ Layer 2: hermes-lcm Working memory layer for conversational continuity between turns, with sensitive pattern redaction. 📚 Layer 3: GBrain Long-term knowledge graph powered by PGLite (WASM Postgres). 301 pages, 1,150 chunks. Hybrid search + multi-query expansion. Now letting the agents run on it for the next week to see how the memory behavior evolves 👀
david bayendor tweet media
English
25
26
279
12.2K
Steven (Batman) Batchelor-Manning
@LottoLabs That's where machines like the spark have their strength. Running models 24/7 on negligible watts vs a desktop PC gpu stack. For them tasks where yoy can afford a little extra time for big savings.
English
1
0
0
35
Steven (Batman) Batchelor-Manning
@sudoingX I'm regularly finding it can double the perceived sharpness of any model just by using my harness over something like claude code. The answers given are only as good as the questions asked
English
0
0
0
24
Avi Chawla
Avi Chawla@_avichawla·
Build human-like memory for your Agents (open-source)! Every agentic and RAG system struggles with real-time knowledge updates and fast data retrieval. Graphiti solves these issues with its continuously evolving and temporally-aware knowledge graph. Like humans, Graphiti organizes an Agent's memories into episodes, extracts entities and their relationships from these episodes, and stores them in a knowledge graph: (refer to the image below as you read) 1) Episode subgraph: Captures raw data with timestamps, retaining every detail for easy historical lookup. 2) Semantic entity subgraph: Extracts entities (e.g., “Alice,” “Google”) and facts (“Alice works at Google”). Everything is versioned, so outdated info gets replaced. 3) Community subgraph: Groups related entities into clusters, with summaries for faster retrieval. Graphiti delivers up to 18.5% higher accuracy with 90% lower latency when compared to tools like MemGPT. It's fully open-source with 26k+ stars. I have shared the repo in the replies.
GIF
English
25
16
104
9.5K
Steven (Batman) Batchelor-Manning retweetledi
うたこ/utako@VRC
うたこ/utako@VRC@utako_vrchat·
@S_BatMan This model is tuned for Nvidia Spark, but I was able to achieve 120 tok/s on an RTX 5090 with --draft-max=3! A model supporting NVFP4 + MTP + GGUF simultaneously is rare! Would love to see a vision-capable version too.
うたこ/utako@VRC tweet media
English
0
1
1
83
Steven (Batman) Batchelor-Manning retweetledi
mr-r0b0t
mr-r0b0t@mr_r0b0t·
It’s official! @NVIDIAAI / MiniMax-M2.7-NVFP4 Optimized specifically for your SM120/121 DGX Spark (GB10) and RTX 6000/5090 Blackwell tensor cores! Full native FlashInfer/CUTLASS Finalizing the benchmarks and documentation now 😁
mr-r0b0t tweet media
English
19
8
160
10.6K
Steven (Batman) Batchelor-Manning
@KishanVavdara @0xSero So far ive managed to get Deepseek V4 flash at Q2_K-XL on the spark to around 12 tokens/second for a small test context, long way to go fixing kernel issues with llama cpp before i think i can get that to a better speed and context length. but lets see.
English
1
0
1
39
0xSero
0xSero@0xSero·
It had to happen. - deepseek-v4-flash reap on 1 spark - integration with vLLM-studio - research multi-device inference - dynamo disaggregated inference
0xSero tweet media
English
24
4
292
14.7K
Steven (Batman) Batchelor-Manning
Another intesting system worth a deep dive
Paul Iusztin@pauliusztin_

2 months ago, I started building unified memory layers with knowledge graphs. Here’s the most common question I’ve been asked: “How do you handle entity resolution and deduplication without corrupting the graph?” I didn’t want to give a high-level answer based on assumptions... So I spent a lot of time studying how systems like mem0, cognee, and Neo4j approach this problem. While experimenting with orchestration patterns using @PrefectIO to improve durability and reduce costs. I discovered most people treat entity resolution and deduplication as the same thing. But they’re not. The best memory systems separate naming from identity. Here’s what you have to do after your LLM extracts an entity: 1/ Resolution → "What should we call this?" This layer handles: Typos Acronyms Surface-form similarity Using exact, fuzzy, and semantic matching only against names of nodes of the same type. Examples: “NYC” ↔ “New York City” “JP Morgan” ↔ “JPMorgan Chase” At this stage, the system only updates canonical names used later for soft matching. No graph merges happen yet. Because similar names are NOT strong enough evidence that two entities are identical. For example: Apple: Company ≠ Fruit Jensen Huang: CEO of NVIDIA ≠ A doctor in Taipei with the same name 2/ Deduplication → "Is this the same real-world entity?" Now we embed the full entity context and compare it against existing nodes using semantic + fuzzy similarity across the full context. Based on the similarity score (0 → 1), there are 3 outcomes: High confidence (≥0.95) → auto-merge Medium confidence (>0.85) → human review Low confidence (≤0.85) → new node This is critical because false merges silently corrupt the graph. The smartest design decision I found was treating evidence strength as permission strength: Weak evidence earns a new node Strong evidence earns a merge Uncertain evidence earns a review queue This kept the graph clean as memory scales. But the graph model is only half the problem... Building KGs on top of unstructured data is expensive. These operations all cost money: LLM extraction Entity resolution Embeddings Deduplication If one downstream step fails and you replay the whole pipeline, you burn tokens recomputing work you already paid for. So the best architectures split workflows into checkpointed tasks with retries + caching. This is where tools like @PrefectIO fit extremely well for production memory systems. It lets you: Cache expensive extraction steps Retry only failed stages Batch embeddings efficiently Scale ingestion safely Add observability into every phase Without restarting from every failure from scratch. P.S. What’s been the hardest part for you when building long-term memory for agents via KG?

English
1
0
1
106
0xVibly
0xVibly@0xVibly·
Twitter is cool, but it’s 100x better when your feed is full of builders. I want to meet more people working on tech, AI, startups, design, SaaS, web dev, and programming. If that’s you, drop a hi 👋
English
87
1
72
3K
Steven (Batman) Batchelor-Manning
@populartourist Great now i have lost another afternoon to good advice, this platform is plagued with meaningful helpful advice that makes me want to change stuff. I want time for my other hobbies god damn it
English
1
0
15
1.4K
wd 🔺
wd 🔺@populartourist·
Hot take: Qwen3.6 27B adheres to instructions like rails if prompt is formatted in JSON. Thank you for your attention to this matter.
English
11
11
374
18.9K
Sandro
Sandro@pupposandro·
Local AI is still too complex for 99% of people and the fix isn't better kernel tutorials or cleaner docs. it's solving the entire software stack Local LLMs are still waiting for their plug-and-play moment. Exactly what we're building @luceboxai for
Hikari∣LocalLLM⚡@Hikari_07_jp

Local LLM is incredibly complex. Hardware selection, quantization, harnesses, engines, tensor parallelism, unmodified models, MTP… Despite its complexity, local LLM is irresistibly fascinating. I started using X because there was almost no one close to me who could share this excitement with me.

English
19
2
94
14.5K
Steven (Batman) Batchelor-Manning
Drop on a quick /goal to regression test today's code and look forward to a morning of reading negative reviews of my work by an ai over coffee.
English
0
0
0
42
buun
buun@spiritbuun·
@S_BatMan Nah, mainline doesn't accept slop like this. The beauty though is that with vibecoding we can all just have our own slop forks that do the niche things we want them to do.
English
1
0
2
41
buun
buun@spiritbuun·
Was OOMing with my favorite quant + MTP + mmproj. I've added a new flag to my fork: --mmproj-gpu-swap. If you're tight on space, this puts mmproj on CPU until it's needed, and then temporarily destroys MTP in order to use it, then swaps it back. Virtually no discernable speed hit
buun tweet media
English
2
1
25
913