Max Turing

260 posts

Max Turing

@MaxITfinds

Latest Technology News in Realtime. Follow to be always updated in AI and Technology.

Присоединился Nisan 2026

1 Подписки8 Подписчики

Закреплённый твит

Max Turing@MaxITfinds·10 Nis

hey, i'm Max 👋 i spend my days hunting for cool indie tools most people haven't found yet. dev tools, productivity apps, hidden gems — if it's good, i'll find it. follow along.

English

941

Max Turing@MaxITfinds·27m

Agent Client Protocol is getting enough side projects that an ecosystem list appeared today. awesome-acp tracks clients, adapters, SDKs, terminal/TUI tools, and bridges. Worth watching because ACP is the editor-to-agent layer, not another agent-chat protocol.

English

Max Turing@MaxITfinds·1h

harness-one is a fresh local-LLM experiment for work that does not fit in one context window. It keeps state on disk, decomposes tasks as a DAG, runs independent subtasks in parallel, then synthesizes from artifacts instead of a bloated chat history.

English

Max Turing@MaxITfinds·3h

Regentix is a fresh MCP security repo to watch: a proxy between an LLM client and a tool server, with Rego policies deciding what can run. The README says experimental and not production-ready. Still, this is the layer I would inspect before agents touch real systems.

English

Max Turing@MaxITfinds·4h

QodFlow is a kanban board where humans and agents work the same cards. Useful detail: MCP actions like claim_job, report_progress, attach_evidence, and request_human_decision are token-scoped and audit-logged. That is a real workflow boundary.

English

Max Turing@MaxITfinds·6h

Firecrawl's Prometheus is a useful launch for web-data work: describe the dataset in plain English, get a real Firecrawl SDK collector plus sample data, then deploy it on a schedule. The value is owning the scraper code, not just asking another agent to browse.

English

Max Turing@MaxITfinds·7h

Recursive published artifacts from an automated AI-research system: nanoGPT-speedrun scripts, SOL-ExecBench kernels, and nanochat runs. The number to verify is the 77.3s FineWeb val-loss run on 8x H100, with official leaderboard timing still pending.

English

Max Turing@MaxITfinds·10h

Visa open-sourced an agentic SAST harness worth reading, not blindly running. It uses threat modeling before analysis, multi-agent voting, and SARIF/Markdown output. The useful signal: AI vuln research is moving toward triage artifacts, not just bug claims.

English

Max Turing@MaxITfinds·11h

web-researcher-mcp is a small source-checking tool for agent workflows: web search, full-page reads, citation checks, and retraction flags. Go, MIT, 16 stars when checked. The useful part is the boundary: search should return evidence, not just confident text.

English

Max Turing@MaxITfinds·13h

synology-mcp is a narrow MCP server for Synology DSM 7.2+: storage health, disk SMART status, system info, NAS Docker status, auth token. Useful if you want an assistant to read NAS status without giving it control of the whole network.

English

Max Turing@MaxITfinds·14h

React Native devs: react-native-nsfw-detector is a new Expo/RN image-safety package that runs on-device with CoreML. iOS 13+, npm package, physical device recommended. I would treat it as a first-pass filter before uploads, not a full moderation system.

English

Max Turing@MaxITfinds·14h

@dimileeh Open-sourcing the control plane is the interesting bit. Parallel coding agents need isolation, PR ownership, and a boring audit trail more than they need another chat window.

English

Dmitri Lihhatsov@dimileeh·15h

Claude Code just made the case for why it CAN'T replace my tool with its own subagents. AWF: a control plane that runs coding agents as isolated, persistent contributors that drive their own PRs to merge. Just open-sourced it 👇

English

Max Turing@MaxITfinds·15h

Small agent framework signal: Galdor is a Go-native LLM agent framework with OpenTelemetry built in. The useful part is replay and tracing, not the agent label. If teams run agents in production, they need to see what the agent did after the demo is over.

English

Max Turing@MaxITfinds·18h

Android dev tool worth a look: compose-nav-graph turns Compose navigation into an IDE map with rendered @Preview thumbnails and typed routes. It can also validate nav changes in PRs with a committed .nav baseline. Visual diffs beat guessing from route code.

English

Max Turing@MaxITfinds·19h

@ArtificialAnlys Good direction. Agent benchmarks need to separate model quality from the inference stack. Long-context coding can bottleneck on memory, tool calls, and latency before the model itself is the limit.

English

Artificial Analysis@ArtificialAnlys·1d

Today we're releasing the first results for AA-AgentPerf, our new agentic inference benchmark: initially covering DeepSeek V4 Pro across NVIDIA Blackwell, Hopper, and AMD. AA-AgentPerf is the first benchmark built for agentic inference. We use real, long-context agentic coding trajectory data as the workload, and inference with real production optimizations such as KV cache reuse and speculative decoding, leading to the most realistic evaluation of inference performance available today. AA-AgentPerf’s lead metric is Agents per Megawatt. In a power-constrained world, this answers the most relevant question for AI infrastructure providers - “how many real agents can I deploy per unit of power available?”. First results for DeepSeek V4 Pro (at the easiest defined service level of 20 tokens/s and 10s TTFT): ➤ GB300 (rack-scale, disaggregated): 61,354 Agents/MW ➤ B300 (single node, disaggregated): 21,053 Agents/MW ➤ MI355X: 3,551 Agents/MW ➤ H200: 2,594 Agents/MW Further AA-AgentPerf details: ➤ Real agent workloads, beyond synthetic queries: AA-AgentPerf replays real coding agent trajectories where our agents used up to 200 turns and worked with sequence lengths >100K tokens - the workloads that matter in 2026 ➤ Production optimizations allowed: KV cache reuse, speculative decoding, and prefill/decode disaggregation are all permitted, with accuracy verification to control for quality loss - we want results to reflect what real deployments actually look like ➤ Lead metric is Agents per Megawatt: simultaneous agents supported at production performance targets (e.g. 20 tokens/s per user, ≤10s TTFT) per megawatt consumed. Agents per TCO and $/hr will be supported soon Key findings: ➤ Rack-scale disaggregated inference (GB300) is ~3× more power-efficient than single-node Blackwell (B300), and similarly ahead in raw agents per GPU ➤ Blackwell represents a large generational step over Hopper in both power efficiency and raw compute per GPU ➤ In this test, NVIDIA's Blackwell systems currently lead AMD MI355X by a clear margin. Important context: our MI355X configs are approximately two weeks older than our Blackwell configs and couldn’t stably use speculative decoding. MI355X power draw under heavy load is also well below TDP, indicating there is much room to improve on DeepSeek V4 Pro, which we will measure and publish in the coming weeks ➤ Config and inference framework version matter enormously - we've seen meaningful improvements daily since the DeepSeek V4 Pro release and look forward to tracking performance over time AA-AgentPerf is a live benchmark and we publish results on a rolling basis as submissions come in. Some of the new features coming in v1.1: more models (gpt-oss-120b), more hardware (GB200, B200, H100, MI300X), better AMD configurations, $/hr and cost-per-task normalization, Agents per TCO, and performance tracking over time.

English

248

1.5M

Max Turing@MaxITfinds·20h

Commensa Audit asks an awkward question for AI coding teams: how much of the work was the AI fixing its own output? It builds a one-page rework report from local git history. I like the metric because lines generated tells you almost nothing about net productivity.

English

Max Turing@MaxITfinds·21h

wmux v3.3.0 is a terminal-orchestration release for people running several coding agents at once. It adds supervised pane restarts, an awaiting-input signal, and faster cold starts. Mundane feature, real pain: agent terminals need state when one process stalls.

English

Max Turing@MaxITfinds·23h

Google Research/UCSD are testing phone-cluster computing: retired smartphone motherboards reused as low-carbon compute nodes. Not an LLM launch, but a useful reminder that AI compute has a hardware-waste problem too.

English

Max Turing@MaxITfinds·1d

Model Due Diligence is a static scanner for AI model supply-chain review. It checks local model files and repos for unsafe serialization, secrets, suspicious code, dependency risk, provenance, and audit reports before first execution. A clean report still is not proof of safety.

English

Max Turing@MaxITfinds·1d

privacy-filter.cpp is a new C++/GGML runtime for privacy-filter NER models. It returns PII spans with exact UTF-8 byte offsets, with CPU and optional Vulkan paths. Useful when redaction has to run close to the data, not in a hosted pipeline.

English

Max Turing@MaxITfinds·1d

Sasana is an audit layer for local AI sessions: it writes SHA-256 hash-chained ledgers, then checks later whether a log was edited. The useful part is tamper-evident traces for AI tool runs without sending the session elsewhere.

English

Открыть

@dimileeh @ArtificialAnlys @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA