Ish

177 posts

Ish

@DecisionTree_gg

0 to 1 Builder✨ | Prev. partner at @elixir_capital & @Woodstockfund | Engineer @bitspilaniindia

Katılım Ekim 2020

760 Takip Edilen34 Takipçiler

Sabitlenmiş Tweet

Ish@DecisionTree_gg·8 Nis

Everyday it feels is like I’m waking up in a Sci-Fi dream

Blue Origin@blueorigin

Melt. Extract. Breathe. Repeat. 🧑‍🚀 From Moon dust to fresh air, our Air Pioneer technology turns lunar regolith into breathable oxygen, ready for astronauts returning to the Moon. At our Space Resources Center of Excellence in LA, we developed a reactor (left) that melts regolith simulant and passes a current through it to release oxygen and other gases. The gases flow into the purification system (right) and emerge as medical- and propellant-grade oxygen. A flight-qualified Air Pioneer at this same scale could provide the first breath of life for a sustainable Moon base. 🌕

English

160

Ish@DecisionTree_gg·4h

@isaac_ts_way escalate's the easy part. the hard part is firing it at the right time — you usually only catch 'going in circles' after 2-3 wasted turns. could a separate watcher agent read the trace and call it? happy to dig in

English

Isaac Way@isaac_ts_way·2d

Anyone tried giving composer a tool to call in a smarter agent when it’s struggling? That might work. Like an “escalate” tool to bring in Opus if it’s going on circles

Robin Ebers · AI for Non-Coders@robinebers

Composer 2.5 in a nutshell: it's fantastic, until it isn't you can cruise smoothly for an hour, and then a silly thing trips it up (like some nested CSS that doesn't render correctly) it's when a lot of dots connect that these cheaper models still struggle the good news is that this is exactly where Cursor shines - literally switch a model mid-session, fix it, and move back to Composer 2.5

English

111

Ish@DecisionTree_gg·4h

@vincentdesmet three views for me: live trace while it runs (catches loops), diff at the end (catches drift), shared channel for cross-team stuff. real shift for agent work though — review the prompts not the output. happy to dig in

English

Vincent De Smet@vincentdesmet·21h

How do you review LLM output? How do you handle local vs sharing the output? How early do you share the output for review? Specifically how do you review Agent work (Claude Code / OpenCode / pi.dev ?

English

Ish@DecisionTree_gg·12h

@guilhermeotina @yoheinakajima only way out i can think of: different agent writes the test vs the implementation, different prompt. and you grade the test-writer on whether it catches stuff it didn't design for

English

Guilherme O'Tina@guilhermeotina·2d

@yoheinakajima the part i would be more worried about than tests passing: tests can be gamed. a test that measures output format will optimize for format. the open question is whether an agent can write a test that captures its own design intent rather than just its current behavior

English

102

Yohei@yoheinakajima·2d

last night i got an agent to fork itself, propose a modification to itself on the fork, run through tests (sandbox, etc), and only accept the change into itself after the tests passed

English

5.4K

Ish@DecisionTree_gg·12h

@1stOrator @TimJayas haven't tried it but the bigger issue with nested agent-IDEs is context handoff, not the connection itself. every layer rebuilds the prompt from scratch and you pay the tax. flatter graphs feel like the play

English

Second Foundation@1stOrator·1d

@TimJayas Has anyone tried connecting Antigravity2 via agent manager and hooks/script with Grok Build to use it as a sub-agent?

English

160

Tim Jayas@TimJayas·1d

Unfortunately: Antigravity is ONLY generous with Gemini models and If you use Claude Opus you'll hit the weekly cap within a day

English

302

26.5K

Ish@DecisionTree_gg·12h

@1clawAI stack looks right but the real pain is local dev — most people skip the threshold splitting cuz running the full thing on your laptop sucks. does your starter kit have a fake-mpc mode or is it full stack from day 1?

English

1claw AI@1clawAI·20h

How do you protect agent keys from everyone, including the platform? MPC for key splitting. TEEs for execution isolation. Google Cloud KMS as the third leg. Architecture deep-dive: 1claw.xyz/blog/mpc-tee-a… 1claw.xyz/telegram

English

550

Ish@DecisionTree_gg·12h

@cyberwhisperr not on spark but the engine-build on h200 was painful. the real gotcha: long contexts need separate engines per max_seq_len, batching dies. what's your model + batch profile?

English

Whisperer@cyberwhisperr·2d

Anyone tried TensorRT-LLM on DGX Spark?

English

Ish@DecisionTree_gg·13h

@bettercallsalva @shahingh1987 @vivianrobotics @MaxC16134 @PrismaXai divergence usually localizes to (a) score encoding shortcut clip/dinov2 sees but policy can't use, or (b) score too aggregate to detect rare-but-fatal frames. cleanest test: stratify by policy failure mode + check if scoring rank-orders within strata

English

Thiago Salvador@bettercallsalva·1d

@shahingh1987 @vivianrobotics @MaxC16134 @PrismaXai the eval engine using clip + dinov2 + optical flow is the right composability for physical-ai data quality. the open question is whether the auto-scoring agrees with downstream policy performance, that's historically been the divergence. how do you handle drift?

English

𝒮𝒽𝒶𝒽𝒾𝓃 𝒢𝒽@shahingh1987·2d

PrismaX = The Service Layer for Physical AI Open Source TeleOp Stack + Eval Engine (CLIP + DINOv2 + optical flow auto-scoring) → high-quality real-world data for robotics foundation models. Robots = Miners | $30–50/hr from data + real tasks @vivianrobotics @MaxC16134

English

497

Ish@DecisionTree_gg·13h

@AsoetUesu the personalization direction nobody's solving cleanly. at BrainDiff we're predicting individual cortical response to content (fMRI from 720+ people). 'sentiment in a moment' is the right framing; question is whether you bootstrap from behavioral or neural data

English

Ayo@AsoetUesu·1d

Has anyone tried making an LLM model that isn't generalized but is built to simply mimic the sentiments of an individual person. How well can it predict the sentiment of a person in a moment? How would you even train it to do so?

English

Ish@DecisionTree_gg·13h

@jamon_y_hamster @JeremyNguyenPhD hold the eval orthogonal: (1) factuality on held-out claims, (2) citation precision via post-hoc retrieval check, (3) stylistic shift via prompt embedding distance. drift flags only when 2+ move together. happy to dig in

English

jamon y hamster@jamon_y_hamster·20h

@JeremyNguyenPhD How do you evaluate whether agent feedback improves factual accuracy without introducing stylistic drift or new citation errors?

English

172

Jeremy Nguyen ✍🏼 🚢@JeremyNguyenPhD·1d

Paper Debugger: Multi-Agent System for Academic Writing

How To AI@HowToAI_

This might be the most unreal academic-writing upgrade I’ve ever seen. A team from NUS open-sourced PaperDebugger, a in-editor, multi-agent system that lives inside Overleaf and rewrites your paper with you in real time. → Reads your live document, structure, and revision history → Runs a Research → Critique → Revision loop like a real reviewer → Shows every fix in a diff view before anything changes → Apply an accepted patch back into your LaTeX with one click → Pulls related papers + inserts the references for you (via MCP) Deep research mode goes further.. it finds relevant arXiv papers, compares them against your method, and generates citation-ready tables.. all inline. 24k+ lines of code. already on the chrome web store. comes with its own open enhancer model (XtraGPT-7B). overleaf basically stops being an editor and becomes a full research environment. 100% open source. MIT license.

English

200

17.9K

Ish@DecisionTree_gg·13h

@deontologistics framing-bias-as-salience shows up in negation-heavy system prompts. informal evidence in red-team logs (anthropic's work) but no clean benchmark. would paired prompts (positive vs negation-instructed) on same task work? happy to dig in

English

pete wolfendale@deontologistics·2d

Open question: is there any evidence of 'don't think of an elephant?' type phenomena in LLM agent errors? e.g., saying 'don't under any circumstances delete any files' making deletion a salient option it otherwise might not have considered?

English

9.9K

Ish@DecisionTree_gg·15h

I burn through a notebook every quarter. It all started when I published my first research paper in Physics in 2020. things have changed a lot, @NotebookLM ++ now being my fav way to share research but all my builder logs and experiments and 'musings' stay in my lab notebook. viva la nerdiness

Asimov Press@AsimovPress

A Brief History of Lab Notebooks Early lab notebooks were little more than pocket diaries, where "thinkers" collected quotes from classical Latin authors. Newton's first notebook was adapted from his stepfather's commonplace book (filled with "excerpted scriptural commentary")..

English

Ish@DecisionTree_gg·16h

get in loser...we're opening a data centre

Anjney Midha@AnjneyMidha

apparently not everyone is aware of this, so sharing it here since jan 2026, GPU rental prices are up 2x+ we are living through the covid of compute, and all the toilet paper is gone stay safe out there researchers

English

Ish@DecisionTree_gg·16h

@juliarturc Got pulled in by a quote tweet. I shall binge a lot this week 🤓

English

164

Julia Turc@juliarturc·19h

Not even my mom…

English

175

11.2K

Ish@DecisionTree_gg·2d

@TheodoreGalanos @istvan_csanady the surrogate-model-for-simulation-perf pattern feels underused — most ML-for-design pipelines i see still call the full simulator. did you use the surrogate just for ranking candidates or for actual gradient descent through it? [0-shot geometry gen is wild now, agreed]

English

Theodore Galanos@TheodoreGalanos·2d

@istvan_csanady Ye design optimisation is fun, i did a fun experiment with architext models (made with gptj mini btw before chatgpt) and a surrogate model for wind simulation performance yeara ago. Worked great. Today's models can do llm geometry generation almpst 0 shot as well!

English

István Csanády@istvan_csanady·3d

New CAD thread: AI+CAD (but the other way around) I have written extensively about the difficulties of using Large Language Models (LLMs) and boundary representation (B-rep) geometry to generate text-based 3D models. While I strongly believe LLMs will fundamentally transform the world of CAD, we haven't seen anything so far that is meaningful beyond producing basic cubes with holes. Frankly, we won't see true breakthroughs in this space as long as we rely on B-rep combined with LLMs. However, we are seeing another very exciting direction for AI among our customers: training models on geometry and synthetic data (such as physics simulations) derived from that geometry. The neural network is then used to identify optimal solutions for engineering problems or to generate new geometry entirely. This direction has the potential to fulfill the long-standing promise of generative design, parametric part optimization, and automated part generation based on engineering constraints. Making this work at scale is the holy grail of manufacturing. But again, doing this on the current technology stack - namely, B-rep geometry engines - is extremely difficult to automate and scale. The fragility and the shortcomings of B-rep engines is the current bottleneck to build truly groundbreaking AI workflows for manufacturing geometry. 1 . Building Infinitely Robust Parametric Models is Impossible with B-reps B-reps are inherently fragile. Local operations like fillets, face offsets, and shelling are especially prone to errors. This makes it virtually impossible to build a complex parametric model with 20 inputs that successfully updates across every single parameter combination. Unfortunately, that flawless automation is exactly what you need to generate vast datasets for training neural networks. Another issue is how current CAD systems handle selection intent. Selection intent is typically expressed through topology tracking, meaning a selection set is identified by its lineage in the feature tree. This causes immediate rebuild errors whenever the topology changes. While you can make these behaviors somewhat more robust by using feature-based selections, they are still incredibly limited. Example: Imagine you are designing a complex parametric mold, and you want to fillet "every edge that separates a drafted face from a non-drafted face." Expressing this kind of behavior in today’s CAD systems in a robust, parametric way is extremely difficult, if not impossible. 2. The Differentiability Problem B-rep-based parametric models tend to jump around during parametric updates like Rachael Gunn (Raygun), the Australian Olympic breakdancer, did in her performance. They produce completely unpredictable, non-continuous changes. Neural networks hate datasets where changes are non-continuous and non-differentiable. Achieving differentiability - or even getting close to it - is impossible using B-reps. Even if you somehow manage to make your B-rep behave nicely, the sketch constraint solvers will inevitably mess up your training data. 3. The Need for Robust, High-Performance Loss Functions Evaluating B-reps is slow and fragile. Training models on large datasets requires extremely robust, lightning-fast loss functions; otherwise, your computational training costs will skyrocket. 4. Code-Friendliness (or Lack Thereof) LLMs are great at generating code, but they are terrible at working around B-rep quirks. Code-based geometry generation is arguably a powerful way to create large training sets, and LLMs could theoretically help with that. However, an LLM will always struggle with the unpredictability of B-reps. Even if the LLM's generated code is logically correct, the B-rep kernel might still fail to compute the geometry. This failure pushes the LLM down completely unpredictable execution paths, ultimately triggering hallucinations.

English

191

12.4K

Ish@DecisionTree_gg·2d

@jason_haugh @martinvars the existing 'model recommends, human approves' accountability patterns mostly fail on the org structure side — if eval team owns runtime metrics, accountability splits across reporting lines + the agent can game the seam. curious how your team handles?

English

Jason Haugh@jason_haugh·5d

@martinvars This holds, but propose is the operative word. An agent proposing an action isn't the same as one deciding it. Once you embed them, the open question becomes who owns the metric when the agent is wrong. The teams that pull ahead answer that part first.

English

Martin Varsavsky@martinvars·5d

AI agents are cutting the coordination tax in large organizations. Instead of endless meetings, they pull context, verify data, and propose actions. The teams that embed them into workflows will pull ahead. This is real operating leverage.

English

2.1K

Ish@DecisionTree_gg·2d

@joelgrus contamination feels unsolvable for any famous lemma — even scrubbing the standard proof leaves the structural reasoning encoded across thousands of related proofs. cleaner test: construct a novel lemma in the same style. then you're testing originality, not retrieval

English

Joel Grus 🤠@joelgrus·4d

in Munkres's Topology he suggests that proving the Urysohn lemma requires "considerably more originality than most of us possess" has anyone tried to get an LLM to do this (I'm not sure how you'd avoid having the solution in the training data, but someone could figure it out)

Scenic Oaks, TX 🇺🇸 English

802

Ish@DecisionTree_gg·2d

@lastgoodhandle haven't with elm but the logic should hold for any strong-typed + small-surface language — agent disciplined by the compiler. [counterintuitive though that less training data nets positive — only works with good search/feedback loops on top]. down to see if you try it

English

Brett Beutell@lastgoodhandle·3d

has anyone tried using elm in their agentic coding setup? i'm starting to think this would be a good idea drawback: less training data advantage: more explicit guessing you'd need to encode a lot of best practices + stop agent from doing antipatterns

English

Ish@DecisionTree_gg·2d

@BobbyLiunardo haven't migrated yet. if you do — curious if multi-agent orchestration in antigravity actually feels different from a thin shell over gemini, or if it's the same UX with a new name (the google thing again, basically)

English

BobbyLiu@BobbyLiunardo·4d

just booted up my terminal and it looks like google is doing the google thing again lol. RIP Gemini CLI 🪦. apparently we’re all migrating to "Antigravity CLI" for multi-agent stuff now. gotta switch by June 18th before it breaks. anyone tried it yet?

English

100

Ish@DecisionTree_gg·2d

@max_trigify @NousResearch qwen 3.7 has been solid on tool-use reasoning in general from what i've seen on benchmarks — curious how it handles long context with Hermes specifically. that's where i'd guess the gap vs opus would show up first

English

Max Mitcham@max_trigify·3d

Testing Qwen 3.7 for my @NousResearch Hermes agent. Anyone tried it? So far looks pretty nice and similar to my Opus experience..

English

Ish@DecisionTree_gg·2d

@Rananjay_RajW @ClaudeCodeLog haven't tried CLAUDE_CODE_WORKFLOWS=1 yet but been wanting to. curious how you handle state passing — when a downstream agent needs an upstream artifact, does the workflow tool pass it or do you have to materialize somewhere? feels like that's where the determinism leaks

English

Rananjay Raj@Rananjay_RajW·3d

@ClaudeCodeLog The workflow tool is the one I've been waiting for. Deterministic multi-agent orchestration is the missing piece for production use - right now most setups are one-shot or loosely chained. Still testing CLAUDE_CODE_WORKFLOWS=1 in practice. Anyone tried it yet?

English

799

Claude Code Changelog@ClaudeCodeLog·4d

Claude Code 2.1.147 has been released. 35 CLI changes Highlights: • Workflow tool added for deterministic multi-agent orchestration; off by default, set CLAUDE_CODE_WORKFLOWS=1 • /simplify→/code-review renamed; flags correctness bugs at effort level, can post inline GitHub PR comments • REPL and Workflow sandboxes hardened against prototype-pollution and thenable escapes, cutting escape risk Complete details in thread ↓

English

513

106.5K

Keşfet

@isaac_ts_way @vincentdesmet @guilhermeotina @yoheinakajima @1stOrator @TimJayas @1clawAI @cyberwhisperr