Sabitlenmiş Tweet
Sol Irvine
8.2K posts

Sol Irvine
@solirvine
mostly harmless • building https://t.co/5W0eJiJlRS and https://t.co/43ZMkKkMRa
kyoto, japan Katılım Mayıs 2010
1.4K Takip Edilen922 Takipçiler

For me, the revelation from Mike is how *thin* the underlying product is.
1. Chat = chat with your docs.
2. Project = folder-scoped chat.
3. Tabular Review = projects x prompts.
4. Workflow = prompts.
Credit to @willchen500 for packaging and framing it brilliantly.
English

@T1000_V2 Works very well for my purposes, but it depends on the models you use, the prompts you give them, and the type of contract. Using GPT 5.4 mini on typical commercial contracts, I'm impressed.
English

My new app wargame.esq pits two agents against each other in a contract negotiation. Each agent reviews the contract. They assemble a shared issues list. Then they negotiate each point, showing their internal reasoning and back-and-forth in real time.

English

@afterlanie The agents already adopt the side they represent and pursue its interests. Before the negotiation starts, you can steer them in terms of their approach, e.g., "be conciliatory, but concede anything that will be onerous or unreasonably limiting for us".
English

@solirvine have you experimented with each agent taking separate tactics to compare the outcome of the contract?
English

@aref_vc Most routine commercial contracts are < 100 pages. The app only negotiates one contract at a time, and uses a markdown extraction, so context windows aren't really an issue. Newer frontier models do a good job of caching, which keeps costs reasonable, too.
English

Love it. I'm wondering about performance-wise and speed of execution, especially with context coherence and taking dozens, if not hundreds, of pages at the same time, including parsing capabilities and being faithful to the control guard rails to remove any possible hallucination. I've seen a lot of variability and variance between already frontier models, commercial ones, but also open-source ones. At least running this locally, you can see the trend and the requirements to make it handle a lot of H cases, especially if it's fintech, legal, or real estate legal, etc. Those are things a mix of expert-type models would bring a bit more depth, but if you choose to work with a more generalist model, things could be trickier.
English

@aref_vc I have another product augustus.esq that can evaluate the negotiated draft from a neutral perspective. I did consider integrating a neutral pass at the end, but never got around to it. I like the idea of establishing some persistence/reputation for the agent over time.
English

This is awesome with the council framework to push back, etc. What's the final line of judgment? Who basically confirms? Is that basically the end user, or is there an LLM as a judge on top of it?
I did run a few experiments recently on top of this. Initially, instead of a council, we do a bidding approach where the LLMs pick based on their confidence in solving or winning the negotiation. There is a reputation token and a budget loaded into that conversation. If you win the conversation or the negotiation, you get more points and more reputation, which helps you win more deals.
There is an LLM as a judge that comes later on to cover these. It could be at the close level, or at the full agreement level. There are different angles to it, with pros and cons in terms of where it's best fit and where the maximum impact surface is taking place.
English

@futureproof_amy To me, this tool is a more robust expression of the analysis that I do when confronted with a contract. Anticipate the issues and arguments, get a sense of a reasonable middle ground, etc.
English

@futureproof_amy "Law is not to be gamified."
To me, the AI-generated output is only useful as a benchmark for:
- Which issues are raised.
- The arguments in both directions.
- Which compromises are reached.
- The rationale for concessions, etc.
English

@cmiller11101 It was built as an internal tool initially. I'll release soon. Follow me or @wargame_esq and I'll post there.
English

@solirvine It seems this doesn't work yet? All I see is your landing page showing a brief text intro.

English

@BaricJohnpaul I built it for myself initially, but given the response I'll release something in the next week or so.
English

@horadrimsage I know you're kidding, but we did have to implement an (adjustable) cap on turns to protect against cycles/digressions. Depending on which models you use, it might cost you some fraction of a junior associate's hourly rate. ;)
English

@solirvine Do you have a NY biglaw mode where the models just keep redlining with the comment "this is standard" and billing +2k/hr for weeks straight?
English

@flseeh There’s a short interview at the start. Which party(ies) do you represent? Whose draft is it? Open-ended inputs for: (1) your goals, constraints, etc. and (2) context about the counterparty—e.g., big company, inflexible, needs revenue, etc.
English

@solirvine Super interesting! How does the briefing of agents work? How would they know what’s my red line? Thanks for sharing your thoughts :)
English

@kourouklides 1. The transcript of the negotiation is a very good roadmap for what to expect in the real world. Not only which issues are likely to emerge, but also the arguments in both directions.
2. The redline is helpful for baselining a compromise.
3. Memo is a useful executive summary.
English

@solirvine What is the point of this?
What is the hypothesis and what are the (preliminary) results?
Are you just polluting the public square with more noise?
English

@gavinyerxa I haven’t seen that issue in my tests. I find the edits are targeted well with anything above Haiku/GPT5.4-mini. Where does this happen for you?
English

@solirvine very cool! Have you been able to get the agents to avoid the over editing problem (where instead of making a one word change they replace an entire paragraph)? Does two agents negotiating against each other help there?
English

@originalmagneto Using OpenAI or Anthropic via API. I view the API as just another SaaS with access to sensitive files. You?
English

@solirvine What’s the model? Is it using API? How is your stance on Cloud Act, GDPR and data residency and ZDR?
English

@andrewarruda Thanks, Andrew! Still early going for this app, but it's already much more useful than the standard "chat with your docs" template.
English






