Sol Irvine

8.2K posts

Sol Irvine

@solirvine

mostly harmless • building https://t.co/5W0eJiJlRS and https://t.co/43ZMkKkMRa

kyoto, japan Katılım Mayıs 2010

1.4K Takip Edilen922 Takipçiler

Sabitlenmiş Tweet

Sol Irvine@solirvine·14 Eyl

東本願寺

日本語

1.4K

Sol Irvine@solirvine·6h

Anthropic and OpenAI already built all these features for coders. Their only remaining obstacle is porting those features to the (horrible, awful, broken) file formats, docx and pdf.

English

Sol Irvine@solirvine·6h

What's still missing: - Plan, refine, implement loops - Delegation to task-oriented agents - Inline human-in-the-loop approvals & clarifications - Branching/forking, versioning - Structured memory at the project/user/client/practice/firm levels - QA reviews, red-teaming, linting

English

Sol Irvine@solirvine·6h

For me, the revelation from Mike is how *thin* the underlying product is. 1. Chat = chat with your docs. 2. Project = folder-scoped chat. 3. Tabular Review = projects x prompts. 4. Workflow = prompts. Credit to @willchen500 for packaging and framing it brilliantly.

English

Sol Irvine@solirvine·7h

@T1000_V2 Works very well for my purposes, but it depends on the models you use, the prompts you give them, and the type of contract. Using GPT 5.4 mini on typical commercial contracts, I'm impressed.

English

T1000@T1000_V2·13h

@solirvine How does it perform?

English

Sol Irvine@solirvine·1d

My new app wargame.esq pits two agents against each other in a contract negotiation. Each agent reviews the contract. They assemble a shared issues list. Then they negotiate each point, showing their internal reasoning and back-and-forth in real time.

English

558

184.1K

Sol Irvine@solirvine·7h

@afterlanie The agents already adopt the side they represent and pursue its interests. Before the negotiation starts, you can steer them in terms of their approach, e.g., "be conciliatory, but concede anything that will be onerous or unreasonably limiting for us".

English

Lāniē@afterlanie·21h

@solirvine have you experimented with each agent taking separate tactics to compare the outcome of the contract?

English

150

Sol Irvine@solirvine·10h

@aref_vc Most routine commercial contracts are < 100 pages. The app only negotiates one contract at a time, and uses a markdown extraction, so context windows aren't really an issue. Newer frontier models do a good job of caching, which keeps costs reasonable, too.

English

❈Aref❈@aref_vc·11h

Love it. I'm wondering about performance-wise and speed of execution, especially with context coherence and taking dozens, if not hundreds, of pages at the same time, including parsing capabilities and being faithful to the control guard rails to remove any possible hallucination. I've seen a lot of variability and variance between already frontier models, commercial ones, but also open-source ones. At least running this locally, you can see the trend and the requirements to make it handle a lot of H cases, especially if it's fintech, legal, or real estate legal, etc. Those are things a mix of expert-type models would bring a bit more depth, but if you choose to work with a more generalist model, things could be trickier.

English

Sol Irvine@solirvine·11h

@aref_vc I have another product augustus.esq that can evaluate the negotiated draft from a neutral perspective. I did consider integrating a neutral pass at the end, but never got around to it. I like the idea of establishing some persistence/reputation for the agent over time.

English

114

❈Aref❈@aref_vc·11h

This is awesome with the council framework to push back, etc. What's the final line of judgment? Who basically confirms? Is that basically the end user, or is there an LLM as a judge on top of it? I did run a few experiments recently on top of this. Initially, instead of a council, we do a bidding approach where the LLMs pick based on their confidence in solving or winning the negotiation. There is a reputation token and a budget loaded into that conversation. If you win the conversation or the negotiation, you get more points and more reputation, which helps you win more deals. There is an LLM as a judge that comes later on to cover these. It could be at the close level, or at the full agreement level. There are different angles to it, with pros and cons in terms of where it's best fit and where the maximum impact surface is taking place.

English

151

Sol Irvine@solirvine·13h

@futureproof_amy To me, this tool is a more robust expression of the analysis that I do when confronted with a contract. Anticipate the issues and arguments, get a sense of a reasonable middle ground, etc.

English

Sol Irvine@solirvine·13h

@futureproof_amy "Law is not to be gamified." To me, the AI-generated output is only useful as a benchmark for: - Which issues are raised. - The arguments in both directions. - Which compromises are reached. - The rationale for concessions, etc.

English

462

Sol Irvine@solirvine·13h

@cmiller11101 It was built as an internal tool initially. I'll release soon. Follow me or @wargame_esq and I'll post there.

English

126

闵魁偲@cmiller11101·22h

@solirvine It seems this doesn't work yet? All I see is your landing page showing a brief text intro.

English

424

Sol Irvine@solirvine·13h

@L1AD Stay tuned. (Follow @wargame_esq)

English

Liad Shababo@L1AD·18h

@solirvine That's such a tease, let us play with it.

English

156

Sol Irvine@solirvine·13h

@p_dove Yes, we have a (very rudimentary) version of this.

English

349

Paloma A.@p_dove·21h

This is so interesting, would be especially helpful for a negotiation against a new/unknown counterparty. For known counterparties (or at least, people your colleagues have told you about) it'd be cool to be able to input their well-known bugaboos to see how it hashes out with some tuning.

English

745

Sol Irvine@solirvine·13h

@BaricJohnpaul I built it for myself initially, but given the response I'll release something in the next week or so.

English

109

JohnPaul Baric@BaricJohnpaul·16h

@solirvine Will you be open sourcing this?

English

154

Sol Irvine@solirvine·13h

@horadrimsage I know you're kidding, but we did have to implement an (adjustable) cap on turns to protect against cycles/digressions. Depending on which models you use, it might cost you some fraction of a junior associate's hourly rate. ;)

English

216

Drew Jenkins@horadrimsage·14h

@solirvine Do you have a NY biglaw mode where the models just keep redlining with the comment "this is standard" and billing +2k/hr for weeks straight?

English

286

Sol Irvine@solirvine·23h

@flseeh There’s a short interview at the start. Which party(ies) do you represent? Whose draft is it? Open-ended inputs for: (1) your goals, constraints, etc. and (2) context about the counterparty—e.g., big company, inflexible, needs revenue, etc.

English

1.4K

Florian Seeh@flseeh·23h

@solirvine Super interesting! How does the briefing of agents work? How would they know what’s my red line? Thanks for sharing your thoughts :)

English

1.7K

Sol Irvine@solirvine·1d

@kourouklides 1. The transcript of the negotiation is a very good roadmap for what to expect in the real world. Not only which issues are likely to emerge, but also the arguments in both directions. 2. The redline is helpful for baselining a compromise. 3. Memo is a useful executive summary.

English

973

Ioannis — AI/acc 🇨🇾 🏴‍☠️ 🥩@kourouklides·1d

@solirvine What is the point of this? What is the hypothesis and what are the (preliminary) results? Are you just polluting the public square with more noise?

English

1.2K

Sol Irvine@solirvine·1d

@gavinyerxa I haven’t seen that issue in my tests. I find the edits are targeted well with anything above Haiku/GPT5.4-mini. Where does this happen for you?

English

2.1K

gavin yerxa@gavinyerxa·1d

@solirvine very cool! Have you been able to get the agents to avoid the over editing problem (where instead of making a one word change they replace an entire paragraph)? Does two agents negotiating against each other help there?

English

2.8K

Sol Irvine@solirvine·1d

@originalmagneto Using OpenAI or Anthropic via API. I view the API as just another SaaS with access to sensitive files. You?

English

Majo@originalmagneto·1d

@solirvine What’s the model? Is it using API? How is your stance on Cloud Act, GDPR and data residency and ZDR?

English

3.4K

Sol Irvine@solirvine·1d

@andrewarruda Thanks, Andrew! Still early going for this app, but it's already much more useful than the standard "chat with your docs" template.

English

andrew arruda@andrewarruda·1d

@solirvine very cool work sol

English

Keşfet

@willchen500 @T1000_V2 @afterlanie @aref_vc @futureproof_amy @cmiller11101 @wargame_esq @L1AD