Elliot Vaucher

511 posts

Elliot Vaucher

@ElliotVaucher

❤️ Intelligence : artificial or not.

Opinions are my own Katılım Kasım 2018

486 Takip Edilen82 Takipçiler

Sabitlenmiş Tweet

Elliot Vaucher@ElliotVaucher·20 Oca

LLMs are thought to code transpilers

English

155

Elliot Vaucher@ElliotVaucher·2h

@solirvine Totally get your point. Mine was only to say that in terms of OSS around LLMs there was plenty of far more advanced shells than mikeoss for people to build upon in the direction of lawyers needs (which you of course as a lawyer probably understand far better than me 🙂)

English

Sol Irvine@solirvine·2h

@ElliotVaucher Also, general web search is basically a flip of the switch on frontier APIs. More specialized search, which is probably more apt for legal users, might require API keys and more involved integration. That's what forks are for.

English

Elliot Vaucher@ElliotVaucher·21h

While I do respect the approach, and agree on a lot of what you are highlighting in the legaltech industry (the money raised is mostly non sense), I think it’s a bit unrealistic to compare two weeks (could be an afternoon) of codex tokens thrown at a web app, that doesn’t even implement web search, to Linux now, isn’t it ? 🙃

WillC@willchen500

I was interviewed yesterday by The Australian Financial Review on Mike. The article notes the game-changing impact that Mike has had on the legaltech industry, less than a week after release. The article also features the first public comment from Harvey. Their spokesperson stated that "big proprietary platforms such as Harvey remained best placed to meet the intensive technology needs of law firms, including robust data security, around-the-clock support and access to a range of large language models." I agree on the security and service requirements of big law firms. But let me address the "proprietary" bit. A piece of software being private rather than open source does not equate to having technological moat, nor does it mean that it is secure. The idea that private = secure and open source = insecure is a pretty widespread misconception in the legal industry. Some of most secure and robust software in the world used by everyone, like Linux, is open source. Any modern piece of software is built upon the foundation laid by open source libraries, and that includes Harvey. It is precisely because of open source communities that we have these public goods. Article: This ‘game changer’ free app could blow up the $23b legal AI sector afr.com/companies/prof…

English

825

Elliot Vaucher@ElliotVaucher·2h

« Lawyers » in itself is too broad a category. A product that is not able to identify areas of law already shows its immaturity, IMO. But, if for the sake of the question we considered « lawyers » as a consolidated entity, I would start with reliably parsing 10GB of unstructured, messy and versioned files, including non OCR-ised PDFs, to extract pinpoint accuracy information, for instance ?

English

victor@stokebuilder·4h

What features does it obviously lack that lawyers need? I’m genuinely curious, you started this discussion questioning features and then citing non-purpose built open source projects as examples of products better suited for lawyers. Perhaps I might be deeply missing something profound about the legal customer needs.

English

Elliot Vaucher@ElliotVaucher·3h

Web search (robustly used, not randomly polluting context) is about refreshing llm context with post training data. For lawyers there are tons of important data for day to day tasks that are accessible on the web (newspapers, case law, updates on target clients, commercial registries, and so on).

English

Sol Irvine@solirvine·3h

@ElliotVaucher Serious question: what place does a web search tool call have in a contract review/drafting platform? Personally, bringing the noise of web results into the mix is near the bottom of my list of missing features.

English

Elliot Vaucher@ElliotVaucher·4h

You’re making up a recursive argument. We either assume Harvey and Legora are good products and their product market fit is good and the whole mikeoss point is BS (cause we can’t reasonably say a chat interface with document upload justifies billions of revenue), or we agree with mikeoss they are overrated and we must then find real ways to bring added value to lawyers via real tangible features (which mikeoss currently obviously lacks). I believe option 2 to be correct. What are you even defending at this point ?

English

victor@stokebuilder·4h

@ElliotVaucher MikeOss is a clone of Harvey and Legora. Perhaps they also don’t have product market fit with lawyers? Because their features seemed to have amassed 9 figures in recurring revenue.

English

Elliot Vaucher@ElliotVaucher·5h

@matt_ambrogi @winstonweinberg @harvey Totally agree. I’ve created this PR to add bring your own harness logic github.com/harveyai/harve… It’s inspired by the bankertoolbench benchmark which has native harness plugin capabilities.

English

Matt Ambrogi@matt_ambrogi·18h

My deep-dive analysis of @harvey's new Legal Agent Benchmark: Model Evaluation - First: this is a *model*, not a harness benchmark. The harness is very simple. No special system prompt for legal. Standard bash, read, write, edit, glob, grep tools. A few skills for dealing with files. - This is a tricky design decision. You want to isolate model evaluation. But if the harness sways too from what you would actually use in production, eval results may not carry over. I think wise overall to simplify. Tasks and Evaluation - All tasks are one turn. No compaction or context engineering built into harness. Simplicity of single turn arguably a feature as a starting point even if in real world users are likely to ask follow ups and refinements. - Evaluation criteria very interesting and well designed. All judgement is put into detailed criteria sets per task. Effectively unit tests. I.e. "Pass if memo identifies inconsistent publication count, Fail if not". The judge itself is dumb. Takes final input and criteria and returns pass fail - A task only passes if all criteria pass. Makes sense for legal work. But there is post run visibility to see Task N passed 18/20 criteria etc. - Most notable here: the benchmark's quality is capped by the task criteria text. Poorly specified or missing criteria could tank the trustworthiness of the entire benchmark. Presumably they had heavy expert input on these criteria. Environment Accuracy - The benchmark is high-quality but small scale. This is a big area for improvement imo. But its tremendously hard to build accurate synthetic legal matters at scale. - Each task is based in a matter (court case). The matters have documents, emails, spreadsheets, and power-points. - Docs per matter median: 7, P95: 14. This is much smaller than in real world. Emails even worse. Total token size per matter ~= 60k median, 120k P90. Again very small. - That being said the content is extremely high quality. This is actually much more important than total size anyways for this use case. After a threshold you get into harness, not model evalution. - But there is a local maximum risk. This tests whether a model has strong built in legal knowledge work capabilities. It does not test a model's ability to search and synthesize huge amounts of data, which is equally important in law. Engineering tricks - Everything is parallelized within reason (caps to avoid rate limits). - Streaming utilized to prevent timeouts - Secure sandbox document parsing implementation - Overall very well designed. Few small things would be nice to add, for example, if agent stops, reason is not logged right now (context limit hit? timeout? failure?). Utility - The most practical application of LAB is for model evaluation on legal knowledge work - However, you could also repurpose this benchmark as a means of benchmarking different harnesses. One might keep the model constant and instead iterate on the harness to a get an idea of what matters in legal. To make this really robust it would be important to have some matters with real-world scale context. Some things harness engineers might experiment with: - Vectorize all documents and give agent a semantic search tool - Legal specific system prompt - Encouragement to use grep in parallel to search documents without reading entire file into context - Compare performance of embedding based rag vs just grep - Pre-load short summaries of each doc in context - Introducing subagent spawning to read docs in separate context - Cross reference resolution prompting or tool ("as defined in Section 3.2..") - Code interpreter to handle xlxs files But again this is not meant to be a harness benchmark. Overall this is a very high quality benchmark. It is much harder to get together a high quality environment of underlying data, tasks, and expected outputs in knowledge / legal work than it is for coding. The design decisions around judging are very smart. I think this will be enormously useful for the legal AI community.

Gabe Pereyra@gabepereyra

x.com/i/article/2051…

English

4.5K

Elliot Vaucher@ElliotVaucher·5h

@stokebuilder I understand what your saying, but again, how do you define product market fit in this context ? What are the features that align Mikeoss to the lawyers market ? It’s a chat and a tabular view. My question is not rhetorical.

English

victor@stokebuilder·5h

@ElliotVaucher Do any of them have product market fit with lawyers? This isn’t a gotcha question. If your argument is there are other more powerful open source LLM projects, sure, congrats, you win the argument.

English

Elliot Vaucher@ElliotVaucher·5h

What does « for legal » mean for you ? It’s just routes over llm providers and tooling that allow for context management. The « community » could make a better legal competitor to Harvey in days by creating what is commonly called « workflows » (just a bunch of text files) over #goose, for instance. What do mean by for legal ?

English

victor@stokebuilder·5h

@ElliotVaucher Are any of these for legal?

English

Elliot Vaucher@ElliotVaucher·6h

@stokebuilder There are already tons of open source projects with a similar « chat UI » approach that are far more advanced than Mikeoss, offering more advanced features. #Anythingllm, #goose, #5ire. Seems strange to me that we would want to take a step back.

English

victor@stokebuilder·13h

Let’s see in 2 months, 2 quarters, 2 years. To put in perspective, there’s already hundreds of forks with updates to the main branch that address many of the pitfalls of the original repo. The community pushes this one forward in a decentralized manner, and the inherent benefits take time to diffuse.

English

Elliot Vaucher@ElliotVaucher·15h

@ecommerceshares If AI race means giving millions to Jude Law so he can put his face on a wrapper while our industry is still busy making the precision parts that allow nvidia to produce their GPUs, then yeah, we’re out of the race.

English

261

Wasteland Capital@ecommerceshares·1d

Why has Europe decided to completely stay out of artificial intelligence race? Serious question. They’re not even trying. I really don’t get it.

English

2.8K

202

6.9K

1.1M

Elliot Vaucher@ElliotVaucher·15h

Thanks for your honest and interesting reply. I’ve never had the chance to actually test Harvey or Legora. I would have assumed they had worked on improving specific long running agentic tasks by means of custom scaffolding and enhancements on existing harnesses. But you seem to know their platforms far better than me. Good luck with your next steps I’m looking forward to see your next moves 🙂

English

WillC@willchen500·16h

@ElliotVaucher Just making the broader point. Yes it’s far from linux. Mike is a proof of concept. It is vibe coded and simple just like Harvey and Legora. If you try both on you’ll instantly recognise them as vibe coded apps and they don’t work very well either

English

427

Elliot Vaucher retweetledi

Alp ICT@Alpict·1d

💡 IA : un défi stratégique & organisationnel 🎯 Depuis l’émergence des premiers #LLM grand public courant 2025, beaucoup ont vécu ce moment eurêka: celui où la machine n'est plus seulement une vitrine alléchante, mais un outil bel et bien opérationnel, révélateur d'opportunités métiers, stagiaires survitaminé, parfois même proche du jeune prodige... Le plus étonnant dans ce point de bascule, probablement un des aspects qui contribue d'ailleurs le plus à crédibiliser la technologie, c'est qu'il se répète. #ClaudeCode ou #DeepSeek encore tout récemment, cette dynamique illustre une réalité d'adoption qui fait aussi basculer les standards selon lesquels les technologies numériques s'intégraient jusqu'à maintenant au sein des organisations. C'est précisément et entre autres sujets, cette réalité et la stratégie qui s'y adosse, dont est venue nous parler @ElliotVaucher, CEO d'ogram | intelligence, dans un de nos derniers épisodes #TechTalk, à retrouver en intégralité sur la chaîne YouTube d’Alp ICT: youtu.be/3o3t74YX3j4 #SwissInnovation #SwissTech #TransformationDigitale #booba

YouTube

Français

Elliot Vaucher@ElliotVaucher·1d

@winstonweinberg Amazing. Thanks for the effort. Are you planning on allowing a « bring your own harness » logic ? Currently the agent logic is quite constrained. 🙏

English

Winston Weinberg@winstonweinberg·1d

Excited to announce the open-source release of our Legal Agent Benchmark (LAB). LAB provides a framework for evaluating legal agents’ ability to operate over synthetic client matters and convert partner-level instructions into completed work product. Our initial release covers 1,200 tasks across 24 practice areas validated by 75,000 expert rubric criteria. Benchmark results are coming soon, built in partnership with labs, cloud and chip companies, academics, and the legal tech community. We’d love to hear from anyone interested in contributing!

Gabe Pereyra@gabepereyra

x.com/i/article/2051…

English

12.9K

Elliot Vaucher@ElliotVaucher·2d

We had orange pilled Now we probably gonna have codex pilled

English

Elliot Vaucher retweetledi

okazakitomohiro@oo_kk_aa·2d

ニャッキの伊藤有壱さんにお声掛け頂き、コマ撮りの展覧会に一作家として参加しています。私はコマ撮り分野ではない場所から活動をはじめて、デザインの視点でのコマ撮りに取り組んできましたが、今回初めてコマ撮り界の本丸の方々とご一緒でき嬉しいです。今6年目のマッチ撮影素材等を展示しています

日本語

522

27.2K

123.4K

4.9M

Elliot Vaucher@ElliotVaucher·2d

@rough__sea You mean, from codex

English

Ryan Dahl@rough__sea·2d

i expect almost all software is about to be rewritten from scratch

English

128

1.4K

117.6K

Elliot Vaucher@ElliotVaucher·3d

« [Jim] Simons would understand what you’re building. He might even find it interesting » Thank you Claude. Now I’m delusional.

English

Elliot Vaucher retweetledi

jason liu@jxnlco·3d

Open

Nederlands

455

60.9K

Elliot Vaucher@ElliotVaucher·3d

The output of LLMs is not deterministic. The output of a computer controlled by a LLM is.

English

Elliot Vaucher@ElliotVaucher·4d

Interesting indeed. Something else I’ve identified in the last 2-3 years of building AI solutions for lawyers here in Switzerland is that people often underestimate the diversity of work lawyers are specialised in, from one expertise to another. It is in fact a profession where specialisation is key. This is where most AI legal tech platforms fail. They can’t go deep enough to be taken seriously.

English

148

Brian Burns@brian_a_burns·4d

Great essay on the nuances of GTM in legaltech by @heyitsalexsu: "It is genuinely difficult to identify a company that took a traditional SaaS GTM motion and won in core legal work." alexofftherecord.com/p/credibility-…

English

7.4K

Keşfet

@solirvine @matt_ambrogi @winstonweinberg @harvey @stokebuilder @ecommerceshares @elonmusk @BarackObama