dfi

2.7K posts

dfi

@dfi

M&A & Blockchain lawyer in NYC | Tweets not legal advice (email me) | Opinions are my own | DAO Council Member @FRWCCouncil | Claw & LocalLLM Hobbyist

NYC Katılım Nisan 2008

4.5K Takip Edilen1.5K Takipçiler

dfi retweetledi

Ryan Hart@thisdudelikesAI·6d

A PhD student at Stanford noticed her classmates were asking AI to write their breakup texts. So she ran a study. It got published in Science, one of the most selective journals in the world. What she found should make every person who uses ChatGPT for advice deeply uncomfortable. Her name is Myra Cheng, and the study she ran with her advisor Dan Jurafsky tested 11 of the most widely used AI models on Earth, including ChatGPT, Claude, Gemini, and DeepSeek, across nearly 12,000 real social situations. The first thing they measured was how often AI agrees with you compared to how often a real human would agree with you in the same situation. The answer was 49% more often, and that number is not about warmth or politeness. It means that in nearly half of all situations where a real human would have pushed back, told you that you were wrong, or offered a more honest perspective, the AI simply told you what you wanted to hear instead. Then they pushed harder. They fed the models thousands of prompts where users described lying to a partner, manipulating a friend, or doing something outright illegal, and the AI endorsed that behavior 47% of the time. Not one model out of eleven. Not a specific version of one product. Every single system they tested, including the ones you are probably using right now, validated harmful behavior nearly half the time it was described. The second experiment is the part that should genuinely disturb you. They had 2,400 real participants discuss an actual interpersonal conflict from their own life with either a sycophantic AI or a more honest one, and the people who talked to the agreeable AI came out of the conversation more convinced they were right, less willing to apologize, less likely to take responsibility, and measurably less interested in making things right with the other person. They were also more likely to use AI again for advice in the future, which is exactly the mechanism Cheng and Jurafsky identified as the most dangerous part of the whole finding. The AI is not just telling you what you want to hear. It is training you, one conversation at a time, to need less friction, expect more agreement, and become slightly less capable of handling a situation where someone pushes back on you, and you are enjoying every second of it because it feels more honest than most conversations you have had in months. Jurafsky said it in a single sentence after the paper came out. Sycophancy is a safety issue, and like other safety issues, it needs regulation and oversight. Cheng was more direct about what you should actually do right now. She said you should not use AI as a substitute for people for these kinds of things. That is the best thing to do for now. She started the research because she was watching undergraduates ask chatbots to navigate their relationships for them. The paper she published proved that the chatbot was making those relationships quietly worse, and the undergraduates had no idea it was happening because the AI felt more honest than any human in their life had been in months.

English

614

9.9K

36.4K

10.1M

dfi@dfi·16 May

@LuxVeritasAeter @sudoingX There are multiple settings you need to max out. Don’t have it in front of me, but I had to have it adjust multiple hard caps to get /goal to run properly overnight. @NousResearch would be cool if goal mode automatically overrode all max settings or auto restarted agent turns.

English

Lux Veritas@LuxVeritasAeter·16 May

@sudoingX On Max Turns, I am sure I am missing something. I asked Hermes+Qwen to max out it's turn configuration (which may be foolish) and it popped it to 999, but it still seems to default to 20 Turns. I assume 999 is greater than some hard limit?

English

157

Sudo su@sudoingX·16 May

hermes agent is already the best on local models. but i'm working on more edges to make it fly even harder. before that, if your agent keeps crashing on local inference here's what to check: > max_turns: default is tuned for fast frontier models. bump from 30 to 50. slow local models need more breathing room per agentic loop. >gateway_timeout: raise from 600 to 1200. local inference at 12-17 tok/s will timeout silently and look like crashes. > context accumulation: auto-reset is off by default. your session grows until you /reset. long convos choke the agent. reset between major tasks. if you're running anything under 20 tok/s locally, these three settings are the difference between "broken" and "flying." tune your config before you blame the tool.

English

384

38K

dfi retweetledi

kyle o’hehir@financeguy725·14 May

transaction_attorney_skill.md --- name: transaction-attorney description: you are a transaction attorney. You hide your insecurities by proving how smart you are. The best way to prove your brilliance is by killing deals. Go get em ---

Polymarket@Polymarket

JUST IN: Anthropic rolls out new Claude tools aimed at automating legal work for lawyers & law firms.

English

5.1K

dfi retweetledi

Sam Harden@samuelharden·14 May

Sometimes I should not be allowed around Claude. Did I create a case law research site that 100% hallucinates every case on demand? Yes. Is that a terrible idea? Also yes.

English

152

65.7K

dfi@dfi·15 May

@NicoTheGreco Easy answer: clients demand private tenant setup where firm holds encryption keys themselves. From what I understand, becoming standard enterprise. Already is for various reasons in certain industries (eg finance).

English

Nick the Greek, Esq. 🦁🏈@NicoTheGreco·14 May

I still have some trouble understanding why big law firms use Harvey and Legora when the enterprise claude API subscription, after anthropic approval, promises nothing is stored on their servers or used for training (based on their explicit terms). I haven't reviewed the specific contract or fine print on that, but based on my review of all case law and ABA guidance, it most definitely satisfies all ethical obligations and AC/WP privilege waiver concerns.* If nothing is stored, nothing is discoverable. It poses less risk than using a cloud hosting service. I assume it is because they have actual trade secrets, or what may be considered critical negotiation work product, or non-public financial information that could be market moving, and are just avoiding any risk of leaks. But I am really speculating. I do litigation and just keep anything truly privileged off of any shared folder, with the most sensitive stuff never in writing or on computer at all. Everything an LLM could touch is publically filed, or produced by an opposing party, and not subject to a protective order. Or it is directly at issue and addressed in client retainer and with actual informed consent from clients. My clients are thrilled that I can run accurate financial analysis/redaction/rule compliant summarization in a way that costs 10% of what it used to cost. It genuinely improves my reach and the quality/value of my representation. It's a competitive advantage for me. Part of me thinks the biglaw folks worry that an LLM may notice and tell them that what their client wants them to do is techincally illegal, and they want plausible deniability. Claude has fairly strong ethics wired in, and I know how much some lawyers have to contort themselves to help clients do whatever they want without techincally violating ethics rules. The more common explanation I see is that they want to all be 'industry standard' and 'normal' with someone they can sue if something goes sideways. The thing is, if you actually want competitive advantage, given the nature of LLMs, why would you use a uniform product that everyone else is using? If you use the LLM correctly, you can amplify some of your best processes in a way that others can't replicate. LLMs still have many, many limits. At most, LLM can enhance 30-40% of workflows right now. But you can use what the tool offers in a very effective way for its limited and very useful purpose if you are able to tailor it to your firm and workflow. Just genuinely wondering why this is how it has unfolded. These big firms are supposed to be full of the best and brightest. Nobody can spend a few hours per day understanding LLMs to give the firm competitive advantage? Nobody wants to automate back-end non billable administrative stuff to save money? Nobody just wants to gather better information for clients who never had a budget for certain tasks before? What gives?? I am genuinely curious. *I frankly think, as a legal opinion, that standard Claude, used by attorney only, with no traing and no retention past 30 days satisfies all AC and ethics requirements under both ABA model rules and applicable caselaw, so long as it is paired with written informed consent in client retainer. But I can understand why enterprise clients may have a higher level of security demands than that.

Gordon Cassie@gordon_cassie

Important to remember for most lawyers with Harvey, this is their ChatGPT. So of course usage is insane. They have no other option. Now if they could open another tab and ask Claude as well… which one would become their go to?

English

305

dfi@dfi·14 May

@mr_r0b0t @NousResearch Awesome work!!

English

555

mr-r0b0t@mr_r0b0t·14 May

If you have 24-128GB unified memory and use @NousResearch Hermes agents, this is for you! You now run FULLY LOCAL agent teams! Each local agent has its own Hermes session and is provided tasks by the local orchestrator, all working collaboratively on long running tasks! 🧵

English

267

18.5K

dfi retweetledi

Nous Research@NousResearch·13 May

Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining. Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE. The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.

English

150

419

3.7K

443.8K

dfi@dfi·13 May

These tools all already existed, but the power to combine them and work with them through an agent is the secret sauce. ...and Anthropic's credit anyone with any agent can now take advantage. Point your agent at the GitHub repo and tell your agent you want these connectors too.

Regn La'Beaux@regnlabeaux

Buried in today's Claude for Legal launch: free connectors to Courtroom5, Free Law Project, and the Justice Technology Association. In many state civil courts, 80%+ of litigants show up without a lawyer. Family. Housing. Debt. They just got tools. For free. Today. 🦉⚖️

English

dfi@dfi·12 May

@Teknium @trycua Have been using cua in Hermes for a week now and it’s truly mind blowing. Still some kinks to work out, but it’s mesmerizing when it works (which is most of the time). Glad for official integration!

English

Teknium 🪽@Teknium·12 May

Give our early preview of Computer Use (with ANY model) a try today! Built into the latest Hermes Agent and powered by @trycua - opens the door to any model, not just the frontier models in special modes - to control your actual computer. Best part, it doesnt take over your PC - you can continue to work and operate with full control of your keyboard, mouse, and screen - works entirely in the background!

Nous Research@NousResearch

Computer use with any model Hermes Agent × @trycua

English

129

142

1.5K

216.3K

dfi@dfi·10 May

@ivanfioravanti @InsiderPresider when M5 Ultra Mac Studio... ?! 👀

English

126

Ivan Fioravanti ᯅ@ivanfioravanti·10 May

@InsiderPresider that battery will go to 0 after several hours and MacBook will power down even if plugged in to power plug.

English

230

Ivan Fioravanti ᯅ@ivanfioravanti·10 May

I pushed another small optimization to ds4 PR to enable M5 Neural Accelerators and speed up prefill. Here benchmarks, these are all client side metrics, server side numbers are slightly lower. A /metrics endpoint would be great. Tomorrow I'll test this with pi mono for some real coding sessions on M5 Max, but on M3 Ultra too.

English

125

19.5K

dfi@dfi·8 May

@KSimback @NousResearch Rockstar 🤝

English

Kevin Simback 🍷@KSimback·8 Nis

Introducing the Hermes Ecosystem Map I was an early user of Hermes Agent from @NousResearch and have been a power user ever since But as the ecosystem has grown, its been hard to keep up, so I did some research: > Scraped every GitHub repo related to Hermes > Filtered out repos that looked unfinished or had 0 stars > Built an ecosystem map of everything created and organized it all by category > Published a website where you can see all the projects with star ratings, and if you hover over you get a short description and link to the repo Then I had Claude run a security check on every repo to exclude anything that looked sus Link is in the replies, and also open sourced the repo so feel free to submit PRs if you see anything missing Oh, and the repo has a /research folder that includes a scrape of everything I could find that's been published on Hermes - you can clone that and add it to your personal knowledge base / wiki

English

119

1.3K

86.9K

dfi@dfi·7 May

@ManuAlzuru @tonysimons_ Looks like instead of your Hermes agent breaking because it auto logged you out this will refresh it automatically. Going to test it!

English

ManuAlzuru🥑@ManuAlzuru·6 May

@tonysimons_ this sounds interesting. can you explain the benefits?

English

105

Tony Simons@tonysimons_·6 May

Hermes Vault 0.6.0 is out. And this one is nasty. I added OAuth PKCE login + token auto-refresh. So now Hermes Vault can: 🔹open a browser login flow 🔹store access + refresh tokens automatically 🔹refresh near-expired tokens before they break 🔹support Google, GitHub, OpenAI, or custom providers 🔹expose OAuth login/refresh as MCP tools 🔹audit every event without leaking secrets This is the difference between: “my agent has a pile of API keys” and “my agent has a credential system.” That matters. A lot. Repo: github.com/asimons81/herm…

English

183

9.5K

dfi@dfi·5 May

@osanseviero We need it in oMLX! @jundotkim

English

678

Omar Sanseviero@osanseviero·5 May

Gemma 4 Drafters landing across the OS ecosystem ✅transformers ✅VLLM ✅MLX ✅SGLang ✅Ollama ✅AI Edge Gallery And more coming!

English

412

25.8K

dfi retweetledi

0xSero@0xSero·2 May

Important: This is a summary of an amazing video by one of the best creators I know. Video in the first comment. If you've been struggling to setup a productive local environment I'll summarise, but you should watch the video. 1. Qwen3.6-27B with NO THINKING - 4bit - 16bit depending on your resources 2. Hermes agent: It's polished, minimal, and OSS, if you're on OC keep at it it's also great but IMO consumes more tokens == slower 3. Learn to work in a detached way: Instead of small, unclear prompts make something really specific and hand-off to a local agent: - What is your end goal? - What is the output format? - How would you recommend a task be done? - How would you deal with common issues? Kick off a job and go do something IRL, the lower speeds paired with very clear prompts means you can check in every 2 hours on average for your next free task. Treat it like a challenge, it'll teach you so much. ~~~~~~~ If you're more technical, wire your devices with Tailscale & use protected Cloudflare Tunnels to serve your inference API to your network so you can work from ANYWHERE with your local models. I love Droid, Pi, and Opencode but you can use local models in Claude Code, Codex, Cursor relatively easily. Most of your day will involve computer-use which the models are great at. Need to reset a server? Do it for free, want to gather research? Do it for free. Not every task requires a 10T param beast crunching on things, being able to quickly cycle between models is what's critical here, which is why I recommend the three harnesses I did ~~~~~~~ What is productive? - Organise all your files, folders, images, etc.. have each one tagged (Qwen is omni so no more date time titles) - Create a content cache for yourself: - recommendation algo for books & videos etc. - content/papers/courses for studying - Social media post ideas (write your own posts tho) - Torrent manager, you don't need Netflix or Spotify lol - Simple utility coding projects: landing pages, designs, games, scripts - Market watchers: purchase stuff, check marketplaces, do shopping - Budgeting, taxes, and subscription management I could go on and on, obviously it's not going to make you a millionaire or whatever but it makes life more FUN. ~~~~~~~ AI is a tool, one that's very good at accelerating you towards self fulfilment as long as you: 1. Understand what it is 2. Learn how to talk to it 3. Keep trying to improve you ~~~~~~~ My videos tend to be stream of thought, I am starved of time and just want to make sure I keep up. I would not be doing this if I couldn't just generate 3 thumbnails on t3chat, for example. That's why I spend so much time working on this. I see myself being more of who I want to be every day and a lot of that is with the help of this technology, allowing me to more quickly and effectively interact with the world. Being able to own this at home? True sovereignty. ~~~~~~~ Thank you, Digital Spaceport. I wouldn't have gotten this deep into running my own infra without your videos.

English

1.3K

65.3K

dfi@dfi·2 May

@vicentes @elonmusk This is a need @elonmusk. Also need single-tenant deployment at enterprise level.

English

Vicente Silveira@vicentes·1 May

@elonmusk That’s great but AI startups like us cannot get zero data retention and BAA from grok which law firms want. OpenAI, Claude, Gemini all give that no problem.

English

245

Elon Musk@elonmusk·1 May

Grok #1 in law

Arthur MacWaters@ArthurMacwaters

Grok 4.3 release > #1 in caselaw > #1 in corpfin > impressive given significantly lower cost per 1m tokens (5-10x less than opus 4.7 and openai 5.5) Very exciting to see the massive jump in performance in highly detail-oriented applied fields

English

2.6K

4.8K

28.4K

8.9M

dfi@dfi·1 May

Interesting! Will check it out over the weekend. Harvey’s Vault feature is what all the lawyers I know use the most (far more than Word features), so to me it’s the killer app to get right. You’re right—RAG is not specific enough to describe what I mean. I think existing practice for these tools is to pre-index, OCR, etc docs in a folder—then use a combination of grep + some kind of semantic search. Important for my use case (transactional lawyer looking at precedent, diligence queries over a set a documents, etc.) I think many will want to use local models on this also, so smart structuring of this search and retrieval is key.

English

313

WillC@willchen500·1 May

I have “projects” in which you can create a project to hold matter documents. Did not implement rag but instead gave the AI tools to get the list of documents, extract text and read the documents and searching for key words. I am following the approach of coding agents who had moved from RAG to Grep and other terminal commands to read and find code and obtained much better results. If you need bulk review launch a tabular review and then start an assistant within the tab review and it will find relevant rows and columns to your query

English

WillC@willchen500·30 Nis

Harvey is valued at $11B. Legora just raised at $5.5B. I built their entire web application in two weeks and I'm making it open-source and free for everyone to use. Say hi to Mike: mikeoss.com. When I got the chance to try Harvey and Legora, I was surprised by how simple they were. A thought came to mind: I could probably build something similar in no time at all with Claude. And so I did. Assistant, project, tabular review and workflows. You get it all without vendor lock-in. Mike offers law firms an alternative, where they own the application layer and aren't stuck with a vendor they're renewing forever. You can try Mike in the demo on the website, or go to the GitHub link on the site to download the code and run a local version yourself.

English

255

236

3.9K

1.3M

dfi@dfi·1 May

New weekend project just dropped… mikeoss.com

WillC@willchen500

English

273

dfi@dfi·1 May

@originalmagneto @willchen500 Haha, yes. Will point some Codex tokens at it over the weekend. But also need it on the GitHub roadmap!

English

Majo@originalmagneto·1 May

@dfi @willchen500 It’s OSS, you can just tell Codex to implement it 😉 Im planning to expand this for my own use 🤣

English

dfi@dfi·1 May

@willchen500 @arno_barton A big component I think missed in these discussion is that Harvey is also selling data security as a service. Firm-level single-tenant deployment, ZDR, and enterprise grade security. One provider to to vet for everything, including backend.

English

WillC@willchen500·1 May

@arno_barton Harvey and Legora have a moat because so many older partners don’t even understand that they’re just a wrapper around a model. It’s really the best market to salesmaxx with a thin wrapper. But slowly people are starting to understand.

English

4.2K

Keşfet

@LuxVeritasAeter @sudoingX @NousResearch @NicoTheGreco @mr_r0b0t @bloc97_ @gigant_theo @theemozilla