5Sas

2 posts

5Sas banner
5Sas

5Sas

@5vsas

Beigetreten Ağustos 2025
9 Folgt0 Follower
5Sas retweetet
Joël Niklaus
Joël Niklaus@joelniklaus·
Excited to announce our first project from the HuggingLegal community: the Gemini-3-Benchmarkathon! Gemini-3 achieved the top spot on most major benchmarks last week, but how well does it know the law? Unfortunately, most model providers don't evaluate on law-specific benchmarks. So while we have a good idea of how good new models are at coding, we are pretty much in the dark about their lawyering abilities. This is why we ran a vibe-check on six diverse datasets from Greek bar exams over Indian law questions to Swiss university law exams among others. So what do the vibes say? AA-Omniscience: 6/10 – high competence but unreliable LegalBench: 9.5/10 – almost perfect answers GreekBarBench: 9/10 – impressive long-context reasoning without hallucinations IndianLawQA: 8.5/10 – The task feels precise, clean, and insightful, revealing how reliably Gemini-3 handles high-precision statutory queries (including new BNSS 2023 codes) while most models typically hallucinate in this domain. WilfulMisconduct: 8/10 – strong logic, missed binding precedent LEXam: 7/10 – competent but overly confident Gemini-3 is extremely strong at many legal tasks and often performs at or above the level of very good human lawyers. However, it still makes serious mistakes: it often answers confidently even when it does not know the right legal rule or fact, instead of saying “I don’t know.” Because of this overconfidence and some failures on complex reasoning and precedents, it cannot safely replace human lawyers and still needs expert oversight. This is a great community effort, thanks for the collaboration Robert Scholz, @5vsas, Ernest Beta, @odychlapanis, @adhipba, Matteo Bürgler, Sophie Franco, Chu Fei Luo, @samdahan06! Find the link to the article below.
Joël Niklaus tweet media
English
1
6
9
592
5Sas
5Sas@5vsas·
Theory : Moral maximalism is just minimalism scaled for the masses: since abuses (e.g. forced prostitution) are hard to separate from consensual cases, society bans the whole practice. Maximalism = collective shortcut to protect against harm.
English
0
0
2
49