5Sas (@5vsas) - Twitter 프로필 | Zamantika Mersobahis Locabet

5Sas 리트윗함

Joël Niklaus@joelniklaus·28 Kas

Excited to announce our first project from the HuggingLegal community: the Gemini-3-Benchmarkathon! Gemini-3 achieved the top spot on most major benchmarks last week, but how well does it know the law? Unfortunately, most model providers don't evaluate on law-specific benchmarks. So while we have a good idea of how good new models are at coding, we are pretty much in the dark about their lawyering abilities. This is why we ran a vibe-check on six diverse datasets from Greek bar exams over Indian law questions to Swiss university law exams among others. So what do the vibes say? AA-Omniscience: 6/10 – high competence but unreliable LegalBench: 9.5/10 – almost perfect answers GreekBarBench: 9/10 – impressive long-context reasoning without hallucinations IndianLawQA: 8.5/10 – The task feels precise, clean, and insightful, revealing how reliably Gemini-3 handles high-precision statutory queries (including new BNSS 2023 codes) while most models typically hallucinate in this domain. WilfulMisconduct: 8/10 – strong logic, missed binding precedent LEXam: 7/10 – competent but overly confident Gemini-3 is extremely strong at many legal tasks and often performs at or above the level of very good human lawyers. However, it still makes serious mistakes: it often answers confidently even when it does not know the right legal rule or fact, instead of saying “I don’t know.” Because of this overconfidence and some failures on complex reasoning and precedents, it cannot safely replace human lawyers and still needs expert oversight. This is a great community effort, thanks for the collaboration Robert Scholz, @5vsas, Ernest Beta, @odychlapanis, @adhipba, Matteo Bürgler, Sophie Franco, Chu Fei Luo, @samdahan06! Find the link to the article below.

English

592

5Sas@5vsas·9 Eyl

Theory : Moral maximalism is just minimalism scaled for the masses: since abuses (e.g. forced prostitution) are hard to separate from consensual cases, society bans the whole practice. Maximalism = collective shortcut to protect against harm.

English

5Sas

탐색