ellamind

47 posts

ellamind

@ellamindAI

Building elluminate. AI Evaluations, simplified. Also: Data Sovereignty, Privacy, Performance & some research. Come & work with us.

Bremen, Germany Katılım Ocak 2024

12 Takip Edilen167 Takipçiler

Sabitlenmiş Tweet

ellamind@ellamindAI·4 Eyl

AI evaluations are broken. Generic benchmarks tell you nothing. Manual QA doesn't scale. And existing tools are either too academic or simplify in the wrong places. That's why we built elluminate - evals that actually work for real product teams.

English

4.1K

ellamind retweetledi

Alexander Doria@Dorialexander·2d

Great annotation work from @ellamindAI / OpenEuroLLM on French-Science-Commons less than 24 hours after release!

Max Idahl@maxidahl

Annotations are already available. Looks to be very good data. Now go ahead and curate the best seed docs for synth data.

English

5.2K

ellamind retweetledi

Max Idahl@maxidahl·4d

@fujikanaeda In case you are interested in speedrunning a German version in collab with @ellamindAI , hit me up. We can take care of the locale work and also got some B200 compute to spare.

English

ellamind retweetledi

OpenEuroLLM@OpenEuroLLM·5d

Experimenting with model-based annotation for better data selection? A candidate to consider is propella-1, a multi-property annotator partially funded by #OpenEuroLLM which is fully open-source. 🔓Code, annotations and paper available! arxiv.org/pdf/2602.12414

ellamind@ellamindAI

We released propella-1, a small model for advanced pre-training data annotation 🙃. Work led by @maxidahl within the @OpenEuroLLM project. Link to model + annotations for important pre-training datasets below 👇

English

160

ellamind retweetledi

Jan P. Harries@jphme·10 Mar

M 2.5 by @MiniMaxAI_ is currently the most popular open weights model on @OpenRouter, but is also heavily censored. Inspecting the CoT`s reveals deliberate lying, which can also be problematic in other areas as @AnthropicAI`s research has shown. Some examples attached 👇

English

106

ellamind@ellamindAI·15 Oca

Model Download: huggingface.co/ellamind/prope… Annotations generated with propella-1: huggingface.co/datasets/opene…

Bremen, Germany 🇩🇪 English

ellamind@ellamindAI·15 Oca

Max Idahl@maxidahl

Time to propel open LLM training data curation to the next level. Releasing propella-1: small multilingual LLMs that annotate text documents for dataset curation at scale. 🧵👇

Bremen, Germany 🇩🇪 English

311

ellamind@ellamindAI·17 Ara

Our @TheBitFlipper built an in-house benchmark for coding agents, based on real PRs from our codebase. As expected from our vibes (and other benchmarks), Opus takes the crown 🥇 - GPT-5.2 results still outstanding though 👀

Damian Barabonkov@iamdamianb

Public benchmarks are easy to game. I built swellubench to validate real features and bug fixes from a production platform at @ellamindAI. It evaluates models on private, real-world coding tasks to measure true performance and cut through benchmark maxing noise. Methodology in 🧵

English

ellamind@ellamindAI·12 Ara

10m gpu hours for @OpenEuroLLM, lfg 🚀

OpenEuroLLM@OpenEuroLLM

Strategic access to EuroHPC resources granted to OpenEuroLLM!!! -first AI project granted strategic access across multiple EuroHPC centres -for over 10 million GPU hours Thanks @EUComission and @EuroHPC_JU!

Stuhr, Deutschland 🇩🇪 English

ellamind@ellamindAI·21 Kas

Joined SOOFI consortium for 🇩🇪 sovereign AI. Our role: rigorous LLM evaluation on @deutschetelekom's shiny new DGX B200s in Munich🙂. Let´s build! 🙌 More info: ki-verband.de/wp-content/upl… @KI_Verband @FraunhoferIAIS @MMerantix @DFKI @FraunhoferIIS @UniHannover @TUDarmstadt

Stuhr, Deutschland 🇩🇪 English

146

ellamind@ellamindAI·24 Eki

Machine translated data beats native language data? 🤔 As part of @OpenEuroLLM, we produced >5 trillion tokens of multilingual pretrain data for low-resource languages with >3M tps on LEONARDO (CINECA). Findings presented at @BSC_CNS. led by @maxidahl, release coming soon 🙂.

Barcelona, Spain 🇪🇸 English

206

ellamind@ellamindAI·17 Eki

@WolframRvnwlf 🫡

QME

Wolfram Ravenwolf@WolframRvnwlf·17 Eki

The wolf steps off the boat after six intense months at @ellamindAI, where we took our eval platform elluminate from pilot to GA. Thanks to the entire team for the great teamwork and community. I'm leaving the company, but we're parting on the best of terms and will stay in touch. What's next for me? Stay tuned! 🐺✨

English

552

ellamind@ellamindAI·16 Eki

decide with confidence 💯 #elluminate

Jan P. Harries@jphme

Veo 3.1 vs Sora 2 creating professional-looking (at least that was the intention 😄) minimal ads. My take: Veo3.1´s details slightly better, however Sora 2 a lot more steerable and with better text + scene changing capabilities. (prompt was adapted from some sora example though)

English

ellamind retweetledi

Jan P. Harries@jphme·5 Eyl

This is just a small vibecheck (more currently not possible due to rate limits) - but in the German Geo eval I built on stage yesterday evening, @Alibaba_Qwen 3-Max doesn't look competitive with other top models and also falls far behind e.g. R1 or GLM 4.5. 😕 @ellamindAI

English

1.9K

ellamind@ellamindAI·4 Eyl

Building AI products? You need real evaluations. Let's talk. elluminate.de

English

136

ellamind@ellamindAI·4 Eyl

The result? Teams ship faster with confidence. Product managers can actually trust their metrics. And developers spend time building, not firefighting. Whether you're a developer tired of vibe-checking, a PM who needs reliable metrics, or a domain expert who knows what "good" looks like, elluminate speaks your language.

English

149

ellamind@ellamindAI·4 Eyl

English

4.1K

ellamind@ellamindAI·21 Ağu

Our co-founders project #LeoLM highlighted by @bmftr_bund. Today, we´re continuing what started as a student`s side-project with @OpenEuroLLM (and more to come). If you want to work on Open Source AI, multilingual applications and AI evaluations as well - we´re hiring! 🙂

Björn Plüster@bjoern_pl

Nearly two years after release my project LeoLM is being used as a strong justification for the expansion of federal compute funding in Germany. Goes to show how much impact open-source projects can have. Hell yeah @bmftr_bund - thanks for making projects like this possible! 🚀

English

536

ellamind retweetledi

Jan P. Harries@jphme·7 Ağu

GPT-5 is worse than GPT-4o 😳 ... ...at least for some writing tasks in German (and probably also other languages...) 👇

English

6.9K

Keşfet

@fujikanaeda @MiniMaxAI_ @OpenRouter @AnthropicAI @maxidahl @OpenEuroLLM @deutschetelekom @ki_verband