malteos

241 posts

malteos banner
malteos

malteos

@XYOU

Berlin, Germany Katılım Haziran 2009
1.5K Takip Edilen752 Takipçiler
malteos
malteos@XYOU·
@RishiBommasani @percyliang The analogy for cloud vs local would be restaurant vs takeout. At the restaurant you better behave otherwise you get kicked out. At home you eat your food however you want.
English
0
0
0
21
rishi
rishi@RishiBommasani·
I like the analogy. Notably in the restaurant world, only one of these even is afforded the word open. Option 3 is an "open kitchen" restaurant. (I don't think all such restaurants would appreciate the customer shouting at the chef but let's put that aside) Though maybe there is some mismatch in the analogy: - Option 1 is just "you get the food". Analogue is "you get the model". This probably collapses open weight with everything less open than it since we don't distinguish weights vs API in food as far as I can imagine, and certainly there is no local vs. cloud distinction for food - Option 2 is "you get the food and recipe". I think this is a bit of a mismatch with open source since recipe is transparency (i.e. information about how to build) but not the actual ingredients themselves (whereas you might/do have the dataset in some stronger sense with open-source). But, worth noting in both cases that you are not given the cooking infrastructure or compute infrastructure to consume the ingredients and produce the food. One other subtlety is open kitchen restaurants are not fully open due to constraints: chefs do prepwork so that the cook time in front of the diner is reasonable length (e.g. omakase restaurant needs to prepare rice in advance). That's fine because the customer doesn't need 100% open and to see every gory detail, but not fine for researchers.
English
2
0
4
1.7K
Percy Liang
Percy Liang@percyliang·
I find myself repeatedly explaining the difference between open-weight (DeepSeek), open-source (Olmo), open-development (Marin). Let's see if this restaurant analogy helps: - Open-weight: food is made behind closed doors, server brings you the dish - Open-source: food is made behind closed doors, server brings you the dish and the recipe - Open-development: you see the chef make the dish in the kitchen (and can shout suggestions while its cooking)!
English
40
92
914
75.9K
malteos
malteos@XYOU·
@MatthewBerman Sure about this? Given the current reproducibility crisis in ML research, I doubt that humans would achieve a much higher replication score.
English
0
0
0
16
Matthew Berman
Matthew Berman@MatthewBerman·
Which model won? Turns out Claude 3.5 Sonnet leads the pack, achieving a ~21% replication score on PaperBench! This is impressive, but, it shows there's still a gap compared to human PhD-level experts.
Matthew Berman tweet media
English
2
2
49
4.3K
Matthew Berman
Matthew Berman@MatthewBerman·
.@OpenAI dropped a new research paper showing AI agents are now capable of replicating cutting-edge AI research papers from scratch. This is one step closer to the Intelligence Explosion: AI that can discover new science and improve itself. Here’s what they learned: 🧵
Matthew Berman tweet media
English
37
149
1.3K
190.2K
malteos
malteos@XYOU·
4/ In academia, the work is very different. PhD students or even undergraduates are the ones doing most the actual research work. But as a PhD student, you need to decide whether you prioritize the project work over your own PhD work (papers and thesis).
English
0
0
0
201
malteos
malteos@XYOU·
3/ LLMs and other foundation models are no longer research artifacts but products. Frontier models are developed by dedicated teams of +100 people specialized across the whole stack (from low level hardware optimization over data to ML and UX topics).
English
1
0
0
232
Yifei Hu
Yifei Hu@hu_yifei·
I am currently working on an end-to-end OCR pipeline for research papers. Open Research Assistant needs high a quality OCR pipeline to work properly, so I really have to solve the OCR problem before making more progress in the OpenRA project. Good news: paper OCR will be solved soon.
English
8
1
121
17.9K
malteos
malteos@XYOU·
@gui_penedo @pjox13 That’s even better. I will share the data with you as soon it’s ready!
English
0
0
2
21
Guilherme Penedo
Guilherme Penedo@gui_penedo·
@XYOU @pjox13 We're happy to run a training on the same conditions, but you can find details on the model setup (we haven't posted the exact training script yet) and the exact eval code on our blogpost
English
1
0
2
43
Guilherme Penedo
Guilherme Penedo@gui_penedo·
We keep getting new pretraining datasets 🔥 Congratulations to the Matrix team for such a strong dataset!
Guilherme Penedo tweet media
English
1
10
70
18.1K
malteos
malteos@XYOU·
@gui_penedo @pjox13 We will release a filtered version of Colossal OSCAR soon. Is your training and evaluation script somewhere available? I would love to do the comparison with that version.
English
1
0
1
34
malteos
malteos@XYOU·
@mark_cummins For Germany, we have ~50B tokens of court decisions but that are only the publicly available ones and that represent ~1% of all court decisions. However, you won't need all for LLM training due to high duplicate ratio. @mlissner might have the US numbers.
English
1
0
1
43
Mark Cummins
Mark Cummins@mark_cummins·
@XYOU One other thing I forgot to include was court documents. Seems like you might know about that. Do you have any data on how many publicly accessible court documents exist?
English
2
0
0
293
malteos
malteos@XYOU·
@gui_penedo Awesome work. Will the remaining models also be released? And from your experience what model and data size do you need to see a significant difference in performance?
English
0
0
0
505
Guilherme Penedo
Guilherme Penedo@gui_penedo·
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
Guilherme Penedo tweet media
English
39
323
1.6K
607.6K
(((ل()(ل() 'yoav))))👾
"15T tokens collected from publicly available sources". what does "publicly available source" even mean?
English
22
5
87
26.6K
OcciGlot
OcciGlot@occiglot·
We have some great new evaluation results to share that provided by the community. The German Occiglot model is the best in class on ScandEval. scandeval.com/german-nlg/ And our Spanish model achieves SOTA results in lexical word understanding.
OcciGlot tweet media
English
2
4
21
1.9K
Zengyi Qin
Zengyi Qin@qinzytech·
Training LLMs can be much cheaper than previously thought. 0.1 million USD is sufficient for training LLaMA2-level LLMs🤯 While @OpenAI and @Meta use billions of dollars to train theirs, you can also train yours with much less money. Introducing our open-source project JetMoE: research.myshell.ai/jetmoe A thread 🧵
Zengyi Qin tweet media
English
34
165
879
246.7K
malteos
malteos@XYOU·
@BramVanroy @VSC_HPC If your cluster uses slurm you can catch the kill signal and save a checkpoint before that. See this script for an example. Line 14 and 293-300 do the magic. #file-bigscience-deepspeedmeg-example-sbatch-L293-298" target="_blank" rel="nofollow noopener">gist.github.com/malteos/71635c…
English
0
0
2
102
malteos
malteos@XYOU·
@SebastianB929 Opengptx is an official government funded research project. Occiglot is a loose group of individuals from different organizations without any formal ties. We call it a research collective. You may also call it simply a discord server. And yes, the website needs to be improved.
English
0
0
2
41
SebastianBoo
SebastianBoo@SebastianB929·
@XYOU Is Occiglot a research project like opengptx or how can i get a better understanding of it? The project page is a bit confusing :)
English
1
0
0
36
malteos
malteos@XYOU·
@ZedDou1 @occiglot As mentioned in the readme, we suspect that this is due to the benchmarks being machine translated from English and based on English prompts.
English
0
0
1
76
Jordan
Jordan@ZedDou1·
@occiglot Nice to see multilinguality more and more addressed, great work! I do have a question though, how would you explain the gap in the evals in the 5 languages between your models (base and instruct) and the Mistral models which are mostly English? 🤔
English
1
0
0
65
OcciGlot
OcciGlot@occiglot·
Today, we are announcing Occiglot! A large-scale collaborative research collective focusing on open-source European LLMs. We invite anybody working on multilingual datasets, benchmarks, or models to get in touch/join our discord. occiglot.github.io/occiglot/posts…
English
7
47
180
31.8K
malteos
malteos@XYOU·
@BramVanroy Have you tried tensor parallelism on the embedding layer? If I remember it correctly Bloom used this with its large vocab. @StasBekman
English
1
0
0
143
malteos
malteos@XYOU·
@BramVanroy @ph_singer There is a high correlation between the weights of Mistral and Mixtral. So this seems pretty likely.
English
0
0
0
29
malteos
malteos@XYOU·
@robertomasymas @burkov Check out "progressive growing". People did something similar already with BERT models.
English
0
0
2
99
Roberto Tomás C 🍉
Roberto Tomás C 🍉@robertomasymas·
@burkov “SOLAR-10.7B incorporates the innovative Upstage Depth Up-Scaling. We then integrated Mistral 7B weights into the upscaled layers, and finally, continued pre-training for the entire model.” Honest question: how do you start with pretrained weights from a model of diffident size?
English
3
0
8
2.8K