Pietro Mascheroni

122 posts

Pietro Mascheroni banner
Pietro Mascheroni

Pietro Mascheroni

@Pi_Mas

Post Doc @ gBDS, Boehringer Ingelheim | Predictive modeling | Machine Learning | AI in Medicine. Views are my own.

Post Doc, Boehringer Ingelheim เข้าร่วม Mart 2012
403 กำลังติดตาม110 ผู้ติดตาม
Philippe A. Robert
Philippe A. Robert@PRobertImmodels·
… and yet, i have more citations than the guy that got 13 papers in his short postdoc …
English
3
0
2
77
Philippe A. Robert
Philippe A. Robert@PRobertImmodels·
My first first author research article came 2 years after my 4 years PhD. The real PhD project is unpublishable. Yet I got summe cum laude, published cool stuff, low to high impact as first, middle and co-last author! I completed interesting projects with cool people. No regrets!
Stefano Zucca@StefanoZucca5

Wasn't sure to share it, but thought it might help someone. How I published 0 first-author papers in my 3yrs #postdoc -changed research topic -moved to a foreign country -#mentalhealth breakdown -existential crisis on my #academic career -#fellowship rejections -COVID

English
1
0
5
207
Pietro Mascheroni รีทวีตแล้ว
Mistral AI
Mistral AI@MistralAI·
magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=http%3A%2F%https://t.co/OdtBUsbeV5%3A1337%2Fannounce
Indonesia
259
762
5.6K
1.8M
Pietro Mascheroni
Pietro Mascheroni@Pi_Mas·
@emollick Well, we are just using language models are reasoning engines... what could go wrong? 😅
English
0
0
1
40
Ethan Mollick
Ethan Mollick@emollick·
You cannot eliminate LLM hallucinations by simply telling it not to hallucinate. Hallucination rates are dropping over time and model size, but there is no prompting solution that eliminates hallucinations, contrary to a lot of advice I see shared on this site. (Errors in red)
Ethan Mollick tweet mediaEthan Mollick tweet media
English
23
56
282
35.8K
Pietro Mascheroni
Pietro Mascheroni@Pi_Mas·
@emollick I am with @ylecun here, if I understand him correctly. We need a paradigmatic shift in terms of architecture: current LLM architectures are doomed to commit errors, no easy way out - no matter how much we optimize pipelines.
English
0
0
1
134
Ethan Mollick
Ethan Mollick@emollick·
Here is why I am so doubtful about the "talk-to-your-data" use of AI. This is Google NotebookLM, a cool tool that lets you use AI on data sources. Even though the document search retrieves the right information (it is Google, after all), the LLM answer has subtle hallucinations.
Ethan Mollick tweet media
English
36
119
951
350.4K
Pietro Mascheroni รีทวีตแล้ว
Bindu Reddy
Bindu Reddy@bindureddy·
Good Quality Data, Not Compute, Is LLM Gold GPU shortage has somewhat eased up and a number of companies including Amazon, Google and Micosoft are trying to compete with Nvidia with their own LLM friendly chips We are also seeing a big trend towards smaller models being as performant as large models. Mistral's 7B outperforms 13B and Llama-2 70B can be tuned to get GPT 3.5 (180B)'s performance If I have to make a bet, I would say a MOE (mixture of experts) architecture of instruct tuned open-source models where you have an ensemble of models, each an "expert" at a particular type of task can potentially achieve GPT-4 like performance. If each of the models < 100B in size, then they would also be accessible for research and mass market use (i.e. serve-able by the GPU poor). Serving super large models is not only expensive but is also cumbersome. Given that, < 100B models such as Llama-2 have shown promise, I suspect the GPU crunch will soon be over. Both because of multiple players coming into the market and by more efficient models. The next thing to consider is data, how much more performance can we get out these LLMs by expanding their training datasets? Can OpenAI go into a infinite loop and successively train more and more powerful and performant models? LLMs plateau in performance once they've "used up" the information in their training set. This fundamentally means that we can run out of "training data". In traditional machine learning, this is reflected in the fact that model performance doesn't improve even if you train with larger larger datasets, as long as the sample you are training on, is robust and is reflective of the underlying data distribution Some believe that we already saturated available training data. This means that GPT 5, 6 and 7 will look more like GPT 4, unless we develop some new techniques Having said that, there is a lot of data being created constantly by humans and now LLMs :). So in some sense we may never run out of new data for LLM training. Assuming we will continue to have new data to train LLMs, do we need bigger and bigger LLMs as we generate more and more training data? Will the LLM's reasoning skills improve because it's being trained on bigger and bigger datasets, or is just that their knowledgeBase improves? The Chinchilla Training Law, proposed by Google, addresses LLM size vs. data and challenges traditional scaling laws by advocating for a ~21:1 ratio of training tokens to parameter size, instead of the conventional ~1:1 ratio, optimizing the balance between model size, training data, and compute budget. This law posits that many models were "massively oversized and massively undertrained," suggesting a pathway towards more efficiently trained, cost-effective Large Language Models. They recommend LLMs of size 70 billion parameters should be trained with 1,400 billion (1.4 trillion) tokens to achieve data-optimal training. For example Llama-2 70B is trained on 2T tokens! So more data, doesn't necessarily mean bigger LLMs. The next question is around data-quality. Again, like standard issue ML, focusing on data quality over quantity can vastly improve LLM performance. Gunasekar, et al. in their paper "Textbooks Are All You Need". The team trained a small model of just 1.3B parameters but used high quality data of filtered code, textbooks, and GPT-3.5-generated data. High quality data dramatically improves the learning efficiency of language models for code as they provide clear, self-contained and instructive examples. Fine-tuning LLM models yield excellent results, but only with the utilization of high-quality domain-specific data. The majority of the effort is spent on curating the data and continually refining it, based on the performance of the LLM. Data matters, but high quality data matters more. When it comes to specific enterprise AI tasks, an open-source model fine-tuned on a high quality dataset has equivalent performance to GPT-4. So the real challenge is high quality data. Enterprises have a lot of high quality training data for specific uses - example: a Q/A model on their knowledgebase. For example, their customer support team by answering customer queries, has basically curated a large supervised training dataset. Another source of high quality data is the human preference data. Humans can review LLM responses and provide feedback to the LLM. This can be used in further refine a model. Consumer services like ChatGPT are collecting a lot of human feedback through their chat interfaces High quality data can in-turn be generated by LLMs. "self-instruct" is a method where you can generate training data for an LLM from itself. Alternatively you can use one LLM (say GPT-4) to generate a high quality training dataset to fine-tune another LLM. Two instruction-tuned LLaMA models were compared, one fine-tuned on data generated by GPT-4 and the other on data generated by GPT-3. The model fine-tuned with GPT-4 generated data performed substantially better in the "Helpfulness" criterion, showcasing the utility of high-quality data generated by LLMs for fine-tuning High-quality small datasets can be very useful for fine-tunes and you can also employ traditional human labelling techniques to generate those datasets. In summary, you can fine-tune a small LLM (< 100B) for a particular task, if you have a high quality dataset Good general purposes LLMs however are a different story and may still require a lot of data and training. However, it's not clear if just throwing more data at the problem will improve LLM reasoning dramatically. It's also very likely that Auto-regressive LLMs while immensely useful for many business and consumer use-cases aren't going to get us to AGI, simply by "brute-forcing it" One more reason for the Doomers to simmer down and let builders build :)
Bindu Reddy tweet media
English
25
98
505
167K
Spencer Greenberg 🔍
Spencer Greenberg 🔍@SpencrGreenberg·
We got this extremely interesting set of responses from an applicant for a software engineer job opening. Spot anything odd about it?
Spencer Greenberg 🔍 tweet media
English
259
503
9.3K
1.4M
Pietro Mascheroni รีทวีตแล้ว
Curious Refuge
Curious Refuge@CuriousRefuge·
What if Wes Anderson directed The Lord of the Rings? We asked the community which video they want to see next and Lord of the Rings took the cake… or should we say Elven bread. We hope you enjoy this Midjourney to Middle-Earth. #LordOfTheRings #WesAnderson #MovieTrailer #LOTR
English
255
1.5K
6.6K
1.9M
Pietro Mascheroni รีทวีตแล้ว
Terrible Maps
Terrible Maps@TerribleMaps·
Terrible Maps tweet media
ZXX
239
987
15.1K
1.2M
Massimo
Massimo@Rainmaker1973·
When throwing paint turns into a masterpiece. Paul Kenton is a contemporary artist, acclaimed for his cityscape paintings which capture the unique energy of cities across the world [source, more videos: buff.ly/3Hfu47a] twitter.com/SnarkkTank/sta…
English
84
1.1K
8.3K
946.9K
Pietro Mascheroni รีทวีตแล้ว
M3s (B. Hatzikirou)
M3s (B. Hatzikirou)@M3sBiomath·
I am very proud to share the excellent work of my talented PhD student @Lito_MathBio_ about a theory-driven treatment vs Staphylococcus chronic infections. Here, the proposed treatment is supported and validated by murine experiments. sciencedirect.com/science/articl…
English
1
2
17
0
Pietro Mascheroni รีทวีตแล้ว
Papers with Code
Papers with Code@paperswithcode·
🪐 Introducing Galactica. A large language model for science. Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more. Explore and get weights: galactica.org
English
210
2K
7.4K
0
Pietro Mascheroni รีทวีตแล้ว
Bindu Reddy
Bindu Reddy@bindureddy·
By far, the most accurate representation of machine learning pipelines in the real world 🍿 😀
English
207
1.6K
11.1K
0
Pietro Mascheroni รีทวีตแล้ว
DG MEME 🇪🇺
DG MEME 🇪🇺@meme_ec·
Let's give it a tragic twist!
DG MEME 🇪🇺 tweet media
English
37
488
4.7K
0
Pietro Mascheroni รีทวีตแล้ว
Russell Crowe
Russell Crowe@russellcrowe·
Taking the kids to see my old office
Russell Crowe tweet media
English
6.5K
41.1K
682K
0
Pietro Mascheroni รีทวีตแล้ว
Burel Goodin
Burel Goodin@PainBurel·
Sometimes it’s not you, it’s them 🙄😂😭 #academia
Burel Goodin tweet media
English
278
4.7K
46.2K
0