Pietro Mascheroni

122 posts

Pietro Mascheroni

@Pi_Mas

Post Doc @ gBDS, Boehringer Ingelheim | Predictive modeling | Machine Learning | AI in Medicine. Views are my own.

Post Doc, Boehringer Ingelheim เข้าร่วม Mart 2012

403 กำลังติดตาม110 ผู้ติดตาม

Pietro Mascheroni@Pi_Mas·11 Tem

@PRobertImmodels you also survived the winter in Braunschweig! This is not a small thing :D

English

Philippe A. Robert@PRobertImmodels·10 Tem

… and yet, i have more citations than the guy that got 13 papers in his short postdoc …

English

Philippe A. Robert@PRobertImmodels·10 Tem

My first first author research article came 2 years after my 4 years PhD. The real PhD project is unpublishable. Yet I got summe cum laude, published cool stuff, low to high impact as first, middle and co-last author! I completed interesting projects with cool people. No regrets!

Stefano Zucca@StefanoZucca5

Wasn't sure to share it, but thought it might help someone. How I published 0 first-author papers in my 3yrs #postdoc -changed research topic -moved to a foreign country -#mentalhealth breakdown -existential crisis on my #academic career -#fellowship rejections -COVID

English

207

Pietro Mascheroni รีทวีตแล้ว

Mistral AI@MistralAI·10 Nis

magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=http%3A%2F%https://t.co/OdtBUsbeV5%3A1337%2Fannounce

Indonesia

259

762

5.6K

1.8M

Pietro Mascheroni@Pi_Mas·5 Şub

@emollick Well, we are just using language models are reasoning engines... what could go wrong? 😅

English

Ethan Mollick@emollick·5 Şub

You cannot eliminate LLM hallucinations by simply telling it not to hallucinate. Hallucination rates are dropping over time and model size, but there is no prompting solution that eliminates hallucinations, contrary to a lot of advice I see shared on this site. (Errors in red)

English

282

35.8K

Pietro Mascheroni@Pi_Mas·5 Ara

@emollick I am with @ylecun here, if I understand him correctly. We need a paradigmatic shift in terms of architecture: current LLM architectures are doomed to commit errors, no easy way out - no matter how much we optimize pipelines.

English

134

Ethan Mollick@emollick·4 Ara

Here is why I am so doubtful about the "talk-to-your-data" use of AI. This is Google NotebookLM, a cool tool that lets you use AI on data sources. Even though the document search retrieves the right information (it is Google, after all), the LLM answer has subtle hallucinations.

English

119

951

350.4K

Pietro Mascheroni รีทวีตแล้ว

Bindu Reddy@bindureddy·17 Eki

Good Quality Data, Not Compute, Is LLM Gold GPU shortage has somewhat eased up and a number of companies including Amazon, Google and Micosoft are trying to compete with Nvidia with their own LLM friendly chips We are also seeing a big trend towards smaller models being as performant as large models. Mistral's 7B outperforms 13B and Llama-2 70B can be tuned to get GPT 3.5 (180B)'s performance If I have to make a bet, I would say a MOE (mixture of experts) architecture of instruct tuned open-source models where you have an ensemble of models, each an "expert" at a particular type of task can potentially achieve GPT-4 like performance. If each of the models < 100B in size, then they would also be accessible for research and mass market use (i.e. serve-able by the GPU poor). Serving super large models is not only expensive but is also cumbersome. Given that, < 100B models such as Llama-2 have shown promise, I suspect the GPU crunch will soon be over. Both because of multiple players coming into the market and by more efficient models. The next thing to consider is data, how much more performance can we get out these LLMs by expanding their training datasets? Can OpenAI go into a infinite loop and successively train more and more powerful and performant models? LLMs plateau in performance once they've "used up" the information in their training set. This fundamentally means that we can run out of "training data". In traditional machine learning, this is reflected in the fact that model performance doesn't improve even if you train with larger larger datasets, as long as the sample you are training on, is robust and is reflective of the underlying data distribution Some believe that we already saturated available training data. This means that GPT 5, 6 and 7 will look more like GPT 4, unless we develop some new techniques Having said that, there is a lot of data being created constantly by humans and now LLMs :). So in some sense we may never run out of new data for LLM training. Assuming we will continue to have new data to train LLMs, do we need bigger and bigger LLMs as we generate more and more training data? Will the LLM's reasoning skills improve because it's being trained on bigger and bigger datasets, or is just that their knowledgeBase improves? The Chinchilla Training Law, proposed by Google, addresses LLM size vs. data and challenges traditional scaling laws by advocating for a ~21:1 ratio of training tokens to parameter size, instead of the conventional ~1:1 ratio, optimizing the balance between model size, training data, and compute budget. This law posits that many models were "massively oversized and massively undertrained," suggesting a pathway towards more efficiently trained, cost-effective Large Language Models. They recommend LLMs of size 70 billion parameters should be trained with 1,400 billion (1.4 trillion) tokens to achieve data-optimal training. For example Llama-2 70B is trained on 2T tokens! So more data, doesn't necessarily mean bigger LLMs. The next question is around data-quality. Again, like standard issue ML, focusing on data quality over quantity can vastly improve LLM performance. Gunasekar, et al. in their paper "Textbooks Are All You Need". The team trained a small model of just 1.3B parameters but used high quality data of filtered code, textbooks, and GPT-3.5-generated data. High quality data dramatically improves the learning efficiency of language models for code as they provide clear, self-contained and instructive examples. Fine-tuning LLM models yield excellent results, but only with the utilization of high-quality domain-specific data. The majority of the effort is spent on curating the data and continually refining it, based on the performance of the LLM. Data matters, but high quality data matters more. When it comes to specific enterprise AI tasks, an open-source model fine-tuned on a high quality dataset has equivalent performance to GPT-4. So the real challenge is high quality data. Enterprises have a lot of high quality training data for specific uses - example: a Q/A model on their knowledgebase. For example, their customer support team by answering customer queries, has basically curated a large supervised training dataset. Another source of high quality data is the human preference data. Humans can review LLM responses and provide feedback to the LLM. This can be used in further refine a model. Consumer services like ChatGPT are collecting a lot of human feedback through their chat interfaces High quality data can in-turn be generated by LLMs. "self-instruct" is a method where you can generate training data for an LLM from itself. Alternatively you can use one LLM (say GPT-4) to generate a high quality training dataset to fine-tune another LLM. Two instruction-tuned LLaMA models were compared, one fine-tuned on data generated by GPT-4 and the other on data generated by GPT-3. The model fine-tuned with GPT-4 generated data performed substantially better in the "Helpfulness" criterion, showcasing the utility of high-quality data generated by LLMs for fine-tuning High-quality small datasets can be very useful for fine-tunes and you can also employ traditional human labelling techniques to generate those datasets. In summary, you can fine-tune a small LLM (< 100B) for a particular task, if you have a high quality dataset Good general purposes LLMs however are a different story and may still require a lot of data and training. However, it's not clear if just throwing more data at the problem will improve LLM reasoning dramatically. It's also very likely that Auto-regressive LLMs while immensely useful for many business and consumer use-cases aren't going to get us to AGI, simply by "brute-forcing it" One more reason for the Doomers to simmer down and let builders build :)

English

505

167K

Pietro Mascheroni@Pi_Mas·1 Ağu

@SpencrGreenberg As an AI language model I don't find anything strange in this application letter.

English

Spencer Greenberg 🔍@SpencrGreenberg·31 Tem

We got this extremely interesting set of responses from an applicant for a software engineer job opening. Spot anything odd about it?

English

259

503

9.3K

1.4M

Pietro Mascheroni รีทวีตแล้ว

Morgan Delarue Research@DelarueResearch·5 Tem

This offer is still on! Thanks for sharing! @ERC_Research

Morgan Delarue Research@DelarueResearch

We are recruiting 2 motivated PhD students to tackle the regulation of cell growth and division under mechanical pressure. Don't hesitate to apply and to contact me if you have questions! delarue-research.org/job-offer/ @ERC_Research Please RT :)

English

6.3K

Pietro Mascheroni รีทวีตแล้ว

Curious Refuge@CuriousRefuge·10 May

What if Wes Anderson directed The Lord of the Rings? We asked the community which video they want to see next and Lord of the Rings took the cake… or should we say Elven bread. We hope you enjoy this Midjourney to Middle-Earth. #LordOfTheRings #WesAnderson #MovieTrailer #LOTR

English

255

1.5K

6.6K

1.9M

Pietro Mascheroni รีทวีตแล้ว

Terrible Maps@TerribleMaps·27 Nis

ZXX

239

987

15.1K

1.2M

Pietro Mascheroni@Pi_Mas·26 Nis

@arnabbarua10 @Entropy_MDPI @M3sBiomath I love this! Thanks Arnab!

English

Arnab Barua | অর্ণব বড়ুয়া@arnabbarua10·25 Nis

Finally, my hand made sketch of Cellular Sherlock Holmes🕵️‍♂️ came on the cover of April issue @Entropy_MDPI . It was a joint collaboration work with @M3sBiomath. mdpi.com/1099-4300/25/4

English

270

Pietro Mascheroni@Pi_Mas·14 Nis

@Rainmaker1973 It reminds me of this :D

English

379

Massimo@Rainmaker1973·13 Nis

When throwing paint turns into a masterpiece. Paul Kenton is a contemporary artist, acclaimed for his cityscape paintings which capture the unique energy of cities across the world [source, more videos: buff.ly/3Hfu47a] twitter.com/SnarkkTank/sta…

English

1.1K

8.3K

946.9K

Pietro Mascheroni รีทวีตแล้ว

M3s (B. Hatzikirou)@M3sBiomath·10 Kas

I am very proud to share the excellent work of my talented PhD student @Lito_MathBio_ about a theory-driven treatment vs Staphylococcus chronic infections. Here, the proposed treatment is supported and validated by murine experiments. sciencedirect.com/science/articl…

English

Pietro Mascheroni รีทวีตแล้ว

Papers with Code@paperswithcode·15 Kas

🪐 Introducing Galactica. A large language model for science. Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more. Explore and get weights: galactica.org