SujeevanRatnasingham

242 posts

SujeevanRatnasingham banner
SujeevanRatnasingham

SujeevanRatnasingham

@DNAdiversity

Principal on @LifeScanApp; Director of Informatics at CBG, University of Guelph, and Director of BOLD (https://t.co/2jK1DcJNYM)

Canada Katılım Ağustos 2008
63 Takip Edilen321 Takipçiler
Karl Richard
Karl Richard@karlfilho·
@svpino I got this: it’s the Positive Predictive Value, and you have a 0.98% chance of being sick if this test comes back positive. I try to teach this to my medical students, but apparently, either the math is too complex, or I’m a terrible teacher. Or both! 😂🤣
English
1
0
2
437
Santiago
Santiago@svpino·
99% of the people you know will answer this incorrectly. We need to start teaching statistics in middle school. Question: You go to the doctor and get tested for a disease that only 1 in 10,000 people get. The test is 99% effective in detecting both sick and healthy people. Your test comes back positive. Are you sick?
English
116
23
349
151.7K
SujeevanRatnasingham retweetledi
Prof Lennart Nacke, PhD
Prof Lennart Nacke, PhD@acagamic·
10 research gap types and how to bridge them
Prof Lennart Nacke, PhD tweet media
English
23
1K
5.6K
910.3K
DominikBuchner
DominikBuchner@buchner_dominik·
@DNAdiversity Is there already a timeline for this? Are we talking weeks, months, or years?
English
1
0
0
57
DominikBuchner
DominikBuchner@buchner_dominik·
BOLD... are you serious?! We will need an alternative soon!
DominikBuchner tweet media
English
2
2
8
814
SujeevanRatnasingham
SujeevanRatnasingham@DNAdiversity·
@buchner_dominik I fully agree. Over 2.5M IPs request data services from BOLD annually. It has been needing a significant upgrade in software and hardware for some time. Recent funding has allowed for it happen. This outage is temporary and is part of the upgrade to BOLD5.
English
1
0
2
55
DominikBuchner
DominikBuchner@buchner_dominik·
Don’t get me wrong, their work for the community has been fantastic. But limiting the API to 3 requests per minute, and now this… It’s a serious potential danger for the widespread use of genetic methods.
English
2
0
5
303
SujeevanRatnasingham
SujeevanRatnasingham@DNAdiversity·
Didi was a bright young star whose light was extinguished early. She was a north star for many young women in South Africa. Se was a beautiful person and a dedicated scientist. I will miss her. mg.co.za/news/2024-07-1…
English
0
2
5
201
SujeevanRatnasingham retweetledi
ACDB lab
ACDB lab@acdblab·
Prof. Michelle Van der Bank has been awarded the SA Academy of Science and Culture’s Medal of Honour for contribution to science in South Africa. Read more: shorturl.at/2g0zq
English
2
6
12
1.4K
SujeevanRatnasingham retweetledi
Naiara
Naiara@NaiaraLopezRojo·
Extremely happy to share this article!!!!! within in the project @DRYvER_H2020 we analysed the GHG emissions in European drying river networks. Drying had a legacy effect both on CO2 and CH4 & riverbeds represented >50% of total annual C emissions in 3 of the 6 case studies
Naiara tweet media
Grenoble, France 🇫🇷 English
3
18
59
4.7K
SujeevanRatnasingham retweetledi
Anish Kirtane
Anish Kirtane@DNAsaur_·
My PhD thesis is due in a couple of weeks. With so many things to take care of, I have adopted more effective to-do lists to manage my tasks and I am never going back! Here are my tips for best practices to improve output (1/n)#phdlife #AcademicChatter #phdvoice #AcademicChatter
Anish Kirtane tweet media
English
27
211
2.6K
496.1K
Jane Lubchenco
Jane Lubchenco@JaneLubchenco46·
I am excited to announce the National Aquatic eDNA Strategy. This strategy will accelerate the progress of fast, low-cost, and effective technologies for studying life in the ocean and how it’s changing. 🌊🧬 whitehouse.gov/ostp/news-upda…
English
2
42
113
15.6K
SujeevanRatnasingham retweetledi
Andrew Ng
Andrew Ng@AndrewYNg·
Inexpensive token generation and agentic workflows for large language models (LLMs) open up intriguing new possibilities for training LLMs on synthetic data. Pretraining an LLM on its own directly generated responses to prompts doesn't help. But if an agentic workflow implemented with the LLM results in higher quality output than the LLM can generate directly, then training on that output becomes potentially useful. Just as humans can learn from their own thinking, perhaps LLMs can, too. For example, imagine a math student who is learning to write mathematical proofs. By solving a few problems — even without external input — they can reflect on what does and doesn’t work and, through practice, learn how to more quickly generate good proofs. Broadly, LLM training involves (i) pretraining (learning from unlabeled text data to predict the next word) followed by (ii) instruction fine-tuning (learning to follow instructions) and (iii) RLHF/DPO tuning to align the LLM’s output to human values. Step (i) requires many orders of magnitude more data than the other steps. For example, Llama 3 was pretrained on over 15 trillion tokens, and LLM developers are still hungry for more data. Where can we get more text to train on? Many developers train smaller models directly on the output of larger models, so a smaller model learns to mimic a larger model’s behavior on a particular task. However, an LLM can’t learn much by training on data it generated directly, just like a supervised learning algorithm can’t learn from trying to predict labels it generated by itself. Indeed, training a model repeatedly on the output of an earlier version of itself can result in model collapse. However, an LLM wrapped in an agentic workflow may produce higher-quality output than it can generate directly. In this case, the LLM’s higher-quality output might be useful as pretraining data for the LLM itself. Efforts like these have precedents: - When using reinforcement learning to play a game like chess, a model might learn a function that evaluates board positions. If we apply game tree search along with a low-accuracy evaluation function, the model can come up with more accurate evaluations. Then we can train that evaluation function to mimic these more accurate values. - In the alignment step, Anthropic’s constitutional AI method uses RLAIF (RL from AI Feedback) to judge the quality of LLM outputs, substituting feedback generated by an AI model for human feedback. A significant barrier to using LLMs prompted via agentic workflows to produce their own training data is the cost of generating tokens. Say we want to generate 1 trillion tokens to extend a pre-existing training dataset. Currently, at publicly announced prices, generating 1 trillion tokens using GPT-4-turbo ($30 per million output tokens), Claude 3 Opus ($75), Gemini 1.5 Pro ($21), and Llama-3-70B on Groq ($0.79) would cost, respectively, $30M, $75M, $21M and $790K. Of course, an agentic workflow that uses a design pattern like Reflection would require generating more than one token per token that we would use as training data. But budgets for training cutting-edge LLMs easily surpass $100M, so spending a few million dollars more for data to boost performance is quite feasible. That’s why I believe agentic workflows will open up intriguing new opportunities for high-quality synthetic data generation. [Original text: deeplearning.ai/the-batch/issu… ]
English
34
231
1.2K
204.1K
SujeevanRatnasingham
SujeevanRatnasingham@DNAdiversity·
Privileged to be surrounded by so many luminaries at the Franklin Institute’s Committee on Science and the Arts Dinner. @CBG_UofG @iBOLConsortium
SujeevanRatnasingham tweet mediaSujeevanRatnasingham tweet mediaSujeevanRatnasingham tweet media
Philadelphia, PA 🇺🇸 English
1
2
9
642
SujeevanRatnasingham retweetledi
University of Guelph
Today, the #UofG community honoured the groundbreaking achievement of evolutionary biologist, Dr. Paul Hebert, for the pioneering work he and the team at the Centre of Biodiversity Genomics have accomplished in DNA barcoding to catalogue life on Earth.
University of Guelph tweet mediaUniversity of Guelph tweet mediaUniversity of Guelph tweet mediaUniversity of Guelph tweet media
English
0
9
30
3.5K