

agibsonccc
10.3K posts

@agibsonccc
Founder @KonduitAI Maintainer Eclipse Deeplearning4j Building https://t.co/GDg4JPHYso - #MLOps YC W16



In 2012 CUDA was very important. You can't build anything without it. In 2024 90% of AI developers are actually web developers – and they build off Llama, not CUDA.


Great article by @Priyasideas on increasing opposition to SB 1047. sfstandard.com/2024/08/09/may…









So when *the CTO* of OpenAI is asked if Sora was trained on YouTube videos, she says “actually I’m not sure” and refuses to discuss all further questions about the training data. Either a rather stunning level of ignorance of her own product, or a lie—pretty damning either way!

New Resource: Foundation Model Development Cheatsheet for best practices We compiled 250+ resources & tools for: 🔭 sourcing data 🔍 documenting & audits 🌴 environmental impact ☢️ risks & harms eval 🌍 release & monitoring With experts from @AiEleuther, @allen_ai, @huggingface, @StanfordCRFM, @PrincetonCITP, @MasakhaneNLP, @MIT++ 🔗 fmcheatsheet.org 1/

Thankfully, ML/AI research is largely free of the commercial publishing stranglehold. Preprints are posted on ArXiv and OpenReview, short papers in top conferences like ICLR, NeurIPS, ICML, and a few others, and longer papers in JMLR and TMLR. All of these venues are open access and free for both readers and authors. ArXiv and OpenReview are supported by philanthropy (they need more), conferences by registration fees, and journals by nothing (it really costs essentially no money to run an online journal). The exceptions are the few folks who still think that publishing in Nature Machine Intelligence, Machine Learning Journal, and other for-profit journals is a good idea (looking at you, DeepMind 😠). Not surprisingly, many Nature MI papers are about hardware or applications of AI to the sciences, topics that are not well covered by "core" ML venues.

EpsteinGPT has been officially banned. Why?

I just finished a two-day company quarterly strategy meeting. I haven’t missed anything, have I? Satya is still CEO at MSFT, right?


I burned in🔥2000$ in finetuning so you don't have to. I fine-tuned models with @OpenAI and @anyscalecompute API endpoints with 50million tokens. Here are the results I wish I knew before getting into finetuning. If you just want a quick snapshot, look at the figure. A longer explanation follows, explaining my findings. I am not an expert and not deep into theory of AI models. I just want to get the BEST model performance at the CHEAPEST possible price for my USE-CASE. And quickly deploy that to prod. I picked one specific simple USE-CASE. Summarizing text in a very specific tone, voice and a very specific structure. Trained both models with close to 50M tokens (~37M words). In short, - Anyscale costs 40X cheaper to finetune. - Anyscale costs 56x cheaper to finetune. Comparing the outputs, I get on par performance from llama-13b-fine-tuned as gpt-3.5-fine-tuned. Finetuning smaller models is clearly the way to go for simpler use-cases! I don't understand OpenAI's offering for fine-tuning here. They need to step-up the game. They need to either reduce the price or offer flexibility to compete with open-source fine-tuning models. I am going to run an another experiment which is a way more complicated use-case. It would be interesting to see who wins here. I suspect @OpenAI Turbo will have an edge here (otherwise the pricing does not make sense). P.S : I also know I can finetune models locally & directly without API. Like I said, I am not deep into theory yet. I tried this in @huggingface with their auto-train framework. But it was just not as easy as plugging in via API calls. There were adapters and stuff, and I got quickly lost. But I am reading up and will try start including them in the comparisons too. If anyone is aware of other managed/otherwise solutions for finetuning let me know please.

