Ryan Booth

13.1K posts

Ryan Booth banner
Ryan Booth

Ryan Booth

@that1guy_15

GenAI/MLOps. Passionate about building solutions users can easily consume. Infrastructure Engineer at my core. My Art: https://t.co/Jssz1LCN2Q

Canyon, TX Katılım Mayıs 2012
863 Takip Edilen2.1K Takipçiler
Ryan Booth retweetledi
AIMUG - Austin LangChain
AIMUG - Austin LangChain@AustinLangChain·
The future of AI is agent marketplaces. Ryan Booth's A2A/AP2 demo shows how agents discover, negotiate & transact autonomously. This changes everything about digital commerce 🧵👇 youtube.com/watch?v=d5l0D7…
YouTube video
YouTube
English
1
1
2
410
Ryan Booth retweetledi
Alex Albert
Alex Albert@alexalbert__·
To start, if you want to see Claude 3.5 Sonnet in action solving a simple pull request, here's a quick demo video we made. (voiceover by the one and only @sumbhavsethia)
English
49
57
810
146.7K
Ryan Booth retweetledi
Lior Alexander
Lior Alexander@LiorOnAI·
1. Real time translation
English
20
122
2.3K
537.3K
Ryan Booth retweetledi
Ethan Mollick
Ethan Mollick@emollick·
For managers who want to encourage AI use in their teams, psychological safety is a good guide: 1) Share with your team that you will be exploring AI together, no one knows what it can do so you need help 2) Publicly share mistakes you make 3) Demonstrate curiosity & experiment
Ethan Mollick@emollick

Reminder that “psychological safety” isn’t a buzzword, but a research-backed way of making sure everyone on the team feels comfortable taking risks. At studies at Google it was the key predictor of innovative team success. Use the checklists on your team! rework.withgoogle.com/guides/underst…

English
4
33
183
42.9K
Ryan Booth retweetledi
Cameron R. Wolfe, Ph.D.
Cameron R. Wolfe, Ph.D.@cwolferesearch·
RAG is one of the best (and easiest) ways to specialize an LLM over your own data, but successfully applying RAG in practice involves more than just stitching together pretrained models… What is RAG? At the highest level, RAG is a combination of a pretrained LLM with an external (searchable) knowledge base. At inference time, we can search for relevant textual context within this knowledge base and add it to the LLM’s prompt. Then, the LLM can use its in context learning abilities to leverage this added context and produce a more factual/grounded output. Simple implementation. We can create a minimal RAG pipeline using a pretrained embedding model and LLM by: 1. Separating the knowledge base into fixed-size chunks. 2. Vectorizing each chunk with an embedding model. 3. Vectorizing the input/query at inference time and using vector search to find relevant chunks. 4. Adding relevant chunks into the LLM’s prompt. This simple approach works, but building a high-performing RAG application requires much more. Here are five avenues we can follow to refine our RAG pipeline. (1) Hybrid Search: At the end of the day, the retrieval component of RAG is just a search engine. So, we can drastically improve retrieval by using ideas from search. For example, we can perform both lexical and vector retrieval (i.e., hybrid retrieval), as well as re-ranking via a cross-encoder to retrieve the most relevant data. (2) Cleaning the data: The data used for RAG may come from several sources with different formats (e.g., pdf, markdown and more), which could lead to artifacts (e.g., logos, icons, special symbols, and code blocks) that could confuse the LLM. We can solve this by creating a data preprocessing or cleaning pipeline (either manually or by using LLM-as-a-judge) that properly standardizes, filters, and extracts data for RAG. (3) Prompt engineering: Successfully applying RAG is not just a matter of retrieving the correct context—prompt engineering plays a massive role. Once we have the relevant data, we must craft a prompt that i) includes this context and ii) formats it in a way that elicits a grounded output from the LLM. First, we need an LLM with a sufficiently large context window. Then, we can adopt strategies like diversity and lost-in-the-middle selection to ensure the context is properly incorporated into the prompt. (4) Evaluation: We must also implement repeatable and accurate evaluation pipelines for RAG that capture the performance of the whole system, as well as its individual components. We can evaluate the retrieval pipeline using typical search metrics (DCG and nDCG), while the generation component of RAG can be evaluated with an LLM-as-a-judge approach. To evaluate the full RAG pipeline, we can also leverage systems like RAGAS. (5) Data collection: As soon as we deploy our RAG application, we should begin collecting data that can be used to improve the application. For example, we can finetune retrieval models over pairs of input queries with relevant textual chunks, finetune the LLM over high-quality outputs, or even run AB tests to quantitatively measure if changes to our RAG pipeline benefit performance. What’s next? Beyond the ideas explored above, there are a variety of avenues that exist for improving RAG. Once we have implemented a robust evaluation suite, we can test a variety of improvements using both offline metrics and online AB tests. Our approach to RAG should mature (and improve!) over time as we test new ideas.
Cameron R. Wolfe, Ph.D. tweet media
English
16
262
1.4K
184.3K
Ryan Booth retweetledi
Derick Winkworth
Derick Winkworth@cloudtoad·
Hey folks... I have an urgent opportunity for someone that has AWX, Azure, Python, and some networking background. You should be US based. It is remote. I'm gathering more details now, but those are the key skills. If interested DM me. If not, please repost. Thanks!
English
1
16
13
3K
Ryan Booth retweetledi
Santiago
Santiago@svpino·
Scrum is a cancer. I've been writing software for 25 years, and nothing renders a software team useless like Scrum does. Some anecdotes: 1. They tried to convince me that Poker is a planning tool, not a game. 2. If you want to be more efficient, you must add process, not remove it. They had us attending the "ceremonies," a fancy name for a buttload of meetings: stand-ups, groomings, planning, retrospectives, and Scrum of Scrums. We spent more time talking than doing. 3. We prohibited laptops in meetings. We had to stand. We passed a ball around to keep everyone paying attention. 4. We spent more time estimating story points than writing software. Story points measure complexity, not time, but we had to decide how many story points fit in a sprint. 5. I had to use t-shirt sizes to estimate software. 6. We measured how much it cost to deliver one story point and then wrote contracts where clients paid for a package of "500 story points." 7. Management lost it when they found that 500 story points in one project weren't the same as 500 story points on another project. We had many meetings to fix this. 8. Imagine having a manager, a scrum master, a product owner, and a tech lead. You had to answer to all of them and none simultaneously. 9. We paid people who told us whether we were "burning down points" fast enough. Weren't story points about complexity instead of time? Never mind. I believe in Agile, but this ain't agile. We brought professional Scrum trainers. We paid people from our team to get certified. We tried Scrum this way and that other way. We spent years doing it. The result was always the same: It didn't work. Scrum is a cancer that will eat your development team. Scrum is not for developers; it's another tool for managers to feel they are in control. But the best about Scrum are those who look you in the eye and tell you: "If it doesn't work for you, you are doing it wrong. Scrum is anything that works for your team." Sure it is.
Santiago tweet media
English
2K
4.4K
25.1K
4.8M
Ryan Booth
Ryan Booth@that1guy_15·
I got my first LLM (CodeLlama) up in #AWS SageMaker this weekend. Pretty simple to get your whole ML pipeline up and running. Models are a little wonky when you are not pulling them from @huggingface #AutoML looks tempting but I need to wrap an API around this project first.
English
0
0
0
325
Ryan Booth retweetledi
Jim Fan
Jim Fan@DrJimFan·
Llama-2 was almost at GPT-3.5 level except for coding, which was a real bummer. Now Code Llama finally bridges the gap to GPT-3.5! Coding is by far the most important LLM task. It's the cornerstone of strong reasoning engines and powerful AI agents like Voyager. Today is another major milestone in OSS foundation models. Read with me: - Code Llamas are finetuned from Llama-2 base models, and come in 3 flavors: vanilla, Instruct, and Python. Model sizes are 7B, 13B, 34B. The smallest model can run locally with decent GPU. - On HumanEval benchmark, CodeLlama-python achieves 53.7 vs GPT-3.5 (48.1), but still trailing behind GPT-4 (whopping 67.0). On MBPP, it gets 56.2 vs 52.2 for GPT-3.5. - Significantly better than PaLM-Coder, Codex (GitHub copilot model), and other OSS models like StarCoder. - Trained with an "infilling objective" to support code generation in the middle given surrounding context. Basically, the model takes in (prefix, suffix) or (suffix, prefix) and outputs (middle). It's still autoregressive, but with special marker tokens. The infilling data can be easily synthesized by splitting the code corpus randomly. - Another set of synthetic data from Self-Instruct: (1) take 60K coding interview questions; (2) generate unit tests; (3) generate 10 solutions; (4) run Python interpreter and filter out bad ones. Add good ones to the dataset. - Long context finetuning: Code Llama starts from 4K context and finetunes with 16K context to save computation. With some positional embedding tricks, it can carry consistency over even longer context. - The instruction finetuning data is proprietary and won't be released. Congrats to the team! ai.meta.com/blog/code-llam…
GIF
English
28
171
913
167.1K
Ryan Booth retweetledi
Yi Ding -- prod/acc
Yi Ding -- prod/acc@yi_ding·
🪙Tokens are one of the first things people hear about when they come across large language models like ChatGPT. What is a token? a sequence of unicode characters converted into a number. These numbers are then how the text you send to the LLM are represented internally inside the model. For example, if we use the @OpenAI tokenizer, platform.openai.com/tokenizer, we see that the first three words of this tweet are separated into three tokens: ["Tokens", " are", " one"]. However, not all words are like that: for example, if you try the word strawberry, you may see that the word is separated into three words: ["st", "raw", "berry"]. But wait, if you just add a space before the word, it ends up being one token: [" strawberry"]. 🍓 What's going on here? This is one of the quirks of the way tokenization works. Spaces and punctuation also need to be tokenized. So GPT is trained on a lot of sentences with the word strawberry in the middle, but fewer sentences with the word strawberry, uncapitalized, at the beginning of a sentence or paragraph. This is a good reason why you don't want to have extraneous spaces (and depending on what you're trying to do newlines) at the end of your prompts. By leaving a space at the end of your prompt, you make it harder for the model to complete with many common words.
English
6
7
59
26K
Ryan Booth
Ryan Booth@that1guy_15·
@stephanpastis I just assumed about halfway through you let ChatGPT write this one :)
English
0
0
0
85
Ryan Booth retweetledi
Jerry Liu
Jerry Liu@jerryjliu0·
A big issue with building LLM search is that there’s too many options for retrieval and it’s not clear which ones are better than others 🤔 ❓ Top-k has its downsides, but should I do hybrid search? Knowledge graphs? ⚠️ What if one method works better for some questions but worse for others? A general concept I’ve been playing around with is just “try everything” 🌎- if cost isn’t an issue, fire off a bunch of calls in parallel to different approaches and ensemble them together using an LLM 🛠️🛠️ For any given question, some techniques will work better than others. With the ensemble approach, ideally you’re just taking a max() function over all the candidates. 💡 Because these calls can be made in parallel, latency is (kind of) mostly constrained by the slowest query technique you have ⚡️ This approach is pretty easy to setup in @llama_index with our abstractions. You can check out our brand-new guide here 📗: gpt-index.readthedocs.io/en/stable/exam… Would there be a situation where this would be useful? Have you tried this already and it works/doesn’t work? Let us know! 🙋
Jerry Liu tweet media
English
24
54
376
91K