WallStOwnIt retweetledi

This week, Google announced a doubling of Gemini Pro 1.5's input context window from 1 million to 2 million tokens, and OpenAI released GPT-4o, which generates tokens 2x faster and 50% cheaper than GPT-4 Turbo and natively accepts and generates multimodal tokens. I view these developments as the latest in an 18-month trend. Given the improvements we've seen, best practices for developers have changed as well.
Since the launch of ChatGPT in Nov 2022, with key milestones that include the releases of GPT-4, Gemini 1.5 Pro, Claude 3 Opus, and Llama 3-70B, many model providers have improved their capabilities in two important ways: (i) reasoning, which allows LLMs to think through complex concepts and and follow complex instructions; and (ii) longer input context windows.
The reasoning capability of GPT-4 and other advanced models makes them quite good at interpreting complex prompts with detailed instructions. Many people are used to dashing off a quick, 1- to 2-sentence query to an LLM. In contrast, when building applications, I see sophisticated teams frequently writing prompts that might be 1 to 2 pages long (my teams call them “mega-prompts”) that provide complex instructions to specify in detail how we’d like an LLM to perform a task. I still see teams not going far enough in terms of writing detailed instructions. For an example of a moderately lengthy prompt, take a look at Claude 3’s system prompt. It’s detailed and gives clear guidance on how Claude should behave.
This is a very different style of prompting than we typically use with LLMs’ web user interfaces, where we might dash off a quick query and, if the response is unsatisfactory, clarify what we want through repeated conversational turns with the chatbot.
Further, the increasing length of input context windows has added another technique to the developer’s toolkit. GPT-3 kicked off a lot of research on few-shot in-context learning. For example, if you’re using an LLM for text classification, you might give a handful — say 1 to 5 examples — of text snippets and their class labels, so that it can use those examples to generalize to additional texts. However, with longer input context windows — GPT-4o accepts 128,000 input tokens, Claude 3 Opus 200,000 tokens, and Gemini 1.5 Pro 1 million tokens (2 million just announced in a limited preview) — LLMs aren’t limited to a handful of examples. With many-shot learning, developers can give dozens, even hundreds of examples in the prompt, and this works better than few-shot learning.
When building complex workflows, I see developers getting good results with this process:
- Write quick, simple prompts and see how it does.
- Based on where the output falls short, flesh out the prompt iteratively. This often leads to a longer, more detailed, prompt, perhaps even a mega-prompt.
- If that’s still insufficient, consider few-shot or many-shot learning (if applicable) or, less frequently, fine-tuning.
- If that still doesn’t yield the results you need, break down the task into subtasks and apply an agentic workflow.
I hope a process like this will help you build applications more easily. If you’re interested in taking a deeper dive into prompting strategies, I recommend Microsoft's Medprompt paper (Nori et al., 2023), which lays out a complex set of prompting strategies that can lead to very good results.
[Original text (with links): deeplearning.ai/the-batch/issu… ]
English

