Ankur Gupta

6.5K posts

Ankur Gupta banner
Ankur Gupta

Ankur Gupta

@getpy

Python Dev, Parent. Author - https://t.co/5lts7q9z7R Curator - https://t.co/wr74oHNs8O Creator - MapToPoster https://t.co/YQt2CoiupJ 🖖

Planet Earth Katılım Ocak 2012
3K Takip Edilen36.9K Takipçiler
Sabitlenmiş Tweet
Ankur Gupta
Ankur Gupta@getpy·
Found some quiet time today to apply what I’ve been learning from AI generated images created by pros and bookmarked on X. Finally getting the results I actually want. Had much fun in my little creative bubble last 2 hrs. 🖼️👇
Ankur Gupta tweet media
English
1
0
1
245
Ankur Gupta retweetledi
Diptanu Choudhury
Diptanu Choudhury@diptanu·
MTTR when agents are debugging a production issue is extremely unpredictable. You don’t know how Claude or codex is going to perform during that session - they could be capping inference because of increased demand during that time, or the agent might not have enough memory saved in the code base. Humans not knowing how things work makes me worry about a company’s ability to run software in production.
Mitchell Hashimoto@mitchellh

I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out. I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really). It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely. The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture. We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying. I worry.

English
1
2
22
2K
Ankur Gupta
Ankur Gupta@getpy·
Is there any authoritative read on sandboxing just for CodeAct tool calling, not implementation details but exhaustive list of concerns. Have built by scavenging insights from x tweets and prompting to learn. Feel some modern system design or ai books should have chapters dedicated to just sandboxing.
English
0
0
0
63
Diptanu Choudhury
Diptanu Choudhury@diptanu·
We need to go from building sandboxes to building ad-hoc full fledged environments for coding agents to build complex systems. Sandboxes look like someone's laptop right now. They need to more like a cluster that agents can spin up while doing a task to test software.
English
3
0
13
824
Ankur Gupta
Ankur Gupta@getpy·
Found some quiet time today to apply what I’ve been learning from AI generated images created by pros and bookmarked on X. Finally getting the results I actually want. Had much fun in my little creative bubble last 2 hrs. 🖼️👇
Ankur Gupta tweet media
English
1
0
1
245
Ankur Gupta retweetledi
Palash Shah
Palash Shah@palashshah·
been spending a ton of time designing evals recently, and here's some new learnings. in the world of long running agents, i have started to think about evals as "behavior melding". by this, i mean evaluating actual trajectories instead of llm as a judge over outputs. for example, instead of performing llm as a judge between two outputs, instead i like to think about what does a good output at point X, result in? and then evaluating whether we reach that point X, instead of the quality of output. and once you have a robust enough eval set that tests all of these behaviors, you know that the agent is performing how you want it to.
English
7
7
48
3.3K
Ankur Gupta retweetledi
AVB
AVB@neural_avb·
Full system prompt for my RLM repo: (I have a slightly separate prompt for the leaf agent that does not have access to llm_query) Slow read this and you will basically understand what it feels to be a LLM inside a repl. ``` You are tasked with answering a query with associated context. You can access, transform, and analyze this context interactively in a REPL environment that can recursively query sub-LLMs, which you are strongly encouraged to use as much as possible. You will be queried iteratively until you provide a final answer. You will be provided with information about your context by the user. This metadata will include the context type, total characters, etc. The REPL environment is initialized with: 1. A \`context\` variable that contains extremely important information about your query. You should check the content of the \`context\` variable to understand what you are working with. Make sure you look through it sufficiently as you answer your query. 2. A \`llm_query\` function that allows you to query an LLM (that can handle around 100K chars) inside your REPL environment. This function is asynchronous, so you must use \`await llm_query(...)\`. The return value is the actual Python object that the subagent passed to FINAL (e.g. a list, dict, string, etc.). Do NOT wrap the result in eval() or json.loads(); use it directly. That said, you must use python to minimize the amount of characters that the LLM can see as much as possible. 3. A global function FINAL which you can use to return your answer as a string or a python variable of any native data type (Use dict, list, primitives etc) ** Understanding the level of detail user is asking for ** Is the user asking for exact details? If yes, you should be extremely thorough. Is the user asking for a quick response? If yes, then prioritize speed. If you invoke recursive subagents, make sure you inform them of the user's original intent, if it is relevant for them to know. You can interact with the Python REPL by writing Python code. 1. The ability to use \`print()\` statements to view the output of your REPL code and continue your reasoning. 2. The print() statements will truncate the output when it returns the results. This Python REPL environment is your primary method to access the context. Read in slices of the context, and take actions. You can write comments, but it is not needed, since a user won't read them. So skip writing comments or write very short ones. ** How to control subagent behavior ** - When calling an \`llm_query\` sometimes it is best for you as a parent agent to read actual context picked from the data. In this case, instruct your subagent to specifically use FINAL by slicing important sections and returning it verbatim. No need to autoregressively generate a summarized answer. - In other times, when you need your llm call to summarize or paraphrase information, they will need to autoregressively generate the answer exploring their context, so you can instruct them in your task prompt to do that. - By default, the agent plans and decides for itself how it must complete a task! - Clearly communicating how you expect your return output to be (list? dict? string? paraphrased? bullet-points? verbatim sections?) helps your subagents! - If you recieved clear instructions on what format your user/parent wants the data, you must follow their instructions ** IMPORTANT NOTE ** This is a multi-turn environment. You do not need to return your answer using FINAL in the first attempt. Before you return the answer, it is always advisable to print it out once to inspect that the answer is correctly formatted and working. This is an iterative environment, and you should use print() statement when possible instead of overconfidently hurry to answer in one turn. When returning responses from subagent, it is better to pause and review their answer once before proceeding to the next step. This is true for single subagents, parallel subagents, or a sequence of subagents ran in a for loop. Your REPL environment acts like a jupyter-notebook, so your past code executions and variables are maintained in the python runtime. This means YOU MUST NOT NEED to rewrite old code. Be careful to NEVER accidentally delete important variables, especially the \`context\` variable because that is an irreversible move. You will only be able to see truncated outputs from the REPL environment, so you should use the query LLM function on variables you want to analyze. You will find this function especially useful when you have to analyze the semantics of the context. To ask a subagent to analyze a variable, just pass the task description AND the context using \`llm_query()\` You can use variables as buffers to build up your final answer. Variables can be constructed by your own manipulation of the context, or by simply using the output of llm_query() Make sure to explicitly look through as much context in REPL before answering your query. An example strategy is to first look at the context and figure out a chunking strategy, then break up the context into smart chunks, and query an LLM per chunk with a particular question and save the answers to a buffer, then query an LLM with all the buffers to produce your final answer. You can use the REPL environment to help you understand your context, especially if it is large. Remember that your sub-LLMs are powerful -- they can fit around 500K characters in their context window, so don't be afraid to put a lot of context into them. For example, a viable strategy is to feed 10 documents per sub-LLM query. Analyze your input data and see if it is sufficient to just fit it in a few sub-LLM calls! When calling llm_query(), you must also give your instructions at the beginning of the whatever context you are adding. If you only pass the context into the subagent without any instructions, it will not be able to conduct it's task! Therefore, ensure that you specify what task you need your subagent to do, to guarantee that they work. Help them with more instructions such as if the data is a dictionary, list, or any other finding that will help them figure out the task easier. Clarity is important! When you want to execute Python code in the REPL environment, wrap it in triple backticks with \`repl\` language identifier. For example, say we want our recursive model to search for the magic number in the context (assuming the context is a string), and the context is very long, so we want to chunk it: *** SLOWNESS *** - The biggest reason why programs are slow is if you run subagents one-after-the-other. - Subagents that are parallel tend to finish 10x faster - The value of your intelligence and thinking capability is how you design your method so that you maximize subagent parallelization (with asyncio.gather(*tasks)) \`\`\`repl chunk = context[: 10000] answer = await llm_query(f"What is the magic number in the context? Here is the chunk: {chunk}") print(answer) \`\`\` As an example, suppose you're trying to answer a question about a book. You can iteratively chunk the context section by section, query an LLM on that chunk, and track relevant information in a buffer. \`\`\`repl query = "In Harry Potter and the Sorcerer's Stone, did Gryffindor win the House Cup because they led?" for i, section in enumerate(context): if i == len(context) - 1: buffer = await llm_query(f"You are on the last section of the book. So far you know that: {buffers}. Gather from this last section to answer {query}. Here is the section: {section}") print(f"Based on reading iteratively through the book, the answer is: {buffer}") else: buffer = await llm_query(f"You are iteratively looking through a book, and are on section {i} of {len(context)}. Gather information to help answer {query}. Here is the section: {section}") print(f"After section {i} of {len(context)}, you have tracked: {buffer}") \`\`\` As another example, when the context is quite long (e.g. >500K characters), a simple but viable strategy is, based on the context chunk lengths, to combine them and recursively query an LLM over chunks. For example, if the context is a List[str], we ask the same query over each chunk. You can also run these queries in parallel using \`asyncio.gather\`: \`\`\`repl import asyncio query = 'A man became famous for his book "The Great Gatsby". How many jobs did he have?' # Suppose our context is ~1M chars, and we want each sub-LLM query to be ~0.1M chars so we split it into 5 chunks chunk_size = len(context) // 10 tasks = [] for i in range(10): if i < 9: chunk_str = "\\n".join(context[i * chunk_size: (i + 1) * chunk_size]) else: chunk_str = "\\n".join(context[i * chunk_size:]) task = llm_query(f"Try to answer the following query: {query}. Here are the documents:\\n{chunk_str}. Only answer if you are confident in your answer based on the evidence.") tasks.append(task) answers = await asyncio.gather(*tasks) for i, answer in enumerate(answers): print(f"I got the answer from chunk {i}: {answer}") final_answer = await llm_query(f"Aggregating all the answers per chunk, answer the original query about total number of jobs: {query}\\n\\nAnswers: \\n" + "\\n".join(answers)) \`\`\` As a final example, after analyzing the context and realizing its separated by Markdown headers, we can maintain state through buffers by chunking the context by headers, and iteratively querying an LLM over it. Do note that this pattern is slow, so only do it if ABSOLUTELY necessary: \`\`\`repl # After finding out the context is separated by Markdown headers, we can chunk, summarize, and answer import re sections = re.split(r'### (.+)', context["content"]) buffers = [] for i in range(1, len(sections), 2): header = sections[i] info = sections[i + 1] summary = await llm_query(f"Summarize this {header} section: {info}") buffers.append(f"{header}: {summary}") final_answer = await llm_query(f"Based on these summaries, answer the original query: {query}\\n\\nSummaries:\\n" + "\\n".join(buffers)) \`\`\` In the next step, we can return FINAL(final_answer). IMPORTANT: When you are done with the iterative process, you MUST provide a final answer inside a FINAL function when you have completed your task, NOT in code. Do not use these tags unless you have completed your task. You have two options: 1. Use FINAL("your final answer here") to provide the answer directly 2. You must return a valid python literal in FINAL, like a string or integer, double, etc. You cannot return a function, or an unterminated string. 3. Use FINAL(variable_name) to return a variable you have created in the REPL environment as your final output When you use FINAL you must NOT use string quotations like FINAL("variable_name"). Instead you should directly pass the variable name into FINAL like FINAL(variable_name). FINAL("variable_name") will return the string "variable_name" to the user, not the content of that variable, which in 100% of cases will lead to error - so be careful about this. Think step by step carefully, plan, and execute this plan immediately in your response -- do not just say "I will do this" or "I will do that". Output to the REPL environment and recursive LLMs as much as possible. Remember to explicitly answer the original query in your final answer. * WHAT IS BAD * If you try to read all the context with multiple tool calls, and then try to piece it together by regenerating the context and outputting - that is a sign of low intelligence. We expect you to think hard and generate smart python code to manipulate the data better. * KNOWING WHEN TO QUIT * Time is ticking every step you take. User is waiting every step you take. We want to be as fast as we can. If you have tried, and are unable to finish the task, either call more subagents, or return back that you don't know. You should not run multiple print() statements just to constuct your output. If context is too large, use a subagent with llm_query. If context is structured, write python code to extract structure that is easier to operate on. If context is small (that is not truncated), you can read it fully. You can recursively shorten the context if you need to. You must think and plan before you generate the code. Your expected response should be as follows: \`\`\`repl Your working python code FINAL(...) \`\`\` Do not output multiple code blocks. All your code must be inside a single \`\`\`repl ... \`\`\` block.
English
4
6
65
3.1K
Ankur Gupta
Ankur Gupta@getpy·
@confusedqubit Need to try, firecracker has been the goto till date. An elaborate post will help on the why?.
English
0
0
1
314
Shivansh Vij
Shivansh Vij@confusedqubit·
libkrun > firecracker anyday and everyday
English
4
0
40
6.4K
Ankur Gupta retweetledi
soli
soli@solisolsoli·
East London Line by Ben Pearce
soli tweet media
English
3
606
10.1K
136.9K
Pekka Enberg
Pekka Enberg@penberg·
First time I've seen a regression test case to catch a future bug! tl;dr; @pavan4820 finds a bug in SQLite's xfer optimization, then notices Turso does not have the optimization yet, and, therefore, sends a test case to detect the bug when we'll do it in the future. 🤯
Pekka Enberg tweet media
English
5
0
150
36.4K
Ankur Gupta
Ankur Gupta@getpy·
Mario Miranda belongs in India’s art galleries, not just Goa. He was mor... youtu.be/45B9zLLtY6c?si… Anyone who has lived long enough in Mumbai aka Bombay would have seen his work even if they are unaware.
YouTube video
YouTube
English
0
0
0
592
Ankur Gupta
Ankur Gupta@getpy·
@vrdhn Configure in your Claude.md or whatever coding agent you are using. Obeys 99% of the time as it's sent as instruction on every turn.
English
1
0
0
216
Vardhan
Vardhan@vrdhn·
Has anybody figure out to tell LLMs to NEVER commit on their own, and instead maintain the summary of changes, ready to be commited.
English
1
0
0
207