Sabitlenmiş Tweet

Everyone’s talking about “AI agents,” but here’s how I actually built one that talks to any dataset, even the messy CSVs from data.gov.in.
Here is whole story of how you can do it, and some usefull learning from it
Currently, I have created a system that is submitted to Bharat Digital, which fetches or ingests the unstructured data from data.gov.in CSVs. The data is very unstructured, and it is loaded as a pandas DataFrame, cleaned, and then saved into a database.
Now this database is linked with an SQL LLM agent that uses the Qwen2.5-Coder 7B model, which can perform expert-level queries on SQL tables, even across 10–100s of tables, and can handle complex questions, including table joins and other advanced operations.
We can query it, and after that, the fetched data from the SQL LLM chain is used as context in the main conversation system, which uses the LLaMA 3.2 3B model as a chat interface for end users.
So, it’s a flow where we can chat with any amount of data, even unstructured data.
For example, if I want to ask, “Give me the state with the highest rainfall between 2000–2008, and also give the corresponding highest growth food or crop in that region,” it can generate a complex table-joining query, extract the data, and use that data as context for the main conversation.
My learnings:
- Qwen2.5-Coder 7B is the best medium-level reasoning model.
- LLaMA 3.2 3B is a decent small-parameter model for conversation.
- Don’t use a vector database for complex unstructured dataframes that have thousands or millions of rows — it won’t work perfectly.
- The 7B and 3B models are both workable and give decent results.
Here is video of demonstration : youtube.com/watch?v=HYfXqB…

YouTube

English





















