@beaconnbin go with LLM engineering (agents/LangGraph/evals). MLOps is crowded, backend gets commoditized. agent orchestration is where the unsolved problems are - nobody's figured out reliable multi-step reasoning at scale yet. that's where the leverage is
Senior ML/AI folks, need advice.
I’ve done core ML and built RAG/LangChain/LLM apps. What’s the smartest next step ?
MLOps + pipelines, LLM engineering (agents/LangGraph/evals), or backend/data engineering to build real production AI systems?
Twitter is cool.
But it’s 10x better when you connect with people who like building and scaling GenAI systems.
If you’re into LLMs, GenAI, Distributed Systems or backend.
say hi.
@abhi1thakur Can someone share a real example of building a coding-focused AI agent in a workplace setting? Also, do tools like CrewAI actually hold up in production, and what does the typical pipeline look like? just want a perspective
Man, I don't think I should dream of working at latest tech even before release and be involved in its making
I flunk simple things (leetcode, explanation of concepts) at interviews man
people believe in me and I flunk why?
Idont know, it happens just at places I desire to work in one day, dissappointing them and me
I think I should just what is assigned at job and stick to it since I just cant seem to make it anywhere
Guys, do you know how Uber sets the fare price every second?
How Google Ads Smart Bidding works?
They use a hybrid modeling (time-series + ML prediction) stack.
Covering this in my upcoming Substack this Sunday, 10 AM.
Subscribe: kmeanskaran.substack.com
How to evaluate an ML/LLM model in 2025?
Precision, recall, and F1 score are crucial parts of ML evaluation. In the past, they were theoretically the only way to judge a model. Kaggle competitions later made them famous for earning medals, but in 2025, no app or algorithm is perfect, and ML evaluation has evolved.
Here’s the reality: data is growing insanely fast. You often need to retrain your model every couple of months because old patterns quickly become irrelevant. A single “best” model with 90% metrics can become outdated fast.
What should you do?
- If your model has a decent score (e.g., ~80% accuracy or precision/recall, which is actually great in production), push it to production.
- Deploy it and run A/B testing, this is the only real way to validate performance against live data.
- Retrain regularly based on new data and feedback.
- Evaluate the model on business impact.
In 2025, a robust ML pipeline matters far more than a single accuracy metric. Researchers are building strong foundational models for general tasks, but real-world success depends on your system design.
The ML pipeline and system design are now mandatory.
Keep learning ;)
So I have been training a 50M param GPT but it should take like ~3 hours to train it for one epoch while mine takes ~12 hours.
I have added mixed-precision (bfloat16) using autocast and torch.compile.
Not using flash attention, or DDP, these are constraints. ChatGPT and Claude can't help me anymore and I can't show you guys the code for now.
What could I be forgetting?
FCK it. Here's all the sauce.
After shipping 100+ apps with @Lovable — I made the ULTIMATE Design Cheat Sheet.
Every prompt.
Every design system pattern.
Every cloud config + infra setup.
Every component standard + best practice we actually use to achieve world-class UI.
All in one doc.
Follow + comment "Cheat Sheet" and I'll DM it to you.
I'm currently looking for an internship or entry level role Backend + GenAI focused. I’ve worked on a bunch of amazing projects, some listed in the thread 👇
If you’re hiring or know someone who is, my DMs are open!
Please like/RT to help me reach the right people 🙏
Wrapped up implementing L1 regularization from scratch and diving into logistic regression! 🚀
with mid sems coming up , gotta take a short break from my ML journey this week 😊
#MachineLearning#AI
Important thing is just read 1 or 2 research paper on weekend and after 10 to 12 months...
you'll be having very good understanding of your area of interest.