Pratik Patel

1.9K posts

Pratik Patel banner
Pratik Patel

Pratik Patel

@pratikpatel

Data Scientist in Life Sciences and Healthcare domains. Developing intelligent systems and processes using deep learning.

Katılım Mart 2008
1.1K Takip Edilen271 Takipçiler
Sabitlenmiş Tweet
Pratik Patel
Pratik Patel@pratikpatel·
People are underestimating the importance of this news. Whatever we have seen till now, is the same old Robots-are-going-to-kill-us reporting. It's different. It's a step closer to general purpose AI. It is not a question of sentience, but about solving humanities all problems. twitter.com/TheStalwart/st…
Joe Weisenthal@TheStalwart

1997: World's best human chess player gets destroyed by computer. 2017: World's best chess computer gets destroyed by Artificial Intelligence program that had only learned about chess a few hours earlier. twitter.com/bramcohen/stat…

English
1
0
5
0
Pratik Patel retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.7K
2.4K
31.3K
3.5M
Pratik Patel retweetledi
Andrew Rousso
Andrew Rousso@AndrewRousso·
the forbidden way to order food
English
181
1.8K
18.5K
580.8K
Pratik Patel
Pratik Patel@pratikpatel·
@IndiGo6E what’s up with your baggage services at Ahmedabad airport? *Overworked Staff* They don’t have time to answer phone calls. Delayed bag receipt that should take 2 mins to generate took 15 mins per customer because staff was getting calls continuously. Staff really wants to help but they are burdened by a thoughtless process (or lack of). *Convoluted Process* Delayed baggage seems to be a very common occurrence by the amount of passengers I saw in line at your baggage service desk. You should have a streamlined process that reduces burden on your staff. *Delivery Vendor* They want to finish the delivery asap which means they call the customer at 2AM to see if they can deliver. I understand they’d like to get this done but it should be customer’s preference if they want the delivery at all cost or with reasonable means. This post is only to highlight a company’s failure and not individuals’. Everyone I have encountered were very courteous and always wanted to do the right thing. @MoCA_GoI @DGCAIndia
English
0
1
0
67
Pratik Patel retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
Andrej Karpathy tweet media
English
687
3.4K
24.2K
5.8M
Pratik Patel retweetledi
Simon Willison
Simon Willison@simonw·
I got GPT-5 with ChatGPT code interpreter to hunt down the US Census numbers used by the recent Apollo AI adoption rate chart and then recreate that chart from a screenshot and the raw data using matplotlib in Python simonwillison.net/2025/Sep/9/apo…
English
9
19
292
102.3K
Pratik Patel retweetledi
lisatomic
lisatomic@lisatomic5·
the problem with chatGPT being a sycophant is downstream of ppls' default "thinking" mode being geared toward positive evaluations from other people
English
2
1
17
936
Pratik Patel retweetledi
Tim Urban
Tim Urban@waitbutwhy·
This is why it's important to surround yourself with people who can see the real you. If not, you might find yourself morphing your personality to match people's (incorrect) model of you, subconsciously hiding your real self away.
taoki@justalexoki

this is fucking me up

English
109
368
3.6K
296.3K
Pratik Patel retweetledi
Pearl
Pearl@ppearlman·
The election is over. If your side won, congrats. If your side lost, you’ll get ‘em next go round. Now it’s time to get back to making life better for you & your people. Back to basics… Move your body daily. Feed it real food. Prioritize rest. Love your people. Let’s go!!
English
25
46
362
58.3K
Pratik Patel retweetledi
FreeBSD Frau
FreeBSD Frau@freebsdfrau·
In 2002, I was working for a nation-wide retailer. About 6 months after building a new store, my car broke down. I lived so far away from the store that I was only able to get two rides home from different colleagues. I ultimately took to walking 23 miles to work (each way) 🧵
English
18
67
769
135.2K
Pratik Patel retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
I ran hundreds if not thousands of LoRA & QLoRA experiments to finetune open-source LLMs, and here’s what I learned: 1. Despite the inherent randomness of LLM training (or when training models on GPUs in general), the outcomes remain remarkably consistent across multiple runs. 2. QLoRA presents a trade-off that might be worthwhile if you're constrained by GPU memory. It offers 33% memory savings at the cost of a 33% increase in runtime. 3. When finetuning LLMs, the choice of optimizer shouldn't be a major concern. While SGD on its own is suboptimal, there's minimal variation in outcomes whether you employ AdamW, SGD with a scheduler, or AdamW with a scheduler. 4. While Adam is often labeled a memory-intensive optimizer due to its introduction of two new parameters for every model parameter, this doesn't significantly affect the peak memory demands of the LLM. This is because the majority of the memory is allocated for large matrix multiplications rather than retaining extra parameters. 5. For static datasets, iterating multiple times as done in multi-epoch training might not be beneficial. It often deteriorates the results, probably due to overfitting. 6. If you're incorporating LoRA, ensure it's applied across all layers, not just to the Key and Value matrices, to maximize model performance. 7. Adjusting the LoRA rank is essential, and so is selecting an apt alpha value. A good heuristic is setting alpha at twice the rank's value. 8. 7B models can be finetuned efficiently within a few hours on a single GPU possessing 14 Gb of RAM. With a static dataset, optimizing an LLM to excel across all benchmark tasks is unattainable. Addressing this requires diverse data sources, or perhaps LoRA might not be the ideal tool.
Lightning AI ⚡️@LightningAI

After hundreds of experiments, @rasbt has figured out how to get the most out of LoRA finetuning 👉 lightning.ai/pages/communit… #LLMs #GenAI #DeepLearning

English
27
218
1.2K
367.4K
Pratik Patel retweetledi
Nirant
Nirant@NirantK·
Alibaba releases QwenLM-7B and Chat variant Self-reported results: Beats Llama2 by a mile on math(GSM8K), code (HumanEval), Question Answering and QA (MMLU) 🤯 ~2x better than GPT4 on a variant of tool selection (e.g. AutoGPT) — 8.5% False Positive Rate, compared to GPT4's 15% Commercially licensed if below 100M users huggingface.co/Qwen
English
4
17
64
12K
Pratik Patel retweetledi
Austen Allred
Austen Allred@Austen·
It’s remarkable how people like @joerogan and @lexfridman are building absolute media empires by simply listening to people for once instead of cutting them off and consistently trying to shape a cohesive narrative
English
57
259
4.3K
0
Pratik Patel
Pratik Patel@pratikpatel·
@caseyliss Took me some time to find this tweet but i think i nailed how it’ll be implemented 5 years ago.
English
1
0
0
0
Pratik Patel retweetledi
Ana Navarro-Cárdenas
Ana Navarro-Cárdenas@ananavarro·
If you’re against abortion, don’t get one. If you’re against contraception, don’t take any. If you’re against same-sex relationships, don’t have one. If you’re against same-sex marriage, don’t marry someone of same gender. Do not impose your beliefs & religion on all Americans.
English
15.8K
133.9K
576.1K
0
Pratik Patel retweetledi
Alec Stapp
Alec Stapp@AlecStapp·
Fix your labor shortage with this one weird trick
Alec Stapp tweet media
English
691
28.2K
178.3K
0
Pratik Patel retweetledi
Inspired by Iceland
Inspired by Iceland@iceland·
Some said an open-world experience this immersive wasn’t possible. But it’s already here. And you don’t even need silly VR headsets. Introducing, ✨Icelandverse✨ #icelandverse
English
738
8.5K
31.4K
0
Pratik Patel retweetledi
Marques Brownlee
Marques Brownlee@MKBHD·
Great now do the whole system
English
244
7.9K
56.1K
0
Pratik Patel
Pratik Patel@pratikpatel·
@eightsleep what is going on with your order delays. Ordered almost 6 weeks ago. Originally told, it’d be delivered in 3-4 weeks. Now I’m being told it’ll ship on April 15th. This is unacceptable. Every time different excuse.
Westford, MA 🇺🇸 English
1
0
0
0