Shrutimoy Das

43 posts

Shrutimoy Das banner
Shrutimoy Das

Shrutimoy Das

@shrutimoy

PhD Student at IIT Gandhinagar

Katılım Haziran 2021
349 Takip Edilen81 Takipçiler
Shrutimoy Das retweetledi
Ash Jogalekar
Ash Jogalekar@curiouswavefn·
"For five hundred years, the idea was the prize. The theory. The hypothesis. The flash of insight a physicist chased for twenty years in a lab before it landed. That was the bottleneck. That was what tenure rewarded. That was what Nobel committees were looking for." This vastly underestimates the complexity of what an 'idea' entails. An idea is not simply something a scientist dreams up on a lazy afternoon; that's the *raw material* for an idea. A good idea is one that comes with a plan for execution. No tenure committee, let alone the Nobel committee, will reward flashes of insight; they will only reward flashes of insights that come with a plan and actual validation.
Dustin@r0ck3t23

Terence Tao is the greatest living mathematician. Fields Medal at 31. Solved problems that had been open for a century. Widely regarded as the sharpest analytical mind alive. And he just told you the thing your entire career is built on is now worthless. Tao: “AI has basically driven the cost of idea generation down to almost zero.” For five hundred years, the idea was the prize. The theory. The hypothesis. The flash of insight a physicist chased for twenty years in a lab before it landed. That was the bottleneck. That was what tenure rewarded. That was what Nobel committees were looking for. Gone. A model can generate a thousand candidate theories for a scientific problem in an afternoon. Not noise. Not garbage. Plausible, structured, publishable-grade hypotheses. A thousand of them. Before dinner. The idea used to be the scarcest resource in any room. Now it is the cheapest. But Tao went somewhere most people are not ready to follow. Tao: “Verification, validation, and assessing what ideas actually move the subject forward… that’s not something we know how to do at scale.” Sit with that. We automated creation. We did not automate truth. We can produce ten thousand explanations for a phenomenon. We cannot tell you which ones are real. That is not a gap. That is a chasm. And it is the most important unsolved problem on Earth right now. Tao: “Human reviewers… they’re already being overwhelmed actually.” The entire scientific apparatus was built for a world where a single paper took months to produce. Peer review. Journal boards. Consensus forged over years of replication and debate. That infrastructure was never designed for what just hit it. Journals are flooded. Reviewers are buried. The filters that separated signal from noise for decades were engineered for human-speed output. They are now absorbing machine-speed volume. And they are cracking under it. Tao compared it to the internet. The internet drove the cost of communication to zero. That did not produce clarity. It produced an ocean of noise with islands of signal buried somewhere inside. AI just did the same thing to knowledge itself. Infinite generation. Zero verification. The person who can produce ideas has never mattered less. The person who can prove which ideas are true has never mattered more. That is the inversion nobody is processing. Every company, every lab, every institution is racing to generate more. Faster models. Bigger outputs. More theories. More code. More content. Nobody is building the system that tells you which of those outputs are actually correct. And that is the only system that matters. Whoever solves verification at scale does not win a market. They become the filter that all of science, all of engineering, all of human discovery flows through. The bottleneck of the last five hundred years was producing the answer. The bottleneck of the next fifty is knowing whether the answer is real. And right now, according to the greatest mathematician alive, we do not know how to do that at the speed the machines demand. That is not a research problem. That is the race beneath the race. And almost nobody has entered it.

English
3
7
78
12.3K
Shrutimoy Das retweetledi
Maaz
Maaz@mmaaz_98·
I built a GPU-accelerated linear programming solver in PyTorch that scales to 100k+ variables and constraints -- and is competitive with state-of-the-art solvers. The entire implementation is only ~350 lines (excl. docs / logging) and is meant to be as simple as possible.
Maaz tweet media
English
26
78
905
62.7K
Shrutimoy Das retweetledi
Fermat's Library
Fermat's Library@fermatslibrary·
Computer scientist Edsger Dijkstra on the frustration of debugging a program you wrote yourself
English
34
347
2.3K
229.5K
Shrutimoy Das retweetledi
Prof. Anima Anandkumar
Prof. Anima Anandkumar@AnimaAnandkumar·
For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimizer states during training. Training LLMs from scratch currently requires huge computational resources with large memory GPUs. While there has been significant progress in reducing memory requirements during fine-tuning (e.g., LORA), they do not apply for pre-training LLMs. We design methods that overcome this obstacle and provide significant memory reduction throughout training LLMs. Training LLMs often requires the use of preconditioned optimization algorithms such as Adam to achieve rapid convergence. These algorithms accumulate extensive gradient statistics, proportional to the model's parameter size, making the storage of these optimizer states the primary memory constraint during training. Instead of focusing just on engineering and system efforts to reduce memory consumption, we went back to fundamentals. We looked at the slow-changing low-rank structure of the gradient matrix during training. We introduce a novel approach that leverages the low-rank nature of gradients via Gradient Low-Rank Projection (GaLore). So instead of expressing the weight matrix as low rank, which leads to a big performance degradation during pretraining, we instead express the gradient weight matrix as low rank without performance degradation, while significantly reducing memory requirements. @jiawzhao @BeidiChen @tydsh
AK@_akhaliq

GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank

English
47
364
2.2K
407.6K
Shrutimoy Das retweetledi
MIT CSAIL
MIT CSAIL@MIT_CSAIL·
“The best learners are the people who push through the discomfort of being objectively bad at something.” — Tommy Collison
English
3
88
471
54.6K
Shrutimoy Das retweetledi
Ben Grimmer
Ben Grimmer@prof_grimmer·
The new strangest results of my career (with Kevin Shu and Alex Wang). Gradient descent can accelerate (in big-O!) by just periodically taking longer steps. No momentum needed to beat O(1/T) in smooth convex opt! Paper: arxiv.org/abs/2309.09961 [1/3]
Ben Grimmer tweet media
English
11
89
665
94.2K
Shrutimoy Das retweetledi
elvis
elvis@omarsar0·
LLMs as Optimizers This is a really neat idea. This new paper from Google DeepMind proposes an approach where the optimization problem is described in natural language. An LLM is then instructed to iteratively generate new solutions based on the defined problem and previously found solutions. It was first tested on linear regression and the traveling salesman problem. Leveraging LLMs with simple prompting match or surpass hand-designed heuristic algorithms. This shows good potential for using LLMs as optimizers. The idea is then applied to prompt optimization that aims to maximize task accuracy on different tasks like math word problem-solving. The first piece of the proposed meta-prompt takes in previously generated prompts along with corresponding training accuracies. The second piece includes the optimization problem description with samples obtained from a training set representing the task. At each optimization step, the goal is to generate new prompts that increase test accuracy based on the trajectory of previously generated prompts. The optimized prompts outperform human-designed prompts on GSM8K and Big-Bench Hard, sometimes by over 50%! For math word problem solving, one of the most effective instructions found begins with "Take a deep breath and work on this problem step-by-step". arxiv.org/abs/2309.03409
elvis tweet media
English
22
376
1.7K
300.4K
Shrutimoy Das retweetledi
Shubhendu Trivedi
Shubhendu Trivedi@_onionesque·
This probably keeps getting shared here all the time, but it's worth resharing: An excellent set of lectures on high dimensional probability and concentration inequalities by Roman Vershynin. These complement his great book well. math.uci.edu/~rvershyn/teac…
English
1
24
120
16.6K
Shrutimoy Das retweetledi
Ben Grimmer
Ben Grimmer@prof_grimmer·
I've proven the strangest result of my career.. The classic idea that gradient descent's rate is best with constant stepsizes 1/L is wrong. The idea that we need stepsizes in (0,2/L) for convergence is wrong. Periodic long steps are better, provably. arxiv.org/abs/2307.06324
Ben Grimmer tweet media
English
67
564
3.7K
681.2K
Shrutimoy Das retweetledi
Divy Thakkar
Divy Thakkar@divy93t·
Research Week with Google - officially a wrap! Extremely energising to be with students and see their research curiosity! Till next time! Special thanks to our amazing speakers, ACs, organisers and Program Chairs !
Divy Thakkar tweet media
English
7
28
186
34.1K