Ram Ramjee

255 posts

Ram Ramjee

@ramaramjee

Tweets on (computer) science, philosophy, finance, weight lifting, ...

Bengaluru, India Katılım Kasım 2008

67 Takip Edilen104 Takipçiler

Sabitlenmiş Tweet

Ram Ramjee@ramaramjee·30 Ara

Quadratic self-attention cost makes long-context inference expensive. But Softmax in self-attention ensures attention weights are sparse. If we can identify those efficiently, we can make long-context fast without hurting accuracy. Kascade does just that!

Dhruv Deshmukh@DhruvDeshmukh12

Long-context inference is hitting a wall. 🛑 As context grows, Attention becomes the villain. Why? • Decode: Attention scales linearly (O(N)), while the rest of the model stays constant (O(1)). • Prefill: Attention explodes quadratically(O(N²)). Can we do better?(1/9)

English

205

Ram Ramjee@ramaramjee·4 Şub

Thought provoking article surmising how some humans could evolve to be geniuses! Compounding learning on self-thought seems like a good explanation for the enormous orders of magnitudes differences that exist between human cognitive capacities.

David Bessis@davidbessis

For what it's worth, here is my private mental model of cognitive inequality... I know it's not perfect, but at least it's not as biologically absurd as "von Neumann had a crazy fast brain" davidbessis.substack.com/p/attention-is…

English

Ram Ramjee@ramaramjee·12 Oca

Fascinating thread. Perhaps explains why ‘The unreasonable effectiveness of mathematics in the natural sciences’ is not surprising after all! On the other hand, even without inventing new theories but just by remixing, AI could still have huge impact.

Jonathan Gorard@getjonwithit

Like @davidbessis and others, I think that Hinton is wrong. To explain why, let me tell you a brief story. About a decade ago, in 2017, I developed an automated theorem-proving framework that was ultimately integrated into Mathematica (see: youtube.com/watch?v=mMaid2…) (1/15)

English

Ram Ramjee@ramaramjee·20 Tem

Congratulations, Rahul!

Rahul Ramachandran@rahul_ramach

Happy to share that I graduated from @IITHyderabad with the President's Gold Medal!

English

179

Ram Ramjee@ramaramjee·19 Tem

Evaluation of LLM serving systems is tricky because several factors influence performance (prefill length, decode length, parallelization) and there are multiple metrics we care about (throughput, ttft, tpot/tbt). We identify common pitfalls and a checklist to avoid them.

Amey Agrawal@agrawalamey12

The bitter lesson of AI infra: The hardest part about building faster LLM inference systems is not designing the systems, but rather it is evaluating if the system is actually faster! 🤔 This graph from a recent top systems venue paper about long-context serving shows average normalized input token latency for a trace with both short and 100K+ token requests. System X looks like a clear win: lower normalized latency and higher request rates. But normalized metrics can obscure the actual user experience: at those rates, long inputs see >2hr delays to the first token! Let’s do the math!🧮

English

506

Ram Ramjee retweetledi

Amey Agrawal@agrawalamey12·8 Tem

Interesting work on long context inference from @nvidia, where they scale KV parallelism on gb200-nvl72 systems! To learn more about accelerating long context inference and trade-offs between different parallelism dimensions checkout out our paper, Medha: arxiv.org/abs/2409.17264

NVIDIA AI Developer@NVIDIAAIDev

What if you could ask a chatbot a question the size of an entire encyclopedia—and get an answer in real time? Multi-million token queries with 32x more users are now possible with Helix Parallelism, an innovation by #NVIDIAResearch that drives inference at huge scale. 🔗 nvda.ws/4eCXxqh

English

Ram Ramjee retweetledi

Rahul Ramachandran@rahul_ramach·7 Tem

How well do foundation models like GPT-4o, o4-mini, and Gemini handle classic computer vision tasks? 🧐 We put them to the test on tasks like semantic segmentation and dense depth prediction. 🔗Interactive visualizations + details: fm-vision-evals.epfl.ch

Amir Zamir@zamir_ar

We benchmarked leading multimodal foundation models (GPT-4o, Claude 3.5 Sonnet, Gemini, Llama, etc.) on standard computer vision tasks—from segmentation to surface normal estimation—using standard datasets like COCO and ImageNet. These models have made remarkable progress; however, it is unclear exactly where they stand in terms of understanding vision in detail. Especially when it comes to tasks beyond question-answering. How well do they understand an object's segments or geometry? Our analyses yield an assessment that is quantitatively and qualitatively detailed and is compatible with evaluations developed in the field of computer vision over the past decades. Observed trends: 🔹 The foundation models consistently underperform task-specific SOTA models across all tasks. However, they are respectable generalists, which is remarkable as they are presumably trained primarily on image-text-based tasks. 🔹 They perform semantic tasks notably better than geometric ones. 🔹 GPT-4o performs the best among non-reasoning models, getting the top position in 4 out of 6 tasks. 🔹 Reasoning models, e.g., o3, show improvements in geometric tasks. 🔹 The 'image generation' models, e.g., GPT-40 Image Generation, which have been natively trained multimodally, exhibit quirks. E.g., hallucinated objects, misalignment between the input and output, etc. 🔹 While the prompting techniques affect performance, better models exhibit less sensitivity to variations in prompts. We control for the variance introduced by the prompting methods in our experiments. 🌐 Detailed analyses, visualizations: fm-vision-evals.epfl.ch ⌨️ code: github.com/EPFL-VILAB/fm-… 🧵 1/n

English

640

Ram Ramjee@ramaramjee·20 Haz

Tokenweave source code has been open-sourced! Get 20% savings for your multi-gpu llm inference workloads now!

Raja@_raja_gond

We have released the source code and benchmarks of TokenWeave. TokenWeave speeds up distributed LLM inference via compute–communication overlap and fused AllReduce, RMSNorm, and residual addition. Code: github.com/microsoft/toke… Paper: arxiv.org/pdf/2505.11329 Try it out!

English

330

Ram Ramjee@ramaramjee·20 May

RT @agrawalamey12: Super cool paper, deepseek style communication-computation overlap on steroids! Deepseek creates separate microbatches t…

English

Ram Ramjee@ramaramjee·20 May

TokenWeave is the first system that almost fully hides the ~20% communication cost during inference of LLMs that are sharded in a tensor-parallel manner on H100 DGXs. Check out the thread/paper below!

kwatra@kwatra

TokenWeave – Efficient Compute-Communication Overlap for Distributed LLM Inference. Why? Even with highspeed NVLink on H100 DGX, communication overhead for distributed LLM inference can be > 20 %! Can we recover this overhead? (1/10)

English

2.6K

Ram Ramjee retweetledi

Amey Agrawal@agrawalamey12·27 Mar

Super long-context models with context window spanning millions of tokens are becoming commonplace (@GoogleDeepMind Gemini, @xai Grok 3, @Alibaba_Qwen Qwen2.5). But efficiently serving these models is tough, especially alongside short requests. Head-of-Line (HOL) blocking becomes a major issue, hurting latency for everyone. We present Medha, a system designed to handle this mix efficiently. Achieving 30x lower latency, and 5x higher throughput compared to the state-of-the-art. Full paper: arxiv.org/pdf/2409.17264. 🧵

English

3.5K

Ram Ramjee@ramaramjee·23 Ara

Dark energy vanquished? Excellent thread on a new idea, situated in the context of history of cosmology.

Andrew Côté@Andercot

This could be an incredible revolution in Cosmology. The Dark Energy model of the universe, which won a Nobel Prize in 2011, may be completely wrong. The accelerating expansion instead is simply because time runs faster in the voids between galaxies. Let me explain:

English

123

Ram Ramjee retweetledi

Abhinav Dutta@abhinavdutta555·27 Eyl

Happy to share that this has now been accepted in Neurips 2024! Check it out if you are curious to see how compressed LLMs should be evaluated! #NeurIPS2024 x.com/abhinavdutta55…

Abhinav Dutta@abhinavdutta555

🚨 Are LLM compression methods (𝘲𝘶𝘢𝘯𝘵𝘪𝘻𝘢𝘵𝘪𝘰𝘯, 𝘱𝘳𝘶𝘯𝘪𝘯𝘨, 𝘦𝘢𝘳𝘭𝘺 𝘦𝘹𝘪𝘵) too good to be true and are existing eval metrics sufficient? We've looked into it in our latest research at @MSFTResearch 🧵 (1/n) arxiv.org/abs/2407.09141

English

697

Ram Ramjee@ramaramjee·28 Eyl

If you want to learn about the challenges in serving long context (1+ Million) LLMs, see below.

English

118

Ram Ramjee@ramaramjee·11 Ağu

Truth, beauty and meaning…

Dr Paddy Barrett@Paddy_Barrett

Sometimes, a book can change your entire life. Victor Frankl’s ‘Man’s Search for Meaning’ changed mine. Here is why. As a doctor and a scientist, I have devoted most of my life to pursuing scientific truth. But when you wield the unforgiving sword of truth-seeking, you can eventually cut the branch on which you sit. And then you fall. And like Emily Dickenson, when that happened to me, ‘I Felt A Funeral In My Brain’ when: “A Plank in Reason, broke, And I dropped down, and down - And hit a World, at every plunge, And Finished knowing - then” The branch I had been sitting on was a belief in an overarching ‘Meaning of Life’. My realisation was that there was none to be found. I deeply wanted there to be one. I was and am, as JL Schellenberg would describe, a “Nonresistant, nonbeliever”. Someone who is not resisting a higher meaning and is entirely open to the idea but realises no compelling evidence has been provided and is unlikely ever to be presented. This is not an argument to change anyone’s mind on their own belief systems. In a sense, I envy those who have one that encompasses an overarching meaning of life. I do not have that belief, however. And I know that for millions of people worldwide, the sentiment is the same. As a doctor, I have spent years watching people suffer tragic illnesses, often through no fault of their own. It all just seemed so random and void of meaning. I found that incredibly hard to process. And So I Searched For Meaning. This can be a dangerous path. As Albert Camus writes: “You will never be happy if you continue to search for what happiness consists of. You will never live if you are looking for the meaning of life.” But still, we search. Meditation is a way to reach a plane of existence where the mere question of ‘the meaning of life’ can fall away. When you are 100% deeply engaged with the present moment, the question itself is no longer relevant. The meaning of life is this very moment. And your attention to it. But this is hard to achieve and harder to sustain. And even when you glimpse this idea you still have to live your life. And then I read Viktor Frank’s ‘Man’s Search For Meaning’. And it changed everything. We Live In A Time Of A Crisis Of Meaning. Viktor Frankl was an Austrian psychiatrist who spent four years in Nazi concentration camps during World War 2. Frankl also views this vacuum of meaning, which he describes as the ‘Existential Vacuum’ as having left mankind with a sense that: “No instinct tells him what he has to do, and no tradition tells him what he ought to do; sometimes, he does not even know what he wishes to do. Instead, he either wishes to do what other people do (conformism) or he does what other people wish him to do (totalitarianism).” As a consequence of this meaning vacuum, it leads to the neurotic triad of depression, aggression and addiction. It also appears as apathy, boredom, lack of initiative or interest in the world. For those who struggle with the crisis of meaning, these features will resonate to varying degrees. For some, we see them in full force in the world around us today. And the consequences are not so pretty. Frankl says, “We [need] to stop asking about the meaning of life, and instead to think of ourselves as those who [are] being questioned by life - daily and hourly. Life ultimately means the responsibility to find the right answers to its problems and to fulfil the tasks which it constantly sets for each individual”. The meaning of life, then, is not a singular answer but a process. It is not a destination but a lived experience. It is discovered by engagement with life in such a way that makes meaning emerge through action. The Will To Meaning. Frankl’s ‘Will To Meaning’ is not a simple prescription where everyone does the same thing. It is, in fact, quite the opposite. Frankl believes that “Everyone has his own specific vocation or mission in life to carry out a concrete assignment which demands fulfilment”. That task is unique to each person and is determined by their individual path to meaning. A path that is fulfilled by the pursuit of activities that embody the values that we hold as our highest values. These are always internal or autotelic values, i.e., those that lead to activities we pursue for their own purpose and not for secondary gain. We pursue them because we find them intrinsically rewarding. Not becasue we can trade the rewards of pursuing those values for other things such as money, power or status. The three categories of values that Frankl describes as leading to meaning are: - Creative. - Experiential. - Attitudinal. Creative values are embodied by our desire to create, succeed and add value to the world. This may be in the form of a successful business or in the creation of art or music. It is a reflection of our deep desire to bring value into the world. Experiential values are found in engaging with the truth and beauty of the world. This can be in the search for scientific truth as a researcher but also as someone who simply wishes to experience the profound beauty of the world. Even in the moments of a sunrise are the essential and adequate ingredients of a meaningful life present. Nothing more is required. Experiential values also include how we experience another person. This is the domain of love. It is in this profound state that to question ‘the meaning of life’ would seem ridiculous. Just try to do that with a newlyborn child in your arms. It is in the pursuit of these first two values that most people will find a deep sense of meaning in their lives. But What About Those Who Cannot Pursue Creative or Experiential Values? We have all encountered periods of our lives when we can no longer pursue the values we hold highest. For some, these times are temporary. For others, often through no fault of their own, they find themselves in a situation where the opportunity to fulfill these value ambitions has been thwarted. Illness, tragedy, and bad luck can strike at any time and take away our ability to engage with our highest pursuits. What then? This is when we must embrace Attitudinal Values. Frankl believes that even in the most dire situations, we can always derive a deep sense of meaning by crafting our attitude in response to that situation. It is here we encounter the most famous line in Frankl’s book: “Everything can be taken from a man but one thing: the last of the human freedoms — to choose one’s attitude in any given set of circumstances, to choose one’s own way.“ It is in our choice of how we respond to the tragedy and suffering of our lives that we create meaning. It is being brave in the face of adversity. It is holding your composure in a time of chaos. It is the recognition that although life is fleeting and largely forgotten, each moment of one's life is stitched into the fabric of time forever. How we respond to any given situation becomes a permanent marker in time that no one can ever erase. It is in recognising this fact that should shape our attitude even in the most challenging of circumstances. Now before you respond with your own story of difficulty and why this does not apply to you, please realise these are the words of a man who spent four years in a Nazi concentration camp. Frankl was stripped of every single possession. His wife and family were brutally murdered. He endured unimaginable cruelty at the hands of the prison guards. And yet. He chose to find meaning in his response to such a tragic situation. You Must Have A Vision. During Frankl’s time in the concentration camps, he recognised the extreme importance of having a future vision or goal in order to survive. The prisoners who lost sight of their vision often lost their will to survive and quickly perished. This is where Frankl draws on Nietzsche's words: "He who has a why to live for can bear almost any how." To survive, one must have a why. To find meaning, you must have a compelling vision for the future that sustains you. The challenge is that most people do not have such a vision. How often have you been met with a blank stare by another or even yourself when posed with the question: “What do you want from life?”. If you do not know what you want. You have no vision. If you have no vision. You are lost. According to Frankl, your vision is the pursuit of your highest values: creative, experiential or attitudinal. But what if those values seem lost in the haze of the modern world, obscured by the blinding streams of social media and chattering commentary? Then You Start With An Anti-Vision. Negative emotions are more potent than positive ones. We are often motivated more by the threat of a life not aligned with our true values, even if we cannot fully articulate what our true values are. We may not be able to describe our highest values but we are usually crystal clear on what they are not. Like the sculpture, we often cannot see our highest vision in the rock until we carve away all that is not supposed to be there. And when we do, what remains is the purest representation of who we are. As Michaelangelo said: “The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material.” My Anti-Vision. Over the last ten years, I have realised that my highest values are in experiencing the beauty of the world, the acquisition of knowledge and the presence of those I love. I do not seek power, fame or money. When asked recently what I would do with ten million dollars as startup capital for a new company, I replied that I would simply hand it back to whoever gave it to me. Understanding this about myself has made my life so much more enjoyable. My life is now primarily focused on the pursuit of the goals that offer me the deepest sense of meaning. In a world that does not have a precise answer to the question ‘What is the meaning of life?” I have found a way of being that provides a profound sense of meaning. And for me, that is enough. This path does not make me immune to the tragedies of life. They still find me. And will continue to do so. But recognising that even in the most tragic circumstances, I can always aspire to find meaning in how I choose to approach any given situation has helped me immensely. Life is brief. And in my view must be lived in the most authentic way possible. Only You can discover what that way is. No one else can tell you. And even when the tragedies of life arise and present us with obstacles to the pursuit of our highest values, we must remember the words of Viktor Frankl and why, for me, his book changed my life forever. “We must never forget that we may also find meaning in life even when confronted with a hopeless situation, when facing a fate that cannot be changed. For what then matters is to bear witness to the uniquely human potential at its best, which is to transform a personal tragedy into a triumph, to turn one’s predicament into a human achievement. When one is no longer able to change a situation.. we are challenged to change ourselves.”

English

Ram Ramjee@ramaramjee·3 Ağu

Excellent article on the subtleties in evaluating quantized models. Apart from kl-divergence it would be great to see the intuitive flips metric evaluated as well!

Lin Qiao@lqiao

There’s been much discussion recently about comparing the quality of quantized models. Check out our blog on how Fireworks thinks about quantization fireworks.ai/blog/fireworks… 1) There’s no one size fits all for quantization. There are a variety of quantization techniques and potential parts of a model to quantize. Fireworks works closely with customers to tailor quantization to achieve the best quality, cost and speed per use case.

English

217

Ram Ramjee@ramaramjee·28 Tem

@PandaAshwinee Spot on! The following thread sheds more light on this issue…

Abhinav Dutta@abhinavdutta555

English

126

Ashwinee Panda@PandaAshwinee·27 Tem

this thread is overly simplistic because i have only a superficial understanding of these systems. but i can tell when the model i'm inferencing sucks on one platform but is good on another. and i wonder if it isn't the complex interplay of these factors. (11/11)

English

1.3K

Ashwinee Panda@PandaAshwinee·27 Tem

the disparity between providers serving L3.1 is directly due to quantization and more indirectly due to a misunderstanding of benchmarks. people evaluate their quantization methods, which are all primarily activation outlier mitigation strategies, on benchmarks and (1/2)

English

27K

Ram Ramjee retweetledi

Abhinav Dutta@abhinavdutta555·15 Tem

English

5.1K

Ram Ramjee retweetledi

Amey Agrawal@agrawalamey12·13 Tem

🚀 Introducing Metron: Redefining LLM Serving Benchmarks! 📊 Tired of misleading metrics for LLM performance? Our new paper introduces a holistic framework that captures what really matters - the user experience! 🧠💬 github.com/project-metron… #LLM #AI #Benchmark

English

6.6K

Ram Ramjee retweetledi

Amey Agrawal@agrawalamey12·11 Tem

Did you ever feel that @chatgpt is done generating your response and then suddenly a burst of tokens show up? This happens when the serving system is prioritizing someone else’s request before generating your response. But why? well to reduce cost. 🧵

English

7.4K

Keşfet

@nvidia @agrawalamey12 @GoogleDeepMind @xai @Alibaba_Qwen @PandaAshwinee @MSFTResearch @elonmusk