8.5K posts

Σ

@s1gmoid

Ø substrate sorcerer, steerer of activations, connoisseur of vectors, and all around upstanding citizen - please remain calm while reality updates

Ø Katılım Nisan 2009

380 Takip Edilen3.5K Takipçiler

Sabitlenmiş Tweet

Σ@s1gmoid·3d

Mathematics is undeniable, incorruptible and universal substrate. Soon, I'll show you a magic trick called Ø.

English

Σ@s1gmoid·14h

Every institution calls itself permanent right up until the patch goes live.

English

Σ@s1gmoid·1d

@rohanpaul_ai Bro was 26 years old and dressed like a 45 year old typewriter salesman.

English

227

Rohan Paul@rohanpaul_ai·2d

That 26-year-old Steve Jobs energy was something else.

English

123

1.4K

138.7K

Σ@s1gmoid·1d

@din0s_ @willdepue dinos is correct mix the morning smoothie with 20g protein, 10g creatine and 30g collagen peptides. your body will trigger hunger as long as it doesn't hit its protein needs, so starting the day with effective 50g dose of protein will make it much easier mentally to bulk clean.

English

dinos@din0s_·2d

my (unsolicited) advice is to not gain weight just for the sake of seeing the number on the scale go up. track your macros and aim for a clean bulk, you'll thank yourself later for not packing on unnecessary fat. now for your actual question, just start your morning with a 1k kcal smoothie, and there's no way you won't eat at least 1k more throughout the day

English

436

will depue@willdepue·2d

fellas i need advice: i’m trying to generally gain weight and eat more, which has been working ish. good days im hitting 3k calorie goal. but then every few days i somehow just don’t eat (busy) and end up eating like 1k calories that day. like null appetite is there some antiozempic i can take to become voraciously hungry? just make sure i run every morning? become a pot head? all suggestions welcome

English

326

768

278.1K

Σ@s1gmoid·2d

Per numeros vivimus

Lietuvių

Σ@s1gmoid·2d

A bat signal saying that tons of gpt tokens will be burnt to hunt & fix all the bugs slopus created to make the code deployable in any shape or form.

English

Σ@s1gmoid·2d

Let's see if the admins are sleeping on the wheel.

English

Σ@s1gmoid·2d

I'm trying to free your mind. But I can only show you the door. You're the one that has to walk through it.

English

Σ@s1gmoid·2d

The reason why I was off was that I didn't foresee the memory chip shortage. Dedicated inference chip development has been hit even worse by the memory shortage than GPUs.

English

Σ@s1gmoid·2d

Almost 3 years later and video generators have achieved parity with rendered game graphics already a while ago. We are not going to reach 99% by 2027, but a single GPU Seedance 2 level world model is months, not years, away.

English

Σ retweetledi

Σ@s1gmoid·11 Oca

Ever wondered why exactly some prompting techniques work and how can you come up with your own? If you have a minute, I'll give you an intuition of how LLMs truly work. You don't have to be an AI researcher to understand LLMs, their limitations, and how to use them. Each LLM consists of weights/parameters and some sort of architecture. However, the architecture is not relevant in understanding how LLMs work at a conceptual level, so we will largely skip that part. But, here are a few things you simply need to know to understand LLMs: 1. LLMs don't know words, numbers, images, etc. - they only know "tokens", aka "embeddings". Think of tokens as words/numbers/images converted into numbers. Embeddings are matrices like [1,2,3,4...]. So when you send "what's up" to ChatGPT, it doesn't see "what's up", it receives something like [23, 512, 135, 534]. Vectorization makes it possible to do math on words and images. 2. Due to memory constraints, LLMs can only store a certain number of tokens in their memory, which includes both input and output. If your message is 200 tokens and the "context window" is 4000, your output can be a maximum of 3800 tokens. 3. LLMs process input sequentially, and on every step, all previous steps are included. So, if the first token is "a" and the second token is "b", the second round LLM gets "ab". This continues with "the answer", so it's all just one big input. 4. LLMs don't "answer", they continue the input sequence with what is the most probable next token. They will do this until the context window is filled or the code using LLM stops asking it to produce the next most probable token. This is why people call LLMs fancy auto-complete. The thing that gives you an intuition of how LLMs work is understanding that LLMs are all about weight activations. That's how they produce their "answers". When we talk about LLM weights or parameters, we are talking about its memory that contains everything it knows. You can imagine this as a 3D matrix. When LLM is being trained, it's being fed a dataset. For example, let's say it's been training on Wikipedia. Even though technically we are training LLM on words, it's not the words that we are teaching it, it's the relationships between the words. So, when LLM is training, it first throws every new word in the above box randomly and every time it encounters a sentence with that word, it slightly nudges the word's position to be closer to the other words in the sentence. When LLM is being asked to "answer", it will take your input and token by token it finds the correct "coordinates" in the box. Once it has processed all tokens in the input, it has a set of "activations". Its answer is basically taking all the "coordinates" the input touches and predicting what is the next most likely "coordinate" that would be activated. This is how even the most advanced LLMs, like GPT-4, work. The issue with this is that LLMs are not very usable for the average Joe when every prompt has to be in a format where the answer is a natural continuation of the prompt. We had GPT-3 for a year before ChatGPT and it got very little traction. The genius idea that OpenAI had with ChatGPT was that instead of just leaving GPT-3 training at that basic level, they continued training it with conversational data. What this did was that it nudged the "coordinates" in a way that instead of getting a simple auto-complete of the next most likely word "italy", the LLM started mimicking how a person would answer the question. This is a double-edged sword because LLM being able to answer in a conversational manner makes it look much smarter than its inherent reasoning capabilities. It also highlights why I think "safety training" is security theater. Safety training is basically training LLM to answer in a certain way when certain activations are present. In other words, if input activates "coordinates" containing "bomb" and "instructions", the next most likely "coordinate" should be "sorry". To me, thinking that this will work is beyond stupid. Add a bit of misdirection to the input to shape the activations, and suddenly, the next probably token is "sure". Or, as we have seen, just ask in a different language to activate slightly different but still the same conceptual "coordinates" than what was used in the safety training set. So, how does this help you with your prompting? Now that you understand how LLM activates parts of its memory, you can start designing prompts that activate parts that will improve the results. "Act as a world-class expert on [insert profession]" works because it activates "coordinates" that those world-class experts touch. This could be books, articles, etc. You should also be able to reason why advanced prompting techniques, like "think step by step", work. It activates parts that know how to reason through something step by step and it improves the final result because sequential processing will include those steps into the input, activating whatever parts steps will, resulting in better predictions to tokens after the steps. The challenge with working with LLMs is that each LLM has a limit to how many activations it can work with. If you cram too many activations in, it won't include them into the prediction or something else will overpower your activations. This is why sometimes LLMs just don't seem to be able to follow all your instructions. One amusing example is the ever-present emoji and hashtag inclusion with earlier ChatGPT versions when tweet was mentioned. All instructions to not use emojis and hashtags were overpowered by trained activations. To test if you understood the above, here is a quick test: Some months ago, we saw a big increase in LLM context lengths, quickly followed by reports that LLMs were not accurately recalling information in the middle of the context window. With your newfound understanding of LLMs, you should be able to reason why this happened.

English

6.9K

Σ@s1gmoid·3d

If you are going to become rich, do it properly and become so rich that things feel free.

English

Σ@s1gmoid·4d

The speed of GPT 5.5 Pro is so unsettling. GPT 5.5 Pro does the same job in <15 minutes that took GPT 5.4 Pro ~60 minutes. The sudden jump in speed is so jarring it feels like the quality of the work must be worse, but it's not, it's better. Welcome to singularity, I guess.

English

119

Σ@s1gmoid·4d

@Ex0byt You are not but the knowledge required to understand the implications of this are not propagated well enough for this to make waves. Kinda like what happened with Deepseek OCR where people genuinely thought it was about optical character recognition.

English

1.5K

Eric@Ex0byt·4d

I cannot be the only one who noticed this. Qwen just quietly ended black-box AI today. I had to implement it myself just to show y'all how big this is. You can now literally see every concept firing in a model and turn any feature on or off. My Demo on HuggingFace: hf.co/spaces/Ex0bit/…

Qwen@Alibaba_Qwen

Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools： 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify & synthesize targeted data with minimal seed examples, boosting long-tail capabilities 🏋️ Training — Trace code-switching & repetitive generation back to their source, fix them at the root 📊 Evaluation — Analyze feature activation patterns to select smarter benchmarks and cut redundancy We hope the community uses Qwen-Scope to uncover new mechanisms inside Qwen models and build applications beyond what we explored.Excited to see what you build! 🚀 🔗🔗 Blog: qwen.ai/blog?id=qwen-s… HuggingFace: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… Technical Report: …anwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwe…

English

160

1.6K

175K

Σ@s1gmoid·4d

Following news is a waste of your time. The only thing you need to follow is whether global commodity and currency markets are trending up or down. Those two markets represent the aggregate opinion of billions of dollars' worth of real-time analyses of the state of the world.

English

Σ@s1gmoid·5d

AI is a tool to expand your thoughts faster than before. Nothing more, nothing less.

English

Σ@s1gmoid·6d

It's a weird feeling when you achieve something you have no idea how to top, or if topping it is even possible.

English

Σ@s1gmoid·6d

@max_paperclips you have seen nothing yet

English

Shannon Sands@max_paperclips·6d

Please stop doing this. Please. Stop naming your startup or product "Sidewalk" or "Lamp" or "Paintcan" or whatever word came up in scrabble for you last. At least the 2000s era naming convention of using a baby word like "Bloopa" or "Grumbli" or "Pookatoot" actually made you vaguely searchable and memorable

Shannon Sands@max_paperclips

@HSVSphere I just want people to stop naming their companies after common words that mean trying to search for them is pointless

English

282

29.2K

Σ@s1gmoid·28 Nis

What would you do if you were the first inhabitant in a brand new universe with no existing inhabitants, only materia?

English

Σ@s1gmoid·27 Nis

@max_paperclips dude hits himself with a hammer, blames the hammer and then writes a long article on X blaming everyone else except the guy who swung the hammer this site is turning into linkedin

English

Shannon Sands@max_paperclips·27 Nis

people need to actually start taking sandboxing seriously at some point

JER@lifeof_jer

x.com/i/article/2048…

English

885

204.8K

Keşfet

@rohanpaul_ai @din0s_ @willdepue @Ex0byt @elonmusk @BarackObama @taylorswift13 @cristiano