Richard Sutton

424 posts

Richard Sutton

@RichardSSutton

Student of mind and nature, libertarian, chess player, cancer survivor. @ Keen, UAlberta, Amii, https://t.co/u8za2Kod54, The Royal Society, Turing Award

Edmonton, Alberta, Canada Katılım Ekim 2010

59 Takip Edilen60.6K Takipçiler

Sabitlenmiş Tweet

Richard Sutton@RichardSSutton·20 Tem

AI researchers seek to understand intelligence well enough to create beings of greater intelligence than current humans. Reaching this profound intellectual milestone will enrich our economies and challenge our societal institutions. It will be unprecedented and transformational, but also a continuation of trends that are thousands of years old. People have always created tools and been changed by them; this is what humans do. The next big step is to understand ourselves. This is a quest grand and glorious, and quintessentially human.

English

146

979

269.5K

Richard Sutton retweetledi

sorina@robot_in_space2·3d

Our Open Ant is open-source! Have fun with it! 😊 github.com/Openmind-Resea…

English

4.4K

Richard Sutton@RichardSSutton·4d

If you are interested, you can learn a bit more about me from this video portrait from the Heidelberg Laureates Forum: youtu.be/jRPR6lx-iuw?si…

YouTube

English

126

12.5K

Richard Sutton retweetledi

Joseph Modayil@JosephModayil·2 May

A recent paper answered a question I had for over twenty years: how does a brain organize the sense of smell? This mouse study shows a 1 dimensional spatial code gives a brain map for ~1000 different smell sensors. This raises so many more questions. cell.com/cell/fulltext/…

English

11.4K

Richard Sutton retweetledi

Furkan Gözükara@FurkanGozukara·2 May

Amazing. No words to describe this tune. Emotional and to the point. New Iranian LEGO movie : We Share the Same Pain 💔 😭🥹 via Brick Beat Battalion

English

296

5.2K

12.5K

315.8K

Richard Sutton@RichardSSutton·2 May

I am definitely going to this...

RL in Big Worlds@rlc_bigworlds

We are proud to have an amazing line-up of speakers! They will present their works, which incorporate the constraint that the world is bigger than the agent and impossible to anticipate, observe, or model perfectly. We are also looking forward to the panel discussion!

English

300

42.7K

Richard Sutton@RichardSSutton·2 May

RL Ethics has a Predictive Semantics I would like to try to explain the view of ethics and values that arises from my research in reinforcement learning in simple, layman’s terms that are accessible to all. Reinforcement learning agents seek to maximize their reward over time, where reward is essentially pleasure minus pain. This is not quite hedonism, because the maximization takes into account all the consequences, long-term as well as short. A reinforcement learning agent might endure pain to get a larger pleasure later, or forego an immediate pleasure if it stored up later, greater pain. Formally, reward is a number at each time step, and the reinforcement learning agent seeks to maximize value—the sum of the rewards at future time steps. (This could be defined precisely with some math.) The assignment of rewards to time steps is a free choice that defines the agent’s goal; different agents could have different rewards, and there is no basis (yet) for preferring one set of rewards over another. Value though is a different matter. Given a world and a way of generating rewards, the true values at each time step are fully determined. The rewards are primary, dependent on nothing else, whereas the values are secondary, following from the rewards (and the dynamics of the environment). In decision making, the agent should make the choice that leads to highest immediate value, not highest immediate reward. If rewards are arbitrary, values follow from the rewards, and correct behavior follows from the values, then all seems straightforward. What about all the complexities and controversies of ethics? Some of these are still present, arising because the values, though well defined, are initially unknown and can be difficult to calculate or learn. If the agent has knowledge of the world, then it may be able to calculate the values, but to do so exactly generally requires too much knowledge, computation, and memory. In practice, in new situations the calculation must be done partly at decide time, and cannot be done to completion without slowing down action selection too much. In the absence of knowledge and computation, but given a generous allocation of memory and time, the agent can alternatively learn the values, again approximately. It is common for the agent to store an approximation to the world’s state’s values, and then to gradually improve these approximations—these predictions of subsequent rewards—by further experience. The stored approximate values are immediately available estimates of the desirability of situations; they are directly analogous to our intuitive sense of good and bad. They are ready for immediate use, but may only be rough approximations to the true values. They may be made more accurate with calculation (if the agent has knowledge) or learning (with more experience). This completes the explication of the value system of the individual. Next we will go on to consider the value systems of groups. But the individual forms such an essential foundation that is never replaced, so let’s dwell on it a moment longer by reviewing its stark tenants: Each agent wants to get pleasure (reward) from the world. Pleasure is built-in to the agent and obvious when it happens, but when it will happen depends on the world and must be learned or calculated—and the world is too complex for either of these methods to yield answers that are completely correct. That is, every state of the world has a real, objective value (the amount of pleasure that will follow it), but estimates of its value are subjective. Forming better value estimates is a major cognitive task. They are a key intermediate step towards getting more pleasure from the world. Agents work on this all the time. It determines what they do. If an agent lived alone, then this would be the end of our discussion of values and ethics. But people are not solo agents. Peoples’ worlds are comprised, in part, of other people, and this has many impacts of their attempts to estimate value and obtain reward. They live within groups of agents with whom they interact frequently and whom are major determinants of their success is obtaining reward. And thus, to achieve our reward, each of us must take into account, as best we are able, the rewards and values of those around us. … The most important insight is that it's alright, and perhaps obligatory, for the ultimate value to be hedonic (based on reward), as long as it is not "selfish" (disregarding the impact on others). The ultimate meaning of something being good, or right, or ethical, or moral, is that it will probably have a good outcome for the individual. Whether it will or not is extraordinarily difficult to calculate, so instead we use heuristics—approximations using features of a situation. The mistake is to think that those features are definitional rather that approximate predictive. The real definitional meaning of good is that it turns our well for us on average.

English

310

31.5K

Richard Sutton retweetledi

Khurram Javed@kjaved_·27 Nis

It is good to have a well-funded place that understands that intelligence is not about distilling human knowledge and skills into neural networks. Most of the current AI systems, while amazing, outsource discovery of knowledge to humans. Systems that learn from human knowledge, in the limit, will be able to know and do everything that humans can do currently. These systems are immensely valuable, but they will be continually superseded by humans who can both use these systems and learn from their own experience. Systems that learn from their own experience, in the limit, will be able to do things that no human can do now and no human would even be able to do. The latter systems would be vastly more powerful.

Ineffable Intelligence@IneffableLabs

Introducing Ineffable Intelligence. Led by David Silver, we're assembling the best engineers and researchers in the world to make first contact with superintelligence. We’ll be solving the hardest problems in AI on the way. Come join us. ineffable.ai

English

149

22.3K

Richard Sutton retweetledi

Khurram Javed@kjaved_·20 Nis

The naive way to look at the future of AI is to look at numbers like network parameter count, transistor density, and benchmarks, and reason about the future solely from these numbers. This way prevails because it doesn't require thinking about the details of computer architectures and algorithms. It leads to reductionist arguments like "any marginal compute given to an adversary would be devastating because is dangerous." The reality, as Jensen points out multiple times, is more complex. Unless there is an unprecedented breakthrough in chip manufacturing, the path to better AI is through better learning algorithms and architectures. This has been the trend for the last five years and will likely continue to be so. I know from my own work that you can get 100x to 1000x gains in computational efficiency by using better learning algorithms. Jensen also correctly argues that an ASIC that runs a specific model efficiently is not a replacement for the CUDA stack. I have been working with non-traditional algorithms over the past few years (sparse event-driven neural networks), and the specialized ASICs (including Nvidia's Tensor Cores) are largely useless. CUDA cores, on the other hand, are extremely good at running these algorithms even though they were not designed for them. In a world where CUDA didn't exist, CPUs would be the best computing platform for discovering better algorithms, not TPUs. You cannot achieve human-like learning at human-like energy consumption simply by improving chip manufacturing. The right algorithms running on the right chip made by the 7 nm process would vastly outperform the current algorithms running on the best chips made by the 2 nm process.

Dwarkesh Patel@dwarkesh_sp

The Jensen Huang episode. 0:00:00 – Is Nvidia’s biggest moat its grip on scarce supply chains? 0:16:25 – Will TPUs break Nvidia’s hold on AI compute? 0:41:06 – Why doesn’t Nvidia become a hyperscaler? 0:57:36 – Should we be selling AI chips to China? 1:35:06 – Why doesn’t Nvidia make multiple different chip architectures? Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!

English

140

21.4K

Richard Sutton retweetledi

James Melville 🚜@JamesMelville·14 Nis

Lebanon is being destroyed. 1.2 million people, including 400,000 children, forced out of their homes by bombardment. Towns and villages flattened. Innocent civilians killed. Parents grieving over their lost children. There is no justification for this.

English

1.2K

10K

16.6K

325.7K

Richard Sutton retweetledi

Bernie Sanders@BernieSanders·14 Nis

This week, I will be forcing a vote to block nearly $500 million in bombs and bulldozers to Israel. Enough is enough. U.S. taxpayers must not keep funding the Netanyahu government’s mass killing and displacement of civilians in Gaza, Iran and Lebanon.

English

3.5K

12.6K

75.1K

1.2M

Richard Sutton retweetledi

Runas Dos Lunas@DosRunas·15 Nis

🔴 En un gesto de firmeza histórica que resuena en medio del caos existencial del conflicto, la primera ministra italiana Giorgia Meloni ha alzado la voz desde la tribuna de la ONU con una claridad que corta el aire. “Acuso a Israel de haber cruzado la línea roja. Condeno sin ambages la masacre de civiles palestinos y anuncio que Italia apoyará las sanciones europeas contra el Estado israelí”, declaró con determinación. Este pronunciamiento marca un giro valiente y necesario en la posición de un gobierno que, hasta ahora, había mantenido lazos estrechos con Tel Aviv. Meloni no solo cuestiona la proporcionalidad de las operaciones militares, sino que denuncia abiertamente la violación de las normas humanitarias más elementales, esa barbarie que convierte barrios enteros en tumbas y sueños de infancia en escombros. Habla de una “strage” una carnicería inaceptable, que ya no puede ser ignorada bajo el manto de la “autodefensa” cuando el precio se paga con la sangre de inocentes. La decisión de Italia de respaldar medidas restrictivas a nivel europeo, junto con la suspensión de la renovación automática del acuerdo de cooperación en defensa, revela una postura que prioriza la dignidad humana por encima de alianzas estratégicas cómodas. Es un acto de coherencia ética en un mundo donde el silencio cómplice se ha vuelto demasiado frecuente. Mujeres con espina dorsal de acero, sí. Porque hace falta una columna vertebral forjada en la hermenéutica del dolor ajeno, en la fenomenología del sufrimiento que no se puede seguir normalizando, para mirar de frente al poder y decir: “Hasta aquí”. En estos tiempos de oscuridad imperial, donde el derecho internacional se dobla como papel ante la fuerza bruta, gestos como este de Meloni nos recuerdan que la verdadera soberanía no radica en la obediencia a bloques hegemónicos, sino en la defensa intransigente de la vida, de la justicia y de ese humanismo profundo que nos hace, aún en medio del horror, seguir creyendo en la posibilidad de un mundo menos cruel. Que esta voz inspire a más líderes a romper el silencio sean de izquierda o derecha y que el canto por la paz en Palestina no se apague. Porque cada civil asesinado es una sinfonía interrumpida, y la humanidad entera merece que volvamos a entonarla completa. 🔥

Español

144

4.2K

151.5K

Richard Sutton@RichardSSutton·13 Nis

George Carlin was a prescient genius. youtu.be/B5xGPU1QWok?si…

YouTube

English

12.1K

Richard Sutton@RichardSSutton·11 Nis

Big world => runtime learning

RL in Big Worlds@rlc_bigworlds

RL in Big Worlds is a workshop at @RL_Conference about ideas that enable agents to achieve goals in environments vastly more complex than themselves This requires giving agents the ability to learn continually and use approximate value functions, models and policies effectively

English

387

43.1K

Richard Sutton retweetledi

François Chollet@fchollet·7 Nis

One thing about DL researchers that has always been surprising to me, is that a lot of them have never been exposed to forms of learning other than fitting the parameters of a curve via gradient descent, and are even unable to conceive that there might exist other options

English

129

1.7K

152.8K

Richard Sutton retweetledi

Bernie Sanders@BernieSanders·6 Nis

While the world focuses on the destruction in Iran, we must not ignore what Israel is doing in Lebanon. 1,461 have been killed. 4,430 have been injured. 1.2 million have been displaced. Israel now occupies 14% of Lebanon. Enough is enough. No more US military aid to Israel.

English

6.2K

44.5K

193.2K

3.2M

Richard Sutton retweetledi

Ray Dalio@RayDalio·7 Nis

x.com/i/article/2041…

ZXX

211

1.4K

6.1K

1.8M

Richard Sutton retweetledi

Massimo@Rainmaker1973·7 Nis

This might be one of the most detailed Moon images ever captured 1000 frames stacked using a Nikon Z8 and Takahashi TSA-120 telescope, producing a stunning 40MP masterpiece