Bassel Mabsout

288 posts

Bassel Mabsout

@bmabsout

Katılım Mart 2012

172 Takip Edilen62 Takipçiler

Excited to share some of my work at @EkaRobotics these last 6 months! It takes a one-of-a-kind team threading the needle of research and engineering to hit this level of swiftness and repeatability. The arm truly behaves like an animal :)

Pulkit Agrawal@pulkitology

Eka means unity -- “one,” in Sanskrit and “first” in Finnish. We’re building intelligence for the physical world in its native language: forces. Until now, robotics faced a tradeoff — generality or speed. The real world requires both. Robotics also faced a data problem. Our Vision–Force–Action (VFA) model — the first of its kind — breaks the generality-speed tradeoff and the data barrier. It's a new foundation uniting performance, generality, and safety for putting capable robots in everyone's hands. Today, I am excited to share our journey of pushing robots beyond human limits. Today, dexterity becomes scalable. Today, I welcome you to the Era of Eka. Co-founded with @haarnoja, and so thrilled and grateful to be working with a dream team at @EkaRobotics. Learn more: ekarobotics.com

English

1.1K

Bassel Mabsout retweetledi

Pulkit Agrawal@pulkitology·6d

English

221

1.9K

302.2K

Bassel Mabsout@bmabsout·6 Oca

@aramh @asincole Will take a look at the talk! But is the wrapper issue not solved with having functions work on objects that are coercible to the datatype we want to work with? Or do you think coerce is not the right answer here?

English

Aram Hăvărneanu@aramh·5 Oca

Because newtypes don't work when the old type is expected. You have to deal with what I call "the wrapper problem". As a programmer you to have to jungle these wrappers of wrappers and put them together after taking them apart instead of just writing code. I spoke at length about this problem in my appearance on youtube.com/watch?v=AfbwP9…

YouTube

English

101

A-sin Cole@asincole·18 Tem

Curious, Why not and what would be your preferred alternative?

Aram Hăvărneanu@aramh

(But of course I am not a fan of traits in general.)

English

876

Bassel Mabsout@bmabsout·5 Oca

@aramh @asincole What's your opinion then on newtypes + newtype deriving and DerivingVia. Since the name of a datatype becomes the name of its unique instances, why isn't creating newtypes not the right way to do modularization?

English

Aram Hăvărneanu@aramh·18 Tem

Traits in Rust and type classes in Haskell are canonical, a type (or several) can only implement a trait (or class) in one way. This is global, hence anti-modular. In Lean and Agda type classes or instance arguments are not global. You can have multiple, named, implementation and resolution happens in a scope, not globally. I prefer this much more, but I don't like it either because the resolution mechanism is too difficult to predict and control. In OCaml, with modules, the best part is that everything is explicit (but that is also the worst part).

English

1.4K

Bassel Mabsout@bmabsout·10 Mar

@tritlo @ppavel24 What's the problem with letting it run until some configurable recursion limit? Even Java's silly typesystem is undecidable, If I mostly wrote types that can be inferred by Hindley-Milner, then it shouldn't hit those limits right?

English

Matti Palli 🧙‍♂️@tritlo·9 Mar

@ppavel24 inference in full dependent types is undecidable! They’re too expressive, so you run into the halting problem

English

110

Matti Palli 🧙‍♂️@tritlo·9 Mar

Programming language does not matter to programmers I would sooner use a Hindley-Milner typed language with inference (and constraints) than a dependently typed language without inference

Mariè@p8stie

Money does not matter to women. I would sooner date a high T gym owner making $300k a year than a loser tech founder making $1.5M a year.

English

3.2K

Bassel Mabsout@bmabsout·5 Şub

@HSVSphere Please tell me you're avoiding the million notions of overriding that nix has, also it sounds like nickel in general has similar goals, do you know how the languages compare?

English

147

HSVSphere@HSVSphere·5 Şub

The ideals of Nix are almost perfect. The implementation sucks and it's only really usable if you know why it is the way it is (and why it's badly implemented). That's why I'm working on Cab, and I'm taking my time thinking everything through before committing to an implementation. It's not exactly a build language either. It doesn't specialize in any concept such as derivations, units, resources or whatever. It only lets you compose expressions with contexts—the stuff that makes Nix magical in the first place. What are expression contexts? They're when a subset of an expression implicitly carries the whole expression with it, as a context. It's how you don't explicitly specify what a Nix derivation depends on—it just works! The problem with Nix, ignoring all the QoL stuff that's missing [1] is that Nix contexts can *only* be used for derivations. They're not generic. derivationStrict is a builtin function, you cannot emulate it in the language itself. That prevents you from using contextful expressions for other things, such as process management. Resource (like terraform) management. Literally anything that forms a graph - it cannot be done cleanly! This is why a generic contextful-expression language is required. Cab will fix this, and that won't be the only thing it will fix. Cab has structural types enforced with a super novel system that works super well for a dynamic build system language. It has patterns, rather than hard coding for identifiers. I'd argue that its type system is going to be more powerful than typescript since there is no runtime-comptime difference (it's all the same, there's no IO either so it's simple). Anyway, stay tuned for an MVP. I also plan on supporting using the Nixpkgs package set with the system I'm going to build on top of Cab (the Cull Build System) using the efforts the Ekala project has been going through, eventually. However, I won't announce it until I have a working LSP, a unified documentation system that's flexible, a properly designed, hack-free "project" abstraction for the Cull Build System and a fast runtime overall (bye bye, NixOS module system and home-manager eval times). [1] - Flakes are a bad solution to the purity problem, nix-* commands shouldn't exist, FODs are hard to use, nixConfig can pwn you, too many tunables, too little separation of concerns between the distro, module system, nix itself, the nix daemon, literal oddities in nix, much much much more

Dillon Mulroy@dillon_mulroy

yup, decided. i'm ditching nix. i still think that there is no better solution, but i only use a small portion of it and when it breaks its too much work to justify

English

190

19.6K

Bassel Mabsout@bmabsout·6 Eki

@keenanisalive I can see how the cooling schedule captures the essence of simulated annealing, but it's supposed to be a gradientless metaheuristic, is there some more general definition of simulated annealing that can be considered the global optimizer in conjunction with some local optimizer?

English

413

Keenan Crane@keenanisalive·5 Eki

Mental map of Markov Chain Monte Carlo (MCMC) algorithms, and analogous machine learning (ML) algorithms [dashed = especially loose analogy]. Grey boxes are basic tools, and each arrow is annotated with the "delta" between algorithms.

English

172

1.1K

100.5K

Bassel Mabsout@bmabsout·28 Eyl

@n1mas_ @jsuarez Traditionally true, but with redq and crossq and soon AQS ;), this is changing. I can get a hopper hopping within 20000 samples, that is a far cry from the usual 1e6 samples this takes. Joseph's work makes iterating on the method faster which is the bottleneck in such research.

English

Nima Sarang@n1mas_·27 Eyl

@jsuarez Sadly, it's very sample inefficient.

English

299

Joseph Suarez 🐡@jsuarez·27 Eyl

RL is useless... except if you want super-human perf on games, control, LLMs, chip design, rideshare matching, 5G, and more! It's also an area where you can make major progress with very few resources. Join PufferAI's open source efforts at discord gg/puffer or DM me!

English

450

45.8K

Bassel Mabsout@bmabsout·7 Eyl

@cloneofsimo Worked exactly on this issue with promising results: arxiv.org/abs/2310.14671

English

321

Simo Ryu@cloneofsimo·7 Eyl

Am I the only one finding so weird that a lot of successful RL is based on evolutionary + gradient hybrid but so much of deep learning optimizer is purely greedy? Anyone pinpoints exactly why this is and how we can leverage more of zeroth order algorithm? btw we have stuff like this that might be middle ground, arxiv.org/abs/1907.08610 @kellerjordan0 apparently had a good shot with this, irreplacable with other optimizers in practice on his fastest CIFAR trainer,, Food for thought man, everywhere points to AdamW being messed up, why are we stuck with this shit (btw our lab doesnt use evolutionary hybrid. just PPO)

English

11.3K

Bassel Mabsout@bmabsout·9 Ağu

@emil_priver @ryanwinchester Wait what, what happens if you are merging 2 histories with different notes? Does it like do a note merge conflict? Or just has a take the one you're merging from strategy or something?

English

2.3K

Emil Privér@emil_priver·9 Ağu

@ryanwinchester So 1 reason to use this is that you can update a note without rewriting history

English

2.8K

Emil Privér@emil_priver·9 Ağu

Did you know that you could add notes to your git commits?

English

1.1K

100.3K

Bassel Mabsout@bmabsout·7 Ağu

@VictorTaelin @Marc_Compere I love your approach, for scaling to harder problems, have you thought about how to know which parts of the function space to "focus" on? Gradient descent gets to know what part of the function is more important to change, I feel this is important for efficient search!

English

420

Taelin@VictorTaelin·7 Ağu

The whole point of gradient descent is that it is a fast way to find functions. That's all. The problem is that, to use it, we must accept the limitations of the underlying architecture. Attention is nothing but a terrible programming language, where the only primitive is querying a neural dictionary. Imagine implementing a website using a Python where the only structure is a neural dictionary. It would be... hard. And that's the language GPTs have to work with! So, while GD is great at finding functions, it finds them in a crappy programming language that is, in turn, limited in many ways. My hypothesis is that all the well known limitations of current AI models are inherited from this lack of expressivity of attention. Now, what if we could search functions in a real programming language, as fast as GD finds them under attention? IF that was the case, then, transformers would be entirely obsolete, and we could train a model capable of doing all the things that current LLMs are notoriously bad at. Just to be clear, I'm not claiming that what I posted is that. But it SEEMS to be the optimal way to find unknown functions (in a very deep theoretical sense), and, because of that, I suspect it could play a role in a different AI architecture that isn't restricted or bound by the limitations of attention. And that could result in more competent models, in the sense they'd be able to comfortably do things that GPTs struggle with.

English

5.8K

Taelin@VictorTaelin·7 Ağu

THE ALGORITHM IS COMPLETE 🥹 Finding XOR-XNOR: - Haskell: 2.8s - HVM: 0.0085s Based on the following tests: f(00100011) = 1011 f(10111001) = 0100 Solving for 'f' by search, we find: xor_xnor (0:0:xs) = 0 : 1 : xor_xnor xs xor_xnor (0:1:xs) = 1 : 0 : xor_xnor xs xor_xnor (1:0:xs) = 1 : 0 : xor_xnor xs xor_xnor (1:1:xs) = 0 : 1 : xor_xnor xs My best Haskell searcher, using the Omega Monad, takes 47m guesses. Meanwhile, the HVM searcher, using SUP Nodes, takes just 1.7m interactions, or 0.03 interactions per guess (!!!). This sounds too good to be true, so, before getting too excited, keep in mind *it is very very likely I'm doing something dumb*. As such, I request for validation. FP nerds: prove me wrong? (pls) I've published the Haskell code (and the full story, for these interested) below. Am I missing something? Is there some obvious way to optimize this Haskell search without changing the algorithm? If so, I'd love to hear it. Better embarrassed than pursuing the wrong idea 😅 Gist: gist.github.com/VictorTaelin/7…

English

569

64.8K

Bassel Mabsout@bmabsout·16 May

@VictorTaelin It's awesome :)

English

101

Taelin@VictorTaelin·16 May

@bmabsout that's the dream! but the tech is not there yet. I don't think we'll reach a huge audience anytime soon, but we must start somewhere. running high-level langs on GPUs is *hard*. with this release, we finally have something stable enough to be used. it is the very first step!

English

445

Taelin@VictorTaelin·16 May

so, when we get HOC's @ back, I'll make a proper post, but this might take 72h. for now, I'll nonchalantly announce that higherorderco dot com is up. years of research to put python inside gpus, and, there it is. or something kinda like it. I'll rest for now. see you soon 🥳

English

241

23.3K

Bassel Mabsout@bmabsout·28 Nis

@AndreaVicere @getjonwithit I think you're confusing ball and sphere, a sphere is usually used to refer to the surface itself while a ball refers to the space inside a sphere, then the boundary of a 3d ball is a 2d spherical space

English

Jonathan Gorard@getjonwithit·28 Nis

"The boundary of a boundary is always empty." A huge amount of (classical) physics, including much of general relativity and electromagnetism, can be deduced directly from this simple mathematical fact. Yet, on the surface, it doesn't seem to have much to do with physics. (1/10)

English

263

2.3K

484.1K

Bassel Mabsout@bmabsout·27 Nis

@VictorTaelin I do believe in evolution + grad descent though, I think their combination is more powerful than either alone

English

Taelin@VictorTaelin·27 Nis

@bmabsout It is not efficient if the space of effective programs look like this

English

400

Taelin@VictorTaelin·27 Nis

Guys I must be actually insane because I non-ironically think HOC will have AGI before OAI, and for reasons that seem so obvious to me? As in, if you throw a rock up, the rock will fall down. If you simulate evolution selecting for intelligence, with mass compute... 🧐

English

147

69.8K

Bassel Mabsout@bmabsout·27 Nis

@VictorTaelin True, that's how the loss space looks like if you're looking for one effective program, but not if you have 10000 programs and you are finding a function which makes programs that slightly improves on the total effectiveness

English

110

Bassel Mabsout@bmabsout·27 Nis

@VictorTaelin Gradient descent is very efficient though, it allows us to at least locally attribute how every variable in the system contributes solving the task at hand. I can't just know what lambda term to change to produce a better output. Also, memetic algs are evolution + grad descent

English

847

Taelin@VictorTaelin·27 Nis

just to be clear, I'm not saying I'm better than anyone; just that it makes so much logical sense that evolving intelligence by iterating small λ-terms would be much more efficient than the slow-gradient-descent-over-colossal-matrices approach. I just wonder where this is wrong

English

7.3K

Bassel Mabsout@bmabsout·30 Haz

@domenkozar Super cool! What's your opinion on dream2nix?

English

Domen Kožar@domenkozar·30 Haz

How do I communicate how cool this is? We've replaced a number of tools here with a few lines of configuration.

Cachix@cachix_org

If you're getting started with Nix, create a developer environment using devenv.sh: $ cat devenv.nix { pkgs, ... }: { languages.python.enable = true; languages.python.version = "3.7.16"; } $ devenv shell $ python --version Python 3.7.16

English

1.5K

Bassel Mabsout@bmabsout·17 Haz

@ereb0s_labs Might be the case that eventually gpt-2 will converge at a lower loss, but sounds like you've got a paper on your hands! Also it might be a function of hyperparameter tuning, since you've probably trained the model you propose multiple times on this dataset as you iterated

English

ereb0s@ereb0s_labs·16 Haz

I'm not sure I have enough reach with this account to get a proper response. I need help understanding if I can trust the results I'm seeing for a custom LLM model I've been researching. Yellow - GPT2 Purple - custom model More info below.

English

1.9K

Bassel Mabsout@bmabsout·22 Mar

@cs_kaplan @akivaw @jrdnfrd @keenanisalive Unless you allow for an infinite number of tiles, I don't see how a finite subsection can ever encode an infinite coordinate, isn't it guaranteed that the number of possible configurations of a finite number of tiles in a finitely sized region be finite?

English

Keenan Crane@keenanisalive·21 Mar

Pretty awesome discovery: a single shape that tiles the infinite plane without repetition. If you're staring straight down at a checkerboard, there's no way to tell where you are: every part looks the same. But here, the relative arrangement of tiles encodes your location.

English

443

48.2K

Keşfet

@EkaRobotics @haarnoja @aramh @asincole @tritlo @ppavel24 @HSVSphere @keenanisalive