Shayan Mohanty

1.7K posts

Shayan Mohanty

@shayanjm

Chief Data & AI Officer @Thoughtworks. Previously @watchfulio (Acq. TWKS) @Facebook.

San Francisco Katılım Temmuz 2011

1.1K Takip Edilen2.6K Takipçiler

Shayan Mohanty retweetledi

Thoughtworks@thoughtworks·17 Haz

We’re excited to welcome @shayanjm as our new Chief Data and AI Officer. A seasoned AI leader from Watchful, Facebook & Los Alamos Lab, Shayan will also lead global Data & AI service line, helping clients turn AI ambition into production-ready solutions: ter.li/exboac

English

1.3K

Shayan Mohanty@shayanjm·15 Oca

@martin_casado Would love to hop in!

English

martin_casado@martin_casado·14 Oca

Hey infra folks. We're standing up a new Discord server to discuss CS infra. If you want an invite DM me (reply and I'll follow). thanks!

English

949

970

171.9K

Shayan Mohanty retweetledi

VCs Congratulating Themselves 👏👏👏@VCBrags·13 Eki

Junior VC with no industry experience coaching portfolio companies on finding product market fit

VCs Congratulating Themselves 👏👏👏 tweet media

English

820

63.7K

Shayan Mohanty retweetledi

Crémieux@cremieuxrecueil·25 Tem

A new paper in Nature found that you cannot, in fact, train AIs on AI-generated data and expect them to continue improving. What happens is actually that the model collapses and ends up producing nonsense.

English

633

2.3K

16K

1.9M

Shayan Mohanty retweetledi

François Chollet@fchollet·16 May

If you want to demonstrate that it is impossible to do X (here, X="spot the bot"), you shouldn't ask random people to do X, you need to ask people who actually know how to do X (experts) and see how the *best* of them perform. Otherwise you get results like "22% of our judges cannot tell the difference between ELIZA and a human"

Cameron Jones@camrobjones

People judged GPT-4 to be human 54% of the time, compared to 22% for ELIZA and 67% for humans. The implication is that people are at chance in determining that GPT-4 is an AI, even though the study is powerful enough to detect differences from 50% accuracy.

English

339

113.3K

Shayan Mohanty@shayanjm·29 Oca

@semil Nduja is a criminally underrated topping Also plz drop the recipe to the cocktail, it sounds delicious

English

Semil@semil·29 Oca

49ers game snacks & cocktail from yesterday - made chorizo filled bacon-wrapped jalapeño poppers; crostini for nduja spread (someone gifted to me, never had this before - wow); mandarin-honey-sake cocktail.

English

3.1K

Shayan Mohanty retweetledi

Christoph Molnar 🦋 christophmolnar.bsky.social@ChristophMolnar·25 Oca

Evaluating a model on training data is like asking your mom if you look good.

English

257

2.1K

155.4K

Shayan Mohanty@shayanjm·25 Oca

@jxmnop Think about how many times vision evolved organically in nature. How many times has language evolved organically?

English

dr. jack morris@jxmnop·29 Ağu

An amazing mystery of machine learning right now is that state-of-the-art vision models are ~2B parameters (8 gigabytes) while our best text models are ~200B parameters (800 gb) why could this be? philosophically, are images inherently less complicated than text? (no right?)

English

348

107

1.5K

435.2K

Shayan Mohanty@shayanjm·17 Oca

@jxmnop On second thought - a less cop-out way might be through spectral decomposition. That way you're focusing more on aligning the geometry of the spaces, rather than learning an entirely new space.

English

Shayan Mohanty@shayanjm·17 Oca

@jxmnop maybe throw some adversarial training at it? If you have a dump of mixed embeddings, train a generator that tries to align the spaces, and a discriminator to predict if it's an original or aligned. Feels like a cop-out answer but would probably work maybe?

English

521

dr. jack morris@jxmnop·17 Oca

As an exercise in open science, gonna tweet the research problem I’m stuck on: i want to align two text embedding spaces in an unsupervised way. The motivation is that in my previous vec2text work, we have to know the embedding model and be able to query it. this is fine in today’s world where most people use openAI ada embeddings but when people move on to a better mode, my inversion models won’t work anymore. so i want to take embeddings from an *unknown* embedder and map them somehow to a space i know, like the openAI embedding space, then decode them Sounds hard, right? it definitely is. but my crazy idea is that all text embedding models are learning something very similar, embeddings lie on a low-dimensional manifold, and so given enough samples we should be able to align them. this is supported by some past research on unsupervised bilingual word embedding alignment (which works really well!) and also this fascinating line of research on “relative representations” where representing embeddings by their distances to known anchor points makes embeddings compatible between different spaces So i learned there’s this whole class of problems called “optimal transport” that’s exactly this, it’s the mathematical study of how to find the optimal mapping between two vector spaces. sounds perfect, right? sadly it doesn’t work very well, at least out-of-the-box. Given a thousand paired samples from two different embedding models A and B, the Sinkhorn algorithm can get about 1% accuracy (10x above random). Gromov-Wasserstein which tries to preserve cosine similarity can get a little bit better. If i use embeddings from two models from the same family i can get 20%. I tried using relative representations. this requires 100 or so paired anchor points from both embedders which is also a bottleneck. but using 100-dim relative representations sinkhorn gets 70% accuracy with no hparam tuning which is pretty good. but no one has figured out how to find anchor points without any supervision yet (although I think it’s probably possible) Also a supervised linear mapping between the two embedding spaces works super well, can get 90%+ accuracy, and i can invert the remapped embeddings with pretty good BLEU score but that’s cheating too. (also the true mapping is certainly nonlinear) both these algorithms again this require paired samples which is unrealistic. I want to be able to invert a random database of text embeddings without any paired samples. With enough entries i think it should be possible, just like we can infer an arbitrary substitution if we have enough encrypted data. anyway thats my progress so far! I am now extremely stuck. if you have any ideas please message me or reply to the thread

English

314

71.8K

Shayan Mohanty@shayanjm·12 Oca

@jxmnop This is super clever

English

521

dr. jack morris@jxmnop·11 Oca

fun research story about how we jailbroke the the chatGPT API: so every time you run inference with a language model like GPT-whatever, the model outputs a full probabilities over its entire vocabulary (~50,000 tokens) but when you use their API, OpenAI hides all this info from you, and just returns the top token -- or at best, the top 5 probabilities we needed the full vector (all 50,000 numbers!) for our research, so we developed a clever algorithm for recovering it by making many API calls important to know is that the API supports a parameter called "logit bias" which lets you upweight or downweight the probability of certain tokens. our insight was that we could run a binary search on the logit bias for each token to find the exact value that makes that token most likely, yielding the relative probability for that token to get a full next-token probability vector, we run 50,000 binary searches (it's actually not as expensive as you'd think) – shout out to @justintchiu for coming up with this and implementing it efficiently! and there's a bonus level: in the setting where openAI gives us the top-5 logprobs (available for some models), there's a much more efficient algorithm, with a pretty elegant solution in this setting, to get the probability for a certain token, you just add a really large fixed logit bias to it. given its new probability (which openAI will give you, since that token will be in the top 5 now) you can solve for its original probability in closed-form. since in this setting OpenAI provides probabilities for the top 5 tokens in a single API call, and we only have to run one call per token, this new method lets you get the full vector in 50,000/5≈1,000 queries funnily enough, after we posted the code for the binary search algorithm we got an email from fellow researcher @mattf1n with the math for the top-5 algorithm. and he followed it up with a pull request. nice guy! if you thought this was interesting: - want to run the algorithm yourself? check out the code here: github.com/justinchiu/ope… - want to read about it? see Section 5 of our paper Language Model Inversion: arxiv.org/abs/2311.13647

English

591

124.3K

Shayan Mohanty@shayanjm·11 Oca

@aashaysanghvi_ Right now - you have to deal with a ton of non determinism. Over time, a lot of it will be abstracted away in layers. Lots of research on this topic -- we've tried to quantify and categorize the nondeterminism here: watchful.io/blog/decoding-…

English

Aashay Sanghvi@aashaysanghvi_·11 Oca

Key variable for AI product builders: do you accept the role of non-determinism and build with it as a given? Or is the system eventually scoped and validated in a way where that's not the case?

English

2.7K

Shayan Mohanty@shayanjm·10 Oca

@burkov Yeah this bothered us too. So much so, we did some research to figure out how to use latent space geometry to just generate the entities directly: x.com/shayanjm/statu…

Shayan Mohanty@shayanjm

🚀 New year, new research drop: outperforming GPT-4 in synthetic text generation! Our new approach combines geometry and latent space magic to produce data that is faithful to the real stuff, and orders of magnitude cheaper to produce. Check out the full paper here: watchful.io/blog/navigatin… #NLP #AI #GenAI

English

1.6K

BURKOV@burkov·9 Oca

GPT-4 is officially annoying. You ask it to generate 100 entities. It generates 10 and says "I generated only 10. Now you can continue by yourself in the same way." You change the prompt by adding "I will not accept fewer than 100 entities." It generates 20 and says: "I stopped after 20 because generating 100 such entities would be extensive and time-consuming." What the hell, machine?

English

499

217

4.9K

1.5M

Shayan Mohanty@shayanjm·5 Oca

@visarga Hmm, what input did you use? Seems like it was somehow malformed for the OpenAI API.

English

visarga@visarga·5 Oca

@shayanjm doesn't work well

English

Shayan Mohanty@shayanjm·3 Oca

English

31.5K

Shayan Mohanty@shayanjm·5 Oca

@ShumingHu Try playing a bit with the cone angle & height. We try our best to estimate an optimal set of params but it sometimes is a bit off the mark. We did try this on sarcastic onion titles which is a similar problem, so it _should_ generate something reasonable if params are tuned

English

Shuming Hu@ShumingHu·4 Oca

@shayanjm Maybe it's limited to the embedding, it wasn't able to get these are dad jokes unlike chatgpt.

English

563

Shayan Mohanty@shayanjm·4 Oca

@ocolegro Try grabbing a few rows from a dataset -- or playing with the angle + cone height to get it to provide more varied outputs. Chances are it's overfitting really hard because you only gave 1 input, rather than several.

English

542

Owen Colegrove@ocolegro·4 Oca

@shayanjm Sounds really cool, what sample text should I use? I tried “what is the meaning of life?”, and it just repeated that back to me 10x times

English

708

Shayan Mohanty retweetledi

Don Park@donpark·4 Oca

enjoyed reading this. useful insights into structures in latent space.

Shayan Mohanty@shayanjm

English

724

Shayan Mohanty retweetledi

dr. jack morris@jxmnop·4 Oca

this was a neat little read. the idea is to do data augmentation with embeddings. they randomly sample around an embedding and then decode with vec2text. there is a trick to randomly sampling while not leaving the embedding manifold; they try to sample within an embedding "cone"

Shayan Mohanty@shayanjm

English

125

25.1K

Shayan Mohanty@shayanjm·4 Oca

@thomasahle Generally yeah - the problem is when you prompt you can't really anticipate how the merge is going to work

English

Thomas Ahle@thomasahle·4 Oca

@shayanjm Fun approach to merging! Does it work better than just giving both inputs to a different LLM and asking it politely to please merge?

English

132

Shayan Mohanty retweetledi

Thomas Ahle@thomasahle·4 Oca

Did anybody try using genetic programming to improve LLM Agent's prompts? You let a bunch of them run with somewhat different prompts/rules/guidelines. Then combine the best pairs to form the next generation. You could also just make mutations (asexual reproduction), that gives you more or less the "take a deep breath" paper (arxiv.org/pdf/2309.03409…)

English

18.4K

Keşfet

@martin_casado @semil @jxmnop @justintchiu @mattf1n @aashaysanghvi_ @burkov @elonmusk