Shayan Mohanty

1.7K posts

Shayan Mohanty banner
Shayan Mohanty

Shayan Mohanty

@shayanjm

Chief Data & AI Officer @Thoughtworks. Previously @watchfulio (Acq. TWKS) @Facebook.

San Francisco Katılım Temmuz 2011
1.1K Takip Edilen2.6K Takipçiler
Shayan Mohanty retweetledi
Thoughtworks
Thoughtworks@thoughtworks·
We’re excited to welcome @shayanjm as our new Chief Data and AI Officer. A seasoned AI leader from Watchful, Facebook & Los Alamos Lab, Shayan will also lead global Data & AI service line, helping clients turn AI ambition into production-ready solutions: ter.li/exboac
Thoughtworks tweet media
English
0
4
2
1.3K
martin_casado
martin_casado@martin_casado·
Hey infra folks. We're standing up a new Discord server to discuss CS infra. If you want an invite DM me (reply and I'll follow). thanks!
English
949
32
970
171.9K
Shayan Mohanty retweetledi
Crémieux
Crémieux@cremieuxrecueil·
A new paper in Nature found that you cannot, in fact, train AIs on AI-generated data and expect them to continue improving. What happens is actually that the model collapses and ends up producing nonsense.
Crémieux tweet media
English
633
2.3K
16K
1.9M
Shayan Mohanty retweetledi
François Chollet
François Chollet@fchollet·
If you want to demonstrate that it is impossible to do X (here, X="spot the bot"), you shouldn't ask random people to do X, you need to ask people who actually know how to do X (experts) and see how the *best* of them perform. Otherwise you get results like "22% of our judges cannot tell the difference between ELIZA and a human"
Cameron Jones@camrobjones

People judged GPT-4 to be human 54% of the time, compared to 22% for ELIZA and 67% for humans. The implication is that people are at chance in determining that GPT-4 is an AI, even though the study is powerful enough to detect differences from 50% accuracy.

English
20
40
339
113.3K
Shayan Mohanty
Shayan Mohanty@shayanjm·
@semil Nduja is a criminally underrated topping Also plz drop the recipe to the cocktail, it sounds delicious
English
0
0
0
47
Semil
Semil@semil·
49ers game snacks & cocktail from yesterday - made chorizo filled bacon-wrapped jalapeño poppers; crostini for nduja spread (someone gifted to me, never had this before - wow); mandarin-honey-sake cocktail.
Semil tweet mediaSemil tweet mediaSemil tweet mediaSemil tweet media
English
2
1
14
3.1K
Shayan Mohanty
Shayan Mohanty@shayanjm·
@jxmnop Think about how many times vision evolved organically in nature. How many times has language evolved organically?
English
0
0
1
41
dr. jack morris
dr. jack morris@jxmnop·
An amazing mystery of machine learning right now is that state-of-the-art vision models are ~2B parameters (8 gigabytes) while our best text models are ~200B parameters (800 gb) why could this be? philosophically, are images inherently less complicated than text? (no right?)
English
348
107
1.5K
435.2K
Shayan Mohanty
Shayan Mohanty@shayanjm·
@jxmnop On second thought - a less cop-out way might be through spectral decomposition. That way you're focusing more on aligning the geometry of the spaces, rather than learning an entirely new space.
English
1
0
1
47
Shayan Mohanty
Shayan Mohanty@shayanjm·
@jxmnop maybe throw some adversarial training at it? If you have a dump of mixed embeddings, train a generator that tries to align the spaces, and a discriminator to predict if it's an original or aligned. Feels like a cop-out answer but would probably work maybe?
English
1
0
2
521
dr. jack morris
dr. jack morris@jxmnop·
As an exercise in open science, gonna tweet the research problem I’m stuck on: i want to align two text embedding spaces in an unsupervised way. The motivation is that in my previous vec2text work, we have to know the embedding model and be able to query it. this is fine in today’s world where most people use openAI ada embeddings but when people move on to a better mode, my inversion models won’t work anymore. so i want to take embeddings from an *unknown* embedder and map them somehow to a space i know, like the openAI embedding space, then decode them Sounds hard, right? it definitely is. but my crazy idea is that all text embedding models are learning something very similar, embeddings lie on a low-dimensional manifold, and so given enough samples we should be able to align them. this is supported by some past research on unsupervised bilingual word embedding alignment (which works really well!) and also this fascinating line of research on “relative representations” where representing embeddings by their distances to known anchor points makes embeddings compatible between different spaces So i learned there’s this whole class of problems called “optimal transport” that’s exactly this, it’s the mathematical study of how to find the optimal mapping between two vector spaces. sounds perfect, right? sadly it doesn’t work very well, at least out-of-the-box. Given a thousand paired samples from two different embedding models A and B, the Sinkhorn algorithm can get about 1% accuracy (10x above random). Gromov-Wasserstein which tries to preserve cosine similarity can get a little bit better. If i use embeddings from two models from the same family i can get 20%. I tried using relative representations. this requires 100 or so paired anchor points from both embedders which is also a bottleneck. but using 100-dim relative representations sinkhorn gets 70% accuracy with no hparam tuning which is pretty good. but no one has figured out how to find anchor points without any supervision yet (although I think it’s probably possible) Also a supervised linear mapping between the two embedding spaces works super well, can get 90%+ accuracy, and i can invert the remapped embeddings with pretty good BLEU score but that’s cheating too. (also the true mapping is certainly nonlinear) both these algorithms again this require paired samples which is unrealistic. I want to be able to invert a random database of text embeddings without any paired samples. With enough entries i think it should be possible, just like we can infer an arbitrary substitution if we have enough encrypted data. anyway thats my progress so far! I am now extremely stuck. if you have any ideas please message me or reply to the thread
English
51
22
314
71.8K
dr. jack morris
dr. jack morris@jxmnop·
fun research story about how we jailbroke the the chatGPT API: so every time you run inference with a language model like GPT-whatever, the model outputs a full probabilities over its entire vocabulary (~50,000 tokens) but when you use their API, OpenAI hides all this info from you, and just returns the top token -- or at best, the top 5 probabilities we needed the full vector (all 50,000 numbers!) for our research, so we developed a clever algorithm for recovering it by making many API calls important to know is that the API supports a parameter called "logit bias" which lets you upweight or downweight the probability of certain tokens. our insight was that we could run a binary search on the logit bias for each token to find the exact value that makes that token most likely, yielding the relative probability for that token to get a full next-token probability vector, we run 50,000 binary searches (it's actually not as expensive as you'd think) – shout out to @justintchiu for coming up with this and implementing it efficiently! and there's a bonus level: in the setting where openAI gives us the top-5 logprobs (available for some models), there's a much more efficient algorithm, with a pretty elegant solution in this setting, to get the probability for a certain token, you just add a really large fixed logit bias to it. given its new probability (which openAI will give you, since that token will be in the top 5 now) you can solve for its original probability in closed-form. since in this setting OpenAI provides probabilities for the top 5 tokens in a single API call, and we only have to run one call per token, this new method lets you get the full vector in 50,000/5≈1,000 queries funnily enough, after we posted the code for the binary search algorithm we got an email from fellow researcher @mattf1n with the math for the top-5 algorithm. and he followed it up with a pull request. nice guy! if you thought this was interesting: - want to run the algorithm yourself? check out the code here: github.com/justinchiu/ope… - want to read about it? see Section 5 of our paper Language Model Inversion: arxiv.org/abs/2311.13647
English
15
63
591
124.3K
Shayan Mohanty
Shayan Mohanty@shayanjm·
@aashaysanghvi_ Right now - you have to deal with a ton of non determinism. Over time, a lot of it will be abstracted away in layers. Lots of research on this topic -- we've tried to quantify and categorize the nondeterminism here: watchful.io/blog/decoding-…
English
0
0
1
55
Aashay Sanghvi
Aashay Sanghvi@aashaysanghvi_·
Key variable for AI product builders: do you accept the role of non-determinism and build with it as a given? Or is the system eventually scoped and validated in a way where that's not the case?
English
5
0
8
2.7K
BURKOV
BURKOV@burkov·
GPT-4 is officially annoying. You ask it to generate 100 entities. It generates 10 and says "I generated only 10. Now you can continue by yourself in the same way." You change the prompt by adding "I will not accept fewer than 100 entities." It generates 20 and says: "I stopped after 20 because generating 100 such entities would be extensive and time-consuming." What the hell, machine?
English
499
217
4.9K
1.5M
Shayan Mohanty
Shayan Mohanty@shayanjm·
@visarga Hmm, what input did you use? Seems like it was somehow malformed for the OpenAI API.
English
0
0
0
10
Shayan Mohanty
Shayan Mohanty@shayanjm·
🚀 New year, new research drop: outperforming GPT-4 in synthetic text generation! Our new approach combines geometry and latent space magic to produce data that is faithful to the real stuff, and orders of magnitude cheaper to produce. Check out the full paper here: watchful.io/blog/navigatin… #NLP #AI #GenAI
English
6
19
98
31.5K
Shayan Mohanty
Shayan Mohanty@shayanjm·
@ShumingHu Try playing a bit with the cone angle & height. We try our best to estimate an optimal set of params but it sometimes is a bit off the mark. We did try this on sarcastic onion titles which is a similar problem, so it _should_ generate something reasonable if params are tuned
English
0
0
1
1K
Shuming Hu
Shuming Hu@ShumingHu·
@shayanjm Maybe it's limited to the embedding, it wasn't able to get these are dad jokes unlike chatgpt.
Shuming Hu tweet mediaShuming Hu tweet media
English
1
0
2
563
Shayan Mohanty
Shayan Mohanty@shayanjm·
@ocolegro Try grabbing a few rows from a dataset -- or playing with the angle + cone height to get it to provide more varied outputs. Chances are it's overfitting really hard because you only gave 1 input, rather than several.
English
0
0
2
542
Owen Colegrove
Owen Colegrove@ocolegro·
@shayanjm Sounds really cool, what sample text should I use? I tried “what is the meaning of life?”, and it just repeated that back to me 10x times
English
1
0
1
708
Shayan Mohanty retweetledi
dr. jack morris
dr. jack morris@jxmnop·
this was a neat little read. the idea is to do data augmentation with embeddings. they randomly sample around an embedding and then decode with vec2text. there is a trick to randomly sampling while not leaving the embedding manifold; they try to sample within an embedding "cone"
Shayan Mohanty@shayanjm

🚀 New year, new research drop: outperforming GPT-4 in synthetic text generation! Our new approach combines geometry and latent space magic to produce data that is faithful to the real stuff, and orders of magnitude cheaper to produce. Check out the full paper here: watchful.io/blog/navigatin… #NLP #AI #GenAI

English
4
11
125
25.1K
Shayan Mohanty
Shayan Mohanty@shayanjm·
@thomasahle Generally yeah - the problem is when you prompt you can't really anticipate how the merge is going to work
English
0
0
1
28
Thomas Ahle
Thomas Ahle@thomasahle·
@shayanjm Fun approach to merging! Does it work better than just giving both inputs to a different LLM and asking it politely to please merge?
English
1
0
1
132
Shayan Mohanty retweetledi
Thomas Ahle
Thomas Ahle@thomasahle·
Did anybody try using genetic programming to improve LLM Agent's prompts? You let a bunch of them run with somewhat different prompts/rules/guidelines. Then combine the best pairs to form the next generation. You could also just make mutations (asexual reproduction), that gives you more or less the "take a deep breath" paper (arxiv.org/pdf/2309.03409…)
English
18
7
59
18.4K