Dev (@DevvMandal) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Dev@DevvMandal·13 Nis

Today we're launching the most advanced computer-use dataset in the world. 1,000+ hours of screen recordings along with mouse/keyboard inputs + annotations. Sourced from experts across coding, design, browser-use, research and more. Link in the comments :) @markov__ai

Dev@DevvMandal

Today, we're launching the world's largest open-source dataset of computer-use recordings. 10,000+ hours across Salesforce, Blender, Photoshop and more, to automate the next level of white-collar work. Link in the comments :) @markov__ai

English

27

41

374

45.3K

Dev@DevvMandal·6d

@localhostIND hell yeah

English

0

2

253

LocalHost India@localhostIND·6d

Finding the right soundtrack for a scene is still weirdly hard. Adithya is trying to fix that. Introducing taan, it watches your video and generates music around how the scene actually feels.

English

22

24

175

18.5K

Dev@DevvMandal·6 May

@interface4AGI sahil my goatt

English

2

0

6

316

sahil@_sahildhull·6 May

the input interface has been the same for decades. with ai, software can now reason and act on your behalf but the interface is the bottleneck. why do i have to check sushi on 10 restaurants across 3 apps? why can't i just do it with a flick of a finger?! the world's about to get a new interface @agi_interfaces

English

93

75

267

76.7K

Dev retweetledi

Corbin Rosset@corby_rosset·21 Nis

How do you tell if a computer use agent actually succeeded? It’s really two questions: did it execute well (process), and did the user actually get what they asked for (outcome)? Introducing the Universal Verifier 🧵

English

3

14

31

3K

Dev@DevvMandal·14 Nis

@alokbishoyi97 GOAT

English

1

0

1

491

Alok Bishoyi@alokbishoyi97·12 Nis

for those of you who are autoresearch pilled , or have been meaning to get into autoresearch but dont know how - I shipped evo today - a opensource Claude Code plugin that optimizes code through experiments you hand it a codebase. it finds a benchmark, runs the baseline, then fires off parallel agents to try to beat it. kept if better, discarded if worse. inspired by @karpathy's autoresearch, but with structure on top: - tree search over greedy hill-climb — multiple forks from any committed node - N parallel agents in git worktrees - shared failure traces so agents don't repeat each other's mistakes - regression gates

English

49

71

1.4K

179.8K

Dev@DevvMandal·13 Nis

Data samples at markovstudios.com/data_samples

Filipino

2

0

8

1.5K

Dev@DevvMandal·13 Nis

Today we're launching the most advanced computer-use dataset in the world. 1,000+ hours of screen recordings along with mouse/keyboard inputs + annotations. Sourced from experts across coding, design, browser-use, research and more. Link in the comments :) @markov__ai

Dev@DevvMandal

Today, we're launching the world's largest open-source dataset of computer-use recordings. 10,000+ hours across Salesforce, Blender, Photoshop and more, to automate the next level of white-collar work. Link in the comments :) @markov__ai

English

27

41

374

45.3K

Dev@DevvMandal·12 Nis

Tomorrow, we're launching the world's most advanced computer-use dataset. Stay tuned :) @markov__ai

Dev@DevvMandal

Today, we're launching the world's largest open-source dataset of computer-use recordings. 10,000+ hours across Salesforce, Blender, Photoshop and more, to automate the next level of white-collar work. Link in the comments :) @markov__ai

English

16

28

412

41.6K

Dev@DevvMandal·11 Nis

@KunalKSavita firee

English

0

1

98

Goonal@KunalKSavita·11 Nis

dropping soon

English

5

0

24

785

Dev@DevvMandal·11 Nis

@mike64_t this is one of the coolest things i've ever come across

English

0

2

105

mike64_t@mike64_t·11 Nis

I think I can finally report some success training a quite accurate IDM capable of recovering keystrokes from Minecraft gameplay, even in quite PvP-heavy situations. At this point the model does not only know what keys are pressed to the extent reasonably discernible, it also knows how fast it is moving in 3D space at all times, even when knockback is mixing with the self-move impulse. Now, recovering keystrokes from normal external capture footage is just about impossible. E.g. W/A/S/D does exactly nothing during partial tick frames and jumping mid-air is also equally useless, so asking the model to recover key down states is inherently unreasoanble. Mouse deltas are also completely arbitrary units, as game mouse sensitivity introduces an arbitrary scale factor into the equation. The only good option is to think carefully about your model-environment contract, and only record "logical actions", not raw keystrokes. So here's a few unfortunate lessons I had to learn in roughly this order. - Choose good units. (bad: mouse deltas, good: delta radians [yes, you will need game-internal state]) - Capture from inside the main game loop and read the game fbo to get consistent frame-action pairing. Doing post-mortem pairing is hopeless. - Carefully define when you think keystrokes actually have an effect. (jump only works on ground, when flying or in water etc.) More subtle: The key may already be down, but no tick has happened yet to actually use the value. Hence: ignore Seperate gamestate into "fast and slow-moving" components. E.g. movement is likely tick based, camera rotation is very likely updated every frame in essentially every game ever. - Think about your frame-action correspondance contract (How old is the frame in relation to the inputs you capture? Will double or tripple buffering affect you?) Think about the game loop timeline, where you are sampling, how old the data you are reading is, and where the ticks are happening around you. Language models used to simply not have a model-environment contract, but even now with the model "living" in a designated harness, the contract still boils down to formatting, and tool implementation intrinsics. While also important, it is still quite a bit more obvious because the violations are in some way shape or form reflected as text you can actually see. - ffmpeg dropping frames cummulatively screws the model the further you get into the sequence because your targets are now shifted. If you can't encode the video in real-time, too bad. - Sodium has a frames in flight system different from vanilla Minecraft, which will also offset your targets from your frames. (there goes that data...) - Models are succeptible to latency. If there is too big of a delay between action and on-screen reflection, your performance degrades. At this point I realize ~100hours of gameplay is essentially no longer usable as a dataset. You can train on this data, but all you'll get is a mushy mess. However, some good news: - Making the model predict physics gamestate scalars helps the model generalize. For instantaneous events like jump, it's unreasonable to ask the model emit a short burst of jump=true at exactly the right time, however if you also predict your current y-velocity, the model has supervision signal for the "latent" from which that onground jump becomes apparent. Recovering x/z motion is also somewhat easier than unmixing it into plausible keystrokes for inertia-heavy player controller logic. - Regressing physics gamestate scalars also seems to make your dataset "bigger". While pure keystroke classification will overfit quickly, predicting exact physics gamestate scalars forces the model to generalize more and you can tolerate far more epochs before validation loss starts to stall out. This is the only reason why it was bearable to dump 100h+ of dataset hours and replace it with ~3 hours of gameplay after the 4th revision of the file format (yeah...) and somehow still have better performance. Now, you might be asking, "isn't this brittle?" and the answer is yesn't. Frame-action correspondance matters for training, but not so much during inference. So as long as you are sampling in roughly the same interval as your training data, you aren't violating any hard contract per-se. Somewhere around the frames ticks are happening, and during training you capture various tick-capture offset relations per random chance, so nothing is too obviously wrong here. HOWEVER, you will get screwed by gui scale, shaders, resource packs, "shit that recording is 1920x1040 because somebody doesn't know fullscreen exists" and other unfortunate edge cases of reality. But I suppose this is the role of dataset size. If all those "contract violations" that a youtube video has compared to the training data are addressed, I think this is a way to turn Youtube into a labeled dataset. I could never shake the feeling that VPT is a sound idea in practice, while never having been properly executed, and I think one reason why it hasn't is because that label boostrapping part is just a pain in the butt to get right. Now, what the player is doing is of course not the only label you can extract from video, but it has to be one of the targets predicted during pretraining to "align" the pretraining objective. Some notes on the video here, the colored dots on the analog visualizer are the ground truth, while the gray dot is the model prediction. Green means correct prediction, red means incorrect prediction at that frame. Model P(key) reports how wrong the prediction is from green (0.0) to red (1.0). You will also notice that during periods of rapid slow down, left and right actions become close to irrecoverable, because there is just that little motion. And some jump actions are not predicted correctly because I got the detection condition for jump events wrong... (duh) LMB/RMB for other than sustained events (like item-consume and block break) also seem to be hopelessly irrecoverable for now. Swing was supposed to do the same thing as motion y did for jump, but its too well behaved as an increasing counter. Maybe partial-tick interpolated values work better (v5 file format then... ugh..)

English

17

15

268

18.7K

Dev@DevvMandal·11 Nis

@KunalKSavita Don toliver next to claude code is diabolical

English

0

1

120

Goonal@KunalKSavita·11 Nis

over the last few days, i’ve been working on something really cool. dropping tomorrow :)

English

6

0

25

978

Dev@DevvMandal·8 Nis

@eddybuild SO FIRE

English

0

487

Eddy Xu@eddybuild·8 Nis

introducing Egocentric-1M. the largest egocentric video dataset in the world, and our next step in building the internet for physical AI.

Eddy Xu@eddybuild

today, we’re open sourcing the largest egocentric dataset in history. - 10,000 hours - 2,153 factory workers - 1,080,000,000 frames the era of data scaling in robotics is here. (thread)

English

115

89

1.5K

322.9K

Dev@DevvMandal·3 Nis

Had a great time chatting with Supreeth from @Analyticsindiam about Markov and our vision for the future! analyticsindiamag.com/ai-features/me…

English

12

2

76

4.4K

Dev@DevvMandal·3 Nis

No one does storytelling as awesome as Vikrant 🫡

Vikrant Patankar@vikpat

nobody asked me to turn a $25M Series A announcement into a musical i did it anyway and here's how you win in 2026

English

1

0

26

2.7K

Dev@DevvMandal·28 Mar

computer-use is the natural first application of training ai from video first - there's huge economic value in automating enterprise workflows second - large scale demonstrations already exist on the internet third - the constraints are favourable for RL - action space is only mouse and keyboard - whole state is captured on the screen - feedback loops are tight, any action you/the agent makes, there's an immediate change in the ui - (mostly) verifiable rewards we have so much cooking at @markov__ai :)

Dev@DevvMandal

Today, we're launching the world's largest open-source dataset of computer-use recordings. 10,000+ hours across Salesforce, Blender, Photoshop and more, to automate the next level of white-collar work. Link in the comments :) @markov__ai

English

5

8

81

8.2K

Dev@DevvMandal·26 Mar

@TusharGoswamy @SarvamAI fire

English

0

3

144

Tushar Goswamy@TusharGoswamy·26 Mar

The last 2 months have been exciting in AI developments and for my personal journey of contributing to @SarvamAI 's mission to build sovereign AI for India. Sharing some interesting updates below: (1/6)

English

7

3

96

4.8K

Dev@DevvMandal·22 Mar

@Finstor85 @markov__ai dm :)

0

734

Ameya@Finstor85·22 Mar

@DevvMandal @markov__ai Great work Dev!! This is phenomenal. I have some questions. What's best way to reach out?

English

1

0

1

957

Dev@DevvMandal·12 Mar

Today, we're launching the world's largest open-source dataset of computer-use recordings. 10,000+ hours across Salesforce, Blender, Photoshop and more, to automate the next level of white-collar work. Link in the comments :) @markov__ai

English

91

198

1.8K

457.5K

Dev@DevvMandal·21 Mar

Now at 100k+ downloads on HuggingFace!

Dev@DevvMandal

Today, we're launching the world's largest open-source dataset of computer-use recordings. 10,000+ hours across Salesforce, Blender, Photoshop and more, to automate the next level of white-collar work. Link in the comments :) @markov__ai

English

3

13

111

10.3K

Dev@DevvMandal·18 Mar

cal.com/dev-markov/30m…

ZXX

0

5

644

Dev@DevvMandal·18 Mar

Next week, we’re launching the largest computer-use data collection platform to automate the next level of white collar work. Book a meeting with me (link below) for a sample of the data :)

Dev@DevvMandal

Today, we're launching the world's largest open-source dataset of computer-use recordings. 10,000+ hours across Salesforce, Blender, Photoshop and more, to automate the next level of white-collar work. Link in the comments :) @markov__ai

English

3

5

55

6.3K