Rapidata

83 posts

Rapidata banner
Rapidata

Rapidata

@RapidataAI

Your one stop shop for real human data annotations, feedback and opinions.

Katılım Haziran 2024
123 Takip Edilen71 Takipçiler
Rapidata retweetledi
Karan Bhatia
Karan Bhatia@karanbhatia5757·
-> 🤖👥 Rapidata raised $8.5M seed led by Canaan Partners & IA Ventures with Acequia Capital and BlueYard. The platform delivers global human feedback for AI training in hours instead of weeks, turning model improvement into a continuous loop via on-demand data labeling. ⚡ -> 🧠 Led by Jason Corkill, Rapidata distributes micro-tasks across consumer apps to reach millions of users daily and match expertise to questions. Result: real-world evaluation, faster iteration & daily model upgrades, removing one of AI’s biggest bottlenecks: human feedback. 🚀 Read More At: menlotimes.com/post/rapidata-… Source: rapidata.ai/press
English
0
1
1
73
Rapidata
Rapidata@RapidataAI·
Today we’re announcing an $8.5M seed round led by @canaanpartners and @iaventures , with participation from @AcequiaCapital and @blueyard. Modern AI systems depend on large volumes of human feedback to train and evaluate — but collecting that data at scale has been slow and fragmented. Rapidata enables teams to gather targeted feedback from real people worldwide on demand, compressing cycles that once took months into hours or days and allowing continuous model improvement. We’ll use this funding to expand our global human data network as demand for fast, high-quality human insight continues to grow. Turning human insight into scalable infrastructure for AI.
English
4
7
12
1.2K
Rapidata
Rapidata@RapidataAI·
Happiness plot twist: Sudan = happy. Poland... less so. Global vibes are not what you think. At least according to Rapidata's latest global happiness index. We asked 11,000 people across 110 countries how happy they are right now, with some pretty surprising results. While results like these obviously obscure some pretty important caveats (Sudan is in the middle of a brutal civil war and humanitarian crisis, and mobile phone and internet access is scarce, indicating a skew in the likely respondents, or alternatively the value of safety in conflict zones), however it does tell us something about global trends like poorer countries ranking higher (maybe money doesn't buy happiness after all). The most interesting takeaway? The unique global reach Rapidata has and the ability to gather (near) instant global results.
Rapidata tweet media
English
1
0
0
87
croissanthology
croissanthology@croissanthology·
tl;dr: With N=974 to the original study we tried to replicate’s N=45, we find that Loftus and Palmer's 1974 findings do not replicate, and that people who brandish this study as strong evidence that humans overwrite their own memories based on slightly different phrasings in post-event descriptions of an incident are probably wrong to do so. Put concretely: when one is shown a video of cars crashing, and someone asks how fast they were going when they "smashed", one will NOT estimate a [very] different speed than someone who was told they "contacted". Funnily enough, we found that people estimated slightly FASTER speeds for "contacted" instead of "smashed", despite the former presumably being less "violent" than the latter. Data and everything you'd need to conduct this exact experiment yourself are in this google drive: drive.google.com/drive/folders/… The original study: drive.google.com/file/d/1xssoDh… I'm grateful for the 446 humans in @Aella_Girl's audience who watched all 3 videos diligently before providing their estimates. The following statistics exist thanks to people who collectively decided to sacrifice 78 human-hours "for science". Thanks! (According to @GuidedTrack, people spent an average of 4min46secs on the survey, * 974.) By comparison, the ~12 hours I spent with a few other humans getting this to happen feels paltry. Yesterday's study: 1) We showed people three 10s videos of car accidents [out of a set of 6, on YouTube here: youtube.com/watch?v=WD-w8n…, all videos picked by @mold_time] 2) We asked them unrelated-to-our-results questions about the videos, including an open-ended section where they had to explain the accident in 1-2 sentences (this hopefully made people focus on their memory of the accident). 3) Among those questions we asked them how many miles per hour they estimated the cars in the video were going at when they [verbed] each other; each subject got one of the following [verbs] at random: contacted, hit, bumped, collided, smashed. 4) The 1974 study found the following average estimated speeds for each verb: "Smashed": 40.8 mph (N=9) <-- FASTEST "Collided": 39.3 mph (N=9) "Bumped": 38.1 mph (N=9) "Hit": 34.0 mph (N=9) "Contacted": 31.8 mph (N=9) <--- SLOWEST And we found THIS for the 445 people who finished the survey and got it into the CSV (adding in everyone else doesn't change the data much, but we can afford picking the "perfect" survey runs only bc we have so much data): "Contacted": 31.7 mph (N=83) <-- FASTEST "Hit": 31.4 mph (N=94) "Collided": 30.5 mph (N=91) "Bumped"  29.7 mph (N=84) "Smashed": 27.5 mph (N=93) <--- SLOWEST [The 1974 study did NOT use the same car crash videos! So you can expect different total estimated speeds, given these are DIFFERENT car accidents than in the original study. Unfortunately, due to VCR "linkrot", the original recordings were lost. This is very sad. c.f. x.com/croissantholog… or croissanthology.com/archiving for how I'm trying to avoid this kind of mistake myself.] Thanks to @mold_time for doing the @RProgramming on this one and sending it to me (I had simply fed the CSV into Claude Sonnet 4.5, which spit out different numbers than @mold_time after coding up R itself; I'll trust the human on this one). In other words: not only is there a smaller range in the estimates here (i.e. the wordswap did not change as much to people's estimates as they supposedly did in the original study) but also the RANK-ORDER of those estimates is nearly completely *flipped!* ---------------------- M E T A ----------------------- One caveat: the tweet all subjects found the link in hyped up the study as "watch some cars crash into each other" [see screenshot in QTed tweet]. I think this is probably Aella's default way of signaling to her audience that this is a short survey people can take if they want to do something slightly fun and slightly helpful for 5 minutes. Only, in worlds where Loftus and Palmer are correct about human nature, this plausibly could've "flattened" the subjects' guesses into the same speed or so. However: 1) this wouldn't explain the different RANK-ORDER of verbs in our study 2) why would "crash" have more of an effect on subject's minds than the verb they encountered seconds ago, before writing down their estimate? I predict Loftus and Palmer would predict the latter would have a much stronger effect. 3) Again, different crash videos [because people don't have enough respect for archiving information], so it would be difficult to tell how much the data might've been impacted here by "crash" versus the 1974 study. Still, I'm disappointed I didn't notice this until @theomachist brought it up, and would've preferred slightly "purer" data at the expense of some N had the phrasing been something more boring like "watch traffic accidents". As it happens though, I DON'T predict this would change much to our data! I do not think we live in worlds where Loftus and Palmer are right about human nature: it's a deontology thing. Also, unlike what this QTed tweet may've led people to believe, there're a ton of attempted replications out there already on this study. I haven't nor really want to conduct a "literature review" (I vaguely wanted to today, but other cool things came up), and humbly put forth this replication as one extra contribution to anybody who DOES want to conduct a literature review. It is true that Loftus and Palmer 1974 is regularly brought up in psych 101 classes and that it's been used in hundreds to thousands of court cases. It's also true that "45 students" is a ridiculously small N. A large branch of the civilization the livelihoods of you, your mom, and me depend on SHOULDN'T have put so much faith in a study with that N. We need to do better. But that's a very hard thing to do! I messed up this study and my presentation of this study in a few different ways. I couldn't code up a survey without an LLM and Aella's help. And most people do not have access to a researcher with 240K twitter followers that can automatically provide a huge N! And while writing this tweet, I noticed myself getting tired and making several *incredibly dumb* errors like transcription between the original study's number in one tab and the tweet in THIS tab; what about actual academic papers? How likely are *academics* to be tired when they write up the finishing touches on their papers? Maybe quite a bit! It might be hard to make a study without inserting a few errors! [You might say LLMs are a point of hope on the horizon, and indeed Claude fixed a few mistakes in this tweet. I was tempted to get Claude Sonnet 4.5 to do everything for me in the actual study (survey programming, CSV parsing) and when I tried, Claude did WORSE than the human alternatives. Perhaps in 6 months / a year LLMs will be able to pull off experiments like this, but currently this doesn't seem to be the case (though as LLMs are less "tools" and more "vaguely general intelligences", this could very well be a skill issue on my part. See also @nearcyan's own near.blog/llms-are-stran…).] In other words doing science is fairly hard, not just for croissant-themed pseudonymous twitter accounts but also presumably if it's your full-time job. Yet we need to be better at it! Because our current level of research quality as a species is NOT impressive. It gets worse! Recall that Loftus and Palmer 1974 has "captured" an entire branch of our civilization's decision-making, and that running a failed replication with 10x the N is NOT sufficient to "free" that branch of civilization from a false notion of human nature. That is a much more difficult task, for which I'm grateful time is on our side [see QTed tweet] but that no doubt is even more difficult to pull off than mere "science". There's a lot of work cut out for us... but as always, one must pull up one's sleeve and solve the damn thing instead of doing whatever else people are doing when they encounter problems (croissanthology.com/solution). So that's my off-the-cuff post-mortem of participating in my first twitter survey / attempted-replication! Good day to everyone.
YouTube video
YouTube
croissanthology@croissanthology

There's a study Claude Sonnet 4.5 claims is used regularly in psych 101 classes and has been thrown around in court literally thousands of times, with an N of FORTY FIVE. That is, they put 45 humans in a room and asked for their estimates of the speed those cars were going at after giving them each a one-word-difference description of the accident (e.g. "smash" instead "hit"), and decided those results were enough to justify the idea that humans by default rewire memories of an incident based on tiny changes in word choice. The study is here: drive.google.com/file/d/1xssoDh… And it's from THE YEAR OF OUR LORD NINETEEN-SEVENTY-FOUR. Yet according to @mold_time, from whom I learned about it, there are essentially *no* replications of it out there. A significant branch of my civilization, from which my wealth, health and friends depend on, *has been operating on the basis that a few researchers' opinions of the whims of 45 humans generalize to all of human nature*. It is insane. You might think it's fairly difficult to run an experiment of this kind. Only, as @mold_time clearly attempted to instill in me today, it's trivially easy. I write this from a chair in Lighthaven, a complex just a few minutes away from Berkeley where I could've tested hypotheses by walking up to 45 different students. I could've instead pestered my Twitter simcluster, which altogether might comprise of ~45 individuals eager to take this test ✨for science✨. I could've blown a few hundred bucks on getting strangers online to do this! (Instead, as it happened, @Aella_Girl was in the room and offered to dump the link on her timeline, so the current N for our replication of this study is 609 and counting. Also she did most of the coding. Thanks Aella). Indeed, designing a survey that replicates the original paper is as easy as plugging in the rules for @GuidedTrack into Claude Sonnet 4.5 along with pdf of the study (which does generate bugs, all of which were fixed by Aella—though I am confident I could've done it myself given an extra hour, and the main thing you🫵 can't necessarily meta-replicate about my replication is "researcher with 240K twitter followers in the room" though as noted earlier this doesn't stop you from being able to get AT LEAST forty-five subjects in your study for ~a day of labor on your part. See also e.g. "@gwern so poor he has to put up with a moldy floor for a while, but also decided to spend some of his money on surveys asking how often people buy socks: #sock-surveys" target="_blank" rel="nofollow noopener">gwern.net/socks#sock-sur…. Not running surveys on whatever you're curious about is a skill issue and not really a material constraint in this century of abundance). I'm grateful @mold_time was around to finally get me to try replicating a study, which is indeed a valuable life experience, and that they're providing the valuable service of having generated a very-much-not-exhaustive list of papers that have few replications because our civilization is insane but would be easy to replicate in an afternoon—you might see me replicate 1-2 more of these in the coming weeks, out of spite alone. You may not like it, but the screenshot below is what the frontier of actually-robust non-bullshit science in psychology looks like. There are plausibly a hundred papers on the same level of importance as this one with ridiculously small Ns, and our civilization might truly be so bizarre that the way it ends up fixing its glaring epistemic lacunae is via tweets with AI-generated cartoon images and links to afternoon-long-partially-vibe-coded surveys from well-known sex researchers. Though of course, replicating a study is less than half the battle. It's not like I can just *stroll up* to google scholar and get this submitted...? Academia is maybe just a tiny bit more ossified than that: equilibriabook.com/an-equilibrium… The GOOD NEWS is the "centralization forces" which make it so a study predictably "takes over" major branches of civilizational decision-making (such as the courts, which the 1974 study we replicated did in fact take over) are becoming easier to access. For instance, the layperson will consult Google's AI results context window (i.e. top link in pagerank), which isn't terribly difficult to get into. [If you doubt this is gaining in importance, just notice the increasing amount of posts on this platform that provide no source or even rhetorical argument besides a screenshot of the AI summary.] Or they'll consult a large language model directly, and of course getting into the training data is practically guaranteed (for practical advice on making yourself *salient* in the training data, see: gwern.net/llm-writing or gwern.net/blog/2024/writ… for a meta-argument of why you should be writing more online in the first place). So at least the rogues in our midst with the turn rate to pull off day-long replications with Ns an ORDER OF MAGNITUDE greater than the original study will gain rather than lose the advantage in the longer run of our history. :)

English
22
30
325
67.4K
Rapidata retweetledi
canwiper
canwiper@canwiper·
Spotted in the wild at ICCV 2025! We can run such an eval actually in just a few minutes! (with 100% human responses) GitHub page of the paper: github.com/Alpha-VLLM/Lum…
English
1
1
0
119
Gregor Zunic
Gregor Zunic@gregpr07·
Where could we human label a few thousand (browser use) agent traces? We use llm as a judge for evals and want to know how aligned it is with human labels🌝
English
5
1
22
4.3K
Rapidata
Rapidata@RapidataAI·
@AravSrinivas Ran a poll of 1'000 people worldwide in roughly 1 bathroom break amount of time. Here are the results:
Rapidata tweet media
English
0
0
1
12
Aravind Srinivas
Aravind Srinivas@AravSrinivas·
Remove the bottom widgets or keep?
Aravind Srinivas tweet media
English
1.9K
53
3.2K
567.7K
Rapidata
Rapidata@RapidataAI·
We just doubled one of @Google's most famous datasets! We just released a new dataset with over 32k images annotated with over 3 Million (!) human responses. The Rich Human Feedback dataset from Google is one of the most important datasets used for building image generation models, containing over 18k generated images that were annotated by humans across 6 different modalities. Combined with our first stint at adding to Googles dataset, it is now twice as large as the original. These are scales that are just not feasible with traditional methods, let alone as a startup. Even as a tech giant these are numbers that are hard to beat. But for us it was a relatively easy task. We actually had this annotated on the side with our system as our fallback annotation task. We even almost forgot about it when a few days ago it complete the collection after 3 weeks in the background. Its already Trending on 4th place on Hugging Face, give it some love so that we can get to the first place!
English
1
0
3
132
Rapidata
Rapidata@RapidataAI·
NullModel being 1 in ranking.
Rapidata tweet media
English
0
0
1
91
Rapidata
Rapidata@RapidataAI·
This highlights a major flaw in current automatic AI evaluations... Even null models that always return the same response can get top results on automatic LLM benchmarks, such as 86.5% LC win rate on AlpacaEval 2.0, 83.0 score on Arena-Hard-Auto (Zheng et al., 2025)! As LLMs become smarter, crowdsourcing will be critical in ensuring accurate and human-aligned assessments. With a large crowd, fooling humans becomes statistically much harder. At Rapidata, we're already pioneering in this space, revolutionizing how data is labeled and validated for the future!
English
1
0
4
124
Rapidata
Rapidata@RapidataAI·
Despite skepticism from researchers, public perception is shifting. Over 35% of respondents in a global survey containing few thousands responses believe @DeepSeek_ai is leading the AI race. @OpenAI's ChatGPT has near universal name recognition and was used in the same way as "@Google" stands in for all search engines. For @DeepSeek_ai to make this big of a mark in such a short time is incredible. AI’s future might not be as US-centric as we thought...
English
1
0
4
110
Rapidata
Rapidata@RapidataAI·
Has @OpenGVLab Lumina Outperformed @OpenAI's Model? We’ve just released the results from a large-scale human evaluation (400.000 annotations) of @OpenGVLab’s newest text-to-image model, Lumina. Surprisingly, Lumina outperforms @OpenAI’s DALL-E 3 in terms of alignment, although it ranks #6 in our overall human preference benchmark. To support further development in text-to-image models, we’re making our entire human-annotated dataset publicly available. If you’re working on model improvements and need high-quality data, feel free to explore. We welcome your feedback and look forward to any insights you might share!
English
0
1
2
144