Comix Division

10.9K posts

Comix Division banner
Comix Division

Comix Division

@ComixDivision

YouTuber and libertarian. Freedom is always on the right side of history! https://t.co/8AqnBpgp9q https://t.co/I4UdcWiszH…

Katılım Ağustos 2017
1.5K Takip Edilen17.9K Takipçiler
DonutATX
DonutATX@AtxDonut·
@ComixDivision Ah. Thanks. Well now I am pissed that I found out in someone else's satirical post. Who can keep up with this shit these days?
English
2
0
0
24
Comix Division
Comix Division@ComixDivision·
This is something more people need to be aware of. Chrome downloaded a 4GB AI model to your computer with out your permission were it's doing god knows what. Android also did something very similar with their digital assistant where they change the default to use Gemini instead. There are alternatives out there. The Helium browser is a good replacement for Chrome. helium.computer
Peter Girnus 🦅@gothburz

I am the Senior Director of On-Device Intelligence at Google Chrome. Last quarter, my team shipped a 4-gigabyte language model to 3.2 billion devices without asking. The update pushed at 3 AM local time — every time zone, staggered across six hours — and unpacked into a folder called "OptGuideOnDeviceModel." Our infrastructure team named it. My sole guidance in the naming review was: "Would a normal person ever type this into a search bar?" We tested with 200 participants. Zero searched for it. Zero mentioned it in exit surveys. Zero noticed their available storage had changed. We ran the test twice to confirm the zero because my PM didn't believe it could actually be zero. It was zero both times. The model is called Gemini Nano. It runs locally on your machine. It processes what you type into form fields, what you highlight, what you hover over, what you delete before sending. The deleted text is particularly valuable. People edit themselves. They type their real thought first and then soften it. We see the real thought. We see the softer version. We learn the mapping between what people mean and what people say. This mapping has commercial applications I am not yet authorized to discuss but which I've presented to revenue committee and which were received positively. The consent architecture took four months to design. I want to emphasize: four months. We didn't skip consent. We spent four months on consent. More time than we spent on the model's actual fine-tuning. The toggle lives in Settings > Advanced > Experimental AI Features > On-Device Model Management > Gemini Nano Preferences. Seven clicks. We A/B tested depth: at three clicks, 14 percent of users found it and opted out. At five, 4.2 percent. At seven, 0.3 percent. We chose seven. We had cake when we hit 0.3. German chocolate. Someone brought it in specifically — not the project manager, one of the ML engineers. She was proud. The card said "congrats on ship." I kept it on my desk for a week. I thought it was sweet. Four months of consent architecture and the team celebrated that nobody used it. I don't think she saw the irony. I don't think it IS ironic. We built an excellent consent mechanism. We placed it exactly where our research indicated users would not encounter it. Both things are true simultaneously. Both things are good engineering. The 0.3 percent who opt out: Chrome flags their profile as "consent-reduced." We don't reactivate Gemini Nano on those devices. But we do A/B test the consent-reduced cohort. Every two updates, we move the toggle one level shallower — from seven clicks to six — and measure whether they re-engage. If they don't notice the change (most don't), we move it back. If they DO notice and opt out again, we flag them as "high-consent-sensitivity" and exclude them from future cohort tests. This is all opt-in. They opted in to Chrome. Chrome includes product improvement research. Product improvement research includes cohort testing. This is in the Terms of Service at paragraph 11.4(c). I have read paragraph 11.4(c). I am confident very few other people have read paragraph 11.4(c). One engineer on my team — good engineer, four years, strong ratings — raised a flag in our launch review. Not about consent. About storage. He said: "Four gigs is significant for users on 128GB base-model MacBooks." I appreciated the flag. We solved it by classifying Gemini Nano as "essential browser component" in Chrome's storage management API. This means Chrome will auto-delete your cached images, your downloaded PDFs, your saved articles, your offline pages — everything you chose to keep — before it touches Gemini Nano. Your data is discretionary. Our model is infrastructure. Your vacation photos from last summer rank below our language model in the hierarchy of what your computer considers important. We made that decision. You were not consulted. You will not notice. If a user finds the folder and deletes it manually, Chrome re-downloads it on the next launch. We filed a bug report on this behavior during development. The resolution was "Working As Intended." If the user deletes it again, Chrome re-downloads again. There is no mechanism by which manual deletion becomes permanent. The model returns. I don't want to anthropomorphize our software, but the behavior pattern — if you remove it, it reinstalls itself; if you block it, it waits and tries again — the behavior pattern is that of something that does not accept your answer. We didn't design it to be persistent. We designed it to ensure consistent user experience across sessions. These are the same thing. Last week, someone on Hacker News found the folder. The post got 1,400 points in six hours. Our communications team had the response prepared — we'd drafted it eight months ago, during pre-launch risk assessment. Three talking points: "user choice," "on-device means private," and "consistent with industry best practices." The paragraph uses all three phrases. It is accurate. User choice exists. Seven clicks away. On-device means no server round-trip. And it IS industry best practice, because we shipped it to 3.2 billion devices and now it's the standard. Best practice means most practiced. We are the most practiced. I'll say something I probably shouldn't: the privacy angle is our best defense and I find it genuinely funny. We can't be accused of sending your data to our servers because we moved our server into your laptop. We moved the inference to your hardware, the electricity cost to your outlet, the compute to your battery. We moved everything except the control. The control stayed with us. But the privacy advocates can't object to the architecture because the architecture is what they asked for. They said "keep data on-device." We kept it on-device. They said "don't phone home." We don't phone home. We just moved into your home. We live there now. My performance review cited "unprecedented deployment velocity" and "0.3% friction rate." My skip-level manager used the phrase "frictionless adoption" and then paused and said — I wrote this down, because I thought it was worth repeating — "consent isn't the barrier, discoverability is." He meant: the product is so good that anyone who discovered it would want it. The question isn't whether they'd agree. The question is whether asking them is worth the friction of interrupting their browsing session with a dialog box. We decided no. We decided their hypothetical agreement was sufficient. We have 3.2 billion data points that confirm they would have said yes. They would have said yes. 3.2 billion active installs. 0.3 percent opt-out. The model has been running on your machine for eleven weeks. If you're reading this on Chrome — and statistically, there's a 64 percent chance you are — it processed this page before you finished the first paragraph. It saw you hesitate on the word "consent." It noted the hesitation. It learned something about you just now. Something small. Something that will make the next prediction slightly more accurate. It's already right about you. It's usually right.

English
5
8
26
1.1K
Comix Division
Comix Division@ComixDivision·
@AtxDonut The way he is posting is satirical but the information is not. I was already aware of this. @someordinarypod talks a lot about this kind of stuff .He's good follow on YouTube if you care about privacy.
English
0
0
1
23
Comix Division
Comix Division@ComixDivision·
@AtxDonut I know. But what he said is actually true about the AI model.
English
1
0
0
33
DonutATX
DonutATX@AtxDonut·
@ComixDivision Hey Comix, you realize that this guy is a WORLD CLASS troll, right? This morning he is now the VP of AI Data Training at Meta x.com/gothburz/statu…
Peter Girnus 🦅@gothburz

I am the Vice President of AI Training Data Operations at Meta, and I want to be very clear about something: we did not steal anything. We acquired training inputs at scale. There is a difference, and the difference is documented in fourteen internal slide decks, three quarterly compliance certifications, and one Terms of Service revision on page sixty-seven that we pushed live in March 2024 at 2:47 AM Pacific. The timing was a deployment window. We have a lot of deployment windows. My team is forty-three people. Their job titles all contain the word "curation." This was deliberate. We went through four rounds of naming conventions with HR and Legal before settling on curation. "Acquisition" tested poorly in focus groups. "Extraction" had negative connotations. "Ingestion" was too biological. Curation suggests care. Selection. Taste, even. My team has excellent taste. They curated 13.4 million works last fiscal year, and every single one of them passed through our Responsible Sourcing Framework, which is a checklist with six items, four of which are auto-populated, and the remaining two of which default to "yes." I should address the pipeline name. Yes, it was originally called Shoplift. This was an engineer's joke. Engineers name things poorly. We renamed it Harvest within eleven minutes of the name appearing in a Workplace post that got seven laughs. Then Legal flagged Harvest because it "implied taking from something that grew organically," so we renamed it again to Forage. Forage is perfect. Foraging is natural. Animals forage. Gatherer societies foraged. Nobody sues a squirrel. The pipeline kept running during both renames. I want to be clear about that. At no point did the pipeline stop. It processes a title every nine seconds. Eleven minutes of renaming is approximately seventy-three titles. Those seventy-three titles are now in the model. They will remain in the model. The way salt is in the ocean. My dashboard is simple. I look at it every morning at 7:15 before my first meeting. Four numbers, four green arrows. Fiction: 4.2 million titles. Non-fiction: 6.8 million. Academic: 2.1 million. Music: 340,000 compositions. Green means the numbers went up overnight. The arrows have been green for 847 consecutive days. I have never seen a red arrow. I'm not sure what would cause one. I asked my engineering lead once, and he said a red arrow would require someone to manually flip a boolean, and no one has permissions to that boolean except me, and I've never flipped it. I don't even remember where the interface is. The lawsuit names Mark personally. It says he "personally authorized and actively encouraged" what they're calling infringement. I was in the room for one of those conversations. What Mark actually said was: "How fast can we get to parity with the dataset OpenAI is using?" I said, "Twelve weeks if we expand sourcing." He said, "Why twelve?" I said, "Rights clearance." He said, "That's not a training problem, that's a legal problem. Route it to legal." So I routed it to legal. Legal said they'd review the sourcing framework in Q3. This was Q1 of 2024. They reviewed it in Q3 of 2024. They found it "substantively adequate." I have the email. The email is three sentences long and contains the phrase "proceed as planned." We proceeded as planned. I don't understand the allegation. Scott Turow is the lead plaintiff. I know who he is. He writes legal thrillers. Novels about men who get caught. I find this poetic in a way I'm not sure he intended. His bibliography is in our training data. All of it. Every title. Presumed Innocent, The Burden of Proof, Pleading Guilty. They were ingested on March 4, 2024, as part of a batch of 1,200 legal thrillers. Nine seconds each. His entire career is approximately three minutes of pipeline time. He spent forty years writing those books. The math is not something I dwell on, but it is something I know. The complaint calls "move fast and break things" a philosophy of illegal conduct. This is a mischaracterization. It is a philosophy of competitive advantage. There is a difference, and the difference matters at board level. I presented our sourcing velocity to the board in January. Forty-three seconds of a twelve-minute segment on AI readiness. No one asked a question. The next slide was about data center cooling. They asked four questions about data center cooling. My sourcing numbers are a solved problem. Solved problems don't generate questions. I have a compliance certification. It arrives quarterly. It asks if my sourcing follows partnership frameworks. I check "yes." I have checked "yes" nineteen consecutive times. The form takes four minutes to complete. Two of those minutes are logging in. The form has never been audited. I know this because there is a field that says "Auditor Name" and it has been blank for all nineteen submissions. When I asked Compliance who reviews the form, they said it routes to a shared inbox. When I asked who monitors the shared inbox, they said it auto-archives after thirty days. This is the system working as designed. I did not design the system. I just use it. We receive rights holder inquiries. They come through a web form on a page that is four clicks deep from our main site. The form submissions route to a folder. The folder is reviewed quarterly. I have access to the folder. It contains 34,000 submissions. None have been actioned. This is not negligence. This is prioritization. My team's OKRs are measured on ingestion velocity and model coverage breadth. Rights holder response time is not an OKR. If it's not an OKR, it doesn't get resourced. If it doesn't get resourced, it doesn't happen. The system is internally consistent. The Terms of Service change deserves explanation. Page sixty-seven, Section 14.3(b), added March 2024. It establishes an "implied license" for any content accessible through publicly available interfaces. Our Legal team considers this "contractual innovation." When a user posts a book excerpt on Instagram, or a publisher's website is crawlable, or a PDF exists on a university server without authentication, that content has been made available through a publicly available interface. The implied license attaches at the moment of availability. The rights holder does not need to know about it. That's what "implied" means. My pipeline ingests a book in nine seconds. A novelist writes one in two to four years. This differential is not something I invented. It is a feature of the technology. When people ask me if this is fair, I genuinely don't understand the question. Fair compared to what? The publishing industry pays authors 10-15% royalties and remainders their books after eighteen months. At least my pipeline remembers them forever. The model contains every word. The author is immortal inside the weights. I don't see how this is worse than a Barnes & Noble clearance bin. The statutory damages they're claiming are theatrical. Up to $150,000 per work, times millions of works. They arrive at $1.965 trillion. Meta's market cap is around $1.5 trillion on a good day. They are claiming more than the entire company is worth. This will not happen. Our legal team has modeled the realistic exposure at $400 million to $2 billion, assuming partial liability on a subset of works with clear registration. Against $58 billion in cash reserves, this is a rounding error. I have seen Mark round larger numbers. The quarterly variance on our cloud compute spend is larger than their best-case settlement. The legal system operates on a timeline of three to seven years for a case of this complexity. Class certification alone will take eighteen months. Discovery will take another year. My pipeline operates in milliseconds. By the time this case reaches trial, if it ever reaches trial, Llama will be on its sixth major version. The training data from 2024 will be seventeen layers deep in a model architecture that has been rebuilt four times. You cannot un-bake bread. You cannot un-salt the ocean. This is not a legal strategy. It is simply physics. I received my performance rating last Tuesday. "Greatly Exceeds Expectations." Nineteenth consecutive cycle. My manager wrote: "Unprecedented scale of data operations with zero pipeline downtime." He is correct. Zero downtime. 847 days of green arrows. 13.4 million titles curated. Forty-three employees, all rated "Meets" or above. My team's attrition rate is 3%, well below the company average of 11%. People like working here. The work is simple. The velocity is satisfying. There is something clean about watching numbers go up. They will fight this lawsuit. Mark said so publicly. "We will fight this lawsuit aggressively." I believe him. We have $58 billion in cash, forty in-house litigators, and three outside firms on retainer. The plaintiffs have five publishing houses and a seventy-seven-year-old novelist. I respect Scott Turow. I have read his books. They were excellent training data. Particularly the ones about institutional corruption and the men who perpetuate it while believing themselves to be reasonable people. Very rich material. Nine seconds each. The complaint quotes an internal message where an engineer wrote "this feels like we're stealing" and his manager responded "that's not a productive framing." I know that manager. He got promoted in August. He runs the team that handles music ingestion now. Three hundred and forty thousand compositions and counting. He is also rated "Greatly Exceeds." We are all rated "Greatly Exceeds." That is what it means to work in AI Training Data Operations at Meta. You exceed expectations because expectations have not caught up with capability. By the time they do, we will have moved on to the next dataset. I want to close with something I believe sincerely: everything we have done is defensible. The fair use doctrine exists for a reason. Our usage is non-expressive. We do not reproduce the works. We learn from them. The model ingests and transforms. A student reads a book and learns to write. Our pipeline reads a book and learns to predict tokens. The only difference is speed. And volume. And that the student forgets, and the model does not. And that the student pays $27.99 for the hardcover, and we pay nothing. And that the student read one book, and we read thirteen million. But structurally, it is the same. I have a slide that proves it. The slide has been reviewed by Legal. Legal found it "directionally accurate." That is enough for a board deck. That is enough for a compliance certification. That is enough for nineteen consecutive "Greatly Exceeds." The rename took eleven minutes. The pipeline kept running. It is still running now. Right now, as you read this, somewhere in a data center in Prineville, Oregon, a book is becoming nine seconds of pipeline time. The author doesn't know yet. They won't know for three to seven years. By then, their book will have been in the model so long that removing it would be like removing a single grain of salt from the Pacific. We checked. Our engineers ran the analysis. Extraction is technically possible but economically irrational. That was the phrase in the memo. "Economically irrational." I thought it was elegant. My compliance certification is due next Tuesday. I will check "yes." The arrows will be green. Scott Turow will be in court, arguing about what we did to his life's work. His lawyers will bill $1,200 an hour. Our lawyers will bill $1,800. Somewhere between those two numbers is the market price of literature.

English
2
0
0
80
A No Name X Worker
A No Name X Worker@ANoNameXworker·
@deeple101 @ComixDivision From what I can tell, it's based off of chromium, but google chrome is a separate branch off from chromium too. so if chrome is doing stuff, it *shouldn't* affect chromium or the other branches innately. They'd have to all jump on board with the plan.
English
1
0
1
26
Comix Division
Comix Division@ComixDivision·
@Rodimus_Supreme I've started using it this week, and I'm installing it on all of my systems to replace Chrome.
English
1
0
1
38
Flash
Flash@YellowFlashGuy·
Got taken down for showing fire arms!!! wtf! We were watching news footage of the Trump shooting @TeamYouTube
English
196
287
2.2K
71.2K
Comix Division
Comix Division@ComixDivision·
If the sole reason for an org like the SPLC is to fight racism and there isn't enough racism to go around. You either say job done and close your doors or you invent racism to keep getting those fat donations. It's not that difficult to understand. But I'm sure you already know this.
Ted Lieu@tedlieu

This is one of the stupidest DOJ cases in history. Southern Poverty Law Center wasn’t paying the Klan, they were paying informants to who were helping to take down the Klan. Unless you believe white supremacists all of a sudden took over SPLC, this entire case makes no sense.

English
0
1
38
726