Ben Cerchio

27 posts

Ben Cerchio

Ben Cerchio

@ben_cerchio

Founder @Secludy (privacy-guaranteed synthetic data for training AI models and vendor evals)

Katılım Eylül 2024
123 Takip Edilen34 Takipçiler
Ben Cerchio
Ben Cerchio@ben_cerchio·
We're proud to be launching @Secludy today with $4M in seed funding. @ImpressionVC led the round. @LAUNCH and The Syndicate, a venture firm and angel investing group led by @Jason Calacanis, also joined, along with Wedbush Ventures, @PrecursorVC, @HustleFundVC, @scriptcapital, Mana Ventures, Chispa VC, and an amazing group of angel investors. Banks and fintechs are sitting on incredibly valuable data but can't use it to train AI models. Privacy laws, contracts, and cross-border rules keep it locked down. So my co-founder @Dr_Mingze_He and I got to work. Secludy unlocks your most sensitive data while keeping the utility intact. You can train custom models and do vendor evaluations without ever putting your sensitive data at risk or giving up AI model performance. Moving on to other regulated industries next... Also, special thank you to @sososazesh who wrote our first check and to all of our other existing investors. Read more in the comments.
Ben Cerchio tweet media
English
2
5
8
1.7K
Ben Cerchio retweetledi
Jacek Golebiowski
Jacek Golebiowski@j_golebiowski·
Which SLM to fine-tune? Here's the cheat sheet from our 15-model benchmark: Max accuracy → Qwen3-8B Half the params, same quality → Qwen3-4B Under 2B params → Qwen3-0.6B or Llama-3.2-1B Max ROI from fine-tuning → LFM2-350M Edge / IoT → LFM2-350M
Jacek Golebiowski tweet media
English
2
5
29
1.3K
Ben Cerchio
Ben Cerchio@ben_cerchio·
Amazon spent millions on their SB ad for a feel-good story Minutes after airing, “cancel Ring camera” hit the top of its Google Trends chart worldwide AI products live and die on trust. Privacy should be the default
Ben Cerchio tweet media
English
0
0
0
84
Ben Cerchio retweetledi
Ben Cerchio retweetledi
Hamel Husain
Hamel Husain@HamelHusain·
To what extent can you automate or delegate evals? Is there a way to make it "not your problem?" 😅 Part 1 of 1: You should absolutely automate parts of it as long as a human is in the loop. Many people are a bit too aggressive here, so you have to be careful, some guidelines below:
Hamel Husain tweet media
English
2
6
59
11.5K
Ben Cerchio retweetledi
Ming He
Ming He@Dr_Mingze_He·
I spend more and more time (sometimes > 2hrs) to discuss with coding agent[s] on the revisions of a super detailed project implementation plan before any agenttic work. The results turn great. 1. I got better and deeper understanding of potential pitfalls I failed to see. [1/n]
English
1
1
2
45
Ben Cerchio
Ben Cerchio@ben_cerchio·
The Coinbase data breach announced today is a warning shot. Imagine this: what if the target wasn’t just raw data, but an AI model trained on it? Customer support teams are starting to use fine-tuned AI models trained on sensitive customer data to automate responses. That means bribing an insider to input specific prompts could trigger leakage of private data through model outputs (even if the agent doesn't have access to the raw data). The answer is not to stop using AI. It is to use privacy-preserving techniques like synthetic data and differential privacy to eliminate PII before training ever begins. Otherwise, you are building tools that are one prompt away from leaking your most sensitive user data.
Coinbase 🛡️@coinbase

Cyber criminals bribed and recruited rogue overseas support agents to pull personal data on <1% of Coinbase MTUs. No passwords, private keys, or funds were exposed. Prime accounts are untouched. We will reimburse impacted customers. More here: coinbase.com/blog/protectin…

English
1
1
3
187
Ben Cerchio
Ben Cerchio@ben_cerchio·
Should open-weights be the new norm for models trained on user data? Or should we push harder for full transparency even with the risks and all? of course there are competitive factors and other legal risks at play too such as copyright
English
0
0
0
24
Ben Cerchio
Ben Cerchio@ben_cerchio·
Even with privacy-preserving techniques in place, training at this scale means a single gap can lead to real-world consequences if PII or sensitive info gets leaked
English
2
0
1
35
Ben Cerchio
Ben Cerchio@ben_cerchio·
Is Llama 4 trained on personal user data? Short answer is yes. And that’s exactly why my POV is that “open-weights” is the right — even if unpopular — move. Should a model trained on real user content be fully open-sourced (training data and all)?
Ben Cerchio tweet media
English
1
0
1
69
Ben Cerchio
Ben Cerchio@ben_cerchio·
The EDPB released Opinion 28/2024 on AI models under GDPR, and it's mind-blowing... Most of the industry missed it since it came out over the holidays. Privacy teams are now scrambling. This has massive implications for any companies launching AI products in the EU. I spent last weekend analyzing every page, and here's what jumped out at me: 👉 Regulators are rejecting the idea that AI models somehow "anonymize" personal data through training 👉 They're expecting documentation evidencing regular PII leakage testing 👉 There's now a clear preference for synthetic data and privacy-enhancing tech 👉 If you're deploying someone else's model, you're still on the hook for compliance Stay tuned for my full breakdown that I'll be publishing as an article tomorrow! We've also developed a step-by-step compliance framework report that helps teams meet these requirements without slowing AI innovation. We usually only share our comprehensive guidance with partners, but this is important, so send me a message if you'd like to be added to the distribution list!
English
1
0
0
38
Ben Cerchio
Ben Cerchio@ben_cerchio·
Fine-tuning models for function calling is critical for agents that need to be reliable and accurate. Prompting can only get you so far
English
0
0
0
42
Ben Cerchio retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
LLMs leak up to 27.5% of sensitive training data PII (Personally Identifiable Information like emails, SSNs, VINs, Bitcoin wallets). @Secludy makes it easy to generate privacy-guaranteed synthetic data that is a near replica of the original unstructured dataset but better. How? They utilize privacy-protected LLMs by adding carefully controlled noise to the model weights before generating synthetic data for AI model fine-tuning/evaluations. They just released a technical report that demonstrates their approach. 🧵1/n LLMs are known to memorize and expose sensitive information, even when trained on masked unstructured datasets they can still retain and regurgitate personal data which is a major privacy risk.
Rohan Paul tweet media
English
4
6
37
3.9K
Ben Cerchio retweetledi
Secludy AI
Secludy AI@Secludy·
We just launched our PII Leakage Testing Tool on AWS Marketplace! We put data masking to the test. A thread 🧵👇
English
2
1
1
80
Ben Cerchio
Ben Cerchio@ben_cerchio·
reminder that redacting model training data doesn't make it anonymous
English
0
0
0
81
Ben Cerchio
Ben Cerchio@ben_cerchio·
Training AI models on sensitive medical data poses big risks, as this TechCrunch article highlights. The same goes for unstructured text. Privacy-preserving techniques like training on differentially private synthetic data can ensure utility while protecting privacy 🛡️ techcrunch.com/2024/11/19/psa…
English
0
1
2
181