Ben Cerchio

27 posts

Ben Cerchio

@ben_cerchio

Founder @Secludy (privacy-guaranteed synthetic data for training AI models and vendor evals)

Katılım Eylül 2024

123 Takip Edilen34 Takipçiler

Ben Cerchio@ben_cerchio·14 May

globenewswire.com/news-release/2…

ZXX

Ben Cerchio@ben_cerchio·14 May

We're proud to be launching @Secludy today with $4M in seed funding. @ImpressionVC led the round. @LAUNCH and The Syndicate, a venture firm and angel investing group led by @Jason Calacanis, also joined, along with Wedbush Ventures, @PrecursorVC, @HustleFundVC, @scriptcapital, Mana Ventures, Chispa VC, and an amazing group of angel investors. Banks and fintechs are sitting on incredibly valuable data but can't use it to train AI models. Privacy laws, contracts, and cross-border rules keep it locked down. So my co-founder @Dr_Mingze_He and I got to work. Secludy unlocks your most sensitive data while keeping the utility intact. You can train custom models and do vendor evaluations without ever putting your sensitive data at risk or giving up AI model performance. Moving on to other regulated industries next... Also, special thank you to @sososazesh who wrote our first check and to all of our other existing investors. Read more in the comments.

English

1.7K

Ben Cerchio retweetledi

Jacek Golebiowski@j_golebiowski·18 Mar

Which SLM to fine-tune? Here's the cheat sheet from our 15-model benchmark: Max accuracy → Qwen3-8B Half the params, same quality → Qwen3-4B Under 2B params → Qwen3-0.6B or Llama-3.2-1B Max ROI from fine-tuning → LFM2-350M Edge / IoT → LFM2-350M

English

1.3K

Ben Cerchio@ben_cerchio·2 Mar

Using differential privacy to create realistic synthetic data is the way to safely unlock your proprietary data for training AI models

Rohan Paul@rohanpaul_ai

Larry Ellison on the AI moat: AI is commoditizing because models use the same public internet data. The true competitive edge isn't the model itself anymore, but access to exclusive, proprietary datasets. That is the only moat left.

English

Ben Cerchio@ben_cerchio·12 Şub

Amazon spent millions on their SB ad for a feel-good story Minutes after airing, “cancel Ring camera” hit the top of its Google Trends chart worldwide AI products live and die on trust. Privacy should be the default

English

Ben Cerchio retweetledi

Mira Murati@miramurati·29 Eyl

Today on Connectionism: establishing the conditions under which LoRA matches full fine-tuning performance, with new experimental results and a grounding in information theory

Thinking Machines@thinkymachines

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

127

1.7K

203.8K

Ben Cerchio retweetledi

Hamel Husain@HamelHusain·26 Tem

To what extent can you automate or delegate evals? Is there a way to make it "not your problem?" 😅 Part 1 of 1: You should absolutely automate parts of it as long as a human is in the loop. Many people are a bit too aggressive here, so you have to be careful, some guidelines below:

English

11.5K

Ben Cerchio retweetledi

Ming He@Dr_Mingze_He·26 Tem

I spend more and more time (sometimes > 2hrs) to discuss with coding agent[s] on the revisions of a super detailed project implementation plan before any agenttic work. The results turn great. 1. I got better and deeper understanding of potential pitfalls I failed to see. [1/n]

English

Ben Cerchio@ben_cerchio·15 May

The Coinbase data breach announced today is a warning shot. Imagine this: what if the target wasn’t just raw data, but an AI model trained on it? Customer support teams are starting to use fine-tuned AI models trained on sensitive customer data to automate responses. That means bribing an insider to input specific prompts could trigger leakage of private data through model outputs (even if the agent doesn't have access to the raw data). The answer is not to stop using AI. It is to use privacy-preserving techniques like synthetic data and differential privacy to eliminate PII before training ever begins. Otherwise, you are building tools that are one prompt away from leaking your most sensitive user data.

Coinbase 🛡️@coinbase

Cyber criminals bribed and recruited rogue overseas support agents to pull personal data on <1% of Coinbase MTUs. No passwords, private keys, or funds were exposed. Prime accounts are untouched. We will reimburse impacted customers. More here: coinbase.com/blog/protectin…

English

187

Ben Cerchio@ben_cerchio·6 Nis

Should open-weights be the new norm for models trained on user data? Or should we push harder for full transparency even with the risks and all? of course there are competitive factors and other legal risks at play too such as copyright

English

Ben Cerchio@ben_cerchio·6 Nis

Even with privacy-preserving techniques in place, training at this scale means a single gap can lead to real-world consequences if PII or sensitive info gets leaked

English

Ben Cerchio@ben_cerchio·6 Nis

Is Llama 4 trained on personal user data? Short answer is yes. And that’s exactly why my POV is that “open-weights” is the right — even if unpopular — move. Should a model trained on real user content be fully open-sourced (training data and all)?

English

Ben Cerchio@ben_cerchio·28 Şub

The EDPB released Opinion 28/2024 on AI models under GDPR, and it's mind-blowing... Most of the industry missed it since it came out over the holidays. Privacy teams are now scrambling. This has massive implications for any companies launching AI products in the EU. I spent last weekend analyzing every page, and here's what jumped out at me: 👉 Regulators are rejecting the idea that AI models somehow "anonymize" personal data through training 👉 They're expecting documentation evidencing regular PII leakage testing 👉 There's now a clear preference for synthetic data and privacy-enhancing tech 👉 If you're deploying someone else's model, you're still on the hook for compliance Stay tuned for my full breakdown that I'll be publishing as an article tomorrow! We've also developed a step-by-step compliance framework report that helps teams meet these requirements without slowing AI innovation. We usually only share our comprehensive guidance with partners, but this is important, so send me a message if you'd like to be added to the distribution list!

English

Ben Cerchio@ben_cerchio·22 Şub

Fine-tuning models for function calling is critical for agents that need to be reliable and accurate. Prompting can only get you so far

English

Ben Cerchio@ben_cerchio·19 Şub

@rohanpaul_ai @Secludy Enjoyed collaborating on this post!

English

Ben Cerchio retweetledi

Rohan Paul@rohanpaul_ai·19 Şub

LLMs leak up to 27.5% of sensitive training data PII (Personally Identifiable Information like emails, SSNs, VINs, Bitcoin wallets). @Secludy makes it easy to generate privacy-guaranteed synthetic data that is a near replica of the original unstructured dataset but better. How? They utilize privacy-protected LLMs by adding carefully controlled noise to the model weights before generating synthetic data for AI model fine-tuning/evaluations. They just released a technical report that demonstrates their approach. 🧵1/n LLMs are known to memorize and expose sensitive information, even when trained on masked unstructured datasets they can still retain and regurgitate personal data which is a major privacy risk.

English

3.9K

Ben Cerchio retweetledi

Secludy AI@Secludy·19 Şub

We just launched our PII Leakage Testing Tool on AWS Marketplace! We put data masking to the test. A thread 🧵👇

English

Ben Cerchio@ben_cerchio·3 Oca

reminder that redacting model training data doesn't make it anonymous

English

Ben Cerchio@ben_cerchio·22 Kas

Training AI models on sensitive medical data poses big risks, as this TechCrunch article highlights. The same goes for unstructured text. Privacy-preserving techniques like training on differentially private synthetic data can ensure utility while protecting privacy 🛡️ techcrunch.com/2024/11/19/psa…

English

181

Keşfet

@Secludy @ImpressionVC @LAUNCH @Jason @PrecursorVC @HustleFundVC @scriptcapital @Dr_Mingze_He