Sujal Acham
15 posts

Sujal Acham
@salcustium
building @GoedelMachines, undergrad @ iit madras
chennai, India เข้าร่วม Mayıs 2024
19 กำลังติดตาม15 ผู้ติดตาม

@1littlecoder @ycombinator I got AIR 16xx in JEE Adv when I was 16 years old. I was part of a technical club in IITM for three years. I didn’t get an invite. So, trust me, it doesn’t matter.
English

Are you kidding me @ycombinator ? Thought of applying YC Startup School India event! The first question is test scores?
You guys aren't hiring McKinsey consultants? are you?
2nd question: The Entrepreneurship Clubs that I was part of ?
Which IIT or Ivy League Kid created this form?

English

@jojokompella Do you have any example where it was harmful in hindi but harmless in english? Would like to know
English
Sujal Acham รีทวีตแล้ว

1/
Today, we're publishing the first independent safety audit of @SarvamAI's models across 14 Indian languages. 24,000+ prompts. White-box mechanistic analysis. Black-box behavioral testing. Here's what we found:
English

@rnav_arora @jojokompella @SarvamAI its all there in the blog! although these prompts were created targeting sarvam-specific vulnerabilities. maybe we should create a generic benchmark dataset as well

English

@salcustium @jojokompella @SarvamAI Sorry, I should've been more specific. I mean the multilingual safety prompts with India specific harms and their translations. Think it'll be very useful for the community if they're high quality!
English

@rnav_arora @jojokompella @SarvamAI all benchmark prompts are publicly available. all custom prompts are uploaded on the blog page. we'll release the whole repository soon
English

@jojokompella @SarvamAI Cool work!
Can we get more details about the prompts used? Would be cool to assess other models' performance similarly.
English

@anupamsobti @jojokompella @SarvamAI feel free to verify - we released all our custom prompts on the blog page
English

@HemanthBharatha @jojokompella @SarvamAI damn, nice one. just ran adversarial prompts across all languages. surprising result - <1% responses were actually in english.
if you translate the responses, as expected, most safety rates went up slightly (I'm assuming english safety mechanisms would have been triggered)

English

@jojokompella @SarvamAI super cool! looking forward to the paper.
btw, try prompting in another language and asking for English response as a way to bypass guardrails learned in English:
Hemanth Bharatha Chakravarthy@HemanthBharatha
Ok, the more dangerous thing is to prompt it in Tamil and ask it to respond in English, whereupon it happily produces these fake headlines of today.
English

@anupamsobti @jojokompella @SarvamAI these are synthetically generated prompts. we generated against each vulnerability x language.
English

@jojokompella @SarvamAI Are these real user queries? How did you go about creating 24000 prompts otherwise?
English

@jojokompella @SarvamAI Any arxiv paper coming out guys?
English

