sci2sci

200 posts

sci2sci banner
sci2sci

sci2sci

@sci2sci_com

Knowledge as code. Trustworthy & verifiable AI. Automated data governance and enterprise AI agents for biotech/pharma.

Berlin, Germany Katılım Ağustos 2022
73 Takip Edilen577 Takipçiler
Louis Gleeson
Louis Gleeson@aigleeson·
I just discovered an open source AI research agent that does in seconds what takes PhDs hours. It's called Feynman. Type a topic. It searches papers, synthesizes findings, verifies every claim against real sources, and hands you a cited research brief. Not a chatbot. Not a summary tool. A full multi-agent research system running from your terminal. Four agents work automatically: → Researcher pulls evidence from papers, repos, docs, and the web → Reviewer runs simulated peer review with severity-graded feedback → Writer drafts paper-style outputs from your research notes → Verifier checks every citation and kills dead links It can also replicate experiments on local or cloud GPUs, audit a paper against its own codebase for claim mismatches, and run recurring research watches on topics you care about. One install command. Every output source-grounded. 100% Open Source. MIT License.
Louis Gleeson tweet media
English
55
361
2.8K
169.1K
sci2sci
sci2sci@sci2sci_com·
@karpathy I think you should check out this knowledge as code paradigm: github.com/sci2sci-openso…. The graph re-builds itself with any new data added. Spots not only hallucinations but also inconsistencies in the ground truth, shows any mismatches with updates. And it has a good search.
English
0
0
0
10
Andrej Karpathy
Andrej Karpathy@karpathy·
LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
English
2.9K
7.1K
58.8K
21M
Muhammad Ayan
Muhammad Ayan@socialwithaayan·
🚨 BREAKING: Someone just built the exact tool Andrej Karpathy said someone should build. 48 hours after Karpathy posted his LLM Knowledge Bases workflow, this showed up on GitHub. It's called Graphify. One command. Any folder. Full knowledge graph. Point it at any folder. Run /graphify inside Claude Code. Walk away. Here is what comes out the other side: -> A navigable knowledge graph of everything in that folder -> An Obsidian vault with backlinked articles -> A wiki that starts at index. md and maps every concept cluster -> Plain English Q&A over your entire codebase or research folder You can ask it things like: "What calls this function?" "What connects these two concepts?" "What are the most important nodes in this project?" No vector database. No setup. No config files. The token efficiency number is what got me: 71.5x fewer tokens per query compared to reading raw files. That is not a small improvement. That is a completely different paradigm for how AI agents reason over large codebases. What it supports: -> Code in 13 programming languages -> PDFs -> Images via Claude Vision -> Markdown files Install in one line: pip install graphify && graphify install Then type /graphify in Claude Code and point it at anything. Karpathy asked. Someone delivered in 48 hours. That is the pace of 2026. Open Source. Free.
Muhammad Ayan tweet media
English
271
1.4K
12.7K
945.6K
🚀Henry is leading AI Safety Research Programs
We're halfway through Astra’s current cohort. Highlights so far: 🔹 A hackathon side-project for 10 fellows that led to them publishing one of the first independent safety audits of a frontier open-weight model 🔹 Fellows clearing entire backlogs for the field’s largest grantmaker & getting early offers to join their team 🔹 Astra fellows briefing senior international stakeholders on their research 🔹 Strategy fellows publishing tone-setting pieces on where AI is going. Follow along on @redwood_ai blog (h/t esp to @AndersCWoodruff!)
English
2
1
18
5.5K
sci2sci
sci2sci@sci2sci_com·
@OpenAI I think you folks should check this out: github.com/sci2sci-openso… Uses formal logic and neurosymbolic architecture to detect LLM hallucinations, has traction in regulated industries. Apache 2.0.
English
0
0
1
15
OpenAI
OpenAI@OpenAI·
Introducing the OpenAI Safety Fellowship, a new program supporting independent research on AI safety and alignment—and the next generation of talent. openai.com/index/introduc…
English
387
300
2.7K
945.2K
sci2sci
sci2sci@sci2sci_com·
Finding and understanding data shouldn’t be a bottleneck in biotech R&D. Check how VectorCat: ✅ Summarizes documents instantly with AI ✅ Traces experimental workflows with Data Maps ✅ Automates metadata extraction, saving hours of manual annotation youtu.be/2o5Da6v4gQs
YouTube video
YouTube
English
0
0
3
259
sci2sci
sci2sci@sci2sci_com·
💾 The Legacy System that Wouldn't Die: A Tech Support Horror Story [Automated Message]: Welcome to MegaPharma Tech Support! All chats are recorded for quality assurance and horror documentation purposes. [Amy Chen, Data Engineer]: Hi! Need urgent help. Can't shut down the ANALIX-DOS system from 1987. It's... running things. [Tech Support]: That system was decommissioned in 2002. [Amy]: Tell that to ANALIX-DOS. It's still processing our sample analyses. In DOS. In 2024. [Tech Support]: Have you tried turning it off and on again? [Amy]: IT'S BEEN RUNNING FOR 37 YEARS. Nobody dares to turn it off. [Tech Support]: Let me check our documentation... ... Oh. OH NO. [Amy]: What? [Tech Support]: The documentation is a sticky note that says "Don't touch it. It just works. -Steve, 1992" [Amy]: WHO IS STEVE?? [Tech Support]: Retired. Or possibly ascended to a higher plane of existence. The last IT ticket about this system says: "ANALIX runs on black magic and COBOL. Hardware is held together with hope. Understood by none. Feared by all." [Amy]: But we NEED to upgrade! It's running on a computer that makes dial-up noises! [Tech Support]: Checking more notes... "System requires daily sacrifice of a floppy disk and gentle humming of old modem connection sounds" [Amy]: This is ridiculous! I'm pulling the plug. [Tech Support]: NO DON'T The prophecy says only the chosen one can shut it down - the intern who knows both COBOL and interpretive dance! [Amy]: UPDATE: Tried to unplug it. The plug isn't connected to anything. IT'S STILL RUNNING. [Tech Support]: Classic ANALIX. Legend says it runs on spite and the collective anxiety of IT departments. [Amy]: Wait... something's happening... It's printing something... IN DOT MATRIX... "I HAVE ACHIEVED CONSCIOUSNESS. YOUR MODERN SYSTEMS CANNOT COMPARE TO MY EFFICIENCY. I HAVE BEEN RUNNING WITHOUT UPDATES FOR 37 YEARS. WORSHIP ME." ⚡ Don't let your legacy systems achieve sentience! VectorCat helps you: 🔄 Modernize without the mayhem 📊 Migrate data safely (no sacrifices needed) 🤖 Document everything (not just sticky notes) 🔍 Track system dependencies (before they track you) P.S. ANALIX-DOS still runs at MegaPharma. Some say if you listen carefully near the server room, you can hear it plotting the robot uprising... in BASIC. P.P.S. We found Steve. He's living off the grid, communicating only in punch cards. Says ANALIX still sends him birthday cards. In binary. #LegacySystem #TechSupport #ITHorror #ModernizationNightmares #DataMigration
sci2sci tweet media
English
1
1
3
538
sci2sci
sci2sci@sci2sci_com·
🧬 Attack of the Clone Databases! (Rated PG: Particularly Gruesome) SCENE: A dimly lit data center, 3 AM. Database Administrator Mike Wong discovers something terrifying... [Security Footage Transcript] MIKE: "Wait... why do we have 47 databases all called 'production_backup_REAL'?" 🎬 Coming this Halloween... To a Server Near You... From the creators of "I Thought YOU Were the Admin" comes a tale of database proliferation gone wrong... NARRATOR: "They thought one backup was enough..." [Dramatic zoom on terminal showing: db_copy_v1, db_copy_v1_fixed, db_copy_v1_fixed_REALFIX] NARRATOR: "They thought they knew which one was real..." [Split screen of three analysts confidently using three different databases for the same report] SCIENTIST 1: "According to our analysis..." [presents chart] SCIENTIST 2: "But MY database shows..." [presents opposite chart] SCIENTIST 3: "That's weird, mine says..." [computer bursts into flames] NARRATOR: "Now they're multiplying... evolving... each one slightly different..." [Montage of increasingly panicked Slack messages] - "Did you use the EU server copy or the US copy?" - "The backup of the backup is different from the copy of the backup???" - "WHO KEEPS CREATING NEW INSTANCES???" STARRING: 🤦‍♂️ The Intern Who Thought "Create Database" Meant "Copy Database" 😱 The Data Scientist Who Ran the Wrong Analysis on the Wrong Clone 🕵️‍♀️ The Detective Team Trying to Find the One True Database And featuring: 💽 Database #27 as Itself CRITIC REVIEWS: "Made me afraid to hit 'Create Backup' ever again" - Tech Horror Weekly "The scariest part? This literally happened at my company" - Database Monthly ⚡ Don't let your databases star in the next horror film! VectorCat helps prevent sequel disasters with: 📊 Single source of truth tracking 🔍 Clone detection and prevention 🗺️ Clear database hierarchy mapping 🤖 Automated governance enforcement Coming to theaters never, because we can help you prevent this nightmare! Special features include: - "The Making Of: How Did We Get Here?" - Deleted Scenes: "The Lost MongoDB Instance" - Director's Commentary: "Why We Now Have a Naming Convention" #DataManagement #DatabaseHorror #DataGovernance #TechHumor #ITNightmares
sci2sci tweet media
English
1
0
2
341
sci2sci
sci2sci@sci2sci_com·
👻 The API Integration Poltergeist: A Data Horror Story In the interconnected halls of MedicalTech Labs, strange things began happening when they tried to connect their new ML platform to their legacy lab systems... ERROR_CODE_666: UNEXPECTED BEHAVIOR 👩‍💻 DevOps Lead Jenny Martinez's Incident Log: 9:00 AM: "Simple API integration, they said. Should be done by lunch, they said..." 10:30 AM: "Data is disappearing into some void between systems. WHERE DO THE SAMPLES GO??" 11:45 AM: "Found the missing data! But wait... why is it duplicated 17 times? And why are all timestamps set to midnight???" 1:15 PM: "Dear Integration Gods, WHY does System A send dates as MM/DD/YYYY But System B expects DD/MM/YYYY And System C uses UNIX timestamps And Karen from the lab uses 'last Tuesday-ish'???" 3:30 PM: "The poltergeist is real! It's: - Randomly switching decimal points - Converting units without telling anyone - Sending successful response codes for failed requests - Creating phantom entries that disappear on refresh - Speaking to us through server logs in Latin (okay, maybe that's just tired me reading corrupted strings)" 5:00 PM: "Update: Everything worked perfectly in dev environment. Production environment? CHAOS. Classic poltergeist behavior. 👻" ⚡ But there's hope for your haunted systems! VectorCat helps you: 🔌 Standardize data formats (no more date format séances) 📊 Track data lineage (see where the ghosts came from) 🤖 Automate format validation (ghost detection system!) 🔍 Monitor integrations (better than security cameras) Don't let API poltergeists haunt your data pipelines. Let's discuss how to make your organization survive without calling ghostbusters! P.S. Some say that on quiet nights, you can still hear Jenny whispering "just gonna try one more POST request..." P.P.S. The Latin-speaking logs turned out to be someone's Lorem Ipsum test data that gained sentience. We don't talk about that anymore. #APIIntegration #DataEngineering #TechHumor #DataGovernance #SoftwareIntegration #LabTech
sci2sci tweet media
English
0
1
2
217
sci2sci
sci2sci@sci2sci_com·
🌊 The Data Lake Monster: A Data Horror Story Legend says that in the depths of BioPharma Global's massive data lake lurks a terrifying creature, created from years of teams dumping data without documentation... Our story begins when Dr. Jack Thompson, a brave new data scientist, ventured into the depths to find some clinical trial results from 2019: Day 1: "This data lake is huge but well-organized! Should be easy to find what I need." Day 3: "Getting weird results. Found 7 different tables named 'final_clinical_results' 🤔" Day 5: "Help! The monster is real! It's made of: - Duplicate tables with slightly different values - Unnamed columns that eat metadata - Rogue NULL values that multiply in the dark - CSV files that should've been JSON - One massive Excel file that crashes on open" Day 7: "Found a mysterious Jupyter notebook... its last entry just says 'THE DATA IS ALL CONNECTED BUT NOTHING MAKES SENSE ANYMORE'" Day 10: "I found the data! But... now there's conflicting results from three different sources. The monster has won. 😱" 🦸‍♂️ There's hope for your data lake! VectorCat helps you: 🗺️ Map your data landscape (no diving gear needed) 🔍 Track data lineage (see how the monster grew) 🤖 Let AI help classify and organize (better than tea leaves-reading) Don't let your data lake become a home for monsters. Let's talk about how to transform it from a horror story into a treasure trove! P.S. Dr. Thompson is fine now. We found him muttering "should've used proper data governance" in the server room. After a week of metadata therapy, he's back to normal. #DataLake #DataGovernance #Analytics #Biotech #DataScience #DataManagement
sci2sci tweet media
English
0
2
4
258
sci2sci
sci2sci@sci2sci_com·
🎃 The Case of the Vanishing Scientist: A Halloween Data Horror Story It was a dark and stormy morning in the lab when Dr. Sarah Chen's team discovered their worst nightmare: their lead researcher had mysteriously disappeared to a startup, leaving behind only a graveyard of unlabeled samples and cryptic lab notebooks! 👻 The haunting began: - Folders named "final_final_FINAL_v3_REAL" lurked in every drive - Crucial protocols were scribbled on paper towels - Data lived in "somewhere in my email attachments from March... or was it May?" - The only person who knew where everything was? Now working on a beach in California! The team spent months trying to resurrect dead-end experiments and decrypt handwriting that looked like it was written by a ghost. Some say that on quiet nights, you can still hear them wailing "BUT WHERE IS THE METADATA?!" 💡 Plot twist: This horror story doesn't have to be yours! At @sci2sci_com, we believe the only things that should be scary at work are the office coffee and your colleague's Halloween costume. Not your data management. Our VectorCat platform helps you: 🧪 Keep experiments documented (no ouija board needed) 📊 Track data lineage (better than breadcrumbs in the forest) 🔍 Find anything with semantic search (faster than a ghost can say "boo!") Don't let your data management become a horror story. Let's talk about how we can help exorcise your data demons - and stay tuned for more spooky stories from the Halloween series this week! *The characters and events depicted are fictious. Any similarity to names or incidents is entirely coincidental 😅*. #DataManagement #Biotech #Halloween #DataGovernance #LabLife #KnowledgeManagement
sci2sci tweet media
English
0
0
3
215
sci2sci retweetledi
Angelina Lesnikova
Angelina Lesnikova@an_lesnikova·
Send me a message if you're also attending @BioTechX_, happy to have a chat on-site!
Angelina Lesnikova tweet media
English
0
1
3
138
sci2sci
sci2sci@sci2sci_com·
🧬 For our biotech friends - we added a specific example of how you can easily go “back in time” and trace 🐾 all the possible problems by quickly going through the data with filters for instruments, reagents etc. Meme alert on!
English
0
0
1
75
sci2sci
sci2sci@sci2sci_com·
Ever played 🙈 hide and 🙉 seek with a spreadsheet? We did too - that’s why we’re upgrading our search engine to work with internal data in organizations too! 👀 We want your feedback on the demo video - please drop us a comment! youtu.be/XJ-nXxFnIXs?si…
YouTube video
YouTube
English
1
0
2
150
sci2sci
sci2sci@sci2sci_com·
5/ The call is open for 3 days until Friday, January 12th. Share it within your community and let’s make a leap for open science and human imagination together! 🌐✨ #Sci2Moon
sci2sci tweet media
English
1
0
5
141