Broncio Aguilar-Sanjuan

259 posts

Broncio Aguilar-Sanjuan

Broncio Aguilar-Sanjuan

@BroncioS

Oxford, England Katılım Ocak 2019
193 Takip Edilen60 Takipçiler
Broncio Aguilar-Sanjuan retweetledi
Sheppard Lab
Sheppard Lab@sheppard_lab·
Lab update after long ! We are growing and doing more wonderful science across many fields. But, equally (maybe more) chill outside, pic from a recent lab Thai dinner outing last week .
Sheppard Lab tweet media
English
1
1
13
425
Broncio Aguilar-Sanjuan retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
The UK is an amazing place for science & innovation. Thrilled to deepen our partnership with the UK Government to turbocharge scientific discovery with AI - giving scientists here priority access to our frontier models like AlphaEvolve, AI Co-Scientist, AlphaGenome, WeatherNext & more. We’re also building our first automated lab here in the UK for materials science!
English
103
212
2.1K
183.4K
Sam Altman
Sam Altman@sama·
Our new AI-first web browser, ChatGPT Atlas, is here for macOS. Please send feedback! Availability on other platforms to follow.
English
3.3K
1.6K
22.9K
2.9M
Broncio Aguilar-Sanjuan retweetledi
Brian Roemmele
Brian Roemmele@BrianRoemmele·
BOOOOOOOM! CHINA DEEPSEEK DOES IT AGAIN! An entire encyclopedia compressed into a single, high-resolution image! — A mind-blowing breakthrough. DeepSeek-OCR, unleashed an electrifying 3-billion-parameter vision-language model that obliterates the boundaries between text and vision with jaw-dropping optical compression! This isn’t just an OCR upgrade—it’s a seismic paradigm shift, on how machines perceive and conquer data. DeepSeek-OCR crushes long documents into vision tokens with a staggering 97% decoding precision at a 10x compression ratio! That’s thousands of textual tokens distilled into a mere 100 vision tokens per page, outmuscling GOT-OCR2.0 (256 tokens) and MinerU2.0 (6,000 tokens) by up to 60x fewer tokens on the OmniDocBench. It’s like compressing an entire encyclopedia into a single, high-definition snapshot—mind-boggling efficiency at its peak! At the core of this insanity is the DeepEncoder, a turbocharged fusion of the SAM (Segment Anything Model) and CLIP (Contrastive Language–Image Pretraining) backbones, supercharged by a 16x convolutional compressor. This maintains high-resolution perception while slashing activation memory, transforming thousands of image patches into a lean 100-200 vision tokens. Get ready for the multi-resolution "Gundam" mode—scaling from 512x512 to a monstrous 1280x1280 pixels! It blends local tiles with a global view, tackling invoices, blueprints, and newspapers with zero retraining. It’s a shape-shifting computational marvel, mirroring the human eye’s dynamic focus with pixel-perfect precision! The training data? Supplied by the Chinese government for free and not available to any US company. You understand now why I have said the US needs a Manhattan Project for AI training data? Do you hear me now? Oh still no? I’ll continue. Over 30 million PDF pages across 100 languages, spiked with 10 million natural scene OCR samples, 10 million charts, 5 million chemical formulas, and 1 million geometry problems!. This model doesn’t just read—it devours scientific diagrams and equations, turning raw data into a multidimensional knowledge. Throughput? Prepare to be floored—over 200,000 pages per day on a single NVIDIA A100 GPU! This scalability is a game-changer, turning LLM data generation into a firehose of innovation, democratizing access to terabytes of insight for every AI pioneer out there. This optical compression is the holy grail for LLM long-context woes. Imagine a million-token document shrunk into a 100,000-token visual map—DeepSeek-OCR reimagines context as a perceptual playground, paving the way for a GPT-5 that processes documents like a supercharged visual cortex! The two-stage architecture is pure engineering poetry: DeepEncoder generates tokens, while a Mixture-of-Experts decoder spits out structured Markdown with multilingual flair. It’s a universal translator for the visual-textual multiverse, optimized for global domination! Benchmarks? DeepSeek-OCR obliterates GOT-OCR2.0 and MinerU2.0, holding 60% accuracy at 20x compression! This opens a portal to applications once thought impossible—pushing the boundaries of computational physics into uncharted territory! Live document analysis, streaming OCR for accessibility, and real-time translation with visual context are now economically viable, thanks to this compression breakthrough. It’s a real-time revolution, ready to transform our digital ecosystem! This paper is a blueprint for the future—proving text can be visually compressed 10x for long-term memory and reasoning. It’s a clarion call for a new AI era where perception trumps text, and models like GPT-5 see documents in a single, glorious glance. I am experimenting with this now on 1870-1970 offline data that I have digitalized. But be ready for a revolution! More soon. [1] github.com/deepseek-ai/De…
Brian Roemmele tweet media
English
342
1.4K
7.5K
1.8M
Broncio Aguilar-Sanjuan retweetledi
Ankit Singhal
Ankit Singhal@notankitsinghal·
Introducing Odyssey—the largest and most performant protein language model ever created. Odyssey enables scientists and researchers to generate and edit proteins, the workhorses of all life on this planet, towards specific functional ends—scaled to over 102 billion parameters. We did it all with just a core team of 6 and an order of magnitude less funding than our next largest competitor. Here's how it works 🧵 (1/6)
English
75
180
1.4K
181.1K
Broncio Aguilar-Sanjuan retweetledi
Nextflow
Nextflow@nextflowio·
It’s time for the #NextflowSummit BCN speaker series! 🎉 Let's welcome our first speaker Ziad Al Bkhetan, Product Manager at @AusBiocommons 🎤 He’ll present "The Australian ProteinFold Service: Interactive Prediction & Visualisation of Protein Structure" summit.nextflow.io/2024/barcelona…
Nextflow tweet media
English
1
7
16
4.9K
Broncio Aguilar-Sanjuan
Broncio Aguilar-Sanjuan@BroncioS·
AI rules, software bug breaks it lol @ba13026/amusing-five-word-stories-about-a-world-where-ai-dominates-the-world-f119d41d2cc2" target="_blank" rel="nofollow noopener">medium.com/@ba13026/amusi…
English
0
0
0
52
Broncio Aguilar-Sanjuan retweetledi
Oxford Protein Informatics Group (OPIG)
OPIG postdoc @mijr12 contributed computational analyses to benchmark a new biotechnology: a combination of droplet microfluidics & FACS to isolate pathogen-specific, antibody-secreting B cells that lack B cell receptors. Research led by @hollfelderlab, published @NatureBiotech.
Nature Biotechnology@NatureBiotech

Rapid discovery of monoclonal antibodies by microfluidics-enabled FACS of single pathogen-specific antibody-secreting cells go.nature.com/4dHLaHB

English
0
4
18
8.9K
Broncio Aguilar-Sanjuan retweetledi
Oxford Protein Informatics Group (OPIG)
Small Molecule tool update: We added a new chemical sanity check to PoseBusters (github.com/maabuu/posebus…): "InChI convertibility". If the molecule can be converted to a standard InChI key and back, the test passes.... (1/2)
English
2
6
23
2K
Broncio Aguilar-Sanjuan retweetledi
Oxford Protein Informatics Group (OPIG)
Our Therapeutic Structural Antibody Database has been updated with the latest clinical trial data and the antibody/nanobody therapies in WHO Proposed INN List 131. With this update, Thera-SAbDab has reached the 1000+ milestone (1063 therapeutics) Explore: opig.stats.ox.ac.uk/webapps/sabdab…
English
1
7
25
1.7K