Ben Hitz

1.2K posts

Ben Hitz

@bonscotthoughts

ENCODE DCC * IGVF DACC (he/him)

انضم Temmuz 2010

407 يتبع124 المتابعون

تغريدة مثبتة

Ben Hitz@bonscotthoughts·21 May

This is an infrastructure project. Join us gopetition.com/petitions/a-un… @anshulkundaje @ontowonka 16/16

English

Ben Hitz@bonscotthoughts·2 Kas

@anshulkundaje @sinabooeshaghi no, colored pencils

English

Anshul Kundaje@anshulkundaje·2 Kas

@sinabooeshaghi You mean we need to go Brrrr on 1000 H100s right?

English

291

sina@sinabooeshaghi·1 Kas

Precision medicine, like precision engineering, requires precise tools. To achieve Formula One-level performance with medicine, we must bring engineering principles to biology. #EngineerBiology.

English

5.4K

Ben Hitz@bonscotthoughts·26 Eyl

@jmschreiber91 n

100

Jacob Schreiber@jmschreiber91·26 Eyl

All of my time management skills have disappeared trying to write my first grant. Should I get a skeleton of all this text done? No, I *need to* decide if it's where proteins bind "in" the genome or "on" the genome.

English

3.3K

Ben Hitz@bonscotthoughts·25 Eyl

@JD_Buenrostro Even snowflakes can be classified: its.caltech.edu/~atomic/snowcr…

English

150

Jason Buenrostro@JD_Buenrostro·25 Eyl

“Every cell is a special snowflake” retweet if you agree! #teamsnowflake nature.com/articles/d4158…

English

112

10.1K

Ben Hitz@bonscotthoughts·25 Eyl

@arjunrajlab Still it's clear that a "type" is an ensemble of states. Types should probably be mapped to known microscopic evidence; that is mostly how they are thought of anyway. It starts to get fuzzy when you consider sorting cells on surface markers.

English

Ben Hitz@bonscotthoughts·25 Eyl

@arjunrajlab This isn't a useful definition of state because it's not precise. How many tpm difference is a different state? It is not a useful definition for type because cells have multiple functions (and function itself is slippery) and they generally are not found in all combinations

English

372

Arjun Raj@arjunrajlab·25 Eyl

Since this hand-wringing continues: A cell state is a list of all molecular constituents and their parameters (one crude approximation could be the transcriptome). A cell type is the set of all cellular states that can perform a particular function (you define the function).

Arjun Raj@arjunrajlab

@arnavm1 Cell type and state are actually pretty easy to define. Time for a blog post.

English

127

52.9K

Ben Hitz@bonscotthoughts·12 Eyl

@lpachter Maybe you shouldn't read articles from profit focused paper mills like ...checks notes... Nature

English

3.4K

Lior Pachter@lpachter·12 Eyl

I really don't understand what is going on. How are journals and reviewers ok with this kind of junk? The many organizations that funded this work should be outraged. And sorry for the patients whose data is trashed in this way. Scientists really ought to have some self respect!

Lior Pachter@lpachter

In this UMAP, there are arrows linking nothing to nothing (see panel d). Gornisht mit gornisht, as they say. Also drawing curves on top of a UMAP built from x-ray images by repurposing RNA velocity software that already didn't make sense is next level!

English

132

77.1K

Ben Hitz@bonscotthoughts·12 Eyl

@danorlovsky7 What app are you using?

English

171

Dan Orlovsky@danorlovsky7·11 Eyl

Marshmallows

English

247

2.6K

261.5K

Ben Hitz أُعيد تغريده

Ming "Tommy" Tang@tangming2005·27 Ağu

1/ One of the most common mistakes in bioinformatics is the one-off error. If you are not careful, you will make mistakes. Even experienced bioinformaticians are not aware of it and the mistake prevails in many bioinformatics tools. A thread 🧵

English

174

27.1K

Ben Hitz@bonscotthoughts·24 Tem

@Aella_Girl @snowanddrugs2 is this because Republican-coded laws can be avoided more easier if you just have the resources to do so? Which is why dem-coded laws I consider more egalitarian.

English

Aella@Aella_Girl·23 Tem

@snowanddrugs2 Let's just say I'm safely breaking Republican-coded laws but I'm too scared to break democrat-coded laws

English

9.8K

Aella@Aella_Girl·23 Tem

I lean very slightly towards the Republican side, but only because it's easier to evade restrictive laws on personal/social liberty than it is economic liberty. It's easier to order drugs from overseas than it is to pay less of your own money in taxes

English

200

2.5K

559.5K

Ben Hitz@bonscotthoughts·1 Tem

@akhivae How many nations did Franco's Spain invade?

English

akhivae@akhivae·30 Haz

Fascism is rising almost nowhere in the West. They want fewer immigrants and an end to post-2010 progressive politics. There's very little desire to invade neighboring nations for lebensraum or start goose-stepping in military parades.

Arnesa Buljušmić-Kustura@arnesa_kustura

I understand that Americans are freaking out and many are thinking about moving out of the States and I would genuinely recommend they do not do that. Fascism is on the rise literally everywhere, esp in Europe. You will not be saved by just moving away.

English

131

409

6.2K

413.2K

Ben Hitz@bonscotthoughts·18 Haz

@Aella_Girl actual geneticists study the specific effect of human genetic variation on a huge variety of traits, diseases and phenotypes; see. e.g.: ebi.ac.uk/gwas/

English

195

Aella@Aella_Girl·18 Haz

I suspect the people against HBD are against it because of fear that the general population would use it as a weapon for bad things, not because it's unsound or an inherently evil worldview to hold

English

145

611

267.7K

Ben Hitz@bonscotthoughts·16 Haz

@DrChrisCombs Honestly thought this was an orange cat wearing a very fancy drone hat

English

Chris Combs (iterative design enjoyer)@DrChrisCombs·15 Haz

If Elon was pumping a mission to Titan you guys would be losing your freaking minds Keep telling me about how NASA doesn't innovate

Chris Combs (iterative design enjoyer) tweet media

English

1.1K

62.9K

Ben Hitz@bonscotthoughts·13 Haz

@GBBranstetter That's not fair. We aren't all tranq addicts

English

304

Gillian Branstetter@GBBranstetter·13 Haz

"All trans people are welcome to our ruthlessly-segregated, unlivably-expensive, militarized playground for tranq-addled fascists"

English

763

35K

Ben Hitz@bonscotthoughts·30 May

@pranamanam Aren't all these methods essentially bad? Better than random sure, but...

English

158

Pranam Chatterjee@pranamanam·29 May

Very interesting study that shows AlphaFold3 captures a relatively global effect of mutations on PPIs by learning a smoother energy landscape, but doesn't seem to be as atomicallly fine-grained as standard MD. Could still be good for generating synthetic mutant datasets (especially when the AF3 code is open-sourced)? 🤔 Paper: biorxiv.org/content/10.110… Results: github.com/luwei0917/Alph…

English

4.9K

Ben Hitz أُعيد تغريده

Vince Buffalo@vsbuffalo·15 May

One funny story about this: I spent hours creating a figure in my book explaining 0 versus 1-based indexing and closed versus right-open intervals. The illustrator thought I made a careless error in starting one from 0 and the other from 1, and changed them to match 😱

James Pitt@Sahelanth

@vsbuffalo @jgschraiber One of the many terrifying things I learned from your Bioinformatics Data Skills book!

English

4.7K

Ben Hitz أُعيد تغريده

Anshul Kundaje@anshulkundaje·10 May

Dear big consortia, It is never too late to be brave and use all that visibility you have to make a strong statement that academia will not be held hostage by glam journals & their shiny JIFs 1/

English

29.3K

Ben Hitz@bonscotthoughts·9 May

@zenahitz @lastpositivist what am I your google monkey? en.wikipedia.org/wiki/One_Crew_…

English

Zena Hitz@zenahitz·9 May

@lastpositivist the one that's a parody of heists is wonderful. Which one is that @bonscotthoughts ?

English

279

Liam Bright@lastpositivist·8 May

I haven't watched the last series of Rick'n'Morty. Suppose (counter-factually) my IQ were high enough to appreciate the show's brand of humour. Is it a good series that is worth watching?

English

13.7K

Ben Hitz@bonscotthoughts·8 May

@Caroline_Bartma SPI1 actually official gene name.

English

Ben Hitz أُعيد تغريده

sina@sinabooeshaghi·18 Nis

Our paper "A machine readable specification for genomics assays" is now published in Bioinformatics, @OUPBioinfo. In short, we present a lightweight file format and command-line tool to document the structure of sequencing reads. Coauthored with @XiChenUoM and @lpachter. Paper: doi.org/10.1093/bioinf… Code: github.com/pachterlab/seq… What is in my sequencing reads? Sequencing machines produce text files, called FASTQs, that contain reads or sequences of DNA molecules. Assay developers and data generators deeply understand the contents of their reads; they know the location and presence biological and synthetic constructs like cellular barcodes. Collaborators, reproducers, and other scientists may not despite their sometimes obscure addition to supplementary material. Take for example the @10xGenomics Multiome assay. The 10x Genomics documentation [1] spells out the read structure for each modality: RNA reads contain a synthetic 16bp barcode, a 12 bp "randomly" generated unique molecular barcode (UMI), as well as cDNA that was captured via polyA capture. The ATAC reads consist of genomic DNA and a 16bp cellular barcode. However the 10x Genomics website explains that the ATAC portion of the 10x Multiome data contains an little-known 8bp constant sequence spacer that proceeds the 16bp cell barcode. So saying that you have "10x Multiome" reads is a necessary but not sufficient condition to know the contents of your FASTQ reads. The reason is because the FASTQ read structure is dependent on both the assay as well as the sequencing machine/recipe used; a sequencing library produced from one single-cell assay can yield different read structures depending these parameters. Take the the 10x Multiome assay. The ATAC 24bp barcode + spacer is usually sequenced as the i5 index read. Since the NextSeq 500/550 does not support a 24bp i5 read, the user must specify "dark cycles" (10x details the impact of this [2]) to skip the 8bp spacer. This yields a 16bp cell barcode in the i5 FASTQ file. If, however, the 10x Multiome ATAC library was not sequenced with dark cycles then the i5 FASTQ file will contain a 24bp spacer + barcode. I was originally unaware of the 8bp spacer and the use of dark cycles in Multiome library sequencing. But as I was recently looking ATAC reads I realized the impact it had on my count matrices; I had been extracting half of the cell barcode and all of the spacer. This meant I was performing barcode error correction and was UMI collapsing the, mostly similar, cell barcodes to produce few counts. This decoupling of read structure between the sequencing machine and the assay places a high priority on documenting read structure in a sequencer and assay-specific manner so that preprocessing tools can accurately extract and process relevant sequenced elements. A machine-readable specification I was inspired by @XiChenUoM 's efforts (which started while in @teichlab) in documenting sequencing reads of assays and I came up with an idea to document read structure in a machine- and human-readable specification. The specification is called seqspec [3]. The specification details the structure of a YAML file that allows users to specify and annotate the types of sequences that are contained in their FASTQ data. seqspec uses a nested representation of "Regions" and "Reads" that allows users to annotate groups of sequenced elements and map sequencing reads to sequencing primers. This enables, for example, all of the elements contained in Read 1 of a FASTQ file, such as the barcode and UMI in the 10x RNA assay to be annotated as belonging to Read 1. The spec also comes with an accompanying seqspec command line tool which gives users who annotate their sequencing assays many benefits: 1. Reproducibility and verifiability of the assay structure 2. Positional extraction of relevant features 3. Visualization of the sequencing structure The seqspec command line tool makes it straightforward to extract the positional index of sequenced elements. The barcodes in the 10x Multiome dataset could have easily been identified as starting 8bp into the reads with the seqspec index command. The tool also also makes it straightforward to visualize the structure of your sequencing reads. seqspec print can produce publication-ready figure of your read structure. Most importantly, seqspec makes it easy for others to reanalyze data for which a seqspec exists, bringing about verifiability of analysis results. seqspec adoption seqspec aims to make genomics processing correct and reproducible. seqspec was recently adopted as the first standard in the @IGVFConsortium and we anticipate the publication of terabytes of sequencing data alongside their seqspec read annotations. I personally believe seqspec will be transformative for reproducibility and analysis efforts, in particular for those undertaken by consortia. I hope that public databases (like the @NCBI SRA/GEO and DDBJ) will test out seqspec and look to adopt it as a standard for data submission. seqspec is freely available, open source, open to contributions, useable, and well documented. Please take a look at the GitHub repo and try it out! We welcome feedback. [1] 10xgenomics.com/support/single… [2] kb.10xgenomics.com/hc/en-us/artic… [3] github.com/pachterlab/seq…

English

131

18.5K

Ben Hitz@bonscotthoughts·28 Mar

@RyanMarino As someone with a colonoscopy scheduled Friday I wonder the same thing

English

144

اكتشف

@anshulkundaje @sinabooeshaghi @jmschreiber91 @JD_Buenrostro @arjunrajlab @lpachter @danorlovsky7 @Aella_Girl