Ben Hitz

1.2K posts

Ben Hitz

Ben Hitz

@bonscotthoughts

ENCODE DCC * IGVF DACC (he/him)

Katılım Temmuz 2010
407 Takip Edilen124 Takipçiler
sina
sina@sinabooeshaghi·
Precision medicine, like precision engineering, requires precise tools. To achieve Formula One-level performance with medicine, we must bring engineering principles to biology. #EngineerBiology.
English
1
1
13
5.4K
Jacob Schreiber
Jacob Schreiber@jmschreiber91·
All of my time management skills have disappeared trying to write my first grant. Should I get a skeleton of all this text done? No, I *need to* decide if it's where proteins bind "in" the genome or "on" the genome.
English
2
0
22
3.3K
Ben Hitz
Ben Hitz@bonscotthoughts·
@arjunrajlab Still it's clear that a "type" is an ensemble of states. Types should probably be mapped to known microscopic evidence; that is mostly how they are thought of anyway. It starts to get fuzzy when you consider sorting cells on surface markers.
English
0
0
0
28
Ben Hitz
Ben Hitz@bonscotthoughts·
@arjunrajlab This isn't a useful definition of state because it's not precise. How many tpm difference is a different state? It is not a useful definition for type because cells have multiple functions (and function itself is slippery) and they generally are not found in all combinations
English
1
0
0
372
Arjun Raj
Arjun Raj@arjunrajlab·
Since this hand-wringing continues: A cell state is a list of all molecular constituents and their parameters (one crude approximation could be the transcriptome). A cell type is the set of all cellular states that can perform a particular function (you define the function).
Arjun Raj@arjunrajlab

@arnavm1 Cell type and state are actually pretty easy to define. Time for a blog post.

English
13
17
127
52.9K
Ben Hitz
Ben Hitz@bonscotthoughts·
@lpachter Maybe you shouldn't read articles from profit focused paper mills like ...checks notes... Nature
English
1
0
7
3.4K
Lior Pachter
Lior Pachter@lpachter·
I really don't understand what is going on. How are journals and reviewers ok with this kind of junk? The many organizations that funded this work should be outraged. And sorry for the patients whose data is trashed in this way. Scientists really ought to have some self respect!
Lior Pachter@lpachter

In this UMAP, there are arrows linking nothing to nothing (see panel d). Gornisht mit gornisht, as they say. Also drawing curves on top of a UMAP built from x-ray images by repurposing RNA velocity software that already didn't make sense is next level!

English
6
13
132
77.1K
Dan Orlovsky
Dan Orlovsky@danorlovsky7·
Marshmallows
English
71
247
2.6K
261.5K
Ben Hitz retweetledi
Ming "Tommy" Tang
Ming "Tommy" Tang@tangming2005·
1/ One of the most common mistakes in bioinformatics is the one-off error. If you are not careful, you will make mistakes. Even experienced bioinformaticians are not aware of it and the mistake prevails in many bioinformatics tools. A thread 🧵
Ming "Tommy" Tang tweet media
English
2
44
174
27.1K
Ben Hitz
Ben Hitz@bonscotthoughts·
@Aella_Girl @snowanddrugs2 is this because Republican-coded laws can be avoided more easier if you just have the resources to do so? Which is why dem-coded laws I consider more egalitarian.
English
0
0
1
37
Aella
Aella@Aella_Girl·
@snowanddrugs2 Let's just say I'm safely breaking Republican-coded laws but I'm too scared to break democrat-coded laws
English
10
0
92
9.8K
Aella
Aella@Aella_Girl·
I lean very slightly towards the Republican side, but only because it's easier to evade restrictive laws on personal/social liberty than it is economic liberty. It's easier to order drugs from overseas than it is to pay less of your own money in taxes
English
200
68
2.5K
559.5K
Ben Hitz
Ben Hitz@bonscotthoughts·
@akhivae How many nations did Franco's Spain invade?
English
0
0
0
43
akhivae
akhivae@akhivae·
Fascism is rising almost nowhere in the West. They want fewer immigrants and an end to post-2010 progressive politics. There's very little desire to invade neighboring nations for lebensraum or start goose-stepping in military parades.
Arnesa Buljušmić-Kustura@arnesa_kustura

I understand that Americans are freaking out and many are thinking about moving out of the States and I would genuinely recommend they do not do that. Fascism is on the rise literally everywhere, esp in Europe. You will not be saved by just moving away.

English
131
409
6.2K
413.2K
Ben Hitz
Ben Hitz@bonscotthoughts·
@Aella_Girl actual geneticists study the specific effect of human genetic variation on a huge variety of traits, diseases and phenotypes; see. e.g.: ebi.ac.uk/gwas/
English
0
0
0
195
Aella
Aella@Aella_Girl·
I suspect the people against HBD are against it because of fear that the general population would use it as a weapon for bad things, not because it's unsound or an inherently evil worldview to hold
English
145
16
611
267.7K
Ben Hitz
Ben Hitz@bonscotthoughts·
@DrChrisCombs Honestly thought this was an orange cat wearing a very fancy drone hat
English
0
0
0
10
Chris Combs (iterative design enjoyer)
If Elon was pumping a mission to Titan you guys would be losing your freaking minds Keep telling me about how NASA doesn't innovate
Chris Combs (iterative design enjoyer) tweet media
English
82
33
1.1K
62.9K
Gillian Branstetter
Gillian Branstetter@GBBranstetter·
"All trans people are welcome to our ruthlessly-segregated, unlivably-expensive, militarized playground for tranq-addled fascists"
English
4
78
763
35K
Ben Hitz
Ben Hitz@bonscotthoughts·
@pranamanam Aren't all these methods essentially bad? Better than random sure, but...
English
1
0
0
158
Pranam Chatterjee
Pranam Chatterjee@pranamanam·
Very interesting study that shows AlphaFold3 captures a relatively global effect of mutations on PPIs by learning a smoother energy landscape, but doesn't seem to be as atomicallly fine-grained as standard MD. Could still be good for generating synthetic mutant datasets (especially when the AF3 code is open-sourced)? 🤔 Paper: biorxiv.org/content/10.110… Results: github.com/luwei0917/Alph…
Pranam Chatterjee tweet media
English
2
3
38
4.9K
Ben Hitz retweetledi
Vince Buffalo
Vince Buffalo@vsbuffalo·
One funny story about this: I spent hours creating a figure in my book explaining 0 versus 1-based indexing and closed versus right-open intervals. The illustrator thought I made a careless error in starting one from 0 and the other from 1, and changed them to match 😱
James Pitt@Sahelanth

@vsbuffalo @jgschraiber One of the many terrifying things I learned from your Bioinformatics Data Skills book!

English
0
2
35
4.7K
Ben Hitz retweetledi
Anshul Kundaje
Anshul Kundaje@anshulkundaje·
Dear big consortia, It is never too late to be brave and use all that visibility you have to make a strong statement that academia will not be held hostage by glam journals & their shiny JIFs 1/
English
3
14
99
29.3K
Liam Bright
Liam Bright@lastpositivist·
I haven't watched the last series of Rick'n'Morty. Suppose (counter-factually) my IQ were high enough to appreciate the show's brand of humour. Is it a good series that is worth watching?
English
27
1
57
13.7K
Ben Hitz retweetledi
sina
sina@sinabooeshaghi·
Our paper "A machine readable specification for genomics assays" is now published in Bioinformatics, @OUPBioinfo. In short, we present a lightweight file format and command-line tool to document the structure of sequencing reads. Coauthored with @XiChenUoM and @lpachter. Paper: doi.org/10.1093/bioinf… Code: github.com/pachterlab/seq… What is in my sequencing reads? Sequencing machines produce text files, called FASTQs, that contain reads or sequences of DNA molecules. Assay developers and data generators deeply understand the contents of their reads; they know the location and presence biological and synthetic constructs like cellular barcodes. Collaborators, reproducers, and other scientists may not despite their sometimes obscure addition to supplementary material. Take for example the @10xGenomics Multiome assay. The 10x Genomics documentation [1] spells out the read structure for each modality: RNA reads contain a synthetic 16bp barcode, a 12 bp "randomly" generated unique molecular barcode (UMI), as well as cDNA that was captured via polyA capture. The ATAC reads consist of genomic DNA and a 16bp cellular barcode. However the 10x Genomics website explains that the ATAC portion of the 10x Multiome data contains an little-known 8bp constant sequence spacer that proceeds the 16bp cell barcode. So saying that you have "10x Multiome" reads is a necessary but not sufficient condition to know the contents of your FASTQ reads. The reason is because the FASTQ read structure is dependent on both the assay as well as the sequencing machine/recipe used; a sequencing library produced from one single-cell assay can yield different read structures depending these parameters. Take the the 10x Multiome assay. The ATAC 24bp barcode + spacer is usually sequenced as the i5 index read. Since the NextSeq 500/550 does not support a 24bp i5 read, the user must specify "dark cycles" (10x details the impact of this [2]) to skip the 8bp spacer. This yields a 16bp cell barcode in the i5 FASTQ file. If, however, the 10x Multiome ATAC library was not sequenced with dark cycles then the i5 FASTQ file will contain a 24bp spacer + barcode. I was originally unaware of the 8bp spacer and the use of dark cycles in Multiome library sequencing. But as I was recently looking ATAC reads I realized the impact it had on my count matrices; I had been extracting half of the cell barcode and all of the spacer. This meant I was performing barcode error correction and was UMI collapsing the, mostly similar, cell barcodes to produce few counts. This decoupling of read structure between the sequencing machine and the assay places a high priority on documenting read structure in a sequencer and assay-specific manner so that preprocessing tools can accurately extract and process relevant sequenced elements. A machine-readable specification I was inspired by @XiChenUoM 's efforts (which started while in @teichlab) in documenting sequencing reads of assays and I came up with an idea to document read structure in a machine- and human-readable specification. The specification is called seqspec [3]. The specification details the structure of a YAML file that allows users to specify and annotate the types of sequences that are contained in their FASTQ data. seqspec uses a nested representation of "Regions" and "Reads" that allows users to annotate groups of sequenced elements and map sequencing reads to sequencing primers. This enables, for example, all of the elements contained in Read 1 of a FASTQ file, such as the barcode and UMI in the 10x RNA assay to be annotated as belonging to Read 1. The spec also comes with an accompanying seqspec command line tool which gives users who annotate their sequencing assays many benefits: 1. Reproducibility and verifiability of the assay structure 2. Positional extraction of relevant features 3. Visualization of the sequencing structure The seqspec command line tool makes it straightforward to extract the positional index of sequenced elements. The barcodes in the 10x Multiome dataset could have easily been identified as starting 8bp into the reads with the seqspec index command. The tool also also makes it straightforward to visualize the structure of your sequencing reads. seqspec print can produce publication-ready figure of your read structure. Most importantly, seqspec makes it easy for others to reanalyze data for which a seqspec exists, bringing about verifiability of analysis results. seqspec adoption seqspec aims to make genomics processing correct and reproducible. seqspec was recently adopted as the first standard in the @IGVFConsortium and we anticipate the publication of terabytes of sequencing data alongside their seqspec read annotations. I personally believe seqspec will be transformative for reproducibility and analysis efforts, in particular for those undertaken by consortia. I hope that public databases (like the @NCBI SRA/GEO and DDBJ) will test out seqspec and look to adopt it as a standard for data submission. seqspec is freely available, open source, open to contributions, useable, and well documented. Please take a look at the GitHub repo and try it out! We welcome feedback. [1] 10xgenomics.com/support/single… [2] kb.10xgenomics.com/hc/en-us/artic… [3] github.com/pachterlab/seq…
sina tweet media
English
0
38
131
18.5K
Ben Hitz
Ben Hitz@bonscotthoughts·
@RyanMarino As someone with a colonoscopy scheduled Friday I wonder the same thing
English
0
0
1
144