Alex Dobin

148 posts

Alex Dobin

Alex Dobin

@a_dobin

Director of Bioinformatics @ArcInstitute Formerly: ENCODE; PI@CSHL Developer of STAR.

Palo Alto, CA Katılım Ağustos 2012
125 Takip Edilen767 Takipçiler
Alex Dobin
Alex Dobin@a_dobin·
@fulop_dan Indeed, strandedness of the libraries does not (presently) affect alignments. --soloStrand option is necessary for assigning reads to genes in the single-cell gene expression.
English
0
0
1
0
Dan Fulop
Dan Fulop@fulop_dan·
Never mind, and sorry to bug you. I see now that a general answer about strandedness and the lack of a need for additional parameters is contained here: groups.google.com/g/rna-star/c/o…
English
1
0
1
0
Dan Fulop
Dan Fulop@fulop_dan·
@a_dobin The --readStrand switch is gone in the STAR short read aligner Is it now replaced by --soloStrand even for bulk RNAseq? #Bioinformatics
English
1
2
2
0
Alex Dobin
Alex Dobin@a_dobin·
@APredeus Indeed, we were using the "abridged" 10X annotations that exclude small non-coding RNA and pseudogenes. We checked it for the full Gencode 37 annotations, and the results were very similar.
English
0
0
2
0
Alex Predeus 🇺🇦
Alex Predeus 🇺🇦@APredeus·
@a_dobin Quick question: it seems like you've applied the same GTF filtering as cellRanger does (removing about 30k noncoding RNA genes) for all tested tools; is this correct? Would you expect the accuracy to change a lot if the full reference is used?
English
1
0
0
0
Alex Dobin
Alex Dobin@a_dobin·
@alexwstockinger Supertranscripts should work if you can make a set of Supertranscript sequences and a GTF describing spliced/unspliced transcripts with respect to transcsirpts and giving it to the STAR genome generation step.
English
0
0
0
0
Alex Stockinger
Alex Stockinger@alexwstockinger·
@a_dobin So a simple gene/transcript map is the way to go? Ad supertranscripts: to my understanding, cellranger IS splice-aware, right? And so is STARsolo? What am I missing here?
English
1
0
0
0
Alex Stockinger
Alex Stockinger@alexwstockinger·
Just two months after Kallisto was shown to often outperform STAR in mapping #scRNAseq data (biorxiv.org/content/10.110…), STAR strikes back by integrating multi-mapping data. Happy to see these tools improving, maybe @10xGenomics could consider integrating them into cellranger?
Alex Dobin@a_dobin

STARsolo preprint is out on bioRxiv: biorxiv.org/content/10.110… STAR release 2.7.9a: github.com/alexdobin/STAR… The major new feature is quantification of multi-gene (multi-mapping) reads/UMIs, which are necessary to detect expression from overlapping genes and paralogs. 1/5

English
1
5
5
0
Alex Dobin
Alex Dobin@a_dobin·
@nomad421 Interesting approach, and very impressive accuracy improvement! And incredibly quick turn-around time!
English
1
0
4
0
𝕐
𝕐@nomad421·
@a_dobin : you may be interested in the approach suggested here; we'd be happy to have your thoughts / feedback (we haven't gotten around to looking at the simulated data yet)!
English
1
0
1
0
𝕐
𝕐@nomad421·
Can the prediction of "expression for thousands of non-expressed genes” arising in certain approaches for #scRNAseq quantification #sec-5" target="_blank" rel="nofollow noopener">biorxiv.org/content/10.110… be ameliorated while retaining their computational benefits? It seems possible; a short thread! 1/10
𝕐 tweet media
English
1
33
78
0
Alex Dobin
Alex Dobin@a_dobin·
@alexwstockinger The SuperTranscripts are very cool - but they would require spliced alignments. We were actually looking into that at some point but did not get far. The redundancy is not a problem, as long as redundant transcripts are assigned to the same gene.
English
1
0
0
0
Alex Dobin
Alex Dobin@a_dobin·
@alexwstockinger This is a good point: for species without genome assembly, mapping to the transcriptome is the only option. You can do it with STARsolo by generating the genome index from transcript sequences instead of chromosomes. 3/3
English
1
0
0
0
Alex Dobin
Alex Dobin@a_dobin·
@alexwstockinger Using simulations, we show the differences are due to Kallisto's lower accuracy, which is caused by the pseudoalignment-to-transcriptome algorithm. It forces intronic reads (abundant in single-cell data) to map to spurious genes. 2/3
English
2
0
1
0
Alex Dobin
Alex Dobin@a_dobin·
@bdeonovic @BMirauta @biomonika @lpachter Sure, no disagreement here. I was thinking about a specific data type, scRNA-seq gene/cell counts: mostly 0s, many 1s, and fewer >=2 elements. But maybe Lior has something else on his mind, and I am being paranoid. twitter.com/a_dobin/status…
Alex Dobin@a_dobin

@hypercompetent @lpachter It’s getting late on the East coast, and still no blog from Lior, so I will make my presumptuous guess. I think Lior is trying to puzzle out why Kallisto to CellRanger correlation is lower in our Fig.4C biorxiv.org/content/10.110… vs. their Fig.2D nature.com/articles/s4158… 1/3

English
0
0
1
0
Benjamin Deonovic
Benjamin Deonovic@bdeonovic·
@a_dobin @BMirauta @biomonika @lpachter If the model is x[i] = 1-y[i] for i < k and x[i]=y[i] for i>=k then given k the association between x and y is perfect. My point above is that it is important what the underlying model is and the underlying model should inform what measures of association you use
English
1
0
0
0
Lior Pachter
Lior Pachter@lpachter·
This is a subtweet (until I get around to writing the blog post).
Lior Pachter tweet media
English
3
0
21
0
𝕐
𝕐@nomad421·
@a_dobin @p_bourguet One can also use salmon with the transcriptome-projected alignments from STAR; it is quite fast. It's a great pairing (and, as a bonus, you don't have to disallow indels in the projected alignments).
English
1
0
3
0
Alex Dobin
Alex Dobin@a_dobin·
@BMirauta @bdeonovic @biomonika @lpachter And correlation coefficient does not have to be higher than the proportion of equal elements. An even simpler toy example: x=[0 0 1 1] y=[0 1 0 1] corr(x,y)=0 (obviously) while 50% of the elements agree.
English
1
0
0
0
Bogdan Mirauta
Bogdan Mirauta@BMirauta·
@bdeonovic @biomonika @lpachter I fully agree Pearson is not the best correlation coef in many cases (outliers notably) and that even the concept of correlation is not the most appropriate sometimes. But, on this data I do not agree it is missleading. The r2 of 0.36 indicates the right prediction accuracy.
English
2
0
0
0
Alex Dobin
Alex Dobin@a_dobin·
@p_bourguet Right, there are a few features in STARsolo that would be good to have for bulk (e.g., counting only reads that are concordant with transcripts). They are high on my TODO list. Though for multimappers, quantifying with RSEM is still a better (albeit slower) option.
English
2
0
5
0
Pierre Bourguet
Pierre Bourguet@p_bourguet·
@a_dobin Glad to see that multimappers are getting some love! Did you implement the multimapper quantification options only for single cell or also with the bulk version?
English
1
0
2
0
Alex Dobin
Alex Dobin@a_dobin·
@hypercompetent @lpachter The answer to “why Kallisto to CellRanger correlation is lower in our calculation” is simple. We used Spearman correlation, while they used Pearson. Pearson correlation, of course, can be inflated by various artifacts and is not a good choice for RNA-seq data. 3/3
English
0
0
1
0
Alex Dobin
Alex Dobin@a_dobin·
@hypercompetent @lpachter I am still not sure what’s the point of Lior’s toy example. Should we not use correlation as a metric at all? Then why was it used in Kallisto paper? 2/3
English
1
0
1
0
Alex Rosenberg
Alex Rosenberg@dna_rosenberg·
Looks really interesting. It’s amazing to me the impact @a_dobin has had on the field, especially RNA-seq and scRNA-seq. I’ve been using STAR for years now and we rely heavily on it in @parsebio’s single cell analysis pipeline. A truly incredible tool
Alex Dobin@a_dobin

STARsolo preprint is out on bioRxiv: biorxiv.org/content/10.110… STAR release 2.7.9a: github.com/alexdobin/STAR… The major new feature is quantification of multi-gene (multi-mapping) reads/UMIs, which are necessary to detect expression from overlapping genes and paralogs. 1/5

English
1
0
7
0