Working with what you’ve got

T cell receptor (TCR) profiling is something our lab is very familiar with. However, we had always performed or analyzed specialized TCR-seq experiments to obtain this information. As part of my project analyzing TCGA data, we wanted to see if we could obtain TCR profiles directly from the RNA-seq data. In theory, since the tissue sample used for RNA-seq is heterogeneous, and contains infiltrating T cells, their TCR transcripts should have been captured in the RNA-seq libraries. We tested an existing tool, MiTCR, on some RNA-seq files and found that we were able to find TCR sequences, though there was a high false-positive rate. After some rigorous optimization procedures using negative control datasets from various cell lines, and positive control datasets from in silico recombined TCR sequences, we were able to reliably extract the most abundant TCR sequences from RNA-seq. We applied this analysis to over 7,000 TCGA tumours and found that abundant TCRs found in multiple tumours were more likely to be public TCRs (those recognizing viral antigens) than TCRs unique to a single tumour. However, while this didn’t reveal new TCR-pMHC interactions, it does open the door to analyzing tens of thousands of existing RNA-seq datasets which were not created with immunological questions in mind.

Link: http://genomemedicine.biomedcentral.com/articles/10.1186/s13073-015-0248-x

Abstract: Deep sequencing of recombined T cell receptor (TCR) genes and transcripts has provided a view of T cell repertoire diversity at an unprecedented resolution. Beyond profiling peripheral blood, analysis of tissue-resident T cells provides further insight into immune-related diseases. We describe the extraction of TCR sequence information directly from RNA-sequencing data from 6738 tumor and 604 control tissues, with a typical yield of 1 TCR per 10 million reads. This method circumvents the need for PCR amplification of the TCR template and provides TCR information in the context of global gene expression, allowing integrated analysis of extensive RNA-sequencing data resources.