Defining T cell clonality from RNA-seq

In a recent collaboration with Greg Hapgood and Kerry Savage, we have shown that shallow RNA-seq data is sufficient to identify cell populations that contain T cell clonal expansions.

This work extends our efforts to extract T cell receptor sequences from RNA-seq. In our initial paper, we attempted to find T cell receptor sequences using RNA-seq on solid tumors. The T cells were a rare cell type, and thus the T cell receptor transcripts were also rare, contributing to the overall low yield. In the present study, we are looking at RNA-seq on sorted T cell populations, enabling us to greatly increase our yield.

Using only shallow RNA-seq (~80 samples pooled on a single HiSeq lane) we were clearly able to distinguish control samples (containing a diverse polyclonal population of T cells) from aberrant samples (an aberrant immunophenotype was observed in the cell surface markers, and all cells appeared to share the identical T cell receptor, signifying they are clonally expanded and likely malignant). Perhaps most interesting is the subset of cases which appeared immunophenotypically normal, yet clearly contained a clonally expanded population of cells by T cell receptor analysis, demonstrating the increased sensitivity of RNA-seq over flow cytometry.

Read more about it here:

Working with what you’ve got

T cell receptor (TCR) profiling is something our lab is very familiar with. However, we had always performed or analyzed specialized TCR-seq experiments to obtain this information. As part of my project analyzing TCGA data, we wanted to see if we could obtain TCR profiles directly from the RNA-seq data. In theory, since the tissue sample used for RNA-seq is heterogeneous, and contains infiltrating T cells, their TCR transcripts should have been captured in the RNA-seq libraries. We tested an existing tool, MiTCR, on some RNA-seq files and found that we were able to find TCR sequences, though there was a high false-positive rate. After some rigorous optimization procedures using negative control datasets from various cell lines, and positive control datasets from in silico recombined TCR sequences, we were able to reliably extract the most abundant TCR sequences from RNA-seq. We applied this analysis to over 7,000 TCGA tumours and found that abundant TCRs found in multiple tumours were more likely to be public TCRs (those recognizing viral antigens) than TCRs unique to a single tumour. However, while this didn’t reveal new TCR-pMHC interactions, it does open the door to analyzing tens of thousands of existing RNA-seq datasets which were not created with immunological questions in mind.


Abstract: Deep sequencing of recombined T cell receptor (TCR) genes and transcripts has provided a view of T cell repertoire diversity at an unprecedented resolution. Beyond profiling peripheral blood, analysis of tissue-resident T cells provides further insight into immune-related diseases. We describe the extraction of TCR sequence information directly from RNA-sequencing data from 6738 tumor and 604 control tissues, with a typical yield of 1 TCR per 10 million reads. This method circumvents the need for PCR amplification of the TCR template and provides TCR information in the context of global gene expression, allowing integrated analysis of extensive RNA-sequencing data resources.

Everybody remembers their first

My first paper! And first author too!

I joined this project started by Rene Warren and Rob Holt in the summer of 2013. Rene had built a lovely pipeline to process TCGA data, and I came on between my undergrad and grad degrees as a summer student to play with and analyze the resulting data. After filtering the processed data to maximize the chance of predicting truly immunogenic mutations (high expression of presenting HLA, high expression of mutated gene, strong binding of mutated peptide to MHC), we found a striking association between the number of predicted immunogenic mutations in a tumour, and the level of T cell infiltrate in a tumour. It was already known that patients with tumours having higher levels of T cell infiltrate had better overall survival, so this result supported the idea that these infiltrating T cells were recognizing tumour neoantigens. Despite these tumours having high numbers of immunogenic mutations, and high numbers of T cells, we also showed that they had high expression of PDCD1 and CTLA-4, which suggest that these T cells may be inhibited. This would explain why these patients, despite having good infiltrate and T cell targets, still had cancer.


Abstract: Somatic missense mutations can initiate tumorogenesis and, conversely, anti-tumor cytotoxic T cell (CTL) responses. Tumor genome analysis has revealed extreme heterogeneity among tumor missense mutation profiles, but their relevance to tumor immunology and patient outcomes has awaited comprehensive evaluation. Here, for 515 patients from six tumor sites, we used RNA-seq data from The Cancer Genome Atlas to identify mutations that are predicted to be immunogenic in that they yielded mutational epitopes presented by the MHC proteins encoded by each patient’s autologous HLA-A alleles. Mutational epitopes were associated with increased patient survival. Moreover, the corresponding tumors had higher CTL content, inferred from CD8A gene expression, and elevated expression of the CTL exhaustion markers PDCD1 and CTLA4. Mutational epitopes were very scarce in tumors without evidence of CTL infiltration. These findings suggest that the abundance of predicted immunogenic mutations may be useful for identifying patients likely to benefit from checkpoint blockade and related immunotherapies.