Adaptive Immunity & Cancer Neoantigens: a primer

Here I attempt to explain the basics of adaptive immunity and cancer neoantigens in a generally accessible way. Not being formally trained in immunology, I have striven to just cover the key concepts required for understanding the basic principals behind cancer neoantigens.

Adaptive Immunity

The adaptive immune system evolved as a method of detecting and destroying foreign pathogens, such as virus and bacteria. One of the key principles behind the adaptive immune system is that no prior knowledge of these foreign invaders is required; as a system it is prepared to handle any potential foreign entity.

One of the key players in the adaptive immune system is the T cell, named due to its maturation in the thymus. T cells have receptors on their cell surface called T cell receptors (TCRs) which are what recognize foreign proteins. All cells in the human body have identical copies of your DNA (well, aside from sperm and egg cells, but we will ignore those). All cells, except for T cells. During T cell development, the genes in the TCR locus get rearranged, or mixed up, creating diversity in this region. In addition to cutting and pasting genes around, nucleotides get randomly deleted and added, further increasing the diversity. All this mixing up results in a TCR which is essentially unique to that specific cell. Importantly, any T cell which would react to a self-protein (something non-foreign) is destroyed during development. This results in each person having a repertoire of over 1,000,000 different T cells each with different TCRs moving through their body, constantly looking for foreign proteins.

What is it that T cells actually recognize? In every cell of the body, genes in the DNA are expressed as proteins (chains of amino acids, typically hundreds to thousands of amino acids) inside the cell. Proteins do not last forever; there is an equilibrium that exists between proteins being created and degraded in the cell, so all proteins will eventually be broken down into short peptide fragments (shorter chains of amino acids, typically tens of amino acids). These peptides are transported into the endoplasmic reticulum where they are shortened some more, and may bind to waiting MHC molecules. MHC molecules have the job of presenting these short peptides (at this point 8-11 amino acids in length) to T cells. Once a peptide has bound to an MHC molecule, the peptide-MHC complex moves to the surface of the cell. Since, in the example shown below, the peptide is derived from a self-protein, T cells will survey the peptide-MHC complex, but none will recognize it as foreign, and the presenting cell gets left alone.

A quick note on MHC molecules: these are encoded by the most polymorphic regions of the genome (the HLA locus), resulting in thousands of different MHC molecules existing in Earth’s population. Each individual will have at most six of these different variants, and each variant can present only a subset of all peptides.

Returning to our cell example, if this cell is infected by a virus, this cell will now contain foreign proteins. Like self-proteins, these proteins will be broken down, transported into the endoplasmic reticulum, and may bind to an MHC molecule. When this foreign peptide-MHC complex is transported to the cell surface, one of the many T cells in the repertoire will recognize this peptide as foreign, and will initiate killing of the presenting cell.

After the T cell has killed the virally-infected cell, it will replicate itself, making more cells having that same TCR, allowing there to be a larger attack force of T cells able to hunt down other cells that have been infected by this same type of virus.

Cancer Neoantigens

Cancer is a disease of the genome. It is characterized by changes to cell’s DNA. Some of these changes are “silent” – they do not effect changes in proteins. However, many are “non-silent” – they cause a change in the protein sequence of self-proteins.

A subset of these mutations will be present in the broken down peptides that bind to the MHC molecules. These will be presented on the surface of the cell, and, due to the mutation, have the potential to be identified as foreign by T cells.

Importantly, these mutations are only present on cancer cells, so the immune response driven by the T cells will be specific for these cancer cells and should leave the rest of the normal cells alone.

This is the basis for many cancer immunotherapies, including:

  • Checkpoint blockade – helping existing T cells perform their attack
  • Cancer vaccines – vaccinating with the mutant peptides to “show” the T cells what to look for
  • Autologous T cell therapies – isolating T cells from a patient and selectively replicating the one(s) that recognize the cancer mutations before re-administering them into the same patient

Cancers can have many hundreds of mutations, so it becomes challenging to identify which subset of those mutations would make the best targets for these types of therapies. This is an active field of research, and one I am involved in.

An overly complex, yet elegant, solution to apartment building enterphone limitations

Sometimes, I over-engineer things. I get it from my father.

My girlfriend and I recently moved into an apartment building. Oddly, while the enterphone system is advanced enough to allow calling to cellphones (rather than landlines wired into the building), it only allows a single number per apartment. Ninety percent of the time this is just fine, but if one of us is unreachable (at work during the day, travelling, etc), the other loses the convenience of buzzing a visitor into the building.

I was determined to find a solution!

I came across a service called Twilio, which allows interactions with a phone number to be directed to a webserver and handled as you specify using their API. A lightbulb went on in my head – this could work perfectly!

I set up an account, acquired a local phone number, and got to testing. The Twilio service allows you to set up and run simple code (in TwiML, a variant of XML) directly from their console, but for more complex solutions you must point the inbound calls to a web server. I decided to finally give Amazon AWS a try. A free account and some brief setup, and I was in business!

When a visitor enters our buzzer number, the system makes an outbound call to the number we provided and allows us to talk to the front door and dial a key to unlock the door. My first strategy was to have this outbound call caught by our webserver, and have it ask the visitor who they wanted to contact (“Press 1 for me, 2 for her”). Unfortunately, after testing we learned that the enterphone does not allow further key presses once the call has been made.

Strategy number 2 was to take advantage of Twilio’s speech recognition functionality. Instead of having visitors press a key, they just needed to say either of our names. Unfortunately, the speaker on the enterphone is very quiet, and with loud traffic nearby, the visitor would be unlikely to hear the instructions, and Twilio had a hard time understanding what was being said. A more passive solution was required.

Finally I settled on using the Conference feature. When a visitor buzzes us, the call is put into a conference room on hold:

<?xml version="1.0" encoding="UTF-8"?>
        <Conference startConferenceOnEnter="false" waitMethod="GET" waitUrl="ring_loop_compressed.mp3">Buzzer conference</Conference>

This initiates two calls, one to myself and one to my girlfriend. When one of us picks up, we are served the following:

<?xml version="1.0" encoding="UTF-8"?>
    <Gather action="/accepted" method="POST" numDigits="1" timeout="5">
        <Say>Ding dong. Press 1.</Say>

Pressing 1 accepts the call, ends the parallel outgoing call to the other person (to avoid empty voicemails or dead air), and patches you in to the conference. We can then proceed as usual, finding out who is there and letting them in.

The icing on the cake? The “on hold” music is the sound of a phone ringing, so they don’t even know anything fancy is going on!

And it all happens very quickly; there is about 1 ring’s worth of delay compared to having the enterphone dial our number directly.

There you have it – an over-engineered solution to a simple, yet frustrating problem. And in case you were wondering, my dad is proud.

Defining T cell clonality from RNA-seq

In a recent collaboration with Greg Hapgood and Kerry Savage, we have shown that shallow RNA-seq data is sufficient to identify cell populations that contain T cell clonal expansions.

This work extends our efforts to extract T cell receptor sequences from RNA-seq. In our initial paper, we attempted to find T cell receptor sequences using RNA-seq on solid tumors. The T cells were a rare cell type, and thus the T cell receptor transcripts were also rare, contributing to the overall low yield. In the present study, we are looking at RNA-seq on sorted T cell populations, enabling us to greatly increase our yield.

Using only shallow RNA-seq (~80 samples pooled on a single HiSeq lane) we were clearly able to distinguish control samples (containing a diverse polyclonal population of T cells) from aberrant samples (an aberrant immunophenotype was observed in the cell surface markers, and all cells appeared to share the identical T cell receptor, signifying they are clonally expanded and likely malignant). Perhaps most interesting is the subset of cases which appeared immunophenotypically normal, yet clearly contained a clonally expanded population of cells by T cell receptor analysis, demonstrating the increased sensitivity of RNA-seq over flow cytometry.

Read more about it here:

Cell Symposia: Technology. Biology. Data Science.

I’m currently on my way back to Vancouver after a few days in Berkeley, California. I was here for a conference on data science. This was the first more “data science-y” conference I have attended, in contrast to the usual cancer immunology conferences.

The talks were fantastic, and very interesting. Even if the techniques were not being applied to the same biological problem I work on, the techniques themselves were interesting to learn about.

I gave a talk on our work predicting neoantigens and extracting TCRs from TCGA tumour sequence data. I’ve uploaded the talk for any interested parties, it can be found here: scottbrown_tbds_presentation_web_161010

Looking forward to more conferences like this one!

Fantastic review article on computational genomics for tumour-immune interactions

I came across this review article today, published earlier this month in Nature Reviews Genetics: Hackl et al. Computational genomics tools for dissection tumour-immune cell interactions (

It provides a current state of the field of computational biology in cancer immunology, and is a great primer on my PhD thesis!

Working with what you’ve got

T cell receptor (TCR) profiling is something our lab is very familiar with. However, we had always performed or analyzed specialized TCR-seq experiments to obtain this information. As part of my project analyzing TCGA data, we wanted to see if we could obtain TCR profiles directly from the RNA-seq data. In theory, since the tissue sample used for RNA-seq is heterogeneous, and contains infiltrating T cells, their TCR transcripts should have been captured in the RNA-seq libraries. We tested an existing tool, MiTCR, on some RNA-seq files and found that we were able to find TCR sequences, though there was a high false-positive rate. After some rigorous optimization procedures using negative control datasets from various cell lines, and positive control datasets from in silico recombined TCR sequences, we were able to reliably extract the most abundant TCR sequences from RNA-seq. We applied this analysis to over 7,000 TCGA tumours and found that abundant TCRs found in multiple tumours were more likely to be public TCRs (those recognizing viral antigens) than TCRs unique to a single tumour. However, while this didn’t reveal new TCR-pMHC interactions, it does open the door to analyzing tens of thousands of existing RNA-seq datasets which were not created with immunological questions in mind.


Abstract: Deep sequencing of recombined T cell receptor (TCR) genes and transcripts has provided a view of T cell repertoire diversity at an unprecedented resolution. Beyond profiling peripheral blood, analysis of tissue-resident T cells provides further insight into immune-related diseases. We describe the extraction of TCR sequence information directly from RNA-sequencing data from 6738 tumor and 604 control tissues, with a typical yield of 1 TCR per 10 million reads. This method circumvents the need for PCR amplification of the TCR template and provides TCR information in the context of global gene expression, allowing integrated analysis of extensive RNA-sequencing data resources.

Everybody remembers their first

My first paper! And first author too!

I joined this project started by Rene Warren and Rob Holt in the summer of 2013. Rene had built a lovely pipeline to process TCGA data, and I came on between my undergrad and grad degrees as a summer student to play with and analyze the resulting data. After filtering the processed data to maximize the chance of predicting truly immunogenic mutations (high expression of presenting HLA, high expression of mutated gene, strong binding of mutated peptide to MHC), we found a striking association between the number of predicted immunogenic mutations in a tumour, and the level of T cell infiltrate in a tumour. It was already known that patients with tumours having higher levels of T cell infiltrate had better overall survival, so this result supported the idea that these infiltrating T cells were recognizing tumour neoantigens. Despite these tumours having high numbers of immunogenic mutations, and high numbers of T cells, we also showed that they had high expression of PDCD1 and CTLA-4, which suggest that these T cells may be inhibited. This would explain why these patients, despite having good infiltrate and T cell targets, still had cancer.


Abstract: Somatic missense mutations can initiate tumorogenesis and, conversely, anti-tumor cytotoxic T cell (CTL) responses. Tumor genome analysis has revealed extreme heterogeneity among tumor missense mutation profiles, but their relevance to tumor immunology and patient outcomes has awaited comprehensive evaluation. Here, for 515 patients from six tumor sites, we used RNA-seq data from The Cancer Genome Atlas to identify mutations that are predicted to be immunogenic in that they yielded mutational epitopes presented by the MHC proteins encoded by each patient’s autologous HLA-A alleles. Mutational epitopes were associated with increased patient survival. Moreover, the corresponding tumors had higher CTL content, inferred from CD8A gene expression, and elevated expression of the CTL exhaustion markers PDCD1 and CTLA4. Mutational epitopes were very scarce in tumors without evidence of CTL infiltration. These findings suggest that the abundance of predicted immunogenic mutations may be useful for identifying patients likely to benefit from checkpoint blockade and related immunotherapies.