Getting Started

Download & Installation

Installation for both the Python and R packages is performed in the usual manner.

To install pRESTO and Change-O from PyPI:

> pip3 install presto changeo --user

To install Alakazam, SHazaM, TIgGER and SCOPer from CRAN:

> R
> install.packages(c("alakazam", "shazam", "tigger", "scoper"))

Alternatively, a complete installation of the Immcantation framework and its dependencies is available as a Docker container. Installation of the container is described in Docker Container Installation and Overview and basic usage is described in Using the Container.

Overview of B Cell Repertoire Analysis

Yaari and Kleinstein. Practical guidelines for B-cell receptor repertoire sequencing analysis. Genome Medicine. 7, 121 (2015). doi:10.1186/s13073-015-0243-2

Immcantation Tutorials

Each tool in the framework has its own documentation site, with detailed usage information and examples. A good starting point to familiarize yourself with the framework is to follow one the tutorials listed here.

Introductory Webinar and Jupyter Notebook

For a detailed use example for each Immcantation tool see the Jupyter notebook from our introductory webinar in the repository. If you don’t want to execute the Jupyter notebook yourself, you can explore a website version of it here. This webinar covers:

  • V(D)J gene annotation and novel polymorphism detection

  • Inference of B cell clonal relationships

  • Diversity analysis

  • Mutational load profiling

  • Modeling of somatic hypermutation (SHM) targeting

  • Quantification of selection pressure

Single-cell Analysis

For information on how to process 10x Genomics data to be analyzed with Immcantation, we offer an introductory tutorial for new users:

10x Genomics V(D)J Sequence Analysis Tutorial

Overview

This tutorial is a basic walkthrough for defining B cell clonal families and building B cell lineage trees using 10x Genomics BCR sequencing data. It is intended for users without prior experience with Immcantation. If you are familiar with Immcantation, then this page may be more useful.

Knowledge of basic command line usage is assumed. Please check out the individual documentation sites for the functions detailed in this tutorial before using them on your own data. For simplicity, this tutorial will use the Immcantation Docker image which contains all necessary software. It is also possible to install the packages being used separately (see pRESTO, Change-O, and Alakazam).

Please contact us if you have any questions.

Getting started

First, download and unzip the example data. It represents the Ig V(D)J sequences from CD19+ B cells isolated from PBMCs of a healthy human donor, and is based on data provided by 10x Genomics under a Creative Commons Attribute license, and processed with their Cell Ranger pipeline.

Second, install Docker (if you don’t have it already) and download the Immcantation Docker image. For some operating systems, it may be necessary to use super-user privileges (sudo), and/or to have Docker Desktop running before entering the following commands.

In a terminal, enter:

# download the current Immcantation Docker image (may take a few minutes)
docker pull immcantation/suite:4.2.0

Within the terminal, move to the directory where you’ve placed the example data using the command cd. Load the current directory into the Docker image:

# Linux/Mac OS X
docker run -it --workdir /data -v $(pwd):/data:z immcantation/suite:4.2.0 bash

# Windows
docker run -it --workdir /data -v %cd%:/data:z immcantation/suite:4.2.0 bash

After running the previous command, you’ll now be in the mounted /data folder inside the container. To check that everything is properly configured, enter the following commands:

BuildTrees.py --version
# should return BuildTrees.py: 1.0.0 2020.05.01

ls
# should show filtered_contig_annotations.csv and filtered_contig.fasta, possibly others

If the first command doesn’t return the expected output, you probably aren’t inside the right (or any) Docker container. If the second doesn’t return the expected output, you may not be running the Docker image from the correct directory. Exit the image by typing exit then try again by navigating to the proper directory and rerunning the command above to enter the Docker image again.

Assign V, D, and J genes and define clonal groups

Most of the processing for 10x Genomics data can be handled by the changeo-10x script supplied in the Docker container. This script will automatically:

To run this script on the example dataset, enter the following command in the Docker container (the \ just indicates a new line for visual clarity):

# Run 10x Genomics processing script
changeo-10x -s filtered_contig.fasta -a filtered_contig_annotations.csv -o . \
    -g human -t ig -x 0.1

The -o option refers to the output directory of the processing. The -s and -a options refer to the sequence and sequence annotation file outputs from Cell Ranger respectively. The -g option indicates species and the -t option indicates the type of receptor. The -x option specifies junction distance threshold used for assigning sequences into clonal clusters.

This script will create the following files (in addition to filtered_contig_annotations.csv and filtered_contig.fasta):

  • filtered_contig_db-pass.tsv

  • filtered_contig_heavy_productive-F.tsv

  • filtered_contig_heavy_germ-pass.tsv

  • filtered_contig_igblast.fmt7

  • filtered_contig_light_productive-F.tsv

  • filtered_contig_light_productive-T.tsv

  • filtered_contig_threshold-plot.pdf

  • temp_files.tar.gz

It will also create a /logs directory containing:

  • clone.log

  • germline.log

  • pipeline-10x.err

  • pipeline-10x.log

For a full listing of script options, see the 10x Genomics V(D)J annotation pipeline. It is also important to note that this pipeline uses the standard IMGT reference database of human alleles. To infer novel alleles and subject-specific genotypes, which would result in more accurate assignments, see TIgGER.

Define clonal groups manually

Clonal groups are B cells that descend from a common naive B cell ancestor. To group sequences into inferred clonal groups, we cluster BCR sequences that have the same heavy chain V and J genes and same junction length. We next cluster sequences with similar junction regions, using either a defined sequence distance cutoff, or an adaptive threshold (SCOPer). When available, we can also split clonal groups that have differing light chain V and J genes.

In the previous section, we used a predefined clonal clustering threshold of 0.1 using the -x option in the changeo-10x script. This is not appropriate for all datasets. The current best practice is to find the appropriate threshold for a given dataset, which can be done automatically in the changeo-10x script by specifying -x auto. However, using -x auto to assign clones doesn’t always work (e.g. if there weren’t enough clones to generate a bimodal distance to nearest plot). If this command fails, there are other options for manually defining clones from the file filtered_contig_heavy_productive-T.tsv. If changeo-10x is run successfully above, this file will be in temp_files.tar.gz. Otherwise it will be in the current working directory.

The first is by inspecting a plot of sequence distances. This is supplied in the file filtered_contig_threshold-plot.pdf. You can then define clones manually using the chosen threshold (e.g. 0.09):

# define heavy chain clones
DefineClones.py -d filtered_contig_heavy_productive-T.tsv --act set --model ham \
    --norm len --dist 0.09 --outname filtered_contig_heavy

If the sequence distance plot is not bimodal, it may be more appropriate to instead use SCOPer to assign clones using an adaptive threshold. In order to be able to directly copy/paste the commands provided in this tutorial, be sure to rename the output file filtered_contig_heavy_clone-pass.tsv (to match the output of DefineClones.py).

Once we have defined clonal groups using heavy chains, we can split these groups based on whether or not they have differing light chain V and J genes:

# split heavy chain clones with different light chains
light_cluster.py -d filtered_contig_heavy_clone-pass.tsv -e filtered_contig_light_productive-T.tsv \
    -o filtered_contig_heavy_clone-light.tsv

We can also reconstruct the heavy chain germline V and J genes (using the output file from the previous command):

# reconstruct heavy chain germline V and J sequences
CreateGermlines.py -d filtered_contig_heavy_clone-light.tsv -g dmask --cloned \
    -r /usr/local/share/germlines/imgt/human/vdj/imgt_human_IGHV.fasta \
    /usr/local/share/germlines/imgt/human/vdj/imgt_human_IGHD.fasta \
    /usr/local/share/germlines/imgt/human/vdj/imgt_human_IGHJ.fasta \
    --outname filtered_contig_heavy

This results in the file filtered_contig_heavy_germ-pass.tsv which contains heavy chain sequence information derived from filtered_contig_heavy_clone-light.tsv with an additional column clone_id specifying the clonal group of the sequence.

Build lineage trees

Lineage trees represent the series of shared and unshared mutations leading from clone’s germline sequence to the observed sequence data. There are multiple ways of building and visualizing these trees. Currently the simplest way within Immcantation is to use Alakazam, which is built around building maximum parsimony trees using PHYLIP. Alternatively, you can use IgPhyML, which builds maximum likelihood trees with B cell specific models. Here we use IgPhyML.

To run IgPhyML from within the Docker container, use the BuildTrees.py script:

BuildTrees.py -d filtered_contig_heavy_germ-pass.tsv --minseq 3 --clean all \
    --igphyml --collapse --nproc 2 --asr 0.9

This will remove clones with fewer than 3 unique sequences (--minseq 3), run IgPhyML (--igphyml) parallelized across 2 cores (--nproc 2) and collapse identical sequences (--collapse). It will also reconstruct the maximum likelihood intermediate sequences for each node (--asr 0.9). The number following --asr controls the amount of reported model uncertainty (range from 0-1, 0.9 recommended). --clean all deletes all intermediate files from this operation. This is a computationally intensive task and may take a few minutes.

The following commands in this section are meant to be entered into an R session. Open R within the Docker container using the command R. Once inside the R session, load the appropriate libraries and read in the data:

library(alakazam)
library(ape)
library(dplyr)

# read in the data
db <- readIgphyml("filtered_contig_heavy_germ-pass_igphyml-pass.tab", format="phylo",
      branches="mutations")

Once built, we can visualize these trees using the R package ape. Here, we only visualize the largest tree using the default parameters. However, there are many ways to make more lineage tree plots, as detailed in Alakazam’s lineage vignette. Enter into the R session and save the largest tree as a png image:

png("graph.png",width=8,height=6,unit="in",res=300)
plot(db$trees[[1]],show.node.label=TRUE)
add.scale.bar(length=5)
dev.off()
graph

Lineage tree of example clone 1.

The internal nodes of this tree represent inferred intermediate sequences, while the edge lengths represent the expected number of heavy chain mutations between the nodes (see scale bar to left). If you prefer more graph-based trees, these are also detailed in Alakazam’s lineage vignette.

The reconstructed intermediate sequences for each node shown in the tree are available in the file filtered_contig_heavy_germ-pass_igphyml-pass_hlp_asr.fasta. Each possible codon has a certain probability of occuring at each site in the sequence. The number following --asr in BuildTrees specifies the probability interval desired for each site. For instance, if --asr 0.8 and the relative probability of codon ATG is 0.5 and ATA is 0.4, IgPhyML would return ATR. The R is the IUPAC ambiguous nucleotide for A and G. These characters represent ambiguity in the reconstruction, and are particularly common in the CDR3 region:

>0_7
CAGGTGCAGCTGGTGCAATCTGGGTCTGAGTTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGACTTCTGGATACACCTTCASTGACTATGGTGTGAACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACGCCTACACCGGGAACCCAACGTATGCCCAGGGCTTCACAGGACGGTTTGTCTTCTCCTTGGACACCTCTGTCCGCACGGCATATCTGCAGATCAGCAGCCTGAAGGCTGAGGACACTGCCGTGTATTACTGTGCGATTATCCATGATAGTAGTACYTGGAGTCCTTTTGACTACTGGGGCCAGGGAGCCCTGGTCACCGTCTCCTCAGNN

Merge Cell Ranger annotations

As detailed in the Change-O reference, it is also possible to directly merge Change-O data tables with annotation information from the Cell Ranger pipeline.

Other Immcantation Training Resources

Other training material in using Immcantation is available, such as the slides and example data from our introductory webinar series. The webinar is available as a Jupyter notebook and an interactive website.

Vignettes

Detailed usage documentation and tutorials for each individual tool in Immcantation are provided in the main documentation pages for each tool. The following list of shortcuts cover common analyses. Note, each link will leave the Immcantation portal page.

Data Standards

Immcantation supports both the original Change-O standard and the new Adaptive Immune Receptor Repertoire (AIRR) standard developed by the AIRR Community (AIRR-C). Both standards use tab-delimited file formats with sets of specific predefined column names.

Change-O Standard

The Change-O format is the original data format developed to enable the integration of multiple tools in the Immcantation framework. It is described in detail (along with the corresponding AIRR-C Standard equivalents) in the Change-O package documentation.

Gupta NT*, Vander Heiden JA*, Uduman M, Gadala-Maria D, Yaari G, Kleinstein SH. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 2015; doi: 10.1093/bioinformatics/btv359

AIRR Community Standard

The default file format for all functions in Immcantation is the AIRR-C format as of release 4.0.0. To learn more about this format (including the valid field names and their expected values), visit the AIRR-C Rearrangement Schema documentation. The Change-O package documentation contains a table with mappings between both standards. Some of the most frequently used translations are:

AIRR

Change-O

sequence_id

SEQUENCE_ID

sequence

SEQUENCE_INPUT

sequence_alignment

SEQUENCE_IMGT

productive

FUNCTIONAL

v_call

V_CALL

d_call

D_CALL

j_call

J_CALL

junction_length

JUNCTION_LENGTH

junction

JUNCTION

germline_alignment

GERMLINE_IMGT

clone_id

CLONE

Vander Heiden et al AIRR Community Standardized Representations for Annotated Immune Repertoires. Frontiers in Immunology. 9, 2206 (2018). doi:10.3389/fimmu.2018.02206

Potential Workflow Changes

Release 4.0.0 introduces two main changes that can potentially break existing Immcantation workflows. In this section, we explain these changes, give solutions and provide an example to show how to update workflows in order to properly work with release 4.0.0.

The first change is the adoption of the AIRR Standard as the default format expected by the tools (note that Change-O is still available as an option). The default values in all functions and pipelines have been adjusted to use this standard. Users upgrading to 4.0.0 may find that workflows that relied upon default values now fail. The solution is to review the workflow and specify the correct values for the data format being used.

The second change that can break workflows is that all outputs now use lowercase column names for style consistency with the AIRR Standard format. This means that user workflows that expect columns to be in uppercase will now break. The solution is to update the code to use the current lowercase values.

The following R-based example demonstrates how to fix broken workflows as a result of these two changes:

> library(alakazam)
> library(shazam)

# alakazam provides an example dataset in Change-O format
> db <- ExampleDbChangeo

# Inspect the column names
> colnames(db)
[1] "SEQUENCE_ID"          "SEQUENCE_IMGT"        "GERMLINE_IMGT_D_MASK"
[4] "V_CALL"               "V_CALL_GENOTYPED"     "D_CALL"
[7] "J_CALL"               "JUNCTION"             "JUNCTION_LENGTH"
[10] "NP1_LENGTH"           "NP2_LENGTH"           "SAMPLE"
[13] "ISOTYPE"              "DUPCOUNT"             "CLONE"

# CHANGE 1: default values follow the AIRR Standard specification
> db <- distToNearest(db)
Error in distToNearest(db) : The column junction was not found

# As of release 4.0.0, the `distToNearest` command above doesn't work if the input data
# is in Change-O format because the default values are now AIRR Standard values:
#    sequenceColumn="junction"
#    vCallColumn="v_call"
#    jCallColumn="j_call"
# These values don't match the column names in `db` as previously seen, so the command doesn't work

# The solution is to specify the actual column names:
> db <- distToNearest(db, sequenceColumn="JUNCTION",
                        vCallColumn="V_CALL",
                        jCallColumn="J_CALL")
> colnames(db)
[1] "SEQUENCE_ID"          "SEQUENCE_IMGT"        "GERMLINE_IMGT_D_MASK"
[4] "V_CALL"               "V_CALL_GENOTYPED"     "D_CALL"
[7] "J_CALL"               "JUNCTION"             "JUNCTION_LENGTH"
[10] "NP1_LENGTH"           "NP2_LENGTH"           "SAMPLE"
[13] "ISOTYPE"              "DUPCOUNT"             "CLONE"
[16] "dist_nearest"

# CHANGE 2: outputs are generated using lower case
> threshold <- findThreshold(db$DIST_NEAREST)
Error in h.ucv.default(unique(distances), 4) :
  argument 'x' must be numeric and need at least 3 data points
In addition: Warning message:
Unknown or uninitialized column: 'DIST_NEAREST'.

# In previous releases, `distToNearest` added the column `DIST_NEAREST` to `db`.
# As of release 4.0.0, it adds `dist_nearest`, so the command above
# doesn't work, because `db` doesn't have a column named `DIST_NEAREST`

# The solution is to update the function call to use the correct name:
> threshold <- findThreshold(db$dist_nearest)

Convert between Change-O and AIRR-C format

The default file format is the AIRR-C format. However, Immcantation provides a script ConvertDb in Change-O package to convert the file from AIRR-C format to the legacy Change-O standard. For example, to convert a file named sample1_airr.tsv in AIRR-C format to Change-O format, you can run:

> ConvertDb changeo -d sample1_airr.tsv -o sample1_changeo.tab

The output file sample1_changeo.tab is in Change-O format.

In a similar way, you can also use ConvertDb to convert a Change-O file to an AIRR-C file:

> ConvertDb airr -d sample1_changeo.tab -o sample1_airr.tsv

Contact & Cite

Contact Information

If you have questions, you can email the Immcantation Group.

For additional computational immunology software from the Kleinstein Lab, see our website.

Authors

Jason A. Vander Heiden, Namita T. Gupta, Mohamed Uduman, Daniel Gadala-Maria, Susanna Marquez, Julian Zhou, Ruoyi Jiang, Ang Cui, Nima Nouri, Kenneth Hoehn, Edel Aron, Hailong Meng, Chris R. Bolen, Gur Yaari, Steven H. Kleinstein

How to Cite

To cite the pRESTO software package in publications, please use:

Vander Heiden JA*, Yaari G*, Uduman M, Stern JNH, O’Connor KC, Hafler DA, Vigneault F, Kleinstein SH. pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics 30, 1930-2 (2014). doi:10.1093/bioinformatics/btu138

To cite the Change-O, Alakazam, SHazaM and TIgGER software package in publications, please use:

Gupta NT*, Vander Heiden JA*, Uduman M, Gadala-Maria D, Yaari G, Kleinstein SH. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 31, 3356-8 (2015). doi:10.1093/bioinformatics/btv359

Additional citations for specific methods within Alakazam, SHazaM and TIgGER may be determined using the citation() function within R.

Contributing

We welcome contributions to all components of the Immcantation framework through pull request to the relevant Bitbucket repository:

All Immcantation core software packages are under the free and open-source license AGPL-3. Other core elements, including, but not limited to, documentation and tutorials, are under CC BY-SA 4.0. Contributed packages are subject to their licenses.

For details on documentation, coding style, and other conventions see the CONTRIBUTING.md file on Bitbucket.

Docker Container Installation and Overview

We have provided a complete installation of the Immcantation framework, its dependencies, accessory scripts, and IgBLAST in a Docker image. The image also includes both the IgBLAST and IMGT reference germline sets, as well as several template pipeline scripts. The image is available on Docker Hub at:

immcantation/suite

Images are versioned through tags with images containing official releases denoted by meta-version numbers (x.y.z). The devel tag denotes the latest development (unstabled) builds.

Getting the Container

Requires an installation of Docker 1.9+ or Singularity 2.3+.

Docker

# Pull release version 4.2.0
docker pull immcantation/suite:4.2.0

# Pull the latest development build
docker pull immcantation/suite:devel

Our containers are Linux-based, so if you are using a Windows computer, please make sure that you are using Linux containers and not Windows containers (this can be changed in Docker Desktop and won’t affect your existing containers).

Singularity

# Pull release version 4.2.0
IMAGE="immcantation_suite-4.2.0.sif"
singularity build $IMAGE docker://immcantation/suite:4.2.0

The instructions to use containers from Docker Hub with Singularity can be slightly different for different versions of Singularity. If the command shown above doesn’t work for you, please visit Singularity Documentation and look for the specific command for your Singularity version under Build a container.

What’s in the Container

Accessory Scripts

The following accessory scripts are found in /usr/local/bin:

fastq2fasta.py

Simple FASTQ to FASTA conversion.

fetch_phix.sh

Downloads the PhiX174 reference genome.

fetch_igblastdb.sh

Downloads the IgBLAST reference database.

fetch_imgtdb.sh

Downloads the IMGT reference database.

imgt2igblast.sh

Imports the IMGT reference database into IgBLAST.

imgt2cellranger.py

Converts the IMGT fasta germline reference files to the input required by cellranger-mkvdjref.

Data

/usr/local/share/germlines/imgt/IMGT.yaml

Information about the downloaded IMGT reference sequences.

/usr/local/share/germlines/imgt/<species>/vdj

Directory containing IMGT-gapped V(D)J reference sequences in FASTA format.

/usr/local/share/igblast

IgBLAST data directory.

/usr/local/share/igblast/fasta

Directory containing ungapped IMGT references sequences with IGH/IGL/IGL and TRA/TRB/TRG/TRD combined into single FASTA files, respectively.

/usr/local/share/protocols

Directory containing primer, template switch and internal constant region sequences for various experimental protocols in FASTA format.

Using the Container

Invoking a shell inside the container

To invoke a shell session inside the container:

# Docker command
docker run -it immcantation/suite:4.2.0 bash

# Singularity command
singularity shell immcantation_suite-4.2.0.sif

Sharing files with the container

Sharing files between the host operating system and the container requires you to bind a directory on the host to one of the container’s mount points using the -v argument for docker or the -B argument for singularity. There are four available mount points defined in the container:

/data
/scratch
/software
/oasis

For example, to invoke a shell session inside the container with $HOME/project mounted to /data:

# Docker command
docker run -it -v $HOME/project:/data:z immcantation/suite:4.2.0 bash

# Singularity command
singularity shell -B $HOME/project:/data immcantation_suite-4.2.0.sif

Note, the :z in the -v argument of the docker command is essential.

Executing a specific command

After invoking an interactive session inside the container, commands can be executed in the container shell as they would be executed in the host shell.

Alternatively, it is possible to execute a specific command directly inside the container without starting an interactive session. The next example demonstrates how to execute versions report with $HOME/project mounted to /data:

# Docker command
docker run -v $HOME/project:/data:z immcantation/suite:4.2.0 versions report

# Singularity command
singularity exec -B $HOME/project:/data immcantation_suite_|docker-version|.sif versions report

In this case, we are executing the versions report command which will inspect the installed software versions and print them to standard output.

There is an analagous builds report command to display the build date and changesets used during the image build. This is particularly relevant if you are using the immcantation/suite:devel development builds.

Pipeline Templates

You can always run your own pipeline scripts through the container, but the container also includes a set of predefined pipeline scripts that can be run as is or extended to your needs. Each pipeline script has a -h argument which will explain its use. The available pipelines are:

  • preprocess-phix

  • presto-abseq

  • presto-clontech

  • changeo-10x

  • changeo-igblast

  • tigger-genotype

  • shazam-threshold

  • changeo-clone

All template pipeline scripts can be found in /usr/local/bin.

PhiX cleaning pipeline

Removes reads from a sequence file that align against the PhiX174 reference genome.

Usage: preprocess-phix [OPTIONS]
-s

FASTQ sequence file.

-r

Directory containing phiX174 reference db. Defaults to /usr/local/share/phix.

-n

Sample identifier which will be used as the output file prefix. Defaults to a truncated version of the input filename.

-o

Output directory. Will be created if it does not exist. Defaults to a directory matching the sample identifier in the current working directory.

-p

Number of subprocesses for multiprocessing tools. Defaults to the available cores.

-h

This message.

Example: preprocess-phix

# Arguments
DATA_DIR=~/project
READS=/data/raw/sample.fastq
OUT_DIR=/data/presto/sample
NPROC=4

# Run pipeline in docker image
docker run -v $DATA_DIR:/data:z immcantation/suite:4.2.0 \
    preprocess-phix -s $READS -o $OUT_DIR -p $NPROC

# Singularity command
singularity exec -B $DATA_DIR:/data immcantation_suite-4.2.0.sif \
    preprocess-phix -s $READS -o $OUT_DIR -p $NPROC

Note

The PhiX cleaning pipeline will convert the sequence headers to the pRESTO format. Thus, if the nophix output file is provided as input to the presto-abseq pipeline script you must pass the argument -x presto to presto-abseq, which will tell the script that the input headers are in pRESTO format (rather than the Illumina format).

NEB ImmunoSeq protocol preprocessing pipeline

A start to finish pRESTO processing script for ImmunoSeq data. An example for human BCR processing is shown below. Primer sequences are available from the Immcantation repository under protocols/AbSeq or inside the container under /usr/local/share/protocols/AbSeq. Mouse primers are not supplied. TCR V gene references can be specified with the flag -r /usr/local/share/igblast/fasta/imgt_human_tr_v.fasta.

Usage: presto-abseq [OPTIONS]
-1

Read 1 FASTQ sequence file. Sequence beginning with the C-region or J-segment).

-2

Read 2 FASTQ sequence file. Sequence beginning with the leader or V-segment).

-j

Read 1 FASTA primer sequences. Defaults to /usr/local/share/protocols/AbSeq/AbSeq_R1_Human_IG_Primers.fasta.

-v

Read 2 FASTA primer or template switch sequences. Defaults to /usr/local/share/protocols/AbSeq/AbSeq_R2_TS.fasta.

-c

C-region FASTA sequences for the C-region internal to the primer. If unspecified internal C-region alignment is not performed.

-r

V-segment reference file. Defaults to /usr/local/share/igblast/fasta/imgt_human_ig_v.fasta.

-y

YAML file providing description fields for report generation.

-n

Sample identifier which will be used as the output file prefix. Defaults to a truncated version of the read 1 filename.

-o

Output directory. Will be created if it does not exist. Defaults to a directory matching the sample identifier in the current working directory.

-x

The mate-pair coordinate format of the raw data. Defaults to illumina.

-p

Number of subprocesses for multiprocessing tools. Defaults to the available cores.

-h

This message.

One of the requirements for generating the report at the end of the pRESTO pipeline is a YAML file containing information about the data and processing. Valid fields are shown in the example sample.yaml below, although no fields are strictly required:

sample.yaml

title: "pRESTO Report: CD27+ B cells from subject HD1"
author: "Your Name"
version: "0.5.4"
description: "Memory B cells (CD27+)."
sample: "HD1"
run: "ABC123"
date: "Today"

Example: presto-abseq

# Arguments
DATA_DIR=~/project
READS_R1=/data/raw/sample_R1.fastq
READS_R2=/data/raw/sample_R2.fastq
YAML=/data/sample.yaml
SAMPLE_NAME=sample
OUT_DIR=/data/presto/sample
NPROC=4

# Docker command
docker run -v $DATA_DIR:/data:z immcantation/suite:4.2.0 \
    presto-abseq -1 $READS_R1 -2 $READS_R2 -y $YAML \
    -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

# Singularity command
singularity exec -B $DATA_DIR:/data immcantation_suite-4.2.0.sif \
    presto-abseq -1 $READS_R1 -2 $READS_R2 -y $YAML \
    -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

Takara Bio (Clontech) SMARTer protocol preprocessing pipeline

A start to finish pRESTO processing script for Takara Bio / Clontech SMARTer kit data. C-regions are assigned using the universal C-region primer sequences are available from the Immcantation repository under protocols/Universal or inside the container under /usr/local/share/protocols/Universal.

Usage: presto-clontech [OPTIONS]
-1

Read 1 FASTQ sequence file. Sequence beginning with the C-region.

-2

Read 2 FASTQ sequence file. Sequence beginning with the leader.

-j

C-region reference sequences (reverse complemented). Defaults to /usr/local/share/protocols/Universal/Mouse_IG_CRegion_RC.fasta.

-r

V-segment reference file. Defaults to /usr/local/share/igblast/fasta/imgt_mouse_ig_v.fasta.

-y

YAML file providing description fields for report generation.

-n

Sample identifier which will be used as the output file prefix. Defaults to a truncated version of the read 1 filename.

-o

Output directory. Will be created if it does not exist. Defaults to a directory matching the sample identifier in the current working directory.

-x

The mate-pair coordinate format of the raw data. Defaults to illumina.

-p

Number of subprocesses for multiprocessing tools. Defaults to the available cores.

-h

This message.

Example: presto-clontech

# Arguments
DATA_DIR=~/project
READS_R1=/data/raw/sample_R1.fastq
READS_R2=/data/raw/sample_R2.fastq
CREGION=/usr/local/share/protocols/Universal/Human_IG_CRegion_RC.fasta
VREF=/usr/local/share/igblast/fasta/imgt_human_ig_v.fasta
SAMPLE_NAME=sample
OUT_DIR=/data/presto/sample
NPROC=4

# Docker command
docker run -v $DATA_DIR:/data:z immcantation/suite:4.2.0 \
    presto-clontech -1 $READS_R1 -2 $READS_R2 -j $CREGION -r $VREF \
    -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

# Singularity command
singularity exec -B $DATA_DIR:/data immcantation_suite-4.2.0.sif \
    presto-abseq -1 $READS_R1 -2 $READS_R2 -j $CREGION -r $VREF \
    -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

10x Genomics V(D)J annotation pipeline

Assigns new annotations and infers clonal relationships to 10x Genomics single-cell V(D)J data output by Cell Ranger.

Usage: changeo-10x [OPTIONS]
-s

FASTA or FASTQ sequence file.

-a

10x Genomics cellranger-vdj contig annotation CSV file. Must corresponding with the FASTA/FASTQ input file (all, filtered or consensus).

-r

Directory containing IMGT-gapped reference germlines. Defaults to /usr/local/share/germlines/imgt/[species name]/vdj.

-g

Species name. One of human, mouse, rabbit, rat, or rhesus_monkey. Defaults to human.

-t

Receptor type. One of ig or tr. Defaults to ig.

-x

Distance threshold for clonal assignment. Specify “auto” for automatic detection. If unspecified, clonal assignment is not performed.

-m

Distance model for clonal assignment. Defaults to the nucleotide Hamming distance model (ham).

-b

IgBLAST IGDATA directory, which contains the IgBLAST database, optional_file and auxillary_data directories. Defaults to /usr/local/share/igblast.

-n

Sample identifier which will be used as the output file prefix. Defaults to a truncated version of the sequence filename.

-o

Output directory. Will be created if it does not exist. Defaults to a directory matching the sample identifier in the current working directory.

-f

Output format. One of changeo or airr. Defaults to airr.

-p

Number of subprocesses for multiprocessing tools. Defaults to the available cores.

-i

Specify to allow partial alignments.

-z

Specify to disable cleaning and compression of temporary files.

-h

This message.

Example: changeo-10x

# Arguments
DATA_DIR=~/project
READS=/data/raw/sample_filtered_contig.fasta
ANNOTATIONS=/data/raw/sample_filtered_contig_annotations.csv
SAMPLE_NAME=sample
OUT_DIR=/data/changeo/sample
DIST=auto
NPROC=4

# Run pipeline in docker image
docker run -v $DATA_DIR:/data:z immcantation/suite:4.2.0 \
    changeo-10x -s $READS -a $ANNOTATIONS -x $DIST -n $SAMPLE_NAME \
    -o $OUT_DIR -p $NPROC

# Singularity command
singularity exec -B $DATA_DIR:/data immcantation_suite-4.2.0.sif \
    changeo-10x -s $READS -a $ANNOTATIONS -x $DIST -n $SAMPLE_NAME \
    -o $OUT_DIR -p $NPROC

IgBLAST annotation pipeline

Performs V(D)J alignment using IgBLAST and post-processes the output into the Change-O data standard.

Usage: changeo-igblast [OPTIONS]
-s

FASTA or FASTQ sequence file.

-r

Directory containing IMGT-gapped reference germlines. Defaults to /usr/local/share/germlines/imgt/[species name]/vdj.

-g

Species name. One of human, mouse, rabbit, rat, or rhesus_monkey. Defaults to human.

-t

Receptor type. One of ig or tr. Defaults to ig.

-b

IgBLAST IGDATA directory, which contains the IgBLAST database, optional_file and auxillary_data directories. Defaults to /usr/local/share/igblast.

-n

Sample identifier which will be used as the output file prefix. Defaults to a truncated version of the sequence filename.

-o

Output directory. Will be created if it does not exist. Defaults to a directory matching the sample identifier in the current working directory.

-f

Output format. One of airr (default) or changeo. Defaults to airr.

-p

Number of subprocesses for multiprocessing tools. Defaults to the available cores.

-k

Specify to filter the output to only productive/functional sequences.

-i

Specify to allow partial alignments.

-z

Specify to disable cleaning and compression of temporary files.

-h

This message.

Example: changeo-igblast

# Arguments
DATA_DIR=~/project
READS=/data/presto/sample/sample-final_collapse-unique_atleast-2.fastq
SAMPLE_NAME=sample
OUT_DIR=/data/changeo/sample
NPROC=4

# Run pipeline in docker image
docker run -v $DATA_DIR:/data:z immcantation/suite:4.2.0 \
    changeo-igblast -s $READS -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

# Singularity command
singularity exec -B $DATA_DIR:/data immcantation_suite-4.2.0.sif \
    changeo-igblast -s $READS -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

Genotyping pipeline

Infers V segment genotypes using TIgGER.

Usage: tigger-genotype [options]
-d DB, --db=DB

Change-O formatted TSV (TAB) file.

-r REF, --ref=REF

FASTA file containing IMGT-gapped V segment reference germlines. Defaults to /usr/local/share/germlines/imgt/human/vdj/imgt_human_IGHV.fasta.

-v VFIELD, --vfield=VFIELD

Name of the output field containing genotyped V assignments. Defaults to V_CALL_GENOTYPED.

-x MINSEQ, --minseq=MINSEQ

Minimum number of sequences in the mutation/coordinate range. Samples with insufficient sequences will be excluded. Defaults to 50.

-y MINGERM, --mingerm=MINGERM

Minimum number of sequences required to analyze a germline allele. Defaults to 200.

-n NAME, --name=NAME

Sample name or run identifier which will be used as the output file prefix. Defaults to a truncated version of the input filename.

-o OUTDIR, --outdir=OUTDIR

Output directory. Will be created if it does not exist. Defaults to the current working directory.

-f FORMAT, --format=FORMAT

File format. One of ‘airr’ (default) or ‘changeo’.

-p NPROC, --nproc=NPROC

Number of subprocesses for multiprocessing tools. Defaults to the available processing units.

-h, --help

Show this help message and exit

Example: tigger-genotype

# Arguments
DATA_DIR=~/project
DB=/data/changeo/sample/sample_db-pass.tab
SAMPLE_NAME=sample
OUT_DIR=/data/changeo/sample
NPROC=4

# Run pipeline in docker image
docker run -v $DATA_DIR:/data:z immcantation/suite:4.2.0 \
    tigger-genotype -d $DB -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

# Singularity command
singularity exec -B $DATA_DIR:/data immcantation_suite-4.2.0.sif \
    tigger-genotype -d $DB -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

Clonal threshold inference pipeline

Performs automated detection of the clonal assignment threshold.

Usage: shazam-threshold [options]
-d DB, --db=DB

Tabulated data file, in Change-O (TAB) or AIRR format (TSV).

-m METHOD, --method=METHOD

Threshold inferrence to use. One of gmm, density, or none. If none, the distance-to-nearest distribution is plotted without threshold detection. Defaults to density.

-n NAME, --name=NAME

Sample name or run identifier which will be used as the output file prefix. Defaults to a truncated version of the input filename.

-o OUTDIR, --outdir=OUTDIR

Output directory. Will be created if it does not exist. Defaults to the current working directory.

-f FORMAT, --format=FORMAT

File format. One of ‘airr’ (default) or ‘changeo’.

-p NPROC, --nproc=NPROC

Number of subprocesses for multiprocessing tools. Defaults to the available processing units.

--model=MODEL

Model to use for the gmm model. One of gamma-gamma, gamma-norm, norm-norm or norm-gamma. Defaults to gamma-gamma.

--subsample=SUBSAMPLE

Number of distances to downsample the data to before threshold calculation. By default, subsampling is not performed.

--repeats=REPEATS

Number of times to recalculate. Defaults to 1.

-h, --help

Show this help message and exit

Example: shazam-threshold

# Arguments
DATA_DIR=~/project
DB=/data/changeo/sample/sample_genotyped.tab
SAMPLE_NAME=sample
OUT_DIR=/data/changeo/sample
NPROC=4

# Run pipeline in docker image
docker run -v $DATA_DIR:/data:z immcantation/suite:4.2.0 \
    shazam-threshold -d $DB -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

# Singularity command
singularity exec -B $DATA_DIR:/data immcantation_suite-4.2.0.sif \
    shazam-threshold -d $DB -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

Clonal assignment pipeline

Assigns Ig sequences into clonally related lineages and builds full germline sequences.

Usage: changeo-clone [OPTIONS]
-d

Change-O formatted TSV (TAB) file.

-x

Distance threshold for clonal assignment.

-m

Distance model for clonal assignment. Defaults to the nucleotide Hamming distance model (ham).

-r

Directory containing IMGT-gapped reference germlines. Defaults to /usr/local/share/germlines/imgt/human/vdj.

-n

Sample identifier which will be used as the output file prefix. Defaults to a truncated version of the input filename.

-o

Output directory. Will be created if it does not exist. Defaults to a directory matching the sample identifier in the current working directory.

-f

Output format. One of airr (default) or changeo.

-p

Number of subprocesses for multiprocessing tools. Defaults to the available cores.

-a

Specify to clone the full data set. By default the data will be filtering to only productive/functional sequences.

-z

Specify to disable cleaning and compression of temporary files.

-h

This message.

Example: changeo-clone

# Arguments
DATA_DIR=~/project
DB=/data/changeo/sample/sample_genotyped.tab
DIST=0.15
SAMPLE_NAME=sample
OUT_DIR=/data/changeo/sample
NPROC=4

# Run pipeline in docker image
docker run -v $DATA_DIR:/data:z immcantation/suite:4.2.0 \
    changeo-clone -d $DB -x $DIST -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

# Singularity command
singularity exec -B $DATA_DIR:/data immcantation_suite-4.2.0.sif \
    changeo-clone -d $DB -x $DIST -n $SAMPLE_NAME -o $OUT_DIR -p $NPROC

Release Notes

Version 4.2.0: June 21, 2021

Version Updates:

  • presto 0.6.2

  • changeo 1.1.0

  • alakazam 1.1.0

  • airr-py 1.3.1

  • igblast 1.17.1

Pipeline Changes:

  • Added support for rat, rabbit and rhesus macaque to changeo-10x and changeo-igblast.

  • Added the -z argument to changeo-10x, changeo-igblast, and changeo-clone to allow compression and cleaning of temporary intermediate files to be disabled.

  • Updated changeo-igblast to use the new IgBLAST wrapper in changeo (AssignGenes).

Image Changes:

  • Updated base image to Fedora 33.

  • Fixed a Biopython v1.77 incompatibility in fastq2fasta.py.

  • Updated fetch_igblastdb.sh for new file locations and disabled download of old internal_data and optional_file directories by default.

  • Added support for rat, rabbit and rhesus macaque to fetch_imgtdb.sh.

  • Added download of artificially spliced V exon and leader sequences to fetch_imgtdb.sh. Sequences are downloaded into the leader_vexon subdirectory.

  • Added imgt2cellranger.py script which converts the IMGT reference germline header format into the input format required by cellranger mkvdjref.

Version 4.1.0: August 12, 2020

Version Updates:

  • presto 0.6.1

  • alakazam 1.0.2

  • shazam 1.0.2

  • scoper 1.1.0

  • rabhit 0.1.5

Pipeline Changes:

  • Fixed a clonal clustering threshold detection warning causing early exit of changeo-10x in some cases.

Image Changes:

  • Fixed a Biopython v1.77 incompatibility in clean_imgtdb.py.

  • Updated IgBLAST installation procedure for new structure of internal_data, optional_file, and database directories.

Version 4.0.0: June 1, 2020

General:

  • License changed to AGPL-3 for scripts, core packages, and other software code. Non-software content remains unchanged under the CC BY-SA 4.0 license.

  • Updated base image to Fedora 31.

Version Updates:

  • presto 0.6.0

  • changeo 1.0.0

  • alakazam 1.0.1

  • shazam 1.0.0

  • tigger 1.0.0

  • scoper 1.0.1

  • prestor 0.0.6

  • igphyml 1.1.3

  • igblast 1.16.0

  • airr-py 1.3.0

  • airr-r 1.3.0

Pipeline Changes:

  • Changed the default output format of all pipeline scripts to the AIRR Rearrangement standard. The legacy Change-O format is still supported by specifying -f changeo.

  • Added report generation and the -y argument specifying the report yaml config file to presto-clontech.

  • Changed name of the console logs in presto-clontech to pipeline-presto.log and pipeline-presto.err (was pipeline.log and pipeline.err).

  • Added --minseq and --mingerm arguments to tigger-genotype to control sequence and allele exclusion criteria.

  • The changeo-10x will no longer automatically archive the db-pass file in the temp_files.tar.gz tarball.

Version 3.1.0: December 16, 2019

Version Updates:

  • shazam 0.2.2

Version 3.0.0: August 29, 2019

Version Updates:

  • alakazam 0.3.0

  • presto 0.5.13

  • scoper 0.2.0

  • shazam 0.2.1

  • tigger 0.4.0

  • igphyml 1.0.6

  • igblast 1.14.0

  • blast 2.9.0

  • vsearch 2.13.6

  • cd-hit 4.8.1

Pipeline Changes:

  • Added the -f argument to multiple pipelines to toggle output between the Change-O standard (changeo) and the AIRR Rearrangement standard (airr).

  • Added the -m argument to changeo-clone to specify the distance model used for cloning.

  • Renamed the productive filter argument from -f to -k in changeo-igblast.

  • Added a method option of none to shazam-threshold to provide a dummy mode that simply plots the distance-to-nearest distribution without threshold detection.

  • Added --minseq and --mingerm arguments to tigger-genotype to allow specification of novel allele detection cutoffs.

Image Changes:

  • Added the RAbHIT R package.

  • Added the changeo-10x pipeline to process 10X Genomics V(D)J data.

  • Added the presto-clontech pipeline to preprocess data from the Takara Bio / Clontech SMARTer kit.

  • Added some universal C-region reference sequences to /usr/local/share/protocols.

  • Added the pipelines report command to show a description of available pipeline commands.

  • Fixed a dependency version issue that prevented tbl2asn from running.

  • Fixed Mac OS compatibility in fetch_imgtdb.

Version 2.7.0: February 1, 2019

Version Updates:

  • presto 0.5.11

  • changeo 0.4.5

  • shazam 0.1.11

  • blast 2.8.1

Version 2.6.0: December 9, 2018

Version Updates:

  • igblast 1.12.0

Pipeline Changes:

  • Added -i argument to changeo-igblast to allow retention of partial alignments.

Image Changes:

  • Base system changed to Fedora 29.

  • Moved setup of R package build environment to base image.

Version 2.5.0: November 1, 2018

Version Updates:

  • igblast 1.11.0

  • muscle 3.8.425

  • vsearch 2.9.1

Image Changes:

  • Added error checking to versions report command.

Version 2.4.0: October 27, 2018

Version Updates:

  • changeo 0.4.4

Version 2.3.0: October 21, 2018

Version Updates:

  • presto 0.5.10

  • changeo 0.4.3

  • tigger 0.3.1

Image Changes:

  • Added scoper R package.

  • Added IgPhyML.

  • Removed strict Rcpp version requirement (was fixed at 0.12.16).

  • Added libGL and libGLU to base image.

Version 2.2.0: October 5, 2018

Version Updates:

  • tigger 0.3.0

  • airr python library 1.2.1

Pipeline Changes:

  • Fixed compression error messages in changeo-igblast and changeo-clone.

  • Removed support for tigger versions below 0.3.0 from tigger-genotype.

Image Changes:

  • Adjusted version/changeset detection and output in the versions report and builds report commands.

Version 2.1.0: September 20, 2018

Version Updates:

  • alakazam 0.2.11

  • shazam 0.1.10

  • prestor 0.0.5

  • vsearch 2.8.4

  • BLAST 2.7.1

  • IgBLAST 1.10.0

Pipeline Changes:

  • Subsampling is no longer performed by default in shazam-threshold.

Version 2.0.0: September 8, 2018

Version Updates:

  • pRESTO 0.5.9

  • Change-O 0.4.2

  • airr 1.2.0

Image Changes:

  • Added tbl2asn.

Pipeline Changes:

  • Changed behavior of subsampling argument to shazam-threshold to subsample distances after nearest-neighbor distance calculation rather than rows before distance calculation.

Version 1.10.2: July 3, 2018

Pipeline Changes:

  • Added data set subsampling to shazam-threshold with a default value of 15000 records.

  • Added -f argument to changeo-igblast to allow optional filtering of non-productive/non-functional sequences.

  • Added -a argument to changeo-clone to allow retention of non-productive/non-functionals sequences during cloning.

  • Added -v argument to tigger-genotype to allow specification of the V genotyped column name.

Version 1.10.1: July 1, 2018

Pipeline Changes:

  • Fixed a bug wherein changeo-igblast and changeo-clone were not working with an unspecified output directory (-o argument).

  • Updated CPU core detection in tigger-genotype and shazam-threshold for compatability with new R package versions.

Accessory Script Changes:

  • Fixed fetch_imgtdb.sh creating empty mouse IGKC and IGLC files.

Image Changes:

  • Changed default CRAN mirror setting.

Version 1.10.0: May 23, 2018

Version Updates:

  • IgBLAST 1.9.0

Pipeline Changes:

  • Changed the default threshold detection method in shazam-threshold to the smoothed density estimate with subsampling to 15000 sequences.

  • Fixed a bug wherein changeo-igblast was not reading the -b argument.

Image Changes:

  • Added RDI R package.

  • Added CD-HIT.

  • Added AIRR python and R reference libaries.

  • Added git, BLAS, and LAPACK to base image.

Version 1.9.0: April 22, 2018

Version Updates:

  • alakazam 0.2.10

  • shazam 0.1.9

Pipeline Changes:

  • Added -l <model> argument to shazam-threshold to allow specification of the mixture model distributions to shazam::findThreshold.

Image Changes:

  • Set Rcpp version for R package builds to 0.12.16 (from 0.12.12).

Version 1.8.0: March 22, 2018

Version Updates:

  • alakazam 0.2.9

  • changeo 0.3.12

  • presto 0.5.7

Pipeline Changes:

  • Removed an intermediate file and the ParseHeaders-rename step in presto-abseq.

  • Modifed tigger-genotype to work with upcoming release of tigger v0.2.12.

  • Fixed parsing of output directory argument (-o) in preprocess-phix and changeo-clone.

Image Changes:

  • Added sudo access for the magus (default) user.

Version 1.7.0: February 6, 2018

Version Updates:

  • changeo 0.3.11

Version 1.6.0: January 29, 2018

Version Updates:

  • prestor 0.0.4

Version 1.5.0: January 17, 2018

Version Updates:

  • presto 0.5.6

Version 1.4.0: December 29, 2017

Version Updates:

  • presto 0.5.5

  • phylip 3.697

Pipeline Changes:

  • Fixed a bug in presto-abseq preventing relative file paths from working with the -r argument.

  • changeo-igblast no longer terminates upon IgBLAST warnings.

Accessory Script Changes:

  • Fixed an output directory bug in fastq2fasta.py.

Image Changes:

  • Added Stern, Yaari and Vander Heiden, et al 2014 primer sets.

Version 1.3.0: October 17, 2017

Version Updates:

  • changeo 0.3.9

Pipeline Changes:

  • Fixed a bug in presto-abseq preventing relative file paths from working with the -r argument.

Version 1.2.0: October 05, 2017

Version Updates:

  • changeo 0.3.8

Version 1.1.0: September 22, 2017

Version Updates:

  • alakazam 0.2.8

  • tigger 0.2.11

  • prestor 0.0.3

Image Changes:

  • Added preprocess-phix script that removes PhiX reads.

  • Added fetch_phix.sh script that downloads the PhiX174 genome.

  • Added builds script to record and report image build date and package changesets.

  • Added -x <coordinate system> argument to presto-abseq.

  • Forced install of Rcpp to be fixed at version 0.12.12.

  • Added /oasis mount point

Version 1.0.0: August 08, 2017

  • Initial meta-versioned image.

prestoR

The presto report package (prestoR) is an R package for generating quality control plots from pRESTO log tables.

Example Report

Download & Installation

prestor is current not available from CRAN and must be installed from the bitbucket repo directly by first cloning the bitbucket repository:

https://bitbucket.org/kleinstein/prestor

Then build using the following R commands from the package root:

install.packages(c("devtools", "roxygen2"))
library(devtools)
install_deps(dependencies=T)
document()
install()

Alternatively, you can install directly form the bitbucket repository, but this will not build the documentation:

library(devtools)
install_bitbucket("javh/prototype-prestor@default")

Documentation

For an index of available functions see:

help(package="prestor")

For some common tasks, see the following help pages:

Function

Description

buildReport

Generate a presto pipeline report

loadConsoleLog

Parse console output from a pRESTO pipeline

loadLogTable

Parse tabled log output from pRESTO tools

pdfReport

R Markdown to PDF format for pRESTO reports

plotAlignSets

Plot AlignSets log table

plotAssemblePairs

Plot AssemblePairs log table

plotBuildConsensus

Plot BuildConsensus log table

plotConsoleLog

Plot console output from a pRESTO pipeline

plotFilterSeq

Plot FilterSeq log table

plotMaskPrimers

Plot MaskPrimer log table

plotParseHeaders

Plot ParseHeaders log table

report_abseq3

Generate a report for an AbSeq V3 pRESTO pipeline script

Welcome to the Immcantation Portal!

Advances in high-throughput sequencing technologies now allow for large-scale characterization of B cell receptor (BCR) and T cell receptor (TCR) repertoires. The high germline and somatic diversity of the adaptive immune receptor repertoire (AIRR) presents challenges for biologically meaningful analysis - requiring the development of specialized computational methods.

The Immcantation framework provide a start-to-finish analytical ecosystem for high-throughput AIRR-seq datasets. Beginning from raw reads, Python and R packages are provided for pre-processing, population structure determination, and repertoire analysis.

Core Packages

Click on the images below for more details.

presto-img

pRESTO

  • Quality control

  • Read assembly

  • UMI processing

  • Error profiling

changeo-img

Change-O

  • V(D)J reference alignment standardization

  • Clonal clustering

  • Germline reconstruction

  • Conversion and annotation

alakazam-img

Alakazam

  • Clonal lineage reconstruction

  • Lineage topology analysis

  • Repertoire diversity

  • V(D)J gene usage

  • Physicochemical property analysis

shazam-img

SHazaM

  • Mutation profiling

  • Selection pressure quantification

  • Empirical SHM models

  • Chimera detection

  • Clonal clustering threshold tuning

tigger-img

TIgGER

  • Novel polymorphism detection

  • Genotyping

scoper-img

SCOPer

  • Spectral clonal clustering methods

prestoR-img

prestoR

  • pRESTO report generation

Contributed Packages

Click on the images below for more details.

rdi-img

RDI

  • Repertoire Dissimilarity Index

rabhit-img

RAbHIT

  • Determination of V-D-J haplotypes

igphyml-img

IgPhyML

  • Clonal lineage tree construction

  • Mutation/selection hypothesis testing

sumrep-img

sumrep

  • Generate repertoire summary statistics.

  • Visualize and comparing repertoire summaries.