Biologic drugs are increasingly becoming important as therapeutics for treatment of various diseases, including cancer, infectious and inflammatory diseases. Classical antibody scaffolds and structures are being challenged by smaller but equally potent molecules which have several benefits over classical immunoglobulins.
Camelid antibodies have been shown to lack the light chain only carrying effector function on the heavy chain which have made them remarkably interesting from a therapeutic perspective. The variable domain itself separated from the constant region has been named nanobody. A thorough review of nanobodies and therapeutic relevance is beyond the scope of this application note but it nicely outlined in the review paper by Steeland et al.
Figure 1. Adapted from Deonarain et al.
Advanced bioinformatics and sequence analysis has become an integral part of antibody drug discovery and development and is being challenged by ever increasing amounts of sequence and functional data.
Pipe | bio offers an integrated, comprehensive yet easy to use cloud-based sequence analysis platform and advanced data repository which can easily be configured to cope with various annotation requirements for both antibodies and non-antibody scaffolds.
Biopanning, repertoire analysis, and immunization campaigns are all part of the toolbox to find valuable antibodies and other biologics and in this application note we show how we have used the Pipe | bio platform for analysis of high throughput (NGS) sequencing data of nanobodies (single domain antibodies) and how to find enriched clones in a biopanning experiment of Alpaca (Lama pacos), Miyazaki et al.
- Merge of paired-end data
- Highlevel overview with charts
- Clustering on individual samples
- Multi-sample comparison
- Slice and dice the data
- Re-covering clones
- Alignments and further analyses
For this application note, we have retrieved two datasets from the European Nucleotide Archive.
Paper: doi: 10.1093/jb/mvv038
This is the dataset used throughout the application note. We also used data deposited in Genbank with accession numbers AB926001-11 for validation.
Total read count: 1,372,580
Paper: doi: 10.1371/journal.pone.0161801
This dataset is used as an illustration of “historical” sequences to represent internal company sequences, patent sequences, public repertoire data etc. This dataset has been analysed and put into the sequence store but is otherwise not used in the application note.
Total read count: 4,851,448
Before running the annotation pipeline, there is a one-time configuration of the required scaffold. Examples of a scaffold can be IgG, ScFV, nanobody, non-antibody etc. and below we show a simplified example of a nanobody scaffold. As part of the scaffold configuration it is possible to specify any liabilities, disallowed frameshifts, stop codons, glycosylation sites etc. and how this should be reported in a tabular output.
Multiple scaffolds can be configured allowing for different configurations enabling teams to work on different molecules in the same platform.
Figure 2. Schematic view of the nanobody scaffold. In this example is (none) simply part of the scaffold name to tell the user there are no per base phred score quality checks. Liabilities such as deamidation and glycosylation sites are not allowed.
The Pipe | bio platform is very flexible and here we perform a relatively simple workflow.
- Import sequences
- Merge paired-end sequences
- Annotate sequences according to scaffold configuration
- Plot various findings
- Cluster regions of interest
- Find enriched clones
We also look up common sequences in our database of “historical” sequences.