Nanobodies - A potent alternative

Introduction

Biologic drugs are increasingly becoming important as therapeutics for treatment of various diseases, including cancer, infectious and inflammatory diseases. Classical antibody scaffolds and structures are being challenged by smaller but equally potent molecules which have several benefits over classical immunoglobulins.

Camelid antibodies have been shown to lack the light chain only carrying effector function on the heavy chain which have made them remarkably interesting from a therapeutic perspective. The variable domain itself separated from the constant region has been named nanobody. A thorough review of nanobodies and therapeutic relevance is beyond the scope of this application note but it nicely outlined in the review paper by Steeland et al.


Figure 1. Adapted from Deonarain et al. 


Advanced bioinformatics and sequence analysis has become an integral part of antibody drug discovery and development and is being challenged by ever increasing amounts of sequence and functional data. 

Pipe | bio offers an integrated, comprehensive yet easy to use cloud-based sequence analysis platform and advanced data repository which can easily be configured to cope with various annotation requirements for both antibodies and non-antibody scaffolds. 

Biopanning, repertoire analysis, and immunization campaigns are all part of the toolbox to find valuable antibodies and other biologics and in this application note we show how we have used the Pipe | bio platform for analysis of high throughput (NGS) sequencing data of nanobodies (single domain antibodies) and how to find enriched clones in a biopanning experiment of Alpaca (Lama pacos), Miyazaki et al.


Here we reproduce key analysis components from the Miyazaki et al.  paper within an hour in Pipe|bio software. In total we found 11 of the 12 clones identified in that paper. This blog post follows 6 steps:
  • Merge of paired-end data
  • Annotation
  • Highlevel overview with charts
  • Clustering on individual samples
  • Multi-sample comparison
  • Slice and dice the data
  • Re-covering clones
  • Alignments and further analyses

Data

For this application note, we have retrieved two datasets from the European Nucleotide Archive. 

Bioproject: https://www.ebi.ac.uk/ena/browser/view/PRJDB2382

Paper: doi: 10.1093/jb/mvv038

This is the dataset used throughout the application note. We also used data deposited in Genbank with accession numbers AB926001-11 for validation.

Total read count: 1,372,580


Bioproject: https://www.ebi.ac.uk/ena/browser/view/PRJNA321369

Paper: doi: 10.1371/journal.pone.0161801 

This dataset is used as an illustration of “historical” sequences to represent internal company sequences, patent sequences, public repertoire data etc. This dataset has been analysed and put into the sequence store but is otherwise not used in the application note.

Total read count: 4,851,448


Scaffold configuration

Before running the annotation pipeline, there is a one-time configuration of the required scaffold. Examples of a scaffold can be IgG, ScFV, nanobody, non-antibody etc. and below we show a simplified example of a nanobody scaffold. As part of the scaffold configuration it is possible to specify any liabilities, disallowed frameshifts, stop codons, glycosylation sites etc. and how this should be reported in a tabular output.  

Multiple scaffolds can be configured allowing for different configurations enabling teams to work on different molecules in the same platform. 

Figure 2. Schematic view of the nanobody scaffold. In this example is (none) simply part of the scaffold name to tell the user there are no per base phred score quality checks. Liabilities such as deamidation and glycosylation sites are not allowed. 


Analysis pipeline

The Pipe | bio platform is very flexible and here we perform a relatively simple workflow. 

  • Import sequences
  • Merge paired-end sequences
  • Annotate sequences according to scaffold configuration
  • Plot various findings
  • Cluster regions of interest
  • Find enriched clones

We also look up common sequences in our database of “historical” sequences.  

Non-antibody scaffolds as therapeutics - application note

Introduction

Biologic drugs are increasingly becoming important as therapeutics for treatment of various diseases, including cancer, infectious and inflammatory diseases. Classical antibody scaffolds and structures are being challenged by smaller but equally potent molecules which have a number of benefits over the large bulky IgG molecule. Non-antibody scaffolds are interesting as therapeutic drugs thus the rich interest in these scaffolds (Frejd, F. et al). Many different and interesting scaffolds exist but here we only focus on a few of these.

Figure 1. From Vazquez-Lombardi et al.


Traditionally, there has been very little interest in developing general software tools to cope with these non-antibody scaffolds and companies and academic research groups have often analysed data by hand or used internally developed software. Analysis of high throughput (NGS) sequencing data of these scaffolds has been a very challenging task.

Pipe | bio offers a very easy to use cloud based sequence repository and bioinformatics platform which can easily be configured to fit various annotation requirements for both antibodies and non-antibody scaffolds. 

In this application note we have primarily focused on analysis of affibodies but the platform can very easily be configured to other scaffolds such as knottins, bicyclic peptides, DARPins etc. 


Data and configuration

We have used the first 2 million affibody sequences from ERR3474167.fastq downloaded from the European Nucleotide Archive. Bioproject https://www.ebi.ac.uk/ena/browser/view/PRJEB33942 which have been sequenced on the Illumina MiSeq platform. 


Scaffold configuration

Before running the annotation pipeline, there is a one-time configuration of the required scaffold. A scaffold can be IgG, ScFV, nanobody, non-antibody scaffolds etc. and below we show a simplified example of an affibody scaffold. As part of the scaffold configuration it is also possible to specify any liabilities, disallowed frameshifts, stop codons, etc. and how this should be reported in a tabular output.  

Multiple scaffolds can be configured allowing for different configurations. 

Figure 2. Schematic view of the affibody scaffold. Scaffolds and associated rules can be customized in Pipe | bio to meet your organization's needs.


Analysis pipeline

The Pipe | bio platform has a large toolbox for analysing data and the use of those may be dependent on the biological application. Here we show a simple workflow where we have imported sequence data, annotated interesting regions, plotted different charts and clustered on the region of interest. 


Annotation results 

The initial output of the annotation pipeline is a result document which shows tabular information on the results aligned with the sequences represented in a graphical view. This enables the user to easily filter and visually inspect the data in great detail. The annotation results are accompanied with a graphs showing breakdown of identified liabilities and overall summary statistics.

 

Figure 3. Pie chart showing summary of the annotation pipeline and individual identified errors as a tabular representation. Annotated sequences are shown in the lower half of the screen with both tabular information as well as graphical sequence view.  


Charts

For visual inspection and support of your analyses it is possible to plot various charts. All charts and analyses can be performed per annotated region or the full sequence. All chats are interactive and by clicking different regions of the chart will apply a relevant filter to the result table of both tabular and sequence data. For example, for synthetic scaffolds and affinity maturation it is very valuable to be able to click interesting codons in a codon usage plot or by clicking a certain sequence length in a length distribution chart. 

A number of different charts are support and others can be added on request

  • Codon usage
  • Length distribution
  • Sequence logo
  • Amino acid heatmap
  • And more



Figure 4. Make chart dialog

Below is an example of codon usage which is great for library QC. All cells are clickable and will then apply a filter to the result table. 

Figure 5. Codon usage plot


Clustering

Reducing data complexity by clustering is a great way to get a condensed overview of the data and reduce data redundancy. 

The user will be able to “slice and dice” and have different views on clustered data. In the following screenshots we only look at the overview of the clusters, but it is also possible to expand the content and look into more details of the individual sub-clusters. 

From the 2 million annotated sequence and using 85% identity clustering, we find 4651 clusters in total. The largest cluster has 328,492 sequences comprising 108,498 unique sequences. There is at most 255 identical sequences in that cluster indicating a very high diversity.

Figure 6. Clustering view sorted by the largest clusters at the top. The top pane showing an amino acid bar chart where it is very easy to identify four variable positions.  

For the largest cluster it is very easy to see that there is a high variability in position 10, 18, 28, 35 as seen in the bar chart. 


Cherry pick sequences

Use the sequence cart to cherry pick interesting sequences and clones and store them for later use or download them directly. 

Figure 7. Right click in any document to add interesting sequences to the cart.


Sequence Store

After cherry picking it may be interesting to query to the Sequence Store which is a repository of all the sequences you have analysed before. That way you can very quickly identify if you have analysed identical sequences before and in which documents they are found. 

This can also be used, as example, to store patent sequences and other data from public sources. Then it is very easy and quick to look up if the sequences you are currently analysing has already been found in the public domain. 

Figure 8. Sequence Store showing antibody CDR-H3 sequences and labels. 


Other functionalities

There is a lot which is not described here and more is being added all the time. 
  • API for integration with other systems
  • Merge paired-end NGS data
  • Screen immune repertoires to extract variants having potential in-vitro maturation sites and residues
  • Compare multiple samples, eg. enrichment, panning or to improve potency
  • Subtract one sample from another
  • Reporting
  • Labeling of sequences
  • And a lot more


References

Vazquez-Lombardi, R., Phan, T. G., Zimmermann, C., Lowe, D., Jermutus, L., & Christ, D. (2015). Challenges and opportunities for non-antibody scaffold drugs. Drug Discovery Today, 20(10), 1271–1283. https://doi.org/10.1016/j.drudis.2015.09.004

Frejd, F., Kim, K. Affibody molecules as engineered protein drugs. Exp Mol Med 49, e306 (2017). https://doi.org/10.1038/emm.2017.35