CodonCode Corporation
Better Software for DNA Sequencing

How to Align Sequences Pairwise with CodonCode Aligner

This guide explains how to compare nucleotide sequences in CodonCode Aligner using pairwise alignments. It also looks at dot plots and other visual tools to verify and interpret results, and inspect mismatches, gaps, and sequence quality in connection with local and global pairwise alignments.

What Is Pairwise Sequence Alignment?

Pairwise sequence alignment is the process of comparing two DNA, RNA, or protein sequences to identify regions of similarity. These similarities can reveal evolutionary relationships, functional conservation, or sequencing errors. By aligning two sequences, scientists can detect insertions, deletions, and substitutions that help interpret experimental results or validate assemblies.

What Sequences Can Be Aligned?

Pairwise sequence alignment compares two sequences to reveal regions of similarity and difference.
But for an alignment to be meaningful, the sequences need to share some degree of relatedness. Pairwise sequence alignment works best for sequences that are similar enough to have conserved regions — for example, two versions of the same gene, homologous genes from related species, or overlapping sequencing reads from the same DNA region. If the sequences are too different, the alignment may show many gaps or no clear matches.

How To Do a Pairwise Alignment in CodonCode Aligner

Once you have selected two related sequences, you can align them directly in CodonCode Aligner to visualize similarities and differences. The software provides simple features to perform the alignment, display mismatches, and explore quality information at each position. In the following steps, you will learn how create a pairwise alignment and interpret the results.

Example data download: pairwise-alignment.zip

Note: To use this dataset, unpack the downloaded ZIP file, and open the "pairwise-alignment.ccap" project.

To create a pairwise alignment in CodonCode Aligner, select the sequences to align, and then click on the Assemble button in the project view toolbar, or choose Contig → Assemble from the menu.

Assemble button in the project view toolbar

You do not need to worry about sequence orientation. To avoid problems from sequences that are in the wrong orientation, CodonCode Aligner examines the orientation of all sequences before starting the alignment, and reverse-complements sequences as needed.

When the alignment is completed, the resulting contig can be found in the project view. Double-click on the contig to open the contig view window:

Pairwise alignment for 2 related sequences in the Apidae famaily.

The contig view shows an overview of the aligned sequences on top, which also highlights the differences. The lower section shows the aligned bases and allows manual editing of the alignment.

Viewing Alignment Differences

After aligning two sequences, CodonCode Aligner makes it easy to see where they differ. Differences such as mismatched bases, insertions, and deletions are highlighted directly in the alignment view, allowing you to quickly spot variations between the sequences.
You can scroll through the alignment to inspect each position, zoom in on specific regions, or use built-in tools to navigate from one difference to the next. These features help you confirm sequence similarity, check for possible errors, and better understand how the two sequences compare overall.

There are several different ways that can help you find and inspect the differences in your alignments:

Contig Overview showing differences

Contig Overview

The contig overview shows the aligned sequences and highlights differences using base colors. Zoom in and use mouse overs for details of the differences. Click on a position in the overview to navigate to this position in any view.

Difference table for pairwise alignment

Difference Table

See just the differences and their positions. The difference table can be shown at the top of the contig view instead of the overview. Apply filters, like exclude N's, and use the table to navigate to positions of interest.

Mutation detection showing amino acid changes

Mutation Detection

Run mutation detection to get a list of the mutations and their amino acid effects. Mutations are highlighted in the alignment and mouse overs show relevant information.

Mask matching bases to easily spot differences

Mask Matches

Use the feature Mask Bases Matching Consensus to easily spot and focus in on differences in the contig view. This feature can be found in the View menu.

Navigate to regions of interest

Feature Navigation

Define your regions of interest (features) in CodonCode Aligner's preferences, then use the Previous Feature and Next Feature buttons to quickly navigate from one defined feature to the next.

View chromatograms to verify differences

View Traces

Look at the chromatograms to verify differences. You can also use the base qualities to get an idea of the accuracy of the base call. Double click on a position in the contig to see the traces at this location.

Using colors to see differences between bases

Base Colors

Use base colors and sequence translations to spot differences. To see bases and translations at the same time, you can for example view bases with a translation-colored background. Background colors can be set in the Base Color settings of CodonCode Aligner.

Visualizing Pairwise Alignments with Dot Plots

Dot plots provide an intuitive way to visualize the relationship between two sequences. Instead of viewing each base pair alignment in text form, a dot plot represents sequence similarity as a grid, with one sequence plotted along the x-axis and the other along the y-axis. Wherever the sequences match, dots appear — forming diagonal lines for regions of high similarity.

In CodonCode Aligner, dot plots are a quick way to assess how well two sequences align. Continuous diagonal lines indicate strong similarity, while breaks, shifts, or parallel lines can reveal insertions, deletions, repeats, or inversions. By examining the overall pattern, you can immediately see whether your sequences are related and whether there are any large-scale differences worth exploring in more detail.

To display a dot plot for selected sequences, choose Dot Plot from the Tools menu:

Dot Plot of two sequences for related species

The bottom of a dot plot window shows the aligned bases for the selected base positions in a plot above. Matching bases are shown with a light blue background. The selected position is highlighted by blue crosshairs in the dot plot. The crosshairs can be set by clicking on a plot with the mouse and by using the arrow keys.

At the top of the dot plot window you can change the word size, the zoom, and if the reverse complement of the vertical sequence should be included in the plot. The word size is the word length used when finding matching positions. Increase the word size to get rid of unwanted noise, and reduce it to see more matches. Generally it also makes sense to use a larger word size for longer sequences. You can zoom in and out with the + and - buttons at the top. The current zoom level is displayed in pixels per base.

A dot plot can be shown for one or two samples, but you can also generate and show several dot plots at once by selecting multiple samples. You will be given the option to choose which of the selected samples should be displayed horizontally and which vertically. Each horizontal sample will be compared to each vertical sample. This allows you, for example, to compare dot plots for several partial sequences to a reference sequence.

Align Forward with Reverse Sequence

Pairwsie sequence alignment can also be used to align overlapping forward and reverse reads. CodonCode Aligner automatically flips (reverse complements) sequences if needed for an alignment. Sequence orientation can be seen in the contig view for each alignment. Here is an example for a pairwise alingment of the forward and reverse reads for the same specimen in our example project:

Pairwise alignment of a forward and reverse read for the same specimen.

CodonCode Aligner automatically reverse complemented one of the two reads to create the alignment. Forward sequences are shown in blue in the contig overview, reverse sequences in orange.

FAQ

What are the two main advantages of sequence alignments in CodonCode Aligner versus other software?
  • CodonCode Aligner will automatically flip sequences to achieve the best alignment.
  • You can directly align contigs, keeping the connection to the underlying chromatograms or text sequences.
How can I tell if my sequences are similar enough to align?

You can use a dot plot or a quick pairwise alignment to check similarity. Closely related sequences produce a clear diagonal line in a dot plot or align with few gaps. If the plot shows only scattered dots or the alignment contains long gaps, the sequences may not be homologous or may need to be trimmed first.

What if my sequences do not align?

If the alignment fails, you can get more information about why from the information area at the bottom of the project view. Clicking on this area opens a status history dialog that contains information why an alignment failed. Once you know why the alignment failed, you can try to use this information to solve the problem (for example decreasing the word length or adjusting the gap penatly).

Why does my alignment contain many gaps or mismatches?

Large numbers of gaps or mismatches usually mean the sequences include non-overlapping regions. Try trimming poor-quality ends, checking for large stretches of ambiguous bases, adjusting the alignment settings, or confirming that both sequences are aimilar enough (e.g. come from the same gene or region) before aligning again.

Can I reverse complement a sequence before aligning?

Yes. CodonCode Aligner can automatically detect and reverse complement sequences when needed, so it does not matter which orientation sequences are in before the alignment. You can also reverse complement a sequence manually using the Edit → Reverse Complement command.

Can I save or export the results of a pairwise alignment?

Yes. You can save your alignment as part of a CodonCode Aligner project, or export it as NEXUS/PAUP, Phylip, ACE, FASTA, FASTQ, Genbank or EMBL, or print images for use in reports and presentations.

What algorithms can I use in CodonCode Aligner for pairwise sequence alignments?
  • End to end alignments: When this algorithm is used, alignments always include the entire sequences. When using this algorithm, it is important that samples have been end clipped (and possibly also vector trimmed).
  • Local alignments: When this algorithm is used, the start and the end of sequences is not necessarily included in the alignment - the alignments stop when the alignment score would not improve anymore. This can be due to too many discrepancies in low quality sequence near the ends, or due to unremoved vector sequences.
  • Large gap alignments: This algorithm allows for large gaps in between alignments, without penalizing the large gaps. The large gap algorithm can be useful when analyzing samples with large insertions or deletions.
Where can I find help for the alignment settings?

Our fine tuning sequence assembly page contains detailed descriptions of all algorithms and preference options. Note that you want to change the Assembly settings (not the Alignment settings) for regular pairwise sequence alignments (if you are not using a reference sequence).

What is the difference between alignment and assembly in CodonCode Aligner?

In CodonCode Aligner, the term "alignment" is generally used for alignments that use either a reference sequence, or run other programs such as Muscle or Clustal. All other alignments that use CodonCode Aligner's built in algorithms, are considered "assemblies" and are created using the various "Assemble" commands in CodonCode Aligner.