Contig Overview
The contig overview shows the aligned sequences and highlights differences using base colors. Zoom in and use mouse overs for details of the differences. Click on a position in the overview to navigate to this position in any view.
This guide explains how to compare nucleotide sequences in CodonCode Aligner using pairwise alignments. It also looks at dot plots and other visual tools to verify and interpret results, and inspect mismatches, gaps, and sequence quality in connection with local and global pairwise alignments.
Pairwise sequence alignment is the process of comparing two DNA, RNA, or protein sequences to identify regions of similarity. These similarities can reveal evolutionary relationships, functional conservation, or sequencing errors. By aligning two sequences, scientists can detect insertions, deletions, and substitutions that help interpret experimental results or validate assemblies.
Pairwise sequence alignment compares two sequences to reveal regions of similarity and difference.
But for an alignment to be meaningful, the sequences need to share some degree of relatedness. Pairwise sequence alignment works best for sequences that are similar enough to have conserved regions — for example, two versions of the same gene, homologous genes from related species, or overlapping sequencing reads from the same DNA region. If the sequences are too different, the alignment may show many gaps or no clear matches.
Once you have selected two related sequences, you can align them directly in CodonCode Aligner to visualize similarities and differences. The software provides simple features to perform the alignment, display mismatches, and explore quality information at each position. In the following steps, you will learn how create a pairwise alignment and interpret the results.
Note: To use this dataset, unpack the downloaded ZIP file, and open the "pairwise-alignment.ccap" project.
To create a pairwise alignment in CodonCode Aligner, select the sequences to align, and then click on the Assemble button in the project view toolbar, or choose Contig → Assemble from the menu.
You do not need to worry about sequence orientation. To avoid problems from sequences that are in the wrong orientation, CodonCode Aligner examines the orientation of all sequences before starting the alignment, and reverse-complements sequences as needed.
When the alignment is completed, the resulting contig can be found in the project view. Double-click on the contig to open the contig view window:
The contig view shows an overview of the aligned sequences on top, which also highlights the differences. The lower section shows the aligned bases and allows manual editing of the alignment.
After aligning two sequences, CodonCode Aligner makes it easy to see where they differ. Differences such as mismatched bases, insertions, and deletions are highlighted directly in the alignment view, allowing you to quickly spot variations between the sequences.
You can scroll through the alignment to inspect each position, zoom in on specific regions, or use built-in tools to navigate from one difference to the next. These features help you confirm sequence similarity, check for possible errors, and better understand how the two sequences compare overall.
There are several different ways that can help you find and inspect the differences in your alignments:
The contig overview shows the aligned sequences and highlights differences using base colors. Zoom in and use mouse overs for details of the differences. Click on a position in the overview to navigate to this position in any view.
See just the differences and their positions. The difference table can be shown at the top of the contig view instead of the overview. Apply filters, like exclude N's, and use the table to navigate to positions of interest.
Run mutation detection to get a list of the mutations and their amino acid effects. Mutations are highlighted in the alignment and mouse overs show relevant information.
Use the feature Mask Bases Matching Consensus to easily spot and focus in on differences in the contig view. This feature can be found in the View menu.
Define your regions of interest (features) in CodonCode Aligner's preferences, then use the Previous Feature and Next Feature buttons to quickly navigate from one defined feature to the next.
Look at the chromatograms to verify differences. You can also use the base qualities to get an idea of the accuracy of the base call. Double click on a position in the contig to see the traces at this location.
Use base colors and sequence translations to spot differences. To see bases and translations at the same time, you can for example view bases with a translation-colored background. Background colors can be set in the Base Color settings of CodonCode Aligner.
Dot plots provide an intuitive way to visualize the relationship between two sequences. Instead of viewing each base pair alignment in text form, a dot plot represents sequence similarity as a grid, with one sequence plotted along the x-axis and the other along the y-axis. Wherever the sequences match, dots appear — forming diagonal lines for regions of high similarity.
In CodonCode Aligner, dot plots are a quick way to assess how well two sequences align. Continuous diagonal lines indicate strong similarity, while breaks, shifts, or parallel lines can reveal insertions, deletions, repeats, or inversions. By examining the overall pattern, you can immediately see whether your sequences are related and whether there are any large-scale differences worth exploring in more detail.
To display a dot plot for selected sequences, choose Dot Plot from the Tools menu:
The bottom of a dot plot window shows the aligned bases for the selected base positions in a plot above. Matching bases are shown with a light blue background. The selected position is highlighted by blue crosshairs in the dot plot. The crosshairs can be set by clicking on a plot with the mouse and by using the arrow keys.
At the top of the dot plot window you can change the word size, the zoom, and if the reverse complement of the vertical sequence should be included in the plot. The word size is the word length used when finding matching positions. Increase the word size to get rid of unwanted noise, and reduce it to see more matches. Generally it also makes sense to use a larger word size for longer sequences. You can zoom in and out with the + and - buttons at the top. The current zoom level is displayed in pixels per base.
A dot plot can be shown for one or two samples, but you can also generate and show several dot plots at once by selecting multiple samples. You will be given the option to choose which of the selected samples should be displayed horizontally and which vertically. Each horizontal sample will be compared to each vertical sample. This allows you, for example, to compare dot plots for several partial sequences to a reference sequence.
Pairwsie sequence alignment can also be used to align overlapping forward and reverse reads. CodonCode Aligner automatically flips (reverse complements) sequences if needed for an alignment. Sequence orientation can be seen in the contig view for each alignment. Here is an example for a pairwise alingment of the forward and reverse reads for the same specimen in our example project:
CodonCode Aligner automatically reverse complemented one of the two reads to create the alignment. Forward sequences are shown in blue in the contig overview, reverse sequences in orange.
You can use a dot plot or a quick pairwise alignment to check similarity. Closely related sequences produce a clear diagonal line in a dot plot or align with few gaps. If the plot shows only scattered dots or the alignment contains long gaps, the sequences may not be homologous or may need to be trimmed first.
If the alignment fails, you can get more information about why from the information area at the bottom of the project view. Clicking on this area opens a status history dialog that contains information why an alignment failed. Once you know why the alignment failed, you can try to use this information to solve the problem (for example decreasing the word length or adjusting the gap penatly).
Large numbers of gaps or mismatches usually mean the sequences include non-overlapping regions. Try trimming poor-quality ends, checking for large stretches of ambiguous bases, adjusting the alignment settings, or confirming that both sequences are aimilar enough (e.g. come from the same gene or region) before aligning again.
Yes. CodonCode Aligner can automatically detect and reverse complement sequences when needed, so it does not matter which orientation sequences are in before the alignment. You can also reverse complement a sequence manually using the Edit → Reverse Complement command.
Yes. You can save your alignment as part of a CodonCode Aligner project, or export it as NEXUS/PAUP, Phylip, ACE, FASTA, FASTQ, Genbank or EMBL, or print images for use in reports and presentations.
Our fine tuning sequence assembly page contains detailed descriptions of all algorithms and preference options. Note that you want to change the Assembly settings (not the Alignment settings) for regular pairwise sequence alignments (if you are not using a reference sequence).
In CodonCode Aligner, the term "alignment" is generally used for alignments that use either a reference sequence, or run other programs such as Muscle or Clustal. All other alignments that use CodonCode Aligner's built in algorithms, are considered "assemblies" and are created using the various "Assemble" commands in CodonCode Aligner.
📚 Learning Center: Using CodonCode Aligner