CodonCode Corporation
Better Software for DNA Sequencing

How CodonCode Aligner Detects Mutations in Sanger Sequences

CodonCode Aligner uses several heuristics and trace-based features to identify heterozygous point mutations in aligned Sanger sequencing data. The algorithm focuses on identifying secondary peaks and peak height patterns that indicate potential mutations.

Mutation Detection Workflow

When searching for heterozygous point mutations, CodonCode Aligner performs the following steps:

  1. Low-quality regions are excluded.
    Each sequence is scanned for low-quality sequence at the start and end. Bases below the threshold defined in the mutation detection preferences are marked with a dataNeeded tag and ignored.
  2. Regions with heterozygous indels are excluded.
    Aligner skips over any region marked with a heterozygoteIndel tag. These tags must extend to the start or end of the sample; otherwise, they’re assumed to extend to the end. This step is skipped if "Look for heterozygous indels" is disabled in the preferences.
  3. Traces are scanned base by base.
    For each position in the consensus, Aligner examines all aligned traces. It looks for secondary peaks and intensity drops in both sequencing directions. For text sequences (no trace), only base differences from the consensus are considered.
  4. Noise filtering is applied.
    When a potential heterozygous base is found, Aligner compares that position across all samples to distinguish real secondary peaks from random noise. This noise filter can be turned off in the preferences.
  5. Classification is based on peak heights for primary and secondary peaks.
    Aligner classifies bases as homozygous or heterozygous based on secondary peak height and reduction in primary peak intensity. In some cases, a base may be marked heterozygous even without a clear drop in intensity—for example, if all traces at that position appear heterozygous.
  6. Tags are added to all samples at variant positions.
    If any sample at a given consensus position is classified as heterozygous, tags are added to all aligned samples - unless the option "Add tags only to mutated bases" is enabled in the preferences.
  7. Amino-acid level effects of mutations are noted in the tag comments.
    At homo- and heterozygous mutations, the amino-acid level effect of the mutations is described in the tag comments, using the coding sequence annotation of the reference sequence. If no coding sequence annotation is available, translation starting at the first base of the consensus sequence is used.

The sensitivity of mutation detection can be adjusted using the mutation detection preferences. However, keep in mind that CodonCode Aligner may still miss some heterozygous mutations or classify bases incorrectly—especially in low-quality regions or in traces with inconsistent peak shapes.

Algorithm Background

CodonCode Aligner follows the general approach originally described by Dr. Nickerson and colleagues in the PolyPhred algorithm for detecting heterozygous mutations in Sanger sequencing traces.

A key insight from their work was that combining secondary peak height with a reduction in the primary peak height can significantly improve detection accuracy. In their study, this approach yielded accuracies of 90% to over 99%, depending on the sequencing chemistry - substantially better than methods that rely only on secondary peaks.

While CodonCode Aligner uses a similar principle for heterozygous SNP detection, actual performance may vary depending on sequence quality, base calling, and project-specific factors.

Limitations of Mutation Detection

CodonCode Aligner's mutation detection is designed for research use and is optimized for Sanger sequencing of PCR-amplified templates. While it is accurate in many common use cases, there are important limitations to be aware of:

Technical and Experimental Constraints

Classification Errors

Tool Behavior and Settings

All mutation calls should be reviewed manually by a qualified scientist. This is especially important in critical regions, or when working near the limits of trace quality. Analysis must consider the possibility of both false-positive and false-negative errors.