CodonCode Corporation
Better Software for DNA Sequencing

How to Analyze Heterozygous Indels in Sanger Sequences

This guide explains how to analyze heterozygous insertions and deletions(indels) in Sanger sequences using CodonCode Aligner.

Introduction

In mutation screening and clinical research, heterozygous indels are often particularly important due to their potential to cause frameshifts and loss of function. The screenshot below shows a typical Sanger sequencing trace with a heterozygous insertion. Overlapping peaks appear where one allele contains an insertion, which causes the two sequences to become misaligned, leading to mixed peaks downstream.

Sanger sequence trace showing a heterozygous insertion

Note that ambiguity base calls (e.g. K, S, Y, W, R, S) often appear starting at the indel site, and the sequence quality may drop sharply, as indicated here by the green background.

Finding Heterozygous Indels

CodonCode Aligner can detect potential heterozygous insertions and deletions automatically using the Find Heterozygous Indels function in the Sample menu. To use it, select one or more traces and choose Find Heterozygous Indels from the Sample menu.

Aligner will analyze the selected traces for patterns that are typical of heterozygous indels. When a putative indel is found, Aligner adds a heterozygoteIndel tag starting at the inferred indel position and extending to the end of the sequence. The results are displayed in a summary dialog and, if any indels were found, a separate report window:

Result dialog showing heterozygous indel detection results

For each trace, Aligner provides two separate estimates of the indel size. The first estimate is based on analyzing the peaks in the traces directly. The second estimate is based on analyzing mixed base calls in the sequence. Often, both methods give the same result, as shown here for a one-base indel.

For best results, use traces that have not been end clipped and that include base-specific quality scores. You can re-run base calling if needed to restore untrimmed sequences. Indel detection may fail if the indel is near the start or end of a sequence, or if sequence quality is very low.

Analyzing Indels

To analyze a heterozygous indel in CodonCode Aligner, the sequence must have a heterozygoteIndel tag. These tags can be added automatically using the Find Heterozygous Indels function, or manually.

To begin the analysis, select a contig that contains a sequence with a heterozygoteIndel tag, then choose Process Indels from the Contig menu.

CodonCode Aligner uses two complementary methods to model the sequence change. The first method, "trace subtraction", requires traces for a wild type sequence without a heterozygous indel. This wild type sequence is then subtracted from the mutated sequence, after scaling and stretching as necessary, to reveal the mutated allele.

The screenshot below shows the result of the trace subtraction method. From top to bottom, the traces represent the wild type sequence, the heterozygous sample, and the subtracted result. The clean, evenly spaced peaks in the subtracted trace indicate a successful subtraction.

Trace view showing wild type, heterozygous indel, and subtraction result

The second method that CodonCode Aligner uses to obtain the sequence of the mutated allele is text-based. It first generates mixed base calls at all positions in the indel region where two peaks are present, and then removes the base that corresponds to the wild type sequence. Unlike the trace subtraction method, this method works even without wild type traces, as long as an accurate reference or consensus sequence is available.

The screenshot below shows a contig view after processing. It shows the alignment of the wild type sequence; the heterozygous indel sample where the bases were replaced using the text-base method; and the artificial sample that was created by trace subtraction.

Contig view showing results of heterozygous indel processing

The alignment shows that both the original indel sample and the artificial subtracted sample have a one-base T insertion. Note that the sequence of the "hetero_indel" sample, which was created using the text-based method, had two discrepancies after the insertion (AC instead of CA). If you look at the traces in this region, you can see the cause for this error: the double peaks in this region were offset by almost half a peak width, which caused errors in the base and ambiguity calling here.

Splitting Heterozygous Indels

CodonCode Aligner can create separate traces for each allele by splitting a sequence with a heterozygous indel tag into two pseudoallele sequences. This can be useful when no suitable wild type reference is available, or when you want to double-check results from other indel analysis methods.

To split a sequence, select one or more samples with a heterozygoteIndel tag, then choose Split Heterozygous Indels from the Sample menu.

For each selected sequence, Aligner creates two new samples in the Unassembled Samples folder. These pseudoalleles represent a longer and a shorter version of the sequence, starting at the indel site.

The example below shows the result of splitting a heterozygous indel. The longer allele includes a two-base insertion compared to the shorter allele. Both traces begin at the indel site and continue independently, allowing visual inspection of the inferred alleles.

Trace view showing split pseudoalleles from a heterozygous indel

Splitting generally works well for small indels—up to about 25 bases— in regions where the trace peaks are clearly separated. For longer indels or poor-quality traces, the resulting sequences may contain missing or extra peaks, double peaks, or irregular spacing.

Note that the pseudoalleles are intended primarily for estimating indel size and manual verification. Differences observed downstream of the indel may not reflect real biological variation, and real differences may be assigned to the wrong allele.

Reviewing Results

After analyzing a heterozygous indel using Find Heterozygous Indels, Process Indels, or Split Heterozygous Indels, it's important to compare the indel size estimates. All three functions report an estimated size — either directly (for Find) or based on alignment (for Process and Split). Differences between estimates are a red flag and should be reviewed carefully.

To verify the indel size visually, you can simplify the trace view by hiding some of the traces. Click in the lower-left corner of a trace view panel to toggle its visibility. The screenshot below shows the clickable area for three traces, each highlighted with a red circle.

Trace view showing where to click to hide or show individual traces

In the example above, the single base T insertion is clearly visible in the hetero indel trace and the subtracted trace. After the insertion, single peaks in the wild type sequence are split into double-peaks in the mutated sequence, since the template from one of the two chromosomes is now one base longer. The subtracted trace, which represents the pseudo-allele with the insertion, shows only the second peak from the double peaks in the mutated sample.

Pressing the shift key and/or the alt/option key while clicking to hide traces lets you hide multiple traces:

Tips and Limitations

CodonCode Aligner provides multiple tools for analyzing heterozygous indels, but it's important to understand their intended use and limitations.

Careful visual review of the results is always strongly recommended, but especially important when size estimates from different methods disagree.