How to Analyze Heterozygous Indels in Sanger Sequences
This guide explains how to analyze heterozygous insertions and deletions(indels) in Sanger sequences using CodonCode Aligner.
Introduction
In mutation screening and clinical research, heterozygous indels are often particularly important due to their potential to cause frameshifts and loss of function. The screenshot below shows a typical Sanger sequencing trace with a heterozygous insertion. Overlapping peaks appear where one allele contains an insertion, which causes the two sequences to become misaligned, leading to mixed peaks downstream.
Note that ambiguity base calls (e.g. K, S, Y, W, R, S) often appear starting at the indel site, and the sequence quality may drop sharply, as indicated here by the green background.
Finding Heterozygous Indels
CodonCode Aligner can detect potential heterozygous insertions and deletions automatically using the Find Heterozygous Indels function in the Sample menu. To use it, select one or more traces and choose Find Heterozygous Indels from the Sample menu.
Aligner will analyze the selected traces for patterns that are typical of heterozygous
indels. When a putative indel is found, Aligner adds a heterozygoteIndel tag
starting at the inferred indel position and extending to the end of the sequence. The
results are displayed in a summary dialog and, if any indels were found, a separate
report window:
For each trace, Aligner provides two separate estimates of the indel size. The first estimate is based on analyzing the peaks in the traces directly. The second estimate is based on analyzing mixed base calls in the sequence. Often, both methods give the same result, as shown here for a one-base indel.
For best results, use traces that have not been end clipped and that include base-specific quality scores. You can re-run base calling if needed to restore untrimmed sequences. Indel detection may fail if the indel is near the start or end of a sequence, or if sequence quality is very low.
Analyzing Indels
To analyze a heterozygous indel in CodonCode Aligner, the sequence must have a
heterozygoteIndel tag. These tags can be added automatically using
the Find Heterozygous Indels function, or manually.
To begin the analysis, select a contig that contains a sequence with a
heterozygoteIndel tag, then choose Process Indels
from the Contig menu.
CodonCode Aligner uses two complementary methods to model the sequence change. The first method, "trace subtraction", requires traces for a wild type sequence without a heterozygous indel. This wild type sequence is then subtracted from the mutated sequence, after scaling and stretching as necessary, to reveal the mutated allele.
The screenshot below shows the result of the trace subtraction method. From top to bottom, the traces represent the wild type sequence, the heterozygous sample, and the subtracted result. The clean, evenly spaced peaks in the subtracted trace indicate a successful subtraction.
The screenshot below shows a contig view after processing. It shows the alignment of the wild type sequence; the heterozygous indel sample where the bases were replaced using the text-base method; and the artificial sample that was created by trace subtraction.
The alignment shows that both the original indel sample and the artificial subtracted sample have a one-base T insertion. Note that the sequence of the "hetero_indel" sample, which was created using the text-based method, had two discrepancies after the insertion (AC instead of CA). If you look at the traces in this region, you can see the cause for this error: the double peaks in this region were offset by almost half a peak width, which caused errors in the base and ambiguity calling here.
Splitting Heterozygous Indels
CodonCode Aligner can create separate traces for each allele by splitting a sequence with a heterozygous indel tag into two pseudoallele sequences. This can be useful when no suitable wild type reference is available, or when you want to double-check results from other indel analysis methods.
To split a sequence, select one or more samples with a
heterozygoteIndel tag, then choose
Split Heterozygous Indels from the Sample menu.
For each selected sequence, Aligner creates two new samples in the Unassembled Samples folder. These pseudoalleles represent a longer and a shorter version of the sequence, starting at the indel site.
The example below shows the result of splitting a heterozygous indel. The longer allele includes a two-base insertion compared to the shorter allele. Both traces begin at the indel site and continue independently, allowing visual inspection of the inferred alleles.
Splitting generally works well for small indels—up to about 25 bases— in regions where the trace peaks are clearly separated. For longer indels or poor-quality traces, the resulting sequences may contain missing or extra peaks, double peaks, or irregular spacing.
Note that the pseudoalleles are intended primarily for estimating indel size and manual verification. Differences observed downstream of the indel may not reflect real biological variation, and real differences may be assigned to the wrong allele.
Reviewing Results
After analyzing a heterozygous indel using Find Heterozygous Indels, Process Indels, or Split Heterozygous Indels, it's important to compare the indel size estimates. All three functions report an estimated size — either directly (for Find) or based on alignment (for Process and Split). Differences between estimates are a red flag and should be reviewed carefully.
To verify the indel size visually, you can simplify the trace view by hiding some of the traces. Click in the lower-left corner of a trace view panel to toggle its visibility. The screenshot below shows the clickable area for three traces, each highlighted with a red circle.
In the example above, the single base T insertion is clearly visible in the hetero indel trace and the subtracted trace. After the insertion, single peaks in the wild type sequence are split into double-peaks in the mutated sequence, since the template from one of the two chromosomes is now one base longer. The subtracted trace, which represents the pseudo-allele with the insertion, shows only the second peak from the double peaks in the mutated sample.
Pressing the shift key and/or the alt/option key while clicking to hide traces lets you hide multiple traces:
- Click: Hide or show the selected trace
- Alt / Option + Click: Hide or show that trace in all samples
- Shift + Click: Hide all other traces in the current sample
- Shift + Alt / Option + Click: Hide all other traces in all samples
Tips and Limitations
CodonCode Aligner provides multiple tools for analyzing heterozygous indels, but it's important to understand their intended use and limitations.
- Use for sizing only: The primary purpose of these tools is to estimate the size of the heterozygous indel. Variations observed beyond the indel site—especially in split pseudoalleles—may be artifacts or incorrectly assigned to one allele.
- Size limits: All methods except trace subtraction are limited to indels of approximately 25 bases or fewer. Only the trace subtraction method can be used for larger indels.
- Single event assumption: All analysis methods assume that there is exactly one indel event in the sequence and that the other allele is identical to the reference, consensus, or wild type sequence.
-
Common reasons for failure:
- Poor-quality sequences
- Indels very close to the start or end of a sequence
- Multiple indels in the same region
- Multiple point mutations close to the indel site
Careful visual review of the results is always strongly recommended, but especially important when size estimates from different methods disagree.
Related Resources
📚 Learning Center: Using CodonCode Aligner
🏔️ Overview: Heterozygous Indel Analysis
🎬 Video Tutorial: Heterozygous Indel Analysis