The PHRED - PHRAP Package
The programs Phred, Phrap, Cross_match, Swat, and Consed were developed by Dr. Phil Green and co-workers at the University of Washington in Seattle. CodonCode Corporation has acquired the distribution rights for Phrap, Phred, Cross_match, Swat, and Consed. (Academic users still can obtain source code for Phred-Phrap for restricted academic use free of charge directly from the authors).
Since Phred and Phrap were developed for easy integration into automated data processing pipelines, the programs do not offer a graphical user interface. If you are interested in running Phred and Phrap from a graphical user interface on Windows or Mac OS X, we suggest that you try CodonCode Aligner.
This page gives a brief description of Phred, Cross_match, and Phrap. For additional information about the Phred - Phrap package for Windows, Mac OS X, and Unix, please visit www.phrap.com.
Phred is a base-calling program for DNA sequence traces. The program was developed by Drs. Phil Green and Brent Ewing, and is copyrighted by the University of Washington. It is widely used by the largest academic and commercial sequencing laboratories. Two major reasons why Phred is used by leading sequencers are:
- Better base calling accuracy. Phred achieved a 40-50% lower error rates than ABI software on large test data sets (Ewing, Hillier, Wendl & Green (1998), Genome Research 8: 175-185).
- Error probabilities for each base call. The highly accurate error probablilities Phred calculates for each base enable increase automation of the sequencing process, for example:
- Effective quality control immediately after sequence production.
- Quantitive benchmarking of different sequencing methods and protocol changes.
- Identification of repeat sequences in during assembly.
- More accurate consensus sequences.
- Automatic identification of areas that require "finishing" efforts.
- Drastically lower false positive error rates in mutation detection.
For more information about Phred, please visit www.phrap.com/phred/.
Phrap, a leading program for DNA sequence assembly, was developed by Dr. Phil Green and is copyrighted by the University of Washington. Compared to other sequence assembly programs, PHRAP offers:
- Fast assemblies. Assemblies of cosmid- to BAC sized projects with several hundred to two thousand reads typically take only minutes to complete on high-powered workstations or personal computers.
- Accurate consensus sequences. PHRAP uses PHRED's quality scores to determine highly accurate consensus sequences. PHRAP examines all individual sequences at a given position, and generally uses the highest quality sequence to build the consensus - similar to the way scientists would correct consensus sequences during "contig editing". Compared to simple majority rules use in older sequence assembly programs, PHRAP's approach can give significantly more accurate consensus sequences, especially in regions of low coverage or regions of systematic errors like compressions.
- Consensus quality estimates. PHRAP uses the quality information of individual sequences to estimate the quality of the consensus sequence. In addition, PHRAP uses available information about sequencing chemistry (dye terminator or dye primer) and confirmation by "other strand" reads in estimating the consensus quality. This often allows scientists to ignore random errors, and to focus finishing efforts exclusively onto regions where the data quality is insufficient. Consensus quality estimates can also be very helpful in mutation detection by DNA sequencing (see Rieder, Taylor, Tobe & Nickerson (1998), Nucleic Acids Research 26: 967-973).
- Ability to assemble very large projects. PHRAP has been used routinely to assembly bacterial genomes sequenced by the "shotgun" approach, where each project contained tens of thousands of reads. Smaller bacterial genomes (2 million bases or less) could often be assembled in less than three hours.
- Improved identification and handling of repeats. PHRAP uses quality scores to estimate whether discrepancies between two overlapping sequences are more likely to arise from random errors, or from different copies of a repeated sequence. For repeats with 95 to 98% identity (like human Alu sequences) and high quality sequence data, this typically yields correct assemblies.
While PHRAP is clearly one of the best, if not the best, sequence assembly program that is currently available, PHRAP is also very complex. For very large projects, it is sometimes difficult to understand PHRAP's behaviour, or how to determine how to use PHRAP most effectively. CodonCode Corporation offers consulting services for existing PHRAP users, which allows our clients to benefit from our experience in using PHRAP to assemble some of the largest sequencing projects done so far.
For purchases of PHRAP executables, please visit www.phrap.com.
Cross_match, also developed by Dr. Phil Green and copyrighted by the University of Washington, performs fast comparisons of DNA sequences. For example, the comparison of several hundred thousand bases of "raw" sequences to the sequence of an entire BAC typically takes less than one minute. Within the PHRED - PHRAP system, CROSS_MATCH is typically used for vector screening. Other common uses of CROSS_MATCH include:
- Identification of overlaps between contig ends after assembly with PHRAP or other assembly programs.
- Identification of potential repeat sequences in assemblies.
- Generation of error summaries and lists after completion of sequencing projects.
- Estimation of vector contamination in newly created libraries.