The Institute for Systems Biology Interspersed Repeat Masking Based on Protein Similarity


Query DNA sequences are compared to a database of transposable element encoded proteins. Copies of non-coding transposable elements like SINEs and long terminal repeats of retroviral-like elements will not be masked and the masked sequence is not "ready" for DNA based comparisons. However, this approach is especially useful when no repeat library is yet available for the query species and the primary concern is avoiding spurious matches in BLASTX-like searches. The method is also much faster than the DNA based approach. False positives are minimal but be aware of the existence of transposable element derived genes.

