The Institute for Systems Biology RepeatMasker

Services

  • RepeatMasking
  • Protein-based RepeatMasking
  • Pre-Masked Genomes
  • Server Queue Status
  • FEAST - Gene Prediction
  • Documentation

  • FAQ
  • RepeatMasker
  • Server Configuration
  • Community

  • Tools and Scripts
  • Related Papers
  • Contact

  • Mailing List
  • Download RepeatMasker
  • Download RepeatModeler
  • Submit Feedback
  • People
  • Stats

  • Sequence Processed:
  • Welcome!

    RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program. Sequence comparisons in RepeatMasker are performed by the program cross_match, an efficient implementation of the Smith-Waterman-Gotoh algorithm developed by Phil Green.

    Latest News

    If you would like to keep up with news and announcements relating to RepeatMasker, you can subscribe to the new RepeatMasker Announcements List.

    RepeatModeler: RECON Bug?
    Wednesday May 7, 2008
    We have noticed a few times that RECON's "re-definition of elements ( eleredef )" appears to hang up ( in a recent run it sat on this step for over 6 hours before we killed it ). If you also experience this problem while running RepeatModeler please let us know. When we restart the run from the begining it has been able to accomplish this step in a reasonable amount of time ( hg18: 1-5 minutes ).
    Pre-Masked Genomes Update - HG18 and MM9
    Tuesday May 6, 2008
    Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies HG18 and MM9. In addition to the data query service we have also provided the ability to download the complete annotation sets as compressed files.
    WUBlastXSearchEngine.pm Missing
    Friday May 2, 2008
    The WUBlastXSearchEngine.pm module was missing from yesterday's RepeatMasker release. This module is needed for the RepeatProteinMask program. Please re-download the 3.2.2 release of RepeatMasker if you had previously obtained it.
    RepeatProteinMask Released, RepeatModeler/RepeatMasker Updates
    Thursday May 1, 2008
    The program which runs the repeat protein search on the website is now available as a standalone program within the RepeatMasker package.

    Thanks to the Sanger Institute and various testers we have patched a few bugs in the first RepeatModeler release. The fixes required a new version of RepeatMasker to be created ( version 3.2.2 ) although the changes will not impact RepeatMasker results. If you have experienced problems installing the first release or with running the repeat classifier, please download the latest RepeatMasker and RepeatModeler packages.

    RepeatModeler Beta: Repeat Discovery Workbench Released
    Wednesday April 16, 2008
    RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.
    The software is available for download here.

    Also note that RepeatMasker is now up to version 3.2.1. This version as well as the previous version are organizational updates to support RepeatModeler and have little or no impact on RepeatMasker results.

    RepeatScout On Multiple Sequences
    Tuesday April 1, 2008
    RepeatScout is a highly successful de-novo repeat discovery algorithm developed by Price, A. L. et al. We have created a modified version of RepeatScout ( version 1.0.3 ) which supports searches on highly fragmented genomes. The new version does not attempt to extend seeds across sequence boundaries. Pavel Pevzner's lab has offered to host the download for this new version at: http://repeatscout.bioprojects.org/

    Continuing with the theme of de-novo repeat identification I have added two recent papers on the topic by Saha et al. to the Related Papers page.

    De-Novo Repeat Discovery and Detection
    Thursday, February 28 2008
    A nice survey paper by Bergman and Quesneville appeared recently in Briefings In Bioinformatics ( "Discovering and detecting transposable elements in genome sequences", Vol 8, No 6, 382-392 ). Many software packages have been developed to research repetitive DNA and this paper provides a succinct summary of each programs capabilities and their relationships to each other.
    Unexpected Downtime
    Thursday, February 28 2008
    At 3am last night the RepeatMasker cluster went down unexpectedly. We restored service at 10am this morning and are looking into the cause of the problem. If you had jobs queued/running at the time they will need to be resubmitted. We apologize for any inconvenience this may have caused.
    New Compute Node & Scheduler
    Wednesday, January 23 2008
    The RepeatMasker cluster received a new compute node ( for processing web requests ) and an upgrade to the job scheduler over the weekend. This upgrade should improve the overall throughput of the masking service.
    RepeatMasker open-3.1.9 Released
    Friday, January 11 2008
    A new version of RepeatMasker is available for download. Updates include:
    • Codebase synced to recently released library RMLib 20071204.
    • Improved DNA transposon fragment identification.
    • BugFix: The .align files generated by 3.1.8 did not contain cross_match style headers for each alignment.
    • BugFix: ProcessRepeats will infrequently exit in cycle 10 with the error: cycle 10 Can't call method "addDerivedFromAnnot" on an undefined value at ProcessRepeats line 3835. No results for the search are given.
    RepeatMasker Library Update
    Wednesday, December 12th, 2007
    A new version of the RepeatMasker repeat library ( RMLib: 20071204, RepBase: 12.06 ) is now available for download from GIRI.
    RepeatMasker Webserver Upgraded
    Wednesday, December 5th, 2007
    We replaced the main RepeatMasker webserver today with a new dual Xeon quad core server with 8GB of main memory. The new server also contains increased disk storage for expansion of the cached genomes.
    RepeatMasker Evidence Reporting
    Thursday August 9th, 2007
    A new version of the RepeatMasker webservice has been installed. The new version produces an additional output file ( *.out.html ) which provides the evidence ( source hsps ) with each final annotation call. The page is displayed in the typical one-annotation-per-line format with links ( the "+" preceding each line ) to expand the evidence data below the line. In addition to evidence reporting the repeat names on this new page link to details for each particular type of repeat.
    Open 3.1.8 Bugfix
    A bug in ProcessRepeats causes the program to crash when rare transposon join scenarios are encountered. The error message looks like this: "join(): Invalid join!$this == $partner at ProcessRepeats line 8164." or this: "This violates recursion....Died at ProcessRepeats line 1828". The fix is to replace your Open-3.1.8 ProcessRepeats file with the one contained in this archive RepeatMasker-open-3-1-8-patch-2.tar.gz. NOTE: You may have to alter the first line in ProcessRepeats to correctly reference your perl installation location.
    RepeatMasker Official Release Available
    The recent beta version of RepeatMasker has been tested and is now ready for an official release. We assigned it the version "Open-3-1-8" as there were several minor bugs fixed. You may download the release from here: download. The webserver is now running the updated version as well.
    RepeatMasker Beta Release Available
    A new version of RepeatMasker ( Open-3-1-7 ) is available for testing at download. This version includes a major refactoring of the ProcessRepeats code along with many bugfixes. Due to the volume of changes in this release we are offering it as a downloadable beta-release for a short period while we continue to test it. The webserver will continue to use open-3-1-6 until we are ready for the official release. Changes in the release include:
    • Repeat Defragmentation Improvements: The defragmentation stages of ProcessRepeats have been refactored improving the annotation of LTR, LINE and SINE repeats.
    • Metadata Migration: We have begun to move metadata ( subfamily relationships, consensus model relationships, genomic frequency etc ) out of the ProcessRepeats code. In the near future this will provide researchers greater access to these detailed repeat characteristics and enable the same processing rules to be used on custom generated repeat libraries.
    • Bugfixes:
      • IS Element Bugfix: In certain cases the extraction of IS elements fails causing the sequences indices to be off. The final result is an error message of the form: "ArrayList::get( -1 ) Index out of bounds!".
      • Division By Zero Bugfix: Under special circumstances ProcessRepeats produces a "Illegal division by zero at ProcessRepeats line 1860." error.
      • Long Sequence Names Bugfix: Long sequence names > 20 characters can cause ProcessRepeats to fail. Thanks to Gordon Lack for finding and reporting this.
      • Negative Sequence Positions Bugfix: ProcessRepeats was reporting negative sequence positions in the final output file.
    RepeatMasker open-3.1.6 Released
    A new version of RepeatMasker is available for download. Included in this new release are several major improvements:
    • The repeat database is updated with 694 new entries and 147 improvements on existing ones, including RepBase version 11.06. Major advances were for ancient mammalian repeats, which are shared by all mammals or all eutherians, and for marsupial repeats, especially for the opossum. Other significant additions were for Chlamydomonas and Caenorhabditis briggsae.
    • The annotation of DNA Transposon fragments has been improved. A new method of joining related transposon fragments improves classification of ambiguous fragments. More details here
    • A new option ( -lcambig ) identifies DNA Transposon annotations which without any supporting evidence are ambiguously defined. i.e The fragment falls within a non-unique portion of the family consensus. When this option is used all ambiguous repeat names are printed in lower case while the rest are in uppercase.
    • Fixed a bug with fasta files containing more than 60MB of sequence on a single file line.
    • Updated the taxonomy database and added the "-tree" option to the queryRepeatDatabase.pl script. The new option prints out the taxonomic tree of all species contained in the RepeatMasker database with information on the number of repeat families defined at each level.
    • Several bugs have been fixed in the DateRepeats routine, which played when large numbers were involved (e.g. analysis of whole chromosome RepeatMasker output) and/or the input is a concatenation of RepeatMasker outputs (repeat IDs are not necessarily unique anymore).
    • Other improvements in DateRepeats are better labeling of ambiguously called repeats, correctly assignment of lineage specificity to some elements that have independently inserted in separate lineages of mammals, and refinements in the phylogeny.
    Pre-Masked Genome Annotations Available
    The November 2003 and the January 2006 assemblies of the chimp genome ( panTro1 and panTro2 ), the May 2005 assembly of the dog ( canFam2 ), the May 2005 assembly of the Zebrafish genome ( danRer3 ), and the August 2002 assembly of the takifugu genome ( fr1 ) have been added to the Pre-Masked Genomes Page. You can query RepeatMasker annotations, alignments, and masked sequence using this webservice.
    [Archived News]
    Search

    Search the RepeatMasker website:

    Links
    - RepeatMasker makes use of Repbase which is a service of the Genetic Information Research Institute. Repbase is a comprehensive database of repetitive element consensus sequences.
    - Data and computational resources for the Pre-Masked Genomes page is provided courtesy of the UCSC Genome Bioinformatics group.

    Institute for Systems Biology
    This server is made possible by funding from the National Human Genome Research Institute (NHGRI grant # RO1 HG002939-01) 2003.