The Institute for Systems Biology RepeatMasker Download

Prerequisites
  1. Unix system with perl 5.8.0 or higher installed
  2. Python 3 and the h5py python library.
  3. Sequence Search Engine
    RepeatMasker uses a sequence search engine to perform it's search for repeats. Currently Cross_Match, RMBlast and WUBlast/ABBlast are supported. You will need to obtain one or the other of these and install them on your system.
    • For Cross_Match go to http://www.phrap.org You will want to select "Phred/Phrap/Consed" as Cross_Match is part of the Phrap package.
    • For RMBlast ( NCBI Blast modified for use with RepeatMasker/RepeatModeler ) please go to our download page: http://www.repeatmasker.org/rmblast. It is highly recommended to use 2.13.0 or higher.
    • For HMMER please download the v3.2.1 version here: http://hmmer.org/
    • For ABBlast/WUBlast go to [ NOTE: Rights to BLAST 2.0 (WU-BLAST) have been acquired by Advanced Biocomputing, LLC. http://blast.advbiocomp.com/licensing/ RepeatMasker 3.2.8 and above fully support both variants ]
  4. TRF - Tandem Repeat Finder, G. Benson et al.
    You can obtain a free copy at http://tandem.bu.edu/trf/trf.html. RepeatMasker was developed using TRF version 4.0.9
  5. Repeat Database
    RepeatMasker can be used with custom libraries, or with Dfam out of the box. Dfam is an open database of transposable element (TE) profile HMM models and consensus sequences. The current release of RepeatMasker is shipped without a database, however a minimal version of Dfam 3.8 ( root partition ) can be downloaded automatically by the configure script. Additional taxa partitions may be downloaded and configured at any time.
Installation
  1. Download RepeatMasker
    Latest Released Version: 12/5/23: RepeatMasker-4.1.6.tar.gz
    Previous Released Version: 3/23/23: RepeatMasker-4.1.5.tar.gz
  2. Unpack Distribution
    Unpack the distribution in your home directory or in a location where it may be shared with other users of your system ( ie. /usr/local/ ). Make sure you do not extract in a directory already containing a pre-existing directory called "RepeatMasker" as it will attempt to overwrite files contained within.
    • cp RepeatMasker-open-4-#-#.tar.gz /usr/local
    • cd /usr/local
    • gunzip RepeatMasker-open-4-#-#.tar.gz
    • tar xvf RepeatMasker-open-4-#-#.tar
  3. Install RepeatMasker Libraries
    RepeatMasker is currently not distributed with a database. The configure script will prompt to download a minimal Dfam database during installation. There are three options for supplementing/updating the main RepeatMasker library:
    • The complete Dfam 3.8 database may be downloaded from www.dfam.org in partitioned famdb HDF5 format or individual partitions (divided by taxa) may be downloaded as needed. For example:
      • wget https://www.dfam.org/releases/Dfam_3.8/families/FamDB/dfam38_full.0.h5.gz
      • gunzip dfam38_full.0.h5.gz
      • mv dfam38_full.0.h5.gz /usr/local/RepeatMasker/Libraries/famdb
        • NOTE: only partitions from the same Dfam release should be in this directory
    • and/or:
    • The RepBase RepeatMasker Edition ( final version 10/26/2018 ) may be downloaded from www.girinst.org and unpaked in the RepeatMasker directory. For example:
      • cp RepBaseRepeatMaskerEdition-20181026.tar.gz /usr/local/RepeatMasker/
      • cd /usr/local/RepeatMasker
      • gunzip RepBaseRepeatMaskerEdition-20181026.tar.gz
      • tar xvf RepBaseRepeatMaskerEdition-20181026.tar
      • rm RepBaseRepeatMaskerEdition-20181026.tar
  4. Run Configure Script
    The program requires some initial configuration. This should also be re-run after updates to the library files.
    • cd /usr/local/RepeatMasker
    • perl ./configure

RepeatMasker "open-3.0/4.0" is licensed under the Open Source License v2.1.

Release Notes
RepeatMasker-4.1.6

  • Upgraded to FamDB 1.0.2 to support Dfam 3.8 and the new partitioned database format.
  • Added Libraries/RMRB_spec_to_tax.json to project. This maps the RepBase taxanames to current NCBI tax_ids and needs to be refreshed with each new Dfam release.
  • Added softmasking support to NCBIBlastSearchEngine.pm.
  • Added new '--uncurated' flag to handle single export Dfam format. If this flag is used the CONS/HMM cached directories will be suffixed with "_wunc".
  • Fixed sunk error messages from famdb.py. Now they will be displayed and cause RepeatMasker to quit.
  • Additional library setup steps and error checking for configure utility.
  • CAF documentation in SearchResult.
  • calcDivergenceFromAlign clarified use of "-a" in documentation.
RepeatMasker-4.1.5

  • Updated codebase for Dfam 3.7 compatibility (famdb format 4.3).
  • Penelope classification change caused *.tbl file accounting to place them in the Unknown category. Also fixed landscape generation tool.
  • Added a new utility to merge *.out *.align files generated by running RepeatMasker serially.
  • Repbase metadata was out-of-date, updated species names so that they match the current NCBI Taxonomy names.
  • Fixed an issue with the HMM parser. It wasn't recognizing negative values for Tau with models that do not have GA thresholds.
RepeatMasker-4.1.4

  • Added support for RMBlast 2.13.0.
  • Release of the TE genome browser visualization (UCSC) and trackhub generation tool.
  • New CpGSites and unadjusted Kimura stats in the *.align file.
  • Fixed a bug that caused the read-only state of the input fasta file to propogate to the intermediate files and cause the program to exit.
  • Removed DateRepeats as it's based on old library formats - this functionality will return with the refactored version of RM in the works.
RepeatMasker-4.1.3-p1

  • A recent change in 4.1.3 to correct blank fragment ID fields can in rare cases causing the error message: 'Can't call method "setLeftLinkedHit"'.
  • The RepeatAnnotationData.pm file containing necessary information for recognizing equivalent fragments of DNA transposons was missing data.
  • The MULE-MuDR class was added to the *.tbl file for "-lib" searches.
RepeatMasker-4.1.3

  • A new utility for generating trackHubs for our new UCSC TE visualization Fix a bug where killing RM while starting up can leave the cached libraries in an inconsistent state.
  • Fixed a bug where in rare cases the joined fragment ID field is blank
  • Merged in changes to Dupmasker supporting multi-threaded use
  • Fixed legacy RepBase taxonomic labels
  • Added support for GFF v3 output and fixed the utility/rmOutToGFF3.pl
RepeatMasker-4.1.2-p1

  • Releases 4.1.1-4.1.2 contained a bug with the processing of Alu sequences in primates. The step where an initial annotation is refined into a particular Alu subfamily was not performed and the annotations remained labeled with the initial capture sequence ( AluJb, AluSx, or AluY ). This patch release fixes this one issue.
RepeatMasker-4.1.2

  • Fixed 21 protein family classifications in RepeatProteinLib.
  • Fixed a problem with the generation of the RepeatMasker.lib file for use by RepeatModeler. In release 4.1.1 it did not add the classification info to this auxilary file.
  • Fixed a "log(0)" error that can cause the program to fault in rare circumstances.
  • buildSummary now supports FamDB and has improved documentation.
  • Bugfixes and improvements to FamDB.
RepeatMasker-4.1.1

  • Dfam (starting with version 3.2) is now distributed in the FamDB file format based on HDF5, which has improved support for large datasets compared to the EMBL and HMM formats that were previously used. RepeatMasker therefore includes a copy of famdb.py, and depends on the python package h5py.
    • The 'configure' script and other parts of RepeatMasker have been updated to accomodate these changes.
    • The utilities 'queryTaxonomyDatabase.pl' and 'queryRepeatDatabase.pl' are no longer included, since that data is now included in FamDB. The 'famdb.py' tool can be used to make many of the same queries as the removed utilies, and even more.
RepeatMasker-4.1.0

  • RepeatMasker now has a refactored configuration system making it easier to distribute RepeatMasker via package managers and/or bundle RepeatMasker into containers.
RepeatMasker-open-4-0-9-p1

  • Input files containing multiple FASTA sequences caused RepeatMasker to error out with a message like:

    "WARNING: TRF returned an error (Return code = ### )
    TRF parameters: 2.7.7.80.10.50.10
    A search phase could not complete on this batch.
    The batch file will be re-run and if possible the
    program will resume.
    WARNING: Retrying batch ( 1 ) [ 255,, 195]..."

    This bug was introduced when we attempted to improve TRF error catching. Unfortunatly the return codes are not documented for TRF and the assumption that 256 is the only successful return code is wrong. The "success" code appears to change depending on the number of sequences in the file. The workaround is to fail only if there is a message in the error output file.

RepeatMasker-open-4-0-9

  • General compatibility update for Dfam 3.0. Dfam and Dfam_consensus have merged into one combined database. RepeatMasker can use Dfam using any of it search engines and will automatically switch to using consensus sequences or profile HMMs based on the engine used. It is important to note that, by default RepeatMasker will use Dfam consensus sequences when library duplicates are detected.
  • Bugfix: The -dir option no longer assumes that the directory already exists.
  • Feature: The configure script now accepts command-line parameters to change configuration settings. Configure also re-reads existing configuration options to use as prompt defaults.
Archived Releases

Institute for Systems Biology
This server is made possible by funding from the National Human Genome Research Institute (NIGRI grant # RO1 HG002939).