|
Services
Documentation
Community
Contact
Stats
|
|
Welcome!
|
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program. Sequence comparisons in RepeatMasker are performed by the program cross_match, an efficient implementation of the Smith-Waterman-Gotoh algorithm developed by Phil Green.
|
|
Latest News
|
If you would like to keep up with news and announcements relating to
RepeatMasker, you can subscribe to the new
RepeatMasker Announcements List.
RepeatModeler: RECON Bug?
Wednesday May 7, 2008
|
We have noticed a few times that RECON's "re-definition of elements ( eleredef )" appears to hang up ( in a recent run it sat on this step for over 6 hours before we killed it ). If you also experience this problem while running RepeatModeler please let us know. When we restart the run from the begining it has been able to accomplish this step in a reasonable amount of time ( hg18: 1-5 minutes ).
|
Pre-Masked Genomes Update - HG18 and MM9
Tuesday May 6, 2008
|
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies HG18 and MM9. In addition to the data query service we have also provided the ability to download the complete annotation sets as compressed files.
|
WUBlastXSearchEngine.pm Missing
Friday May 2, 2008
|
The WUBlastXSearchEngine.pm module was missing from yesterday's RepeatMasker release. This module is needed for the RepeatProteinMask program. Please re-download the 3.2.2 release of RepeatMasker if you had previously obtained it.
|
RepeatProteinMask Released, RepeatModeler/RepeatMasker Updates
Thursday May 1, 2008
|
The program which runs the repeat protein search on the website is now available as a standalone program within the RepeatMasker package.
Thanks to the Sanger Institute and various testers we have patched a few bugs in the first RepeatModeler release. The fixes required a new version of RepeatMasker to be created ( version 3.2.2 ) although the changes will not impact RepeatMasker results. If you have experienced problems installing the first release or with running the repeat classifier, please download the latest RepeatMasker and RepeatModeler packages.
|
RepeatModeler Beta: Repeat Discovery Workbench Released
Wednesday April 16, 2008
|
RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.
The software is available for download here.
Also note that RepeatMasker is now up to version 3.2.1. This version as well as the previous version are organizational updates to support RepeatModeler and have little or no impact on RepeatMasker results.
|
RepeatScout On Multiple Sequences
Tuesday April 1, 2008
|
RepeatScout is a highly successful de-novo repeat discovery algorithm developed by Price, A. L. et al. We have created a modified version of RepeatScout ( version 1.0.3 ) which supports searches on highly fragmented genomes. The new version does not attempt to extend seeds across sequence boundaries. Pavel Pevzner's lab has offered to host the download for this new version at: http://repeatscout.bioprojects.org/
Continuing with the theme of de-novo repeat identification I have added two recent papers on the topic by Saha et al. to the Related Papers page.
|
Unexpected Downtime
Thursday, February 28 2008
|
At 3am last night the RepeatMasker cluster went down unexpectedly. We restored service at 10am this morning and are looking into the cause of the problem. If you had jobs queued/running at the time they will need to be resubmitted. We apologize for any inconvenience this may have caused.
|
New Compute Node & Scheduler
Wednesday, January 23 2008
|
The RepeatMasker cluster received a new compute node ( for processing web requests ) and an upgrade to the job scheduler over the weekend. This upgrade should improve the overall throughput of the masking service.
|
RepeatMasker open-3.1.9 Released
Friday, January 11 2008
|
A new version of RepeatMasker is available for download. Updates include:
- Codebase synced to recently released library RMLib 20071204.
- Improved DNA transposon fragment identification.
- BugFix: The .align files generated by 3.1.8 did not contain
cross_match style headers for each alignment.
- BugFix: ProcessRepeats will infrequently exit in cycle 10 with the
error: cycle 10 Can't call method "addDerivedFromAnnot" on an undefined value at ProcessRepeats line 3835. No results for the search are given.
|
RepeatMasker Library Update
Wednesday, December 12th, 2007
|
A new version of the RepeatMasker repeat library ( RMLib: 20071204, RepBase: 12.06 ) is now available for download from GIRI.
|
RepeatMasker Webserver Upgraded
Wednesday, December 5th, 2007
|
We replaced the main RepeatMasker webserver today with a new dual Xeon quad core server with 8GB of main memory. The new server also contains increased disk storage for expansion of the cached genomes.
|
RepeatMasker Evidence Reporting
Thursday August 9th, 2007
|
A new version of the RepeatMasker webservice has been installed. The new
version produces an additional output file ( *.out.html ) which provides
the evidence ( source hsps ) with each final annotation call. The page is
displayed in the typical one-annotation-per-line format with links
( the "+" preceding each line ) to expand the evidence data below the line.
In addition to evidence reporting the repeat names on this new page
link to details for each particular type of repeat.
|
|
Open 3.1.8 Bugfix
|
A bug in ProcessRepeats causes the program to crash when rare transposon join scenarios are encountered. The error message looks like this: "join(): Invalid join!$this == $partner at ProcessRepeats line 8164." or this: "This violates recursion....Died at ProcessRepeats line 1828". The fix is to replace your Open-3.1.8 ProcessRepeats file with the one contained in this archive RepeatMasker-open-3-1-8-patch-2.tar.gz. NOTE: You may have to alter the first line in ProcessRepeats to correctly reference your perl installation location.
|
|
RepeatMasker Official Release Available
|
The recent beta version of RepeatMasker has been tested and is now
ready for an official release. We assigned it the version "Open-3-1-8"
as there were several minor bugs fixed. You may download the release
from here: download. The webserver
is now running the updated version as well.
|
|
RepeatMasker Beta Release Available
|
A new version of RepeatMasker ( Open-3-1-7 ) is available for testing at
download. This version includes a major
refactoring of the ProcessRepeats code along with many bugfixes. Due to the
volume of changes in this release we are offering it as a downloadable
beta-release for a short period while we continue to test it. The webserver
will continue to use open-3-1-6 until we are ready for the official release.
Changes in the release include:
- Repeat Defragmentation Improvements: The defragmentation stages of ProcessRepeats
have been refactored improving the annotation of LTR, LINE and SINE repeats.
- Metadata Migration: We have begun to move metadata ( subfamily relationships,
consensus model relationships, genomic frequency etc ) out of the ProcessRepeats
code. In the near future this will provide researchers greater access to these detailed
repeat characteristics and enable the same processing rules to be used on custom
generated repeat libraries.
- Bugfixes:
- IS Element Bugfix: In certain cases the extraction of IS elements fails causing the sequences indices to be off. The final result is an error message of the form: "ArrayList::get( -1 ) Index out of bounds!".
- Division By Zero Bugfix: Under special circumstances ProcessRepeats produces a "Illegal division by zero at ProcessRepeats line 1860." error.
- Long Sequence Names Bugfix: Long sequence names > 20 characters can cause ProcessRepeats to fail. Thanks to Gordon Lack for finding and reporting this.
- Negative Sequence Positions Bugfix: ProcessRepeats was reporting negative sequence positions in the final output file.
|
|
RepeatMasker open-3.1.6 Released
|
A new version of RepeatMasker is available for download. Included in
this new release are several major improvements:
- The repeat database is updated with 694 new entries and 147 improvements on
existing ones, including RepBase version 11.06. Major advances were for
ancient mammalian repeats, which are shared by all mammals or all
eutherians, and for marsupial repeats, especially for the opossum. Other
significant additions were for Chlamydomonas and Caenorhabditis briggsae.
- The annotation of DNA Transposon fragments has been improved. A new method
of joining related transposon fragments improves classification of ambiguous
fragments. More details here
- A new option ( -lcambig ) identifies DNA Transposon annotations which without
any supporting evidence are ambiguously defined. i.e The fragment falls within a
non-unique portion of the family consensus. When this option is used all ambiguous
repeat names are printed in lower case while the rest are in uppercase.
- Fixed a bug with fasta files containing more than 60MB of sequence on a single
file line.
- Updated the taxonomy database and added the "-tree" option to the queryRepeatDatabase.pl
script. The new option prints out the taxonomic tree of all species contained in the
RepeatMasker database with information on the number of repeat families defined at
each level.
- Several bugs have been fixed in the DateRepeats routine, which played when
large numbers were involved (e.g. analysis of whole chromosome RepeatMasker
output) and/or the input is a concatenation of RepeatMasker outputs (repeat
IDs are not necessarily unique anymore).
- Other improvements in DateRepeats are better labeling of ambiguously called
repeats, correctly assignment of lineage specificity to some elements that
have independently inserted in separate lineages of mammals, and refinements
in the phylogeny.
|
|
Pre-Masked Genome Annotations Available
|
The November 2003 and the January 2006 assemblies of the chimp genome ( panTro1 and panTro2 ), the May 2005 assembly of the dog ( canFam2 ), the May 2005 assembly of the Zebrafish genome ( danRer3 ), and the August 2002 assembly of the takifugu genome ( fr1 ) have been added to the
Pre-Masked Genomes Page. You can query RepeatMasker annotations, alignments,
and masked sequence using this webservice.
|
[Archived News]
|
|
Links
|
- RepeatMasker makes use of Repbase which is a service of the Genetic Information Research Institute. Repbase is a comprehensive database of repetitive element consensus sequences.
- Data and computational resources for the Pre-Masked Genomes page is
provided courtesy of the UCSC Genome Bioinformatics group.
|
|