Mobile elements play an important role in genomic island detection. Although various mobile element databases exist, such as ISFinder, ICEfinder, GyDB and PHAST, each focusing on specific types of mobile elements, there remains a need for a comprehensive mobile element database to aid in the identification of genomic islands.
To solve the problems, a research team led by Dai Qi published their
new research on 15 July 2025 in
Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a comprehensive dataset (MEGI) of the mobile elements for genomic island detection, including the insertion sequences, integrative and conjugative elements, transposons and bacteriophages. Utilizing the proposed MEGI datasets, the Centroid and SIGI-HMM methods achieved a recall increase of over 20%, along with enhanced accuracy rates.
In the research, they compile standard datasets used in genomic island research, including the GIs/non-GIs dataset, the S. enterica Typhi CT18 dataset, the L-data dataset, and the dataset from IslandViewer. For each genomic sequence, they manually annotate mobile elements, including insertion sequences, integrative and conjugative elements, transposons, and bacteriophages.
To validate the effectiveness of this database, they employ these mobile elements to assist in the identification of genomic islands. In this work, candidate genomic islands are identified using Centroid and SIGI. They filter out the candidate genomic islands that do not contain IS sequences. Then, they use a 1 kb sliding window to identify genomic segments containing ICE and phage mobile elements as candidate genomic islands. This process results in the final set of candidate genomic islands. With the help of the proposed MEGI datasets, the recall is increased by more than 20%, and the accuracy rate is increased by 3% and 4% respectively.
Future work can focus on finding more suitable public datasets containing the genomic islands and mobile elements, and designing a more comprehensive annotation dataset of mobile elements for genomic island detection.