At a time when large-scale human genome analysis was not yet common, the Tohoku Medical Megabank Organization (ToMMo) launched its genome cohort study. After ten years of operating this ambitious project, they are sharing key insights regarding the techniques required to analyze, manage, maintain, and update a genomic database of 100,000 people. For researchers around the world, the knowledge gained from this study is a valuable resource to advance genome research, and the research findings derived from it can help build the foundation for genetic-based personalized healthcare.
The findings were published in JMA Journal on October 3, 2025.
Starting in 2013, ToMMo completed whole genome sequencing for 100,000 Japanese individuals. Whole genome sequencing is the process of reading the entirety of the DNA sequence - the building blocks of life that make up who we are. However, conducting in-depth analysis at such a large scale is a major undertaking with many technical and operational limitations that serve as a huge challenge. Even today, only a few countries have conducted genome sequencing at this scale.
"Maintaining high accuracy and consistent quality required careful planning, optimized equipment, and developing innovative new techniques," explains Fumiki Katsuoka, first author of the paper.
This paper shared insights gained over ten years in operating whole genome sequencing, managing quality, and building data infrastructure.
In the early phase, they developed a method named qMiSeq in which small-scale sequencing analyses were performed for each group of samples (typically 96 samples), and the optimal sequencing conditions were determined based on the obtained data volume. After the introduction of high-throughput sequencers, they established a protocol named iDeal, which divides the sequencing of each group into multiple runs to equalize data yield. Both approaches are based on simple concepts, yet they are extremely effective methods.
"As large-scale genome sequencing is becoming more common, we want to share everything we learned during these ten years," remarks first author Fumiki Katsuoka. "We are very proud that some of the unique techniques we developed are now used by other institutions."
Transparency is an important aspect of their project, as frequency and summary data from ToMMo's 100,000 genome project are freely available on jMorp and widely used by researchers worldwide, whereas individual-level genome data are accessible under appropriate conditions following an application-based review process.
As more researchers are expected to conduct large-scale genome analyses, it is predicted that more healthcare providers will use the data to offer innovative medical solutions. The insights from this study will serve as a valuable resource for the genomics community in Japan and around the world, contributing to the advancement of genomic medicine and personalized prevention.