Big Data Tools Help Decipher Disease Progression

The MEGENA tool has 3D spheres that help uncover precise network clusters associated with disease progression.

Two new Big Data analysis tools that help pinpoint specific genes that are actively involved in disease progression were recently made available to the public by scientists in the Multiscale Network Modeling Laboratory at the Icahn School of Medicine at Mount Sinai.

The team, led by Bin Zhang, PhD, Associate Professor in the Department of Genetics and Genomic Sciences, published the pair of algorithm-based tools online in November 2015 in PLoS Computational Biology and in Scientific Reports, a Nature publication. The open-source tools are available to all researchers who wish to gain a better understanding of disease mechanisms in order to develop more effective drugs and create individualized treatments.

One of the tools, MEGENA (Multiscale Embedded Gene Co-Expression Network Analysis), enables researchers to project massive amounts of gene expression data from complex diseases such as cancer onto a three-dimensional sphere to uncover highly precise network clusters associated with disease progression.

Big Data developers, from left: Won-min Song, PhD, Senior Scientist; Bin Zhang, PhD; Minghui Wang, PhD, Postdoctoral Fellow; and Yongzhong Zhao, PhD, Senior Scientist

“In reality, the boundaries of clusters formulated by traditional approaches aren’t so clear, and many genes may be excluded from a cluster at a certain threshold of compactness when they shouldn’t,” says Dr. Zhang. “MEGENA resolves the problem by detecting clusters at multiple compactness multiscales that are naturally determined by the data.

When you remove redundant correlations, you get a more reliable, more robust network structure.”

The other software tool, SuperExactTest, helps scientists determine the significance of similarities among multiple data sets. SuperExactTest allows researchers to compare three or more data sets, a feat that is new in the field of statistics. The tool has wide application and is expected to yield valuable new insights.

“Before we came up with SuperExactTest, nobody could exactly calculate the statistical significance of intersections among multiple sets of genes,” according to Dr. Zhang.

Mount Sinai researchers have already used MEGENA to uncover novel regulatory targets in breast and lung cancers after testing a large amount of genomic data of U.S. patients from The Cancer Genome Atlas (TCGA), a master data repository assembled by the National Institutes of Health. MEGENA showed improved performance over well-established clustering methods and gene co-expression network construction approaches.

SuperExactTest led to the discovery that a set of nine genes is the most critical in determining whether a person is predisposed to getting cancer. The pioneering software allowed Mount Sinai’s researchers to determine the significance of this core set of genes that are common to seven mutated gene sets in cancer.

“Our team is dedicated to crafting high-performance analysis tools and to sharing those resources with the broader genomics community to help us all generate the best possible results,” says Eric Schadt, PhD, the Jean C. and James W. Crystal Professor of Genomics at the Icahn School of Medicine at Mount Sinai, and Founding Director of the Icahn Institute for Genomics and Multiscale Biology. “These new tools demonstrate thoughtful and creative solutions to computational challenges faced by scientists around the world, and I look forward to seeing what the community will accomplish with them.”

Pin It on Pinterest

Share This

Share this post with your friends!

Share This

Share this post with your friends!

Shares