Abstract
Weighted-gene correlation network analysis (WGCNA) is frequently used to identify highly co-expressed clusters of genes (modules) within whole-transcriptome datasets. However, transcriptome-scale networks tend to be highly connected, making it challenging for the hierarchical clustering underlying the WGCNA-based classification to discriminate coherently expressed gene sets without significant information loss from either a priori filtering of the expression dataset or a posteriori pruning of the cluster dendrogram.
Here we present iterativeWGCNA, a Python-wrapped extension for the WGCNA R software package that improves the robustness of detected modules and minimizes information loss. The method works by pruning poorly fitting genes from estimated modules and then re-running WGCNA to refine gene clusters. After refining, pruned genes are assembled into a new expression dataset to isolate overlapping modules and the process repeated. In doing so, iterativeWGCNA provides an unsupervised, non-biased filtering to generate a robust, comprehensive network-based classification of whole-transcriptome expression datasets.