Similarity-based multimodality image fusion with shiftable complex directional pyramid

doi:10.1016/j.patrec.2011.06.002

Pattern Recognition Letters

Volume 32, Issue 13, 1 October 2011, Pages 1544-1553

https://doi.org/10.1016/j.patrec.2011.06.002 Get rights and content

Abstract

For multimodality images, a novel fusion algorithm based on the shiftable complex directional pyramid transform (SCDPT) is proposed in this paper. As well, with the aid of the structural similarity (SSIM) index, a ‘similarity-based’ idea is employed to distinguish regions with ‘redundant’ or ‘complementary’ information between source imagers before the SCDPT coefficients are merged. A ‘weighted averaging’ scheme for regions with ‘redundant’ information and a ‘selecting’ scheme for regions with ‘complementary’ information are then employed, respectively. When merging the low-pass subband coefficients, the SSIM index in spatial domain (SP-SSIM) is employed as similarity measure, and three types of regions are thus determined. Especially, for regions with similar intensity values but different intensity changing directions between source images, a ‘selecting’ scheme based on gradient and energy is proposed. When merging the directional band-pass subband coefficients, the SSIM index in complex wavelet domain (CW-SSIM) is employed as similarity measure. With the CW-SSIM index, not only the magnitude information but also the phase information of SCDPT coefficients can be employed. Compared to the traditional energy matching (EM) index based fusion methods, the proposed method can better deal with ‘redundant’ and ‘complementary’ information of source images. In addition, because of the shift-invariance of the SCDPT and the CW-SSIM index, the proposed fusion algorithm performs well even if the input images are not well registered. Several sets of experimental results demonstrate the validity and feasibility of the proposed method in terms of both visual quality and objective evaluation.

Highlights

► Magnitude information and phase information of the SCDPT are employed. ► Regions with redundant or complementary information are distinguished and merged. ► Regions containing borders or edges are particularly merged. ► The fusion method is with generality and good robustness on mis-registration.

Introduction

With the development of image sensors, multimodality image fusion has emerged as a new and promising research area. Due to different imaging mechanism of sensors, multimodality images of the same scene captured by different image sensors can provide much ‘redundant’ and ‘complementary’ information. Such information from multiple source images can be integrated to create a new and improved image that contains a ‘better’ description of the scene than any of the individual source image (Lu et al., 2006, Pajares and de la Cruz, 2004). This method is called image fusion. Up to now, image fusion has been successfully applied to many fields, such as military affairs, medical imaging, remote sensing, digital camera, and so on (Jan et al., 2005, Li and Yang, 2008).

The multiscale transform (MST) based fusion methods (Piella, 2003, Zhang and Blum, 1999) are considered as important ones among numerous image fusion algorithms. Multiscale image decomposition and reconstruction tools and fusion rules are the two most important components in such fusion algorithms. The commonly used MST tools include the Laplacian pyramid transform (Petrovic´ and Xydeas, 2004, Toe, 1989) and the discrete wavelet transform (DWT) (Amonlis et al., 2007, Lewis et al., 2007, Li et al., 2006, Loza et al., 2010, Pajares and de la Cruz, 2004). Due to the fact that the DWT has many advantages over the pyramid transform, such as localization and direction, the DWT-based methods are generally superior to the pyramid-based methods (Li et al., 2006).

Although the DWT has been widely applied to image fusion, there are some limitations of the separable DWT in the application. Firstly, the DWT is with limited directions and cannot sparsify images with edges. It needs more coefficients to represent the ‘line’ or ‘curve’ discontinuities (Do and Vetterli, 2005). Some artifacts are easily introduced into the fused images (Zhang and Guo, 2009). Secondly, the DWT is not shift-invariant. Then, the corresponding DWT-based fusion methods are also shift-dependent, which is undesirable since different fusion results are obtained once input images are mis-registered.

To overcome the disadvantages of the wavelets in image analysis, Do and Vetterli (2005) developed a ‘true’ image representation tool, i.e., the contourlet transform (CNT). Compared to the traditional DWT, the CNT is not only with multiscale and localization, but also with multidirection and anisotropy. As a result, the CNT can represent edges and other singularities along curves much better. Some image fusion algorithms based on the CNT have been proposed in recent years (Ibrahim and Wirth, 2009, Yang and Guo, 2008). However, the CNT is also lack of shift-invariance.

In 2008, Nguyen and Oraintara, 2008a, Nguyen and Oraintara, 2008b proposed a shiftable complex directional pyramid transform (SCDPT), which is regarded as a complex and shiftable (in energy sense) version of the CNT and has been used in multisepectral and panchromatic image fusion (Saeedi and Faez, 2011). The SCDPT is shown to have a number of desirable properties for image analysis, such as shiftable subband, arbitrarily high directionality, and phase information. When introduced to image fusion, the SCDPT can provide more information for fusion. As well, the SCDPT is with shift-invariance, and the impacts of mis-registration on the fused results can thus be reduced effectively. Therefore, the SCDPT is more suitable for image fusion.

Besides the image multiscale decomposition and reconstruction tool, how to form the MST coefficients of the fused image from the MST coefficients of source images, i.e., fusion rule, is another key issue in the MST-based image fusion algorithms (Zhang and Blum, 1999). Different fusion rules, from the pixel-based to the window-based and region-based (Piella, 2003, Lewis et al., 2007, Zaveri and Zaveri, 2009a, Zaveri and Zaveri, 2009b), have been discussed thoroughly.

However, for the low-pass subband coefficients, the general fusion rule is an approximate ‘averaging’ or ‘weighted averaging’ scheme. This may work well on images with the same modality. Multimodality images may have different dynamic ranges, and much ‘complementary’ information exists between source images (Lewis et al., 2007). Therefore, the ‘averaging’ or ‘weighted-averaging’ scheme may significantly reduce the contrast of the fused image. For the band-pass subband coefficients, the general fusion rule is an approximate ‘absolute-maximum-selecting’ scheme, which implies that only the coefficients with larger absolute value are selected as the fused ones and the rest are discarded. Therefore, some ‘redundant’ information between source images is easily lost during the fusion process and some ‘artifacts’ or noises are also easily introduced into the fused image.

With the particular research on multimodality images, a novel fusion algorithm based on the SCDPT and the structural similarity (SSIM) indexes (Wang and Bovik, 2002, Wang et al., 2004, Wang and Simoncelli, 2005) is proposed in this paper. Source images are decomposed into different subbands with the SCDPT. When merging the low-pass subband coefficients and the band-pass subband coefficients, the SSIM index in spatial domain (SP-SSIM) and the corresponding version in complex wavelet domain (CW-SSIM) are applied to distinguish regions with ‘redundant’ or ‘complementary’ information between source images, respectively. For regions with ‘redundant’ information, a ‘weighted averaging’ scheme is performed. While for regions with ‘complementary’ information, a ‘selecting’ scheme based on some salience measure is performed. Therefore, the proposed fusion algorithm can properly deal with the ‘redundant’ information and ‘complementary’ information between source images. In addition, owing to the shift-invariance of the SCDPT and the CW-SSIM index, the proposed fusion method is with good robustness on mis-registration. Three sets of multimodality images with perfect registration and one set of images with mis-registration are employed to demonstrate the validity and feasibility of the proposed algorithm.

The rest of the paper is organized as follows. Section 2 reviews the theory of the SCDPT in brief. Section 3 describes the image fusion algorithm using the SCDPT. Experimental results and discussion are given in Section 4. And the concluding remarks are presented in Section 5.

Section snippets

Shiftable complex directional pyramid transform (SCDPT)

In this section, we review the theory and main properties of the SCDPT, presented in (Nguyen and Oraintara, 2008a, Nguyen and Oraintara, 2008b), which will be used in the subsequent sections.

Similarly to the CNT, the SCDPT firstly adopts an iterative multiscale filter bank (FB) for multiscale decomposition, and then employs a complex directional filter bank (DFB) at each scale for directional decomposition. To achieve shift-invariance, the complex DFB is constructed by a dual-tree structure of

Similarity-based multimodality image fusion with SCDPT

In this section, we will introduce the SCDPT into image fusion and propose a novel fusion algorithm for multimodality images.

Fig. 2 illustrates the block diagram of the proposed image fusion algorithm. To simplify the discussion, we assume the fusion process is to generate a composite image F from a pair of source images denoted by A and B. Again, we assume here that source images have been well registered. The proposed image fusion approach consists of the following steps:

Firstly, perform a J

Experiments and analysis

Three sets of multimodality images with perfect registration and one set of images with mis-registration are used to evaluate the proposed fusion algorithm. For comparison, the fusion is also performed using another four pixel-level fusion methods and one region-level fusion method.

The first three pixel-level fusion methods, including the DWT-based method, the CNT-based method and the SCDPT-simple-based method, are employed to validate the performance of the SCDPT in image fusion. In the above

Conclusion

As a new image multiscale geometric analysis tool, the SCDPT is very suitable for image fusion because of many advantages, such as multiscale, multidirection, efficient implementation, low redundancy and shift-invariance. Therefore, the SCDPT is introduced into image fusion and a novel multimodality image fusion algorithm is proposed in this paper. In the proposed fusion algorithm, with the SP-SSIM index and the CW-SSIM index, a similarity-based fusion rule is proposed to distinguish and merge

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. 10972002 and 60736022, and by the Fundamental Research Funds for the Central Universities under Grant No. K50510040003. The original four sets of test images, which are all available on line at www.imagefusion.org, are kindly provided by Mr. Lewis and Dr. Nikolov, by Dr. Rockinger, by Manchester University and Dr. Petrovic and by Dr. Toet, respectively.

References (28)

J.J. Lewis et al.
Pixel- and region-based image fusion with complex wavelets
Inform. Fusion
(2007)
M. Li et al.
A region-based multi-sensor image fusion scheme using pulse-coupled neural network
Patter Recognition Lett.
(2006)
S.T. Li et al.
Multifocus image fusion by combining curvelet and wavelet transform
Pattern Recognition Lett.
(2008)
A. Loza et al.
Non-Gaussian model-based fusion of noise images in the wavelet domain
Computer Image and Vision Understanding
(2010)
G. Pajares et al.
A wavelet-based image fusion tutorial
Pattern Recognition
(2004)
G. Piella
A general framework for multiresolution image fusion: From pixels to regions
Inform. Fusion
(2003)
J. Saeedi et al.
A new pan-sharpening method using multiobjective particle swarm optimization and the shiftable contourlet transform
ISPRS J. Photogram. Remote Sens.
(2011)
L. Yang et al.
Multimodality medical image fusion based on multiscale geometric analysis of contourlet transform
Neurocomputing
(2008)
Q. Zhang et al.
Multifocus image fusion using the nonsubsampled contourlet transform
Signal Process.
(2009)
K. Amonlis et al.
Wavelet based image fusion techniques – An introduction, review and comparison
ISPRS J. Photogram. Remote Sens.
(2007)

A.L. Da Cunha et al.

The nonsubsampled contourlet transform: Theory, design, and applications

IEEE Trans. Image Process.

(2006)

M.N. Do et al.

The contourlet transform: An efficient directional multiresolution image representation

IEEE Trans. Image Process.

(2005)

Ibrahim, S., Wirth, M., 2009. Visible and IR data fusion technique using the contourlet transform. In: Internat. Conf....

M.L. Jan et al.

A three-dimensional registration method for automated fusion of micro PET-CT-SPECT whole-body images

IEEE Trans. Med. Imaging

(2005)

Cited by (40)

IE-CFRN: Information exchange-based collaborative feature representation network for multi-modal medical image fusion
2023, Biomedical Signal Processing and Control
Multi-modal medical image fusion (MMIF) is a valuable approach to integrating functional metabolic information and tissue structural details from different modalities, which facilitates clinical diagnosis and surgical navigation efficiently. In this paper, we propose a collaborative feature representation network based on information exchange called IE-CFRN, which distributes the contribution of each encoder feature at the channel level to achieve more accurate fusion. Furthermore, due to the different emphasis on shallow-level and high-level features, we construct a hierarchical feature enhancement network (HFEN) to integrate significant information on multi-level features. Additionally, to adaptively estimate the contribution coefficient in the fidelity loss for guiding the training process, we introduce a learnable weight estimation network (LWEN). Extensive experimentation on eleven state-of-the-art algorithms shows that our IE-CFRN method yields superior consequences with respect to visual effects, and qualitative and quantitative evaluation.
CDRNet: Cascaded dense residual network for grayscale and pseudocolor medical image fusion
2023, Computer Methods and Programs in Biomedicine
Multimodal medical fusion images have been widely used in clinical medicine, computer-aided diagnosis and other fields. However, the existing multimodal medical image fusion algorithms generally have shortcomings such as complex calculations, blurred details and poor adaptability. To solve this problem, we propose a cascaded dense residual network and use it for grayscale and pseudocolor medical image fusion.
The cascaded dense residual network uses a multiscale dense network and a residual network as the basic network architecture, and a multilevel converged network is obtained through cascade. The cascaded dense residual network contains 3 networks, the first-level network inputs two images with different modalities to obtain a fused Image 1, the second-level network uses fused Image 1 as the input image to obtain fused Image 2 and the third-level network uses fused Image 2 as the input image to obtain fused Image 3. The multimodal medical image is trained through each level of the network, and the output fusion image is enhanced step-by-step.
As the number of networks increases, the fusion image becomes increasingly clearer. Through numerous fusion experiments, the fused images of the proposed algorithm have higher edge strength, richer details, and better performance in the objective indicators than the reference algorithms.
Compared with the reference algorithms, the proposed algorithm has better original information, higher edge strength, richer details and an improvement of the four objective SF, AG, MZ and EN indicator metrics.
BTMF-GAN: A multi-modal MRI fusion generative adversarial network for brain tumors
2023, Computers in Biology and Medicine
Image fusion techniques have been widely used for multi-modal medical image fusion tasks. Most existing methods aim to improve the overall quality of the fused image and do not focus on the more important textural details and contrast between the tissues of the lesion in the regions of interest (ROIs). This can lead to the distortion of important tumor ROIs information and thus limits the applicability of the fused images in clinical practice. To improve the fusion quality of ROIs relevant to medical implications, we propose a multi-modal MRI fusion generative adversarial network (BTMF-GAN) for the task of multi-modal MRI fusion of brain tumors. Unlike existing deep learning approaches which focus on improving the global quality of the fused image, the proposed BTMF-GAN aims to achieve a balance between tissue details and structural contrasts in brain tumor, which is the region of interest crucial to many medical applications. Specifically, we employ a generator with a U-shaped nested structure and residual U-blocks (RSU) to enhance multi-scale feature extraction. To enhance and recalibrate features of the encoder, the multi-perceptual field adaptive transformer feature enhancement module (MRF-ATFE) is used between the encoder and the decoder instead of a skip connection. To increase contrast between tumor tissues of the fused image, a mask-part block is introduced to fragment the source image and the fused image, based on which, we propose a novel salient loss function. Qualitative and quantitative analysis of the results on the public and clinical datasets demonstrate the superiority of the proposed approach to many other commonly used fusion methods.
MDRANet: A multiscale dense residual attention network for magnetic resonance and nuclear medicine image fusion
2023, Biomedical Signal Processing and Control
Citation Excerpt :
Then, fusion images are obtained from fusion features by multiscale inverse transform. Wavelet transform [13,14], pyramid transform [15,16], nonsubsampled contourlet transform [17,18] and nonsubsampled shearlet transform [19,20] are common multiscale transform domains. Multiscale transform based fusion algorithms have flexible fusion rules and transform domains, achieving good performance in multimodal image fusion tasks.
Magnetic resonance and nuclear medicine images are the two categories of multimodal medical images. Magnetic resonance images reveal physiological anatomical information of patients, and nuclear medicine images accurately show tissue lesion information. Through medical image fusion algorithms, these fusion images containing both tissue lesion information and physiological anatomical information are obtained to provide sufficient information for clinical medical technologies. However, most existing fusion algorithms are based on mathematical transform domains, and these fusion results have the weaknesses of blurred edges, color distortion and detail loss. To address these problems, a multiscale dense residual attention network (MDRANet) is proposed and applied to magnetic resonance and nuclear medicine image fusion. MDRANet combines multiscale dense network and multiscale residual attention network to extract and enhance deep features. Moreover, four different loss functions are used to optimize MDRANet and improve the fusion quality. The experimental results show that the fusion results of our proposed algorithm have richer details and better objective metrics compared with the reference algorithms.
Multimodal biomedical image fusion method via rolling guidance filter and deep convolutional neural networks
2021, Optik
Recently, numerous image fusion algorithms have been proposed and widely used in the biomedical field. However, most existing algorithms have low luminance, blurry edges and unclear details. To overcome these weaknesses, a multimodal biomedical image fusion method via rolling guidance filter and deep convolutional neural network is proposed in this paper. To enhance image edges and details, the VGG network is utilized. Our fusion algorithm includes three steps. First, the rolling guidance filter is utilized to extract the base and detail images. Second, the perceptual images are extracted by the VGG network to improve the fusion quality. Finally, the base, detail and perceptual images are fused by three different fusion strategies. Specially, the image decomposition parameters are selected based on experiments in order to extract appropriate texture and structure images. Additionally, the normalization operation is applied to the perceptual images that extracted by the VGG network for removing the feature differences and noise. Extensive experiments demonstrate that our proposed algorithm has better fusion results and objective metrics than the reference fusion methods.
Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review
2018, Information Fusion
As a result of several successful applications in computer vision and image processing, sparse representation (SR) has attracted significant attention in multi-sensor image fusion. Unlike the traditional multiscale transforms (MSTs) that presume the basis functions, SR learns an over-complete dictionary from a set of training images for image fusion, and it achieves more stable and meaningful representations of the source images. By doing so, the SR-based fusion methods generally outperform the traditional MST image fusion methods in both subjective and objective tests. In addition, they are less susceptible to mis-registration among the source images, thus facilitating the practical applications. This survey paper proposes a systematic review of the SR-based multi-sensor image fusion literature, highlighting the pros and cons of each category of approaches. Specifically, we start by performing a theoretical investigation of the entire system from three key algorithmic aspects, (1) sparse representation models; (2) dictionary learning methods; and (3) activity levels and fusion rules. Subsequently, we show how the existing works address these scientific problems and design the appropriate fusion rules for each application such as multi-focus image fusion and multi-modality (e.g., infrared and visible) image fusion. At last, we carry out some experiments to evaluate the impact of these three algorithmic components on the fusion performance when dealing with different applications. This article is expected to serve as a tutorial and source of reference for researchers preparing to enter the field or who desire to employ the sparse representation theory in other fields.

View all citing articles on Scopus

View full text

Similarity-based multimodality image fusion with shiftable complex directional pyramid

Abstract

Highlights

Introduction

Section snippets

Shiftable complex directional pyramid transform (SCDPT)

Similarity-based multimodality image fusion with SCDPT

Experiments and analysis

Conclusion

Acknowledgements

Inform. Fusion

Patter Recognition Lett.

Pattern Recognition Lett.

Computer Image and Vision Understanding

Pattern Recognition

Inform. Fusion

ISPRS J. Photogram. Remote Sens.

Neurocomputing

Signal Process.

Wavelet based image fusion techniques – An introduction, review and comparison

ISPRS J. Photogram. Remote Sens.

The nonsubsampled contourlet transform: Theory, design, and applications

IEEE Trans. Image Process.

The contourlet transform: An efficient directional multiresolution image representation

IEEE Trans. Image Process.

A three-dimensional registration method for automated fusion of micro PET-CT-SPECT whole-body images

IEEE Trans. Med. Imaging