The internet revolution has made information acquisition easy and cheap so that it has been producing massive Web/social data in our real life. The emergence of big social media has lead researchers to study the possibility of their exploitation in order to identify hidden knowledge. However, a huge number of issues appear in obtained big social data [23, 24, 26, 28]. First, there are incomplete social data due to all kinds of reasons, such as security and private information. Second, the structure of social data is different, including structured data (e.g., social Web data), semi-structured data (e.g., XML data) and unstructured data (e.g., social networks). Third, the Web data are often high-dimensional. However, current computer techniques can only deal with structured, complete and moderate-sized-dimensional data. Moreover, current computer technologies can only mine the basic structure and are not capable of mining their natural complex structure (or deep structure). Hence, there is a huge gap between existing technologies and the real requirements of actual big social data. In this case, deep mining of big social data (such as data preprocessing, deep pattern discovery, pattern fusion, and outlier/noise detection) stands as an interesting promise to relief such a gap [4, 8, 22, 25, 27] .

In [7], Komarasamy et al. proposed a multi-phase scheduling method to deal with parallel jobs in hierarchical model. The proposed method includes job preprocessing, prioritization, and scheduling among nodes in cloud storage. Moreover, it uses intermediate idle nodes in preprocessing and batch processing to avoid starvation and to mitigate unwanted delay.

In [5], Huang et al. proposed an intelligent trading method based on Naive Bayes algorithm and AdaBoost algorithm. This method first employs dual clustering to detect transaction patterns, and then uses the discovered trading patterns to predict market trends based on the Naive Bayes algorithm. Finally, the Adaboost algorithm is used to promote the naive Bayes classifier as a robust classifier and compared with the other four existing algorithms.

In [6], Huang et al. proposed a CAD system to overcome opacity in new CAD systems. The experimental results show that the proposed system has good performance in the diagnosis of mammary glands and can effectively identify whether it is a benign breast tumor in breast diseases.

In [13], Rao et al. proposed an active learning scheme to utilize the labeled and unlabeled images to build the initial Support Vector Machine (SVM) classifier for image retrieving.

In [15], Wan et al. proposed the development of a suitable coding framework and provided developers with the right coding skills. In this process, the paper designed a probabilistic expert ranking model, added the regularity of the project as a regularization of the graph to the expert ranking, and added the correlation propagation graph.

In [21], Zhang et al. proposed a new feature selection method for data classification that efficiently combines the discriminative capability of features with the ridge regression model. It first uses linear judgement analysis to establish the global structure of training data to help identify discriminative features. And then, the ridge regression model is used to evaluate the feature assurance and discriminant information so as to obtain a representative sparse matrix. Finally a new subset of the selected features is applied to the linear support vector machine for data classification.

In [18], Wen et al. proposed a spectral clustering method especially to process the high-dimensional data, via first using an affinity matrix leaning method to learn a high-quality affinity matrix from the intrinsic feature space, and then utilizing the local PCA to clip the affinity matrix for solving the intersection problem. Finally, this paper employed a robust clustering method to conduct the clustering tasks directly on the affinity matrix, so that it can overcome the cluster-specification problem and the initialization sensitivity problem.

In [16], Wang et al. proposed a type of node similarity, based on the frequency of connections between nodes to describe the changeable relationships between entities over a period.

In [14], Sood proposed a new concept of free space fog to collect available free resources from all assigned jobs to eliminate Deadlock.

In [10], Liu et al. proposed a new data clustering algorithm based on potentiality model. The algorithm merges sub-clusters using clustering merging criteria to automatically terminate the clustering process.

In [9], Li et al. proposed a new problem in dynamic traffic networks. It first considers two dynamic transportation networks, which are the traffic software spatial network and the dynamic public transportation network, and then uses uncertain trajectory data to establish a spatial network of traffic areas.

In [19], Xie et al. proposed a method based on information theory to optimize the tag interaction with high efficiency. In order to generate a recommendation list, this paper applies probabilistic matrix decomposition techniques to predict user preferences and overcome level sparsity. It enhances level sparsity by embedding similar user and resource information.

In [29], Zhu et al. proposed a unsupervised feature selection method by embedding subspace learning regularization (PCA) into a feature selection framework.

In [3], Hu et al. proposed a popular route construction method named GRID based on collective knowledge. The experimental results on two real data sets show that this method is superior to the most advanced methods in terms of efficiency and efficacy.

In [2], Gu et al. proposed a strategy to resolve the ambiguity problems in short text categorization. By using Bi-directional Recurrent Neural Networks (Bi-RNN) and linear discriminant analysis (LDA), the proposed method can catch more contextual and latent semantic information for categorization. Apart from that, it uses topic model to enhance the neural network which represents short texts.

In [12], Pan et al. uses a multi-scale fully convolutional neural networks for regression based on density maps. It applies regression on the structured proximity space for patches to get larger value, and then uses convolutional regression networks to detect different kinds of cells based on features maps.

In [17], Wang et al. tried to apply self-representation of each feature to make the data set sparse, then used Frebenius norm and Locality Preserving Projection (LPP) as regularization term to avoid over-fitting and preserve local relations.

In [1], Gao et al. proposed a bin-based attack model to re-identify social individuals in social networks. Besides that, it also proposes the k-anonymity scheme to protect social individuals. Experiments showed that the proposed method is effective.

In [11], Menasria et al. concentrated on private information protection of Accelerometer-based activity recognition by leveraging the connection between irrelevant private information and relevant private information. By doing this, it can reduce the usage of the irrelevant information to protect our private information. Experiments illustrate that the proposed method can reduce the leakage of privacy.

In [20], Zhang tried to proposes a target-source framework to minimize the total cost by controlling another cost while minimizing one kind of cost scale. Besides, it also proposes a cost-sensitive learning model to help analyze the complex information. Experimental results showed that the proposed method works well on real medical data.