Improved adaptive-phase fuzzy high utility pattern mining algorithm based on tree-list structure for intelligent decision systems

Chen, Jing; Liu, Aijun; Zhang, Hongjun; Yang, Shengyi; Zheng, Hui; Zhou, Ning; Li, Peng

doi:10.1038/s41598-023-50375-y

Download PDF

Article
Open access
Published: 10 January 2024

Improved adaptive-phase fuzzy high utility pattern mining algorithm based on tree-list structure for intelligent decision systems

Jing Chen^1,2,
Aijun Liu²,
Hongjun Zhang¹,
Shengyi Yang³,
Hui Zheng⁴,
Ning Zhou⁵ &
…
Peng Li^5,6

Scientific Reports volume 14, Article number: 945 (2024) Cite this article

622 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

With the rapid development of AI and big data mining technologies, computerized medical decision-making has become increasingly prominent. The aim of high-utility pattern mining (HUPM) is to discover meaningful patterns in medical databases that contribute to maximizing the utility from the perspective of diagnosis. However, HUPM pays less attention to the interpretability and explainability of these patterns in medical decision-making scenarios. This paper proposes a novel algorithm called the Improved fuzzy high-utility pattern mining (IF-HUPM) to address this problem. First, the paper applies a fuzzy preprocessing method to divide the fuzzy intervals of a medical quantitative data set, which enhances the fuzziness and interpretability of the data. Next, in the process of IF-HUPM, both fuzzy tree and list structures are employed to calculate fuzzy high-utility values. By combining the characteristics of the one-stage and two-stage algorithms of HUPM, an adaptive-phase Fuzzy HUPM hybrid frame is proposed. The experimental results demonstrate that the proposed IF-HUPM algorithm enhances both accuracy and efficiency and the mining process requires less time and space on average.

A new multi-attribute group decision-making method based on Einstein Bonferroni operators under interval-valued Fermatean hesitant fuzzy environment

Article Open access 29 May 2024

Entropy for q-rung linear diophantine fuzzy hypersoft set with its application in MADM

Article Open access 08 March 2024

Distance and similarity measures for normal wiggly dual hesitant fuzzy sets and their application in medical diagnosis

Article Open access 12 August 2022

Introduction

In the era of AI and the interpretation of medical diagnostic decisions, the use of current computer-based medical diagnostic and predictive methods¹ can effectively assist doctors in the accuracy of medical diagnosis. It is also necessary to analyse the excavation results in terms of interpretation, which is convenient for decision-makers to interpret in their decision-making, recommendation system^2,3,4,5, and Internet of things^6,7.

Extracting valuable information^8,9 and knowledge from large datasets^10,11,12 requires the use of various approaches such as statistics, machine learning, artificial intelligence, and database technology to find patterns, associations^13,14,15,16, and rules in diverse areas^17,18,19. Zadeh²⁰ utilized association rule mining methods to explore potential drugs or drug combinations associated with newly diagnosed diabetes. Similarly, Vhaduri et al.²¹ conducted a study where they provided continuous glucose monitoring devices to 44 participants to investigate the application n diabetes management.

While traditional Apriori algorithm²² and Frequent Pattern Mining (FPM) algorithms²³ solely focus on identifying the existence of item sets, disregarding their actual value, a classic technique known as High-Utility Item set Mining (HUIM) addresses this limitation. Liu et al.²⁴ introduced HUIM in 2005, which incorporates both the frequency of occurrence and the value contained within item sets. It achieves this by utilizing external utilities and internal utilities. For instance, by analyzing a large amount of data from diabetes patients, HUPM may reveal significant associations between specific physiological indicators, other factors, and the severity of diabetes risk. These patterns offer valuable reference information for doctors, aiding them in better-assessing patients' conditions and developing personalized treatment plans^25,26.

However, HUPM is primarily applied to mine databases containing profit information^27,28,29,30, without considering the interpretability and explainability of these items. Many researchers have conducted relevant studies to address this issue. In 2016, Fournier-Viger proposed FHMUF algorithm that combines HUPM with fuzzy set theory to mine Fuzzy High-Utility Pattern (FHUP). To identify FHUPs, the method preprocesses and calculates fuzzy support and fuzzy utility values. For mining FHUPs, it is based on fuzzy partitioning and multilevel fuzzy set theory. The rule-based fuzzy set theory may effectively soften the threshold and border in the medical environment, alleviating the problem of sharp points. Researchers have proposed many efficient usable itemset mining algorithms, but the resultant itemsets only show the correlation between efficient usable items, and the efficient usable itemset mining is still mainly applied to mining databases with profit information without considering the numerical information of these items, which has the problem of poor interpretability.

To optimize computerized medical decision-making—a field traditionally underserved by existing HUPM, we propose an novel improved fuzzy high utility pattern mining algorithm (IF-HUPM) with adaptive-phase strategy based on Tree-List Structure under medical senoras for decision making. The proposed IF-HUPM algorithm integrates fuzzy preprocessing to bolster interpretability of quantitative medical databases, thereby refining diagnostic accuracy. It also merges fuzzy tree and list structures, and marries one-phase and two-phase algorithmic properties of HUPM into an adaptive, hybrid framework, further enhancing mining efficiency. Experimental results highlight that IF-HUPM not only elevates mining accuracy but economizes on time and space—critical when dealing with extensive real-time medical data. In essence, IF-HUPM provides a viable solution to the prevalent challenge of pattern interpretability in medical decision-making, thus marking a significant stride towards the enhancement of intelligent decision-making systems and medical service quality. The three main contributions of this paper are as follows:

1.
To address the issues of sharp peaks in data and poor interpretability in existing efficient itemset mining models, we introduce the fuzzy sets theory to FHUPM. We proposed IF-HUPM algorithm which can adapt to diverse data properties in various medical contexts which is efficient in terms of space–time complexity and has good performance, making it suitable for practical use in medical data analysis.
2.
In this paper, we propose the Adaptive-Phase Fuzzy UFH Hybrid Frame for fuzzy high-utility pattern mining based on multi-dimensional data fuzzification. To achieve this, we employ a switch module that combines one-stage and two-stage mining methods. This approach allows us to develop an interpretable decision-making framework for medical databases, which is efficient and effective when applied to diverse data properties.
3.
Experimental results on multiple publicly available real-world datasets demonstrate that introducing fuzziness can better reflect the meaning of the numerical attributes of the data, and enhance the interpretability and comprehensibility of the mining results.

Related work on fuzzy high-utility pattern mining (FHUPM)

The fuzzy data mining method is a common technique in data mining. Zadeh et al.²⁰ proposed the membership function as the feature function of fuzzy sets, which is the core of fuzzy sets, in his paper Fuzzy Sets, published in 1965. As the application scope of fuzzy sets expands, some extended fuzzy sets, such as interval-valued fuzzy sets, intuitive fuzzy sets, Vague sets, type II fuzzy sets, etc.³¹, are gradually adopted in various fields.

In 2009, Wang et al.³² proposed a fuzzy approach for mining high-utility quantitative itemsets. Their method considers both profits and quantities of items to discover FHUIs from quantitative databases. Fournier-Viger et al.³³ provided an up-to-date survey that serves as an introduction and guide to recent advances and opportunities in the field of mining high-utility item sets (HUIs) in 2017. Mohbey introduced a new approach called HFUBPM (High Fuzzy Utility Based Patterns Mining)³⁴ for extracting high fuzzy utility patterns from mobile web services access sequences. They designed a modified tree structure based on frequent-pattern trees and propose a mining algorithm to extract high temporal fuzzy utility patterns from transactional datasets with temporal properties.

Hong et al. have noted that the aforementioned approaches can only handle binary situations, which complicates the representation of discovered knowledge as linguistic variables³⁵. A study by Wu et al. investigate a genetic algorithm-based framework for mining high fuzzy utility itemsets³⁶. Previous research has primarily focused on finding high fuzzy utility itemsets in small databases, a circumstance which is not reflective of real-world scenarios in big data environments. In 2022, Ganesan et al.³⁷ conducted an investigation into HUFPM, specifically aiming to discover product sets with high benefits.

Gan et al.³⁸ propose a method called Pattern Growth Fuzzy Utility Mining (PGFUM), which leverages fuzzy set theory for mining fuzzy high-utility sequences with linguistic meaning. In 2022, Wan et al. propose a novel one-phase temporal fuzzy utility item set mining approach called TFUM³⁹. Ryu et al.⁴⁰ propose a scalable and efficient approach with a novel data structure for mining high temporal fuzzy utility patterns without generating candidates. All existing HUSRM algorithms aim to find high-utility partially-ordered sequential rules (HUSRs), which are not consistent with reality and may generate fake HUSRs.

To address this, Zhang et al.⁴¹ propose two novel algorithms, called TotalSR and TotalSR + , which aim to identify all High-Utility Totally-ordered Sequential Rules (HTSRs). However, due to the symmetry of frequent itemset pattern (FIPs), these strategies cannot be directly applied to Periodic High-Utility Sequential Pattern Mining (PHUSPM). To address these issues, Dinh et al.⁴² propose the Stable Periodic High-Utility Sequential Pattern Mining (HUSPM) algorithm.

To reduce complexity and obtain a compact set of frequent high-utility sequential patterns⁴³, Huang et al. proposed an efficient algorithm called TMKU based on the TargetUM algorithm to discover the top-k Target High-Utility Itemsets (top-k THUIs) in 2023. The research achievements and latest algorithms⁴⁴ mentioned have led to significant advancements in FHUPM algorithms, including algorithm innovation, improvement of data structures, and performance enhancement.

However, there are limitations in the research and applicability of these algorithms in multidimensional medical data and real healthcare scenarios. Mining data in the medical field, with multidimensional data and complex features, presents a challenging task. By conducting in-depth research on existing algorithms and better adapting them to the needs of the medical field, it is possible to improve the accuracy and interpretability of mining results.

In this paper, we addresses the challenges of FHUPM in medical data mining. Specifically, we leverage the characteristics of one-phase and two-phase mining. This approach aims to enhance the effectiveness, applicability, and interpretability of the method in medical scenarios. Additionally, integrating the characteristics of one-phase and two-phase mining can help reduce computational complexity and improve mining efficiency. This methodology could possibly enable more efficient mining processes of large-scale medical data and facilitate a better exploration of patterns and associations within the data.

FHUPM algorithm and framework

This section focuses on the application of fuzzy set theory to the process of fuzzifying medical databases, followed by the mining of their FHUP^{45,46,47,48,49}.

Hybrid high utility framework

HUPM algorithms predominantly employ either one-phase or two-phase approaches for the discovery of High Utility Itemsets (HUIs). In the case of two-phase algorithms, a tree-based data structure is commonly employed for the generation of candidate itemsets and their subsequent validation. This data structure facilitates the storage of upper-bound utility estimations for each itemset at each node, enabling the rapid generation of candidate HUIs. However, the generation of candidate HUIs continues to represent the primary computational overhead in two-phase mining algorithms.

Conversely, one-phase algorithms typically leverage inverted lists and transaction projection methodologies, thereby circumventing the need for candidate item set generation and enabling precise utility value calculations. The two-phase HUIM model is depicted in Fig. 1 below.

To combine the advantages of one-phase and two-phase algorithm frameworks, Dawar et al.⁵⁰ proposed a hybrid algorithm in 2017 by integrating UP-Growth + and FHM. This algorithm essentially functions as a hybrid framework, integrating both one-phase and two-phase approaches. In various scenarios, it allows for the dynamic selection of an algorithm framework, or executing a portion of one framework before transitioning to the algorithm of another framework. However, a significant challenge lies in defining the criteria for this transition and determining the most appropriate method for implementing the switch.

Basic definitions

For the original database in the medical field, we need to apply fuzzy set theory to fuzzify each transaction and obtain the fuzzy value of each item set. This not only solves the problem of sharp point division but also generates the internal utility value of the database based on the fuzzy value. For instance, in a specific transaction, the fasting blood glucose value is 7. Its membership degree to the hyperglycemia interval is 1, and its internal utility value after processing is 12. The data can then be mined for fuzzy but effective usage patterns. However, in the process of mining, most algorithms either use one-stage mining to construct a utility list and calculate the actual utility value, or use a two-stage construction tree to first calculate the overestimated utility value and then the actual utility value. In this paper, a hybrid framework will be used to integrate a one-stage algorithm and a second-order feature. This integration will enable the mining algorithm to perform effectively on data sets with diverse characteristics. Before describing the fuzzification process and mining algorithm in detail, we will first explain and define their symbolic meanings. First, each transaction contains a group of items and is associated with a unique transaction identifier, Tid. Let I = {i1, i2, i3, …, in} be the complete set of distinct items that appear in the transaction database.

As shown in Table 1, the transaction data set contains six transactions $T_{j} = \left\{ {T_{1} ,T_{2} ,T_{3} ...T_{6} } \right\}$ and eight items I = A, B, C, D, E, F, G, and H. The internal utility of item X in a transaction $T_{j} \in \left\{ {T_{1} ,T_{2} ,T_{3} ...T_{6} } \right\}$ is defined as the number of x in a transaction ${\text{T}}_{{\text{j}}}$, expressed as ${\text{IU}}\left( {{\text{x,T}}_{{\text{j}}} } \right)$, and the external utility of item X is expressed as ${\text{EU}}\left( {\text{x}} \right)$. As shown in Table 1, the internal utility of item A in the transaction ${\text{T}}_{{1}}$ is. As shown in Table 2, the external utility values of items A, B, C, D, E, F, G, and H are 5, 2, 1, 2, 3, 5, 1, 1.

Table 1 Transaction database.

Full size table

Table 2 Utility list.

Full size table

Definition 1

⁴⁵. Item's utility in a transaction. The utility of item X in transaction $T_{j}$ is expressed as ${\text{U}}\left( {x{\text{,T}}_{{\text{j}}} } \right)$, and defined as

$$ U\left( {x,T{}_{j}} \right) = IU\left( {x,T_{j} } \right) \times EU\left( x \right) $$

(1)

Definition 2

⁴⁶. The utility of an item set in a transaction. The utility of item set X in transaction $T_{j}$ is expressed as $U\left( {x,T_{j} } \right)$ and defined as

$$ U\left( {X,T_{j} } \right) = \sum\nolimits_{x \in X} {U\left( {x,T_{j} } \right)} $$

(2)

Definition 3

⁴⁷. Utility of item sets in a transaction database. The utility value of item set X in the transaction database is expressed as $U\left( X \right)$, is the sum of the utility values of item set X in all database transactions, and is defined as

$$ U\left( X \right) = \sum\limits_{{x \subseteq T_{j \wedge } T_{j} \in D}} {U\left( {X,T_{j} } \right)} $$

(3)

Definition 4

⁴⁸. Transaction utility. The transaction utility of transaction $T_{j}$ is expressed as $TU\left( {T_{j} } \right)$, where m is the number of items of transaction $T_{j}$, and defined as

$$ TU\left( {T_{j} } \right) = \sum\nolimits_{i}^{m} {U\left( {x_{i} ,T_{j} } \right)} $$

(4)

In Tables 1 and 2. Transaction utility of n $T_{5}$ is expressed as $TU\left( {T_{5} } \right) = 2 \times 5 + 6 \times 1 + 2 \times 3 + 5 \times 1 = 27$.

Definition 5

⁴⁹. The support of item set. The support of item set X is defined as the number of transactions containing X in data set, expressed as ${\text{support}}(X)$. As shown in Table 1, support of item set $\{ A,C,D\}$ is 2 and appears in transactions $T_{1}$ and $T_{6}$.

Definition 6

Transaction weighted utility. The transaction weighted utility of item set X is the sum of all transaction utility of item set X, expressed as $TWU\left( X \right)$. For example the value of $TWU\left( G \right)$ is $TWU\left( G \right) = TU\left( {T_{2} } \right) + TU\left( {T_{5} } \right) = 13 + 27 = 40$.

Definition 7

High-utility value. Suppose the utility value of item set X in the transaction database is not less than the minimum utility threshold min-utility set by the user. In that case, the high-utility value is the HUIs. For any item set X, if the TWU of X is less than the minimum utility threshold min-utility, then any super-set of X must not a HUI.

As shown in Table 1, according to Eq. (4) the transaction utility of A is 124. If min-utility is 120, then the super-set of item A has the possibility of high-utility item sets. If min-utility is 160, the super-set of item A must not be an HUI.

Definition 8

Fuzzy set is an extension of the clear set in the traditional set members will only belong to a certain set. In fuzzy sets, members can belong to multiple sets simultaneously. The membership function of the traditional clear set is in Eq. (5).

$$ {\text{U}}_{{\text{A}}} ({\text{X}}) = \left\{ {\begin{array}{*{20}c} 1 & {{\text{if X }} \in A} \\ 0 & {{\text{if X}} \notin A} \\ \end{array} } \right. $$

(5)

The fuzzy set theory extends this concept using tiny partial membership, ranging from 0 to 1. The membership function is formally defined in Eq. (6).

$$ {\text{U}}_{{\text{A}}} ({\text{X)}}:\,\,X \to \left[ {0,1} \right] $$

(6)

Fuzzy high utility itemset construction

In the conventional medical scene, data acquired for various indicators typically manifest as floating-point values. Both the classical FIM algorithm and the HUIM algorithm excel in processing Boolean, binary, and other data types. They can unveil relationships among different data features, thereby presenting distinct advantages in disease prediction and diagnosis, exerting a unique impact on the future evolution of diagnostic practices. In a conventional medical setting, the data obtained from indicators are predominantly in the form of floating-point values. To address this, after processing raw data and handling inconsistencies, various data features undergo fuzzification based on reference values for human indicators. Subsequently, they are transformed into Boolean, binary, or categorical data, and data mining is executed using mining algorithms^1,2,9,17. The direct partition interval method and the fuzzy partition interval method are commonly employed to convert floating-point data from physical examinations into Boolean data. The former is straightforward but may lead to the issue of sharp boundaries in certain situations. Consider, for instance, the measurement of fasting blood glucose levels in diabetes screenings during a routine medical checkup.

The classification of a person whose fasting glucose level is 6.4 mmol/L can be approached using the fuzzy partition interval method. This method helps avoid sharp distinctions by extending the data space into the fuzzy space and dividing it into intervals. By assigning membership degrees, a floating-point number between 0 and 1 to each interval, we can assess the likelihood of different classifications.

In the case of a fasting blood glucose level of 6.4 mmol/L, we can assign membership degrees to different categories. For example, it can be classified as hyperglycemia with a high membership degree (e.g., 0.99) if the threshold for hyperglycemia is set below 6.4 mmol/L. On the other hand, if the threshold for normal blood glucose is set above 6.4 mmol/L, it can be classified as normal blood glucose with a low membership degree (e.g., 0.05). Additionally, if the threshold for hypoglycemia is set significantly lower than 6.4 mmol/L, it can be classified as hypoglycemia with a very low membership degree (e.g., 0).

By using this approach, we can capture the uncertainty and gradual transition between different classifications rather than relying solely on a fixed threshold. It allows for a more nuanced interpretation of blood glucose levels and helps in identifying potential health conditions that may require further examination.

This paper proposes a fuzzy and efficient pattern mining algorithm for medical data. The algorithm aims to de-weight the diabetes dataset, fill in missing values, and normalize the data. The normalization process ensures that attribute values, such as fast blood glucose, fall within the range of 0–7. This range is convenient for fuzzy processing. The algorithm takes into account the medical reference values of various indicators for human beings, as shown in Table 3. In Table 3, we outline the reference standards for human medical parameters. Based on Table 3, we first process the original data by normalizing it according to the Fuzzy Membership Function shown in Fig. 2. Table 4 presents the original data set from the Pima Indians data set⁴⁴, where the order of attributes represented by the data from left to right corresponds to the order of attributes in Table 3. On the right side of Table 4, 1 represents diabetic patients and 0 represents non-diabetic patients. Following fuzzification, the data set is partitioned into three sections, as shown in Table 5 below, where Items represent physical examination attributes. This transaction's item utilities represent the membership value of each attribute within the fuzzy interval. As we are more concerned with the health evaluation of physical examinations, the external utility of other attributes in the external utility table is set to 1 by default. In contrast, the external utility without abnormality in the health evaluation is set to 0, reducing its utility value and focusing on the abnormal situations we are concerned about. Transaction utility represents the total number of attributes that belong to the fuzzy interval, which is the sum of the values in the second and third columns.

Table 3 Reference table of medical monitoring of human parameters.

Full size table

Table 4 Original data sets of medical monitoring of human parameters.

Full size table

Table 5 Fuzzification of original data sets of medical monitoring of human parameters.

Full size table

The proposed algorithm

Improved fuzzy high-utility pattern mining algorithm

This section aims to provide guidance on extracting high utility patterns from fuzzy databases. To achieve this goal, an improved fuzzy high-utility pattern mining algorithm with an adaptive-phase UFH hybrid frame is proposed. It is important to note that there is no clear winner between one-stage and two-stage mining algorithms, as their performance depends on the specific scenario. Thus, defining and implementing switching criteria can be a challenging task.

In this study, we propose the adaptive-phase fuzzy UFH hybrid framework to extract high-utility itemsets (HUIs) after the multi-dimensional data fuzzification. This algorithm works by first scanning the dataset and constructing a fuzzy tree after fuzzification. It then calculates the overestimated utility value and compares it to the lowest utility value in order to eliminate a portion of the itemset. Next, the actual utility value is calculated and compared to the minimum utility threshold value in order to identify the most relevant HUIs for the problem.

A single-stage mining algorithm is utilized. Scan the database for information using lists. Similar to the Apriori principle of generating item sets of varying lengths, this method omits the step of generating candidate itemsets and calculates real utility values directly. The candidate HUI is screened before constructing the utility tree. Using a list-based algorithm, the utility values are computed. Execute the tree-based algorithm first and switch to the list-based algorithm at a certain point.

The switching module can define switching criteria to assist in determining the optimal switching point to maximize overall efficiency. For example, the switching criteria can be determined by observing the data distribution. The Fuzzy UFH hybrid framework with adaptive-phase is illustrated in Fig. 3.

The Adaptive-Phase UFH Hybrid Framework has the capability of incorporating both one-stage and two-stage mining algorithms. Two notable algorithms that utilize this framework are the two-stage recursive mining algorithm Up-growth, proposed by Tseng et al.⁵¹, and the vertical data mining algorithm proposed by Fournier-Viger et al.⁵². The mining process is introduced using the utility list data structure to mine HUI. To mine high-utility itemsets (HUI), the mining process employs the utility list data structure. Initially, the dataset is fuzzified and a fuzzy database is generated, which includes an internal utility table obtained from the initial database scan. Given the characteristics of the fuzzy database, the UFH algorithm constructs an Up-Tree and calculates the overestimated utility value, which is then used to eliminate high-utility candidate itemsets. Each node in the Up-Tree comprises the item name N, its overestimated utility value N.nu, the degree of support N.mount, a pointer n.parent node, and a pointer n.link to other nodes shown in Figs. 4 and 5. The root node is an empty node that points to the child node. A prefix itemset is an itemset formed by the path from the root node to any other node in the tree depicted in Figs. 6, 7, 8 and 9.

Furthermore, a header table is maintained, which includes the name of the item set, its TWU value, and a pointer to the Up-Tree. The Up-Tree's nodes are arranged in descending order, based on the TWU values in the header table. To connect identical items from different transactions, they are linked using a linked list.The Up-Growth + algorithm calculates the transaction weighted utility (TWU) for all items during the initial database scan. Entries that have a TWU below the predetermined minimum utility threshold are removed from the database. TWU values in the header table are utilized to sort the database items within a transaction. Next, the new database's data is utilized to establish a UP-Tree. Figures 4, 5, 6, 7, 8 and 9 depict the up-tree process created using data from the provided sample database and Fig. 10 shows the whole process of UP-Tree building.

Fuzzy hybrid mining (FHM) algorithm

In the proposed IF-FUPM algorithm, a Fuzzy Hybrid Mining model was applied for mining fuzzy high-utility itemsets based on the UFH algorithm (Hybrid Algorithm Based on UP-Growth + and FHM)³. FHM is a vertical data mining algorithm that utilizes the Utility-list data structure to extract high-utility itemsets. The Utility-list is a compact data structure designed to store the itemsets and their utility values that appear in a database.

FHM uses a utility-list to identify HUP. Utility-list is a compact data structure designed to store transaction database item sets and Utility values. It consists of three main columns: TID (transaction ID), Iutils (items utilities), and Rutils (residue utilities). TID is the unique identifier for each piece of transaction data stored in the transaction database. Iutils is the actual utility value of item set I in the transaction. Rutils is the residual utility of item set I.

Each database transaction's internal items are arranged and reorganized based on the ascending TWU value of an item set. Reorganized database is the name of the reorganized database. Table 6 illustrates a sample database transaction for reorganizing the database of Table 1. The utility list number structure for item sets {A} and {B} is depicted in Figs. 7 and 8. For example, in T1 transaction, the Iutils value of A is the product of internal utility value and external utility value of {A} in T1 is 12, in Eq. (7).

$$ Rutils\left( \{ {\text{A} \} ,T}_{{1}} \right) = {\text{I}}utils\left( {\{ D\} ,{\text{T}}_{1} } \right) + {\text{I}}utils\left( {\{ C\} ,{\text{T}}_{1} } \right) $$

(7)

Table 6 The transaction database.

Full size table

In transaction T5, the Iutils value of A is the product of internal and external utility values of {A} in T5 database, namely $Iutils\left( {\{ {\text{A}\} ,T}_{{5}} } \right) = 2*5 = 10$, The Rutils value of A is the sum of the internal and external utility values of the {C} item set after the {A} item set after the transaction sorting of T5 in Eq. (8).

$$ Rutils\left( {\{ {\text{A}\} ,T}_{{5}} } \right) = {\text{I}}utils\left( {\{ C\} ,{\text{T}}_{5} } \right) = 1 \times 6 = 6 $$

(8)

The algorithm identifies a set whose TWU value exceeds the minimum utility threshold, and arranges it in ascending order during the initial database scan. During the second scan of the database, it is reorganized based on item set order, and a list of utilities is generated for each item set. Once the utility list of an item set has been generated, the algorithm combines the utility lists of {K − 1}-length to construct the utility list of {K}-length item sets. For instance, the utility list of {A, B} itemsets in Table 7 (the utility list of {A,B}) is constructed by merging the utility lists of {A} and {B} itemsets found in Table 8 (the utility list of {A} and {B}), as illustrated in Fig. 9 (TID = 6).

Table 7 The utility list of {A,B}.

Full size table

Table 8 The utility list of {A} and {B}.

Full size table

The FHM algorithm does not generate an effective set of candidate items. Instead, it employs estimated utility co-occurrence pruning (EUCP) to reduce the number of join operations. This technique is supported by the EUCS data structure, which is created during the second scan of the database. The EUCS structure determines whether the superset of any item set I is a High Utility Itemset (HUI) by checking whether the sum of Iutils and Rutils of the superset exceeds the minimum utility threshold. If it does not, the superset will not be further investigated, thus reducing the number of connections. Table 9 illustrates the EUCS structure of the database.

Table 9 Data structure of EUCS.

Full size table

The IF-HUPM algorithm

The IF-HUPM main algorithm consists of three main modules. The first module is the Fuzzy function Fuzzy( ), which is responsible for fuzzifying the database^32,43. The second module is the UPGrowth( ) (Function 1), which employs a two-phase approach for mining fuzzy databases and incorporates a switching module. Then, the third module, FHM( ) (Function 2), is called upon to compute the real utility values, compare them with the minimum utility threshold, and discover all the high-utility itemsets.The specific pseudo-code is shown in IF-FUPM Algorithm for the main function, and the Up-Growth(T, H, minUtil) is shown in Function 1.

In the IF-FUPM Algorithm from line 1 to 8 involves scanning the original database, generating the fuzzy database through fuzzification, initializing data structures, and calling the Up-Growth( ) function. The pseudo-code from line 10 to 19 represents the fuzzy function module, which handles the fuzzification of the original database, computes frequent fuzzy intervals, and returns the fuzzy database.

The pseudo-code for the UPGrowth( ) function is shown in Function 1, where lines 20–42 represent the two-phase mining process of the UPGrowth( ) of the fuzzy database. During the mining process, the algorithm calculates the overestimated utility values and compares them with the utility threshold to prune the utility tree. When a fuzzy high-utility 1-itemset is found, the algorithm checks whether the upper bound utility value is greater than or equal to the minimum utility threshold. If the condition is met, the FHM algorithm module is called, which is represented by lines 44 to 61 in Function 2. If not, the UPGrowth( ) function module will be called again until the switching criteria are met or all the fuzzy high-utility itemsets in the database are discovered.

The FHM( ) function is a one-phase high-utility itemset mining technique that directly computes real utility values and compares them with the user-defined utility threshold during execution. If a computed value exceeds the threshold, it is output, otherwise, it is discarded. Additionally, during the function's execution, the algorithm checks if all the fuzzy high-utility itemsets in the dataset have been discovered. If they have, the algorithm terminates and outputs the results of the mining, which are highly interpretable. Otherwise, it continues executing until all the fuzzy high-utility itemsets are found.

Experimental evaluation

In this experiment, the algorithm presented in this paper is compared to the EFIM algorithm⁵³ and UPHist algorithm⁵⁴. To perform the comparison, several distinct datasets were used, including both publicly accessible real datasets and synthetic datasets, which are commonly used to evaluate data mining algorithms. One of the real datasets used in the experiment was the Diabetes dataset, which contains information on diabetes patients from the experimental database. The Pima dataset, made available to the public by the National Institute of Diabetes and Digestive and Kidney Diseases⁵⁵, was also utilized. Additionally, the foodmart dataset, an actual customer transaction dataset for retail stores in SQL Server 2000, was used. Finally, the experiment included the Mushroom dataset, which is a synthetic dataset.

Table 10 shows the characteristics of the experimental dataset. The selected datasets are standard datasets with diverse characteristics, making them suitable for evaluating all aspects of the algorithm. The experimental results for each dataset are presented in Figs. 11 and 12.

Table 10 Data set feature table.

Full size table

Throughout the execution of the algorithm, we assessed three distinct algorithms based on parameters such as runtime, memory usage, and the quantity of generated fuzzy high utility itemsets, all evaluated under varying utility thresholds. We conducted a comparative analysis across diverse datasets, identifying disparities amongst them as depicted in Figs. 11 and 12. Within an identical dataset, we instigated experiments by adjusting the utility threshold values incrementally from 500 to 4500 thousand at a fixed high utility value. Ultimately, we juxtaposed the performance of our proposed algorithm against EFIM and UPH, focusing on runtime and memory consumption.

Figure 11a illustrates a gradual decrease in the number of fuzzy high-utility itemsets mined in the Pima dataset as the minimum utility threshold ascends, with the rate of this decrease decelerating over time. Figure 11b indicates a relative stability in the execution time of the F-UFH, UPHist, and EFIM algorithms across varying utility thresholds. As depicted in Fig. 11c, memory consumption plateaus with an increase in the utility threshold. A review of the F-UFH model's experimental results in the Pima dataset (Fig. 11d) suggests a majority of processing time is allocated to the fuzzy process. Lastly, Fig. 11e reveals higher memory consumption during the fuzzy process than the mining process, although the total memory usage and distribution remain fairly constant.

Figure 12a illustrates a steady decrease in the number of discovered fuzzy high-utility itemsets in the Diabetes dataset with an increasing minimum utility threshold. Figure 12b reveals an initial slow decrease in runtime, accelerating in the latter half as the utility threshold increases, with notable performance improvement in the UPhist algorithm. Runtime drops significantly from 2.5 to about 0.8 s as the utility threshold increases from 4 to 4.5 million, surpassing the speed of F-UFH but lagging behind EFIM. As depicted in Fig. 12c, memory usage generally declines and stabilizes with utility threshold elevation, except for a minor surge in F-UFH at the maximum utility setting. Figure 12d exhibits the F-UFH experiments on the Diabetes dataset, revealing minimal time for fuzzification but high usage during the mining phase. Finally, Fig. 12e indicates constant memory usage in the mining process, yet decreasing memory consumption during fuzzification with an increasing minimum utility threshold, that ultimately stabilizes.

Evidently, the findings allow us to deduce definitive conclusions about temporal and spatial complexity. Reflecting on temporal complexity, an upward movement in the minimum utility threshold in the Diabetes dataset corresponds with a progressive decline in the quantity of high utility fuzzy itemsets detected. This illustrates the algorithm's enhanced pruning speed at higher utility thresholds, effectively diminishing the search space and thereby reducing temporal complexity. Notably, the UPhist algorithm demonstrates a significant time reduction from 2.5 to about 0.8 s when the utility threshold escalates from 4 to 4.5 million, signaling its high performance optimization at larger utility thresholds. Conversely, the F-UFH technique, while inferior in performance to UPhist, outperforms the EFIM algorithm at increased utility thresholds, underscoring the varying efficiency of different algorithms at different thresholds.

Exemplary performance is demonstrated by the algorithms concerning space complexity. With increasing utility thresholds, a decline and subsequent stabilization are observed in memory consumption, signaling the algorithms' commendable scalability in the realm of space complexity. One outlier to note is the increased memory usage of F-UFH at the maximum utility threshold, potentially due to the processing of a larger number of fuzzy itemsets at high thresholds. Referring to Fig. 12d, the application of the F-UFH model to the Diabetes dataset centers most processing time in the mining phase, with less time consumed in the fuzzification phase, indicating the high spatial complexity of the mining phase. Yet, Fig. 12e reveals consistent memory usage during the mining process with decreasing memory utilization during fuzzification, which ultimately stabilizes as the minimum utility threshold is enhanced. This underlines the algorithm's scalability in relation to space complexity.

Consolidating the aforementioned analysis leads us to infer that a rising minimum utility threshold corresponds to reduced temporal complexity, while spatial complexity commendably retains its scalability. Performance divergences are evident among algorithms interacting with different thresholds, thus underscoring the need to judiciously select algorithms congruent with specified scenarios in practical applications.

Table 11 presents a comparative evaluation of the IF-HUPM and FHTFUP^56,57 algorithms in terms of runtime and memory usage. In our empirical assessment, we employed the Mushroom dataset. The comparison reveales that the IF-HUPM algorithm reported lower average temporal and spatial complexities as opposed to the FHTFUP algorithm. This quantitative data emphasizes the superior performance efficiency of IF-HUPM, signifying a balanced achievement between execution time and memory consumption.

Table 11 Results of statistical tests for performance.

Full size table

Besides, the final results of the algorithm proposed in this article are shown in the following Table 12. Each row represents a HUI, and the last column indicates the fuzzy utility value of that item set. Based on the first entry in Table 12, it can be inferred that based on the three interval of fuzzification(L represent Low, M represents Middle, H represents High based on Table 3). The BMI.H, FBG.H, HbAlc.H, ALT.L can form a HUI with a utility value of 1,147,576. When high-utility patterns occur frequently, interpretable fuzzy rules can be derived based on association rules. The high-utility fuzzy association rule that the corresponds to the most significant utility value when health indicators are poor can be explained. The fuzzy association rule can be presented like below for example, if BMI is High, FBG is High, HbAlc is High and ALT is Low, then health index is Low.

Table 12 The results of IF-HUMP algorithm.

Full size table

Conclusions and future research

We propose an innovative IF-HUIM algorithm for the medical field to overcome the difficulties in understanding and interpreting the HUPM algorithm. We have successfully solved this problem by fuzzifying the database and combining one-phase and two-phase algorithms. The results show that fuzzy processing exhibits relative stability in terms of time and space consumption even in the face of datasets with various features. Moreover, our mining process is more frugal in terms of average time and space requirements compared to EFIM and UPHist, thus demonstrating the efficiency of the algorithm.

Although the comprehensibility and interpretability of the algorithm is improved by fuzzifying the database, this may also lead to an increase in the complexity of the algorithm. In addition, the algorithm is primarily designed for medical data and may require further tuning and optimization for other types of data. This limitation makes the applicability of the algorithm in certain scenarios may be compromised. Nevertheless, our IF-HUIM algorithm still has significant advantages. It is innovative in that it improves the stability and efficiency of the algorithm through fuzzification and the combination of one-phase and two-phase algorithms. This is an important advantage because time and space efficiency are often key determinants of algorithm performance when dealing with large-scale datasets.

In addition, our algorithm shows superior performance when compared to other state-of-the-art algorithms. Compared to EFIM and UPHist, our mining process requires less time and space on average. This means that our algorithm requires fewer resources to process the same dataset and is therefore able to process more data. Our overall goal is to overcome the limitations of existing methods and further improve the effectiveness and applicability of the FHUPM algorithm so that it can better cope with the challenges of multidimensional medical data. Although the IF-HUIM algorithm still needs to be improved in some aspects, we believe that through continuous research and optimization, we have the ability to further improve the performance of the algorithm to better meet the needs of practical applications.

In the future, we plan to focus on the following three areas of research. First, we will explore the application of type-2 fuzzy set theory⁵⁸ to the HUPM algorithm⁵⁹ and compare it with the type-1 fuzzy set mining algorithm. This comparison aims to identify more valuable fuzzy high-utility itemsets in temporal pattern mining^1,8,9,60. Second, we will investigate the interpretability of FHUPM and enhance its informativeness for various application scenarios, such as social network distribution⁵¹, recommendation in mobile app development⁴, anomaly detection⁷, and flow prediction⁶. Finally, building upon existing work, we aim to discover fuzzy association rules based on the fuzzy high utility pattern mining.

In conclusion, the utilization of data mining in the diagnosis and treatment of diabetes enables physicians to extract valuable insights from intricate medical datasets, identify potential association rules and patterns, and offer more precise and personalized diagnostic and treatment recommendations for patients with diabetes. This will significantly contribute to enhancing health management and optimizing treatment outcomes through improved interpretation of medical diagnostic decisions.

Data availability

The dataset used in this paper is a publicly available dataset sourced from the internet, and it can be accessed from the following website: https://www.kaggle.com/uciml/pima-indians-diabetes-database.

References

Chen, J. et al. iPAS: A deep Monte Carlo Tree Search-based intelligent pilot-power allocation scheme for massive MIMO system [J]. Digital Commun. Netw. 7(3), 362–372 (2021).
Article Google Scholar
Zhang, P., Huang, X. & Zhang, L. Information mining and similarity computation for semi-un-structured sentences from the social data [J]. Digital Commun. Netw. 7(4), 518–525 (2021).
Article Google Scholar
Zhang, Z., Li, X. & Gan, C. Identifying influential nodes in social networks via community structure and influence distribution difference [J]. Digital Commun. Netw. 7(1), 131–139 (2021).
Article Google Scholar
Yang, Y. et al. ASTREAM: Data-stream-driven scalable anomaly detection with accuracy guarantee in IIoT environment. IEEE Trans. Netw. Sci. Eng. 1, 1 (2022).
Google Scholar
Shengqi, Wu. et al. Popularity-aware and diverse web APIs recommendation based on correlation graph. IEEE Trans. Comput. Soc. Syst. 10(2), 771–782 (2023).
Article Google Scholar
Ali, A. et al. Temporal pattern mining from user-generated content [J]. Digital Commun. Netw. 8(6), 1027–1039 (2022).
Article Google Scholar
Wang, F. et al. Privacy-aware traffic flow prediction based on multi-party sensor data with zero trust in smart city [J]. ACM Trans. Internet Technol. 23(3), 1–19 (2023).
ADS Google Scholar
Mowafi, M., Taqieddin, E. & Al-Dahoud, H. Energy efficient fuzzy-based DASH adaptation algorithm [J]. Digital Commun. Netw. 7(1), 113–119 (2021).
Article Google Scholar
Wang, F. et al. Edge-cloud-enabled matrix factorization for diversified APIs recommendation in Mashup creation. World Wide Web J. 25(5), 1809–1829 (2022).
Article Google Scholar
Wang, G. & Jiang, P. Overview of data mining [J]. J. Tongji Univ. 32(2), 246–252 (2004).
Google Scholar
Radhakrishna, V. et al. Challenge paper: The vision for time profiled temporal association mining [J]. J. Data Inf. Qual. (JDIQ) 13(2), 1–8 (2021).
Article Google Scholar
Gan, W. et al. Fast utility mining on sequence data [J]. IEEE Trans. Cybern. 51(2), 487–500 (2020).
Article Google Scholar
Shaheen, M. & Abdullah, U. CARM: Context based association rule mining for conventional data [J]. Comput. Mater. Contin. 68(3), 1 (2021).
Google Scholar
Shaheen, M. & Shahbaz, M. An algorithm of association rule mining for microbial energy prospection[J]. Sci. Rep. 7(1), 46108 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Khan, S. & Shaheen, M. From data mining to wisdom mining [J]. J. Inf. Sci. 49(4), 952–975 (2023).
Article Google Scholar
Khan, S. & Shaheen, M. WisRule: First cognitive algorithm of wise association rule mining [J]. J. Inf. Sci. 86, 95 (2022).
Google Scholar
Kong, L. et al. LSH-aware multi-type health data prediction with privacy preservation in edge environment [J]. World Wide Web 1, 1–16 (2021).
Google Scholar
Wang, F. et al. Robust collaborative filtering recommendation with user-item-trust records [J]. IEEE Trans. Comput. Soc. Syst. 9(4), 986–996 (2021).
Article Google Scholar
Shaheen, M., Rehman, S. & Ghaffar, F. Correlation and congruence modulo based clustering technique and its application in energy classification [J]. Sustain. Comput. Inf. Syst. 30, 100561 (2021).
Google Scholar
Zadeh, L. A. Fuzzy sets [J]. Inf. Control 8(3), 338–353 (1965).
Article Google Scholar
Pan, X. & Xu, Y. Redefinition of the concept of fuzzy set based on vague partition from the perspective of axiomatization [J]. Soft Comput. 22(6), 1777–1789 (2018).
Article Google Scholar
Srikant, R. & Agrawal, R. Mining generalized association rules [J]. Future Gen. Comput. Syst. Int. J. Sci. 13(2–3), 161–180 (1997).
Article Google Scholar
Han, J. & Pei, J. Mining frequent patterns without candidate generation [J]. Assoc. Comput. Mach. 29(2), 1–12 (2000).
MathSciNet Google Scholar
Liu, Y., Liao, W. K., & Choudhary, A. A two-phase algorithm for fast discovery of high utility itemsets [C]. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 689–695 (2005).
Zhao, Z., Ma, Z. & Yan, L. An efficient classification of fuzzy XML documents based on kernel ELM [J]. Inf. Syst. Front. 23(3), 515–530 (2021).
Article Google Scholar
Chen, Q. et al. Cookie swap party: Abusing first-party cookies for web tracking [C]. Proc. Web Conf. 2021, 2117–2129 (2021).
Google Scholar
Gan, W., Lin, J. C. W. & Fournier-Viger, P. A survey of utility-oriented pattern mining [J]. IEEE Trans. Knowl. Data Eng. 33(4), 1306–1327 (2021).
Article Google Scholar
Wu, J. M., Lin, J. C. & Tamrakar, A. High-utility itemset mining with effective pruning strategies [J]. ACM Trans. Knowl. Discov. Data 13(6), 1556–4681 (2019).
Article Google Scholar
Duong, Q. H. et al. Efficient high utility itemset mining using buffered utility-lists [J]. Appl. Intell. 48(7), 1859–1877 (2018).
Article Google Scholar
Krishnamoorthy, S. Efficient mining of high utility itemsets with multiple minimum utility thresholds [J]. Eng. Appl. Artif. Intell. 69, 112–126 (2018).
Article Google Scholar
Wu, D. et al. Interval type two fuzzy sets and fuzzy systems: Summary and prospect [J]. Acta Autom. Sin. 46(08), 1539–1556 (2020).
Google Scholar
Wang, C.-M., Chen, S.-H., & Huang, Y.-F. A fuzzy approach for mining high utility quantitative itemsets. In 2009 IEEE International Conference on Fuzzy Systems, Jeju, Korea (South), pp. 1909–1913. https://doi.org/10.1109/FUZZY.2009.5277408 (2009).
Fournier-Viger, P. et al. A survey of itemset mining [J]. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 7(4), e1207 (2017).
Article Google Scholar
Mohbey, K. High fuzzy utility based frequent patterns mining approach for mobile web services sequences [J]. Int. J. Eng. 30(2), 182–191 (2017).
Google Scholar
Wu, J. M.-T., Lin, J. C.-W., Fournier-Viger, P., Wiktorski, T., Hong, T.-P., & Pirouz, M. A GA-based framework for mining high fuzzy utility itemsets. In 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, pp. 2708–2715 (2019).
Wu, J. M. T., Lin, J. C. W., & Fournier-Viger, P., et al. A Ga-based framework for mining high fuzzy utility itemsets [C]. In 2019 IEEE international conference on big data (big data). IEEE, pp. 2708–2715 (2019).
Ganesan, M. & Shankar, S. High utility fuzzy product mining (HUFPM) using investigation of HUWAS approach [J]. J. Ambient Intell. Hum. Comput. 1, 1–11 (2022).
Google Scholar
Gan, W. et al. Explainable fuzzy utility mining on sequences [J]. IEEE Trans. Fuzzy Syst. 29(12), 3620–3634 (2021).
Article Google Scholar
Wan, S., Ye, Z., Gan, W., et al. Temporal fuzzy utility maximization with remaining measure [J]. arXiv preprint arXiv:2208.12439 (2022).
Ryu, T. et al. Scalable and efficient approach for high temporal fuzzy utility pattern mining [J]. IEEE Trans. Cybern. 1, 1–17 (2022).
Google Scholar
Xie, S. & Zhao, L. An efficient algorithm for mining stable periodic high-utility sequential patterns [J]. Symmetry 14(10), 2032 (2022).
Article ADS Google Scholar
Dinh, T., Fournier-Viger, P., & Van Hong, H. Mining compact high utility sequential patterns [J]. arXiv preprint arXiv:2302.11426 (2023).
Huang, S., Gan, W., Miao, J., et al. Targeted mining of top-k high utility itemsets [J]. arXiv preprint arXiv:2303.14510 (2023).
Mansoor, M. et al. Deep learning based semantic similarity detection using text data [J]. Inf. Technol. Control 49(4), 495–510 (2020).
Article Google Scholar
Biedma-Rdguez, C., Gacto, M. J., Alcalá, R., et al. Meta-fuzzy items for fuzzy association rules[C]. In 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, pp. 1–6 (2021).
Hong, T. P. et al. One-phase temporal fuzzy utility mining [C]. IEEE Int. Conf. Fuzzy Syst. 2020, 1–5 (2020).
Google Scholar
Lin, J. C. W. et al. Mining multiple fuzzy frequent patterns with compressed list structures [C]. IEEE Int. Conf. Fuzzy Syst. 2020, 1–8 (2020).
Google Scholar
Wu, T. et al. An efficient algorithm for fuzzy frequent itemset mining [J]. J. Intell. Fuzzy Syst. 38(5), 5787–5797 (2020).
Article Google Scholar
Al-diabat, M. Fuzzy data mining for autism classification of children [J]. Int. J. Adv. Comput. Sci. Appl. 9(7), 11–17 (2018).
Google Scholar
Dawar, S., Goyal, V. & Bera, D. A hybrid framework for mining high-utility itemsets in a sparse transaction database [J]. Appl. Intell. 47, 809–827 (2017).
Article Google Scholar
Tseng, V. S., Wu, C. W., Shie, B. E., et al. UP-growth: An efficient algorithm for high utility itemset mining [C]. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 253–262 (2010).
Fournier-Viger, P., Wu, C., Zida, S., et al. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning [C]. In International symposium on methodologies for intelligent systems, pp. 83–92 (2014).
Zida, S., Fournier-Viger, P., Lin, J. C., et al. EFIM: A highly efficient algorithm for high-utility itemset mining [C]. In Mexican international conference on artificial intelligence, 230–546 (2015).
Dawar, S., & Goyal, V. UP-Hist tree: An efficient data structure for mining high utility patterns from transaction databases [C]. In Proceedings of the 19th International Database Engineering & Applications Symposium, pp. 56–61 (2015).
https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.
Hong, T. et al. Using tree structure to mine high temporal fuzzy utility itemsets. IEEE Access 8, 153692–153706 (2020).
Article Google Scholar
Ryu, T. et al. Scalable and efficient approach for high temporal fuzzy utility pattern mining. IEEE Trans. Cybern. 53(12), 7672–7685. https://doi.org/10.1109/TCYB.2022.3198661 (2023).
Article PubMed Google Scholar
Jing, C. et al. Fuzzy frequent pattern mining algorithm based on weighted sliding window and type-2 fuzzy sets over medical data stream [J]. Wirel. Commun. Mobile Comput. 1, 1–17 (2021).
Google Scholar
Miao, Y. et al. A novel short-term traffic prediction model based on SVD and ARIMA with blockchain in industrial internet of Things. IEEE Internet Things J. 1, 1 (2023).
Google Scholar
Kong, L. et al. Time-aware missing healthcare data prediction based on ARIMA model. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 1 (2022).
Article Google Scholar

Download references

Funding

The subject is sponsored by the National Natural Science Foundation of P. R. China (No. 62102194, No. 62102196). Natural Science Foundation of Inner Mongolia Autonomous Region of China (No. 2022MS06010), Natural Science Research Project of Department of Education of Guizhou Province (No. QJJ2022015). Scientific Research Project of Baotou Teachers' College (BSYKY2021-ZZ01, BSYHY202212, BSYHY202211, BSJG23Z07).

Author information

Authors and Affiliations

School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
Jing Chen & Hongjun Zhang
Baotou Teachers’ College of Inner Mongolia University of Science and Technology, Baotou, 014030, Inner Mongolia, China
Jing Chen & Aijun Liu
School of Physics and Mechatronic Engineering, Guizhou Minzu University, Guiyang, 550025, China
Shengyi Yang
Software and Computational Systems Program, Data 61, CSIRO, Canberra, Australia
Hui Zheng
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
Ning Zhou & Peng Li
Institute of Network Security and Trusted Computing, Nanjing, 210023, China
Peng Li

Authors

Jing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Aijun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hongjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shengyi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Peng Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.J.: writing original draft, review response, commentary, revision. L.A. and Z.N.: writing original draft, commentary. Z.H.: validations. Y.S.: commentary, revision, validations. Z.H.: validation, conceptualization. L.P.: conceptualization, funding acquisition, methodology, supervision, review.

Corresponding author

Correspondence to Peng Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Glossary of terms

See Table

Table 13 Glossary of terms.

Full size table

13.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, J., Liu, A., Zhang, H. et al. Improved adaptive-phase fuzzy high utility pattern mining algorithm based on tree-list structure for intelligent decision systems. Sci Rep 14, 945 (2024). https://doi.org/10.1038/s41598-023-50375-y

Download citation

Received: 17 August 2023
Accepted: 19 December 2023
Published: 10 January 2024
DOI: https://doi.org/10.1038/s41598-023-50375-y

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

A new multi-attribute group decision-making method based on Einstein Bonferroni operators under interval-valued Fermatean hesitant fuzzy environment

Entropy for q-rung linear diophantine fuzzy hypersoft set with its application in MADM

Distance and similarity measures for normal wiggly dual hesitant fuzzy sets and their application in medical diagnosis

Introduction

Related work on fuzzy high-utility pattern mining (FHUPM)

FHUPM algorithm and framework

Hybrid high utility framework

Basic definitions

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

Definition 7

Definition 8

Fuzzy high utility itemset construction

The proposed algorithm

Improved fuzzy high-utility pattern mining algorithm

Fuzzy hybrid mining (FHM) algorithm

The IF-HUPM algorithm

Experimental evaluation

Conclusions and future research

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Appendix: Glossary of terms

Appendix: Glossary of terms

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links