e-ISSN : 0975-4024 p-ISSN : 2319-8613   
CODEN : IJETIY    

International Journal of Engineering and Technology

Home
IJET Topics
Call for Papers 2021
Author Guidelines
Special Issue
Current Issue
Articles in Press
Archives
Editorial Board
Reviewer List
Publication Ethics and Malpractice statement
Authors Publication Ethics
Policy of screening for plagiarism
Open Access Statement
Terms and Conditions
Contact Us

ABSTRACT

ISSN: 0975-4024

Title : Removing Duplicate Records from Data Warehouse by Q-gram and Neural Network
Authors : Murtadha M. Hamad, Salih S. Salih
Keywords : Duplicate Detection, Duplicate Elimination, Similarity score, Q-Gram, Neural Network, Key Generation.
Issue Date : Oct-Nov 2016
Abstract :
The problem of discovering and removing duplicated records is one of the main problems in the wide area of data cleaning and data quality in the data warehouse. In this paper, researchers try to find a similar data from a set of data records. A similarity grade is assigned to the data records in relation to other data records based on a similarity between tokens of the data records. Data records whose similarity score with respect to each other is greater than a threshold from one or more groups of data records. In this system, a key is created for each record in the database, as shown in suggested algorithms, where this key is input to Q-grams similarity algorithm that calculates the percentage of similarity between each key and another. We have identified the percentage threshold to be 0.68. If the similarity threshold between the key values is exceeded, it enters to the Neural Network algorithm that works with two-phases training data and testing. The suggested approach is tested through several different data warehouse for the evaluating the efficiency. The accuracy acquired from multi DW has been found to be 96.94%.
Page(s) : 2374-2282
ISSN : 0975-4024 (Online) 2319-8613 (Print)
Source : Vol. 8, No.5
PDF : Download
DOI : 10.21817/ijet/2016/v8i5/160805121