A spatial data pre-processing tool to improve the quality of the analysis and to reduce preparation duration
Introduction
Understanding the effects of the spatial environment on the performance of an activity is a real advantage for many organizations in the public and private sectors. With the changing capabilities and costs of technologies, more and more organizations are accumulating data on their activities (such as sales metrics), which include spatial characteristics such as addresses or GPS coordinates. At the same time, there is an increasing amount of data made available on elements that potentially influence the performance of these organizations’ activities. For these reasons, significant research in various fields has aspired to extract relevant information to understand what actually influences an activity. Recently, Mennis and Guo (2009) said that spatial data mining is a trending area. Spatial data mining, as defined by Koperski and Han (1995), consists of extracting implicit knowledge from spatial data. This research field is an extension of the Knowledge Discovery from Databases (KDD) introduced by Fayyad, Piatetsky-Shapiro, and Smyth (1996). However, many existing data mining algorithms are not able to take advantage of the spatial aspect of the data. Thus, spatial components have to be prepared to be taken into consideration, but this preparation of spatial data is a complex and tedious task.
The aim of this research is to present a tool that automates the pre-processing of the spatial data, removing the GIS skills requirement and allowing for improvement in the analysis quality and savings in processing time.
The next Section 2 presents the elements of the literature related to spatial data analysis and pre-processing to allow a better understanding of the problems that arise from the consideration of spatial data. Section 3 first presents the specificities related to the preparation of spatial data, then it focuses on how the choices made in this pre-processing may influence additional analysis quality. Section 4 presents the specifications of our approach. Technical aspects related to the implementation of our solution are also presented. Section 5 permits an evaluation of the improvements provided by our tool. For this, a case study with real data shows the pre-processing tasks with and without our tool and how it performs. Finally, the limitations and perspectives of our research are discussed.
Section snippets
Spatial decision-making
Thirty years ago, Schmidt (1983) revealed that localization decisions were made quickly by people without experience or knowledge of the issues involved. Decisions were made subjectively with few requirements and considering only a small portion of existing options. At the same time, Herring and DeBinder (1981) argued that the use of computer tools could greatly improve the localization decision-making process. A few years ago MacEachren and Kraak (2001) noted that many problems in the
Necessity and complexity of pre-processing
To illustrate the problems associated with spatial data pre-processing, the following section focuses on the real case of a partner company for which we develop a SDSS for a retail perspective. The company works in construction materials and distributes its products through third-party retailers. As mentioned by Cliquet et al., 2006, Dubelaar et al., 2002 in the particular case of the retail sector, knowledge of the environment can be a major competitive advantage for improving performance. In
Specifications and technical aspects
As the tool developed guides users through several steps, a first scheme that shows that sequence of steps is proposed and each step is then detailed. Next, the prerequisites necessary for the implementation of this solution are mentioned, a possible architecture is proposed and optimizations to improve usability are presented.
Case study to evaluate improvements
First, to support the efficiency of the proposed pre-processing tool, the next subsections describe the steps to be taken with (Section 5.1) and without it (Section 5.2). Second, the quality improvement in further analysis is presented (Section 5.3).
Conclusions and perspectives
Numerous studies denounce the complexity and time-cost of spatial data pre-processing. Few studies have tried to address these issues and have proposed methodological approaches or frameworks to facilitate pre-processing. Although these studies aim to simplify spatial data pre-processing, they do not provide solutions to the need for knowledge of GIS, or to the difficulty of choosing the spatial relations to be taken into account. From this observation, our research proposes a tool that
Acknowledgments
This research was supported by the FORAC research consortium and its partners, as well as financial support from NSERC.
References (50)
- et al.
How do consumers define retail centre convenience?
Australasian Marketing Journal (AMJ)
(2009) - et al.
Measuring retail productivity: What really matters?
Journal of Business Research
(2002) - et al.
Customer profitability forecasting using big data analytics: A case study of the insurance industry
Computers & Industrial Engineering
(2016) - et al.
Covering problems in facility location: A review
Computers & Industrial Engineering
(2012) - et al.
Design and implementation of a web-based platform to support interactive environmental planning
Computers, Environment and Urban Systems
(2009) - et al.
Use of distributed data sources in facility location
Computers & Industrial Engineering
(2012) - et al.
A computer-aided modeling system for geobased data
Computers & Industrial Engineering
(1981) - et al.
Spatial data mining and geographic knowledge discovery—An introduction
Computers, Environment and Urban Systems
(2009) - et al.
The retail site location decision process using GIS and the analytical hierarchy process
Applied Geography
(2013) - et al.
The evolving concept of retail attractiveness: What makes retail agglomerations attractive when customers shop at them?
Journal of Retailing and Consumer Services
(2008)
Geographic information systems in warehouse site selection decisions
International Journal of Production Economics
Creating a spatial multi-criteria decision support system for energy related integrated environmental impact assessment
Environmental Impact Assessment Review
A spatial-based kdd process to better understand the spatiotemporal phenomena
Exploratory analysis of spatial data using interactive maps and data mining
Cartography and Geographic Information Science
Geoda: An introduction to spatial data analysis
Geographical Analysis
Discovery of spatial association rules in geo-referenced census data: A relational mining approach
Intelligent Data Analysis
A knowledge-based approach for supporting locational decision making
Environment and Planning B: Planning and Design
Spatial data preparation for knowledge discovery
The knowledge discovery process
A small set of formal topological relationships suitable for end-user interaction
Management de la distribution
Development of business spatial analysis tools: Methodology and framework
Spatial decisions support systems
Categorizing binary topological relations between regions, lines, and points in geographic databases
Cited by (3)
Improved email classification through enhanced data preprocessing approach
2021, Spatial Information Research