Skip Navigation

IEICE Transactions on Information and Systems 2006 E89-D(8):2332-2339; doi:10.1093/ietisy/e89-d.8.2332
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by YUKIZANE, T.
Right arrow Articles by HIROSE, H.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2006 The Institute of Electronics, Information and Communication Engineers

Special Section on Invited Papers from New Horizons in Computing -- Papers

The Bump Hunting Method Using the Genetic Algorithm with the Extreme-Value Statistics

Takahiro YUKIZANE1, Shin-ya OHI1, Eiji MIYANO1 and Hideo HIROSE1

1 The authors are with the Department of Systems Innovation and Informatics, Kyushu Institute of Technology, Fukuoka-shi, 820–8502 Japan. E-mail: yukizane{at}ume98.ces.kyutech.ac.jp, E-mail: ohi{at}ume98.ces.kyutech.ac.jp, E-mail: miyano{at}ces.kyutech.ac.jp, E-mail: hirose{at}ces.kyutech.ac.jp

In difficult classification problems of the z-dimensional points into two groups giving 0-1 responses due to the messy data structure, we try to find the denser regions for the favorable customers of response 1, instead of finding the boundaries to separate the two groups. Such regions are called the bumps, and finding the boundaries of the bumps is called the bump hunting. The main objective of this paper is to find the largest region of the bumps under a specified ratio of the number of the points of response 1 to the total. Then, we may obtain a trade-off curve between the number of points of response 1 and the specified ratio. The decision tree method with the Gini's index will provide the simple-shaped boundaries for the bumps if the marginal density for response 1 shows a rather simple or monotonic shape. Since the computing time searching for the optimal trees will cost much because of the NP-hardness of the problem, some random search methods, e.g., the genetic algorithm adapted to the tree, are useful. Due to the existence of many local maxima unlike the ordinary genetic algorithm search results, the extreme-value statistics will be useful to estimate the global optimum number of captured points; this also guarantees the accuracy of the semi-optimal solution with the simple descriptive rules. This combined method of genetic algorithm search and extreme-value statistics use is new. We apply this method to some artificial messy data case which mimics the real customer database, showing a successful result. The reliability of the solution is discussed.

Key Words: data mining, data science, bump hunting, genetic algorithm, extreme-value statistics, trade-off curve, decision tree, bootstrap


Manuscript received February 17, 2006. Manuscript revised April 17, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.