skip to main content
10.1145/3331076.3331114acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Exploratory data analysis and crime prediction for smart cities

Published:10 June 2019Publication History

ABSTRACT

Crime has been prevalent in our society for a very long time and it continues to be so even today. Currently, many cities have released crime-related data as part of an open data initiative. Using this as input, we can apply analytics to be able to predict and hopefully prevent crime in the future. In this work, we applied big data analytics to the San Francisco crime dataset, as collected by the San Francisco Police Department and available through the Open Data initiative. The main focus is to perform an in-depth analysis of the major types of crimes that occurred in the city, observe the trend over the years, and determine how various attributes contribute to specific crimes. Furthermore, we leverage the results of the exploratory data analysis to inform the data preprocessing process, prior to training various machine learning models for crime type prediction. More specifically, the model predicts the type of crime that will occur in each district of the city. We observe that the provided dataset is highly imbalanced, thus metrics used in previous research focus mainly on the majority class, disregarding the performance of the classifiers in minority classes, and propose a methodology to improve this issue. The proposed model finds applications in resource allocation of law enforcement in a Smart City.

References

  1. Yehya Abouelnaga. San Francisco crime classification. arXiv preprint arXiv:1607.03626, 2016.Google ScholarGoogle Scholar
  2. Tahani Almanie, Rsha Mirza, and Elizabeth Lor. Crime prediction based on crime types and using spatial and temporal criminal hotspots. arXiv preprint arXiv:1508.02050, 2015.Google ScholarGoogle Scholar
  3. Exegetic Andrew B. Collier. Making Sense of Logarithmic Loss. http://www.exegetic.biz/blog/2015/12/making-sense-logarithmic-loss/, 2015.Google ScholarGoogle Scholar
  4. Shen Ting Ang, Weichen Wang, and Silvia Chyou. San Francisco crime classification. University of California San Diego, 2015.Google ScholarGoogle Scholar
  5. J. Bruin. Ucla: Multinomial logistic regression @ONLINE, February 2011.Google ScholarGoogle Scholar
  6. City and County of San Francisco. Police Department Incidents. https://data.sfgov.org/Public-Safety/Police-Department-Incidents/tmnf-yvry/, 2017.Google ScholarGoogle Scholar
  7. DataSF. Open government. https://www.data.gov/open-gov/. Accessed 2018-04-12.Google ScholarGoogle Scholar
  8. Emre Eftelioglu, Shashi Shekhar, and Xun Tang. Crime hotspot detection: A computational perspective. In Data Mining Trends and Applications in Criminal Science and Investigations, pages 82--111. IGI Global, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  9. Debopriya Ghosh, Soon Chun, Basit Shafiq, and Nabil R Adam. Big data-based smart city platform: Real-time crime analysis. In Proceedings of the 17th International Digital Government Research Conference on Digital Government Research, pages 58--66. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jelle J Goeman and Saskia le Cessie. A goodness-of-fit test for multinomial logistic regression. Biometrics, 62(4):980--985, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jacob Hochstetler, Lauren Hochstetler, and Song Fu. An optimal police patrol planning strategy for smart city safety. In 2016 IEEE 18th International Conference on HPCC/SmartCity/DSS, pages 1256--1263. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  12. Dennis Hsu, Melody Moh, and Teng-Sheng Moh. Mining frequency of drug side effects over a large twitter dataset using apache spark. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pages 915--924. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dan Jurafsky and James H Martin. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Brian Kolo. Binary and Multiclass Classification. Lulu. com, 2011.Google ScholarGoogle Scholar
  15. Gabriela Hernandez Larios. Case study report: San Francisco crime classification, 2016.Google ScholarGoogle Scholar
  16. Andy Liaw, Matthew Wiener, et al. Classification and regression by randomforest. R news, 2(3):18--22, 2002.Google ScholarGoogle Scholar
  17. Shannon J Linning, Martin A Andresen, and Paul J Brantingham. Crime seasonality: Examining the temporal fluctuations of property crime in cities with varying climates. International journal of offender therapy and comparative criminology, 61(16):1866--1891, 2017.Google ScholarGoogle Scholar
  18. Nicholas R Lomb. Least-squares frequency analysis of unequally spaced data. Astrophysics and space science, 39(2):447--462, 1976.Google ScholarGoogle Scholar
  19. Paolo Neirotti, Alberto De Marco, Anna Corinna Cagliano, Giulio Mangano, and Francesco Scorrano. Current trends in smart city initiatives: Some stylised facts. Cities, 38:25--36, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  20. Trung T Nguyen, Amartya Hatua, and Andrew H Sung. Building a learning machine classifier with inadequate data for crime prediction. Journal of Advances in Information Technology Vol, 8(2), 2017.Google ScholarGoogle Scholar
  21. Philip H Swain and Hans Hauska. The decision tree classifier: Design and potential. IEEE Transactions on Geoscience Electronics, 15(3):142--147, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  22. Luca Venturini and Elena Baralis. A spectral analysis of crimes in San Francisco. In Proceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics, page 4. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xiaoxu Wu. An informative and predictive analysis of the San Francisco police department crime data, Master Thesis, 2016.Google ScholarGoogle Scholar

Index Terms

  1. Exploratory data analysis and crime prediction for smart cities

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        IDEAS '19: Proceedings of the 23rd International Database Applications & Engineering Symposium
        June 2019
        364 pages
        ISBN:9781450362498
        DOI:10.1145/3331076

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 June 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate74of210submissions,35%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader