ABSTRACT
Crime has been prevalent in our society for a very long time and it continues to be so even today. Currently, many cities have released crime-related data as part of an open data initiative. Using this as input, we can apply analytics to be able to predict and hopefully prevent crime in the future. In this work, we applied big data analytics to the San Francisco crime dataset, as collected by the San Francisco Police Department and available through the Open Data initiative. The main focus is to perform an in-depth analysis of the major types of crimes that occurred in the city, observe the trend over the years, and determine how various attributes contribute to specific crimes. Furthermore, we leverage the results of the exploratory data analysis to inform the data preprocessing process, prior to training various machine learning models for crime type prediction. More specifically, the model predicts the type of crime that will occur in each district of the city. We observe that the provided dataset is highly imbalanced, thus metrics used in previous research focus mainly on the majority class, disregarding the performance of the classifiers in minority classes, and propose a methodology to improve this issue. The proposed model finds applications in resource allocation of law enforcement in a Smart City.
- Yehya Abouelnaga. San Francisco crime classification. arXiv preprint arXiv:1607.03626, 2016.Google Scholar
- Tahani Almanie, Rsha Mirza, and Elizabeth Lor. Crime prediction based on crime types and using spatial and temporal criminal hotspots. arXiv preprint arXiv:1508.02050, 2015.Google Scholar
- Exegetic Andrew B. Collier. Making Sense of Logarithmic Loss. http://www.exegetic.biz/blog/2015/12/making-sense-logarithmic-loss/, 2015.Google Scholar
- Shen Ting Ang, Weichen Wang, and Silvia Chyou. San Francisco crime classification. University of California San Diego, 2015.Google Scholar
- J. Bruin. Ucla: Multinomial logistic regression @ONLINE, February 2011.Google Scholar
- City and County of San Francisco. Police Department Incidents. https://data.sfgov.org/Public-Safety/Police-Department-Incidents/tmnf-yvry/, 2017.Google Scholar
- DataSF. Open government. https://www.data.gov/open-gov/. Accessed 2018-04-12.Google Scholar
- Emre Eftelioglu, Shashi Shekhar, and Xun Tang. Crime hotspot detection: A computational perspective. In Data Mining Trends and Applications in Criminal Science and Investigations, pages 82--111. IGI Global, 2016.Google ScholarCross Ref
- Debopriya Ghosh, Soon Chun, Basit Shafiq, and Nabil R Adam. Big data-based smart city platform: Real-time crime analysis. In Proceedings of the 17th International Digital Government Research Conference on Digital Government Research, pages 58--66. ACM, 2016. Google ScholarDigital Library
- Jelle J Goeman and Saskia le Cessie. A goodness-of-fit test for multinomial logistic regression. Biometrics, 62(4):980--985, 2006.Google ScholarCross Ref
- Jacob Hochstetler, Lauren Hochstetler, and Song Fu. An optimal police patrol planning strategy for smart city safety. In 2016 IEEE 18th International Conference on HPCC/SmartCity/DSS, pages 1256--1263. IEEE, 2016.Google ScholarCross Ref
- Dennis Hsu, Melody Moh, and Teng-Sheng Moh. Mining frequency of drug side effects over a large twitter dataset using apache spark. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pages 915--924. ACM, 2017. Google ScholarDigital Library
- Dan Jurafsky and James H Martin. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, 2009. Google ScholarDigital Library
- Brian Kolo. Binary and Multiclass Classification. Lulu. com, 2011.Google Scholar
- Gabriela Hernandez Larios. Case study report: San Francisco crime classification, 2016.Google Scholar
- Andy Liaw, Matthew Wiener, et al. Classification and regression by randomforest. R news, 2(3):18--22, 2002.Google Scholar
- Shannon J Linning, Martin A Andresen, and Paul J Brantingham. Crime seasonality: Examining the temporal fluctuations of property crime in cities with varying climates. International journal of offender therapy and comparative criminology, 61(16):1866--1891, 2017.Google Scholar
- Nicholas R Lomb. Least-squares frequency analysis of unequally spaced data. Astrophysics and space science, 39(2):447--462, 1976.Google Scholar
- Paolo Neirotti, Alberto De Marco, Anna Corinna Cagliano, Giulio Mangano, and Francesco Scorrano. Current trends in smart city initiatives: Some stylised facts. Cities, 38:25--36, 2014.Google ScholarCross Ref
- Trung T Nguyen, Amartya Hatua, and Andrew H Sung. Building a learning machine classifier with inadequate data for crime prediction. Journal of Advances in Information Technology Vol, 8(2), 2017.Google Scholar
- Philip H Swain and Hans Hauska. The decision tree classifier: Design and potential. IEEE Transactions on Geoscience Electronics, 15(3):142--147, 1977.Google ScholarCross Ref
- Luca Venturini and Elena Baralis. A spectral analysis of crimes in San Francisco. In Proceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics, page 4. ACM, 2016. Google ScholarDigital Library
- Xiaoxu Wu. An informative and predictive analysis of the San Francisco police department crime data, Master Thesis, 2016.Google Scholar
Index Terms
- Exploratory data analysis and crime prediction for smart cities
Recommendations
Big Data-based Smart City Platform: Real-Time Crime Analysis
dg.o '16: Proceedings of the 17th International Digital Government Research Conference on Digital Government ResearchOne of the challenges governments and communities face to achieve smart city goals is dealing with enormous amount of data available - sensors, devices, social media, Web activities and commerce, tracking devices, all generate enormous amount of data, ...
Smart city data analysis
DATA '18: Proceedings of the First International Conference on Data Science, E-learning and Information SystemsSmart City is one of the vital issues in the next coming years as it is estimated that more number of people will be migrating towards city and by 2040 cities is populated by 70% of the world's population. This will give raise to the city management ...
The Digital Crime Tsunami
This study examines the current level of digital crime experience and investigative capabilities of law enforcement in Michigan. Information was obtained through interviews with members of Michigan Sheriff Departments. Following the collection and ...
Comments