Skip to main content
Log in

Clustering Activity–Travel Behavior Time Series using Topological Data Analysis

  • Original Paper
  • Published:
Journal of Big Data Analytics in Transportation Aims and scope Submit manuscript

Abstract

Over the last few years, traffic data have been exploding and the transportation discipline has entered the era of big data. It brings out new opportunities for doing data-driven analysis, but it also challenges traditional analytic methods. This paper proposes a new divide and combine-based approach to do K-means clustering on activity–travel behavior time series using features that are derived using tools in time series analysis and topological data analysis. Our approach facilitates a case study, where each individual’s daily activity–travel behavior is characterized as a categorical time series consisting of three different levels. Clustering data from five waves of the National Household Travel Survey ranging from 1990 to 2017 suggests that activity–travel patterns of individuals over the last 3 decades can be grouped into three clusters. Results also provide evidence in support of recent claims about differences in activity–travel patterns of different survey cohorts. The proposed method is generally applicable and is not limited only to activity–travel behavior analysis in transportation studies. Driving behavior, travel mode choice, household vehicle ownership, when being characterized as categorical time series, can all be analyzed using the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. A brief review is provided in the appendix. For details on TDA, see Edelsbrunner and Harer (2010); Wang et al. (2018)

References

  • Bubenik P (2015) Statistical topological data analysis using persistence landscapes. J Mach Learn Res 16(1):77–102

    MathSciNet  MATH  Google Scholar 

  • Calabrese F, Diao M, Lorenzo GD, Ferreira J, Ratti C (2013) Understanding individual mobility patterns from urban sensing data: a mobile phone trace example. Trans Res Part C: Emerg Technol 26:301–313

    Article  Google Scholar 

  • Candia J, González MC, Wang P, Schoenharl T, Madey G, Barabási AL (2008) Uncovering individual and collective human dynamics from mobile phone records. J Phys A: Math Theor 41(22):224015

    Article  MathSciNet  Google Scholar 

  • Carlsson G (2009) Topology and data. Bull Am Math Soc 46(2):255–308

    Article  MathSciNet  Google Scholar 

  • Edelsbrunner H, Harer J (2010) Computational Topology. American Mathematical Society, An Introduction

  • Figueiras P, Silva R, Ramos A, Guerreiro G, Costa R, Jardim-Goncalves R (2016) Big data processing and storage framework for its: a case study on dynamic tolling. ASME 2016 International Mechanical Engineering Congress and Exposition

  • Goulias KG (1999) Longitudinal analysis of activity and travel pattern dynamics using generalized mixed markov latent class models. Trans Res Part B: Methodol 33(8):535–558

    Article  Google Scholar 

  • Huang J, Levinson D, Wang J, Zhou J, Zj Wang (2018) Tracking job and housing dynamics with smartcard data. Proc Natl Acad Sci 115(50):12710–12715

    Article  Google Scholar 

  • Jandui Silva LLVSFF Bárbara França (2015) Towards smart traffic lights using big data to improve urban traffic. SMART 2015: The Fourth International Conference on Smart Systems, Devices and Technologies

  • Joh CH, Arentze T, Timmermans H (2001) Pattern recognition in complex activity travel patterns: comparison of euclidean distance, signal-processing theoretical, and multidimensional sequence alignment methods. Trans Res Record J Trans Res Board 1752:16–22

    Article  Google Scholar 

  • Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: an analysis and critique. Strateg Manag J 17(6):441–458

    Article  Google Scholar 

  • Kwan MP (2000) Interactive geovisualization of activity-travel patterns using three dimensional geographical information systems: a methodological exploration with a large data set. Trans Res Part C Emerg Technol 8:185–203

    Article  Google Scholar 

  • Pas EI (1988) Weekly travel-activity behavior. Transportation 15(1):89–109

    Google Scholar 

  • Recker WW, McNally MG, Root GS (1985) Travel/activity analysis: pattern recognition, classification and interpretation. Transp Res Part A Gen 19(4):279–296

    Article  Google Scholar 

  • Shanks JL (1969) Computation of the fast Walsh–Fourier transform. IEEE Trans Comput 18(5):457–459

    Article  Google Scholar 

  • Federal Highway Administration (2017) 2017 National Household Travel Survey. U.S, Department of Transportation, Washington, DC

  • Shoval N, Isaacson M (2007) Sequence alignment as a method for human activity analysis in space and time. Ann Assoc Am Geogr 97:282–297

    Article  Google Scholar 

  • Stoffer DS (1991) Walsh–Fourier analysis and its statistical applications. J Am Stat Assoc 86(414):461–479

    Article  MathSciNet  Google Scholar 

  • Stolz BJ, Harrington HA, Porter MA (2017) Persistent homology of time-dependent functional networks constructed from coupled time series. Chaos Interdiscip J Nonlinear Sci 27(4):47410

    Article  MathSciNet  Google Scholar 

  • Thorndike RL (1953) Who belongs in the family. Psychometrika pp 267–276

    Article  Google Scholar 

  • Wang Y, Ombao H, Chung MK (2018) Topological data analysis of single-trial electroencephalographic signals. Ann Appl Stat 12(3):1506

    Article  MathSciNet  Google Scholar 

  • Wilson C (2001) Activity patterns of canadian women: application of clustalg sequence alignment software. Transp Res Record 1777(1):55–67

    Article  Google Scholar 

  • Zhang A, Kang JE, Axhausen K, Kwon C (2018) Multi-day activity-travel pattern sampling based on single-day data. Transp Res Part C: Emerg Technol 89:96–112

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor and anonymous reviewers whose suggestions helped enhance this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renjie Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: TDA and the First-order Persistence Landscape

Appendix: TDA and the First-order Persistence Landscape

We start with a brief review of topological data analysis (TDA), which is now an emerging area for analyzing big data with complex structures. Using computational homology, TDA is aimed at analyzing the topological features of data and representing these features using low-dimensional representations (Carlsson 2009). The input to TDA is often a set of data points (point cloud) or a function, and persistence homology distills essential topological features in the data, which can then be used together with suitable dissimilarity measures to identify patterns in the data sets. We discuss TDA on functions, which is the approach developed in Sections 2 and 3.

Computational Procedure for TDA on Functions

We look at the method to construct persistence diagrams on functions using the sublevel set filtration. Figure 10 shows the simple procedure of extracting a persistence diagram from a function. Suppose \(y_j = f(j), j=1,\dots , 10\) and let the sublevel set be \(L_r = \{y_j|y_j \le r\}\). TDA is used to construct the persistence diagram based on \(L_r\).

  1. (i)

    When \(r = 0\), a connected component is identified (marked as a blue dot, which is the oldest connected component). The vertical slash line of the second plot records the “birth time \(= 0\)” and the horizontal slash line indicates r. There is no point on the birth/death plot, since no connected components died at \(r=0\).

  2. (ii)

    When \(r = 0.5\), there are two more connected components coming out (indicated in blue); the blue dot in the middle with a blue line connecting it to the dark green dot indicates that the oldest connected component “enlarges” and is “still alive”. The other black vertical slash line in the second plot gives the “birth time” for the other two new connected components. There is no connected component dead yet, and hence no points are shown on the birth/death plot.

  3. (iii)

    When \(r = 1\), all old components “enlarge” and there is one newer component “killed” by the older one. Therefore, there is a “black dot with birth \(= 0.5\) and death \(= 1\)” shown on the second plot.

  4. (iv)

    When \(r = 2\), the last component is “killed, birth \(= 0\), death \(= 2\)”, which is the black dot on the location (0, 2). The other black dot corresponding to (0.5, 1.5) of the second plot tells the “birth and death” of another connected component.

Fig. 10
figure 10

Four pairs of plots, in order from top left to bottom right, to illustrate the procedure of getting the persistence diagram on a function

First-Order Persistence Landscape

First, in the persistence diagram obtained using the sublevel set filtration, the furthest point away from the diagonal line is always born at the minimum value of the function and dies at the maximum value of the function.

Second, referring to the definition of persistence landscape in Section 2.3 from Bubenik (2015), given a persistence diagram \(\{(b_i, d_i), \forall i\}\), the first-order persistence landscape is

$$\begin{aligned} \text{ PL }(\ell ) = \max _i\{ \min (\ell -b_i, d_i-\ell )_+ \}, \end{aligned}$$

where \( \ell \) is a real number. Because the persistence diagram uses a sublevel set filtration, it has the point \((d_{\min }, d_{\max })\). For all \((b_i, d_i)\) that belong to the persistence diagram, \(d_{\min }\le b_i \le d_i \le d_{\max }\). Therefore, for any real number \(\ell \), \(\ell -d_{\min } \ge \ell -b_i\) and \(d_{\max } - \ell \ge d_i - \ell \), which implies that

$$\begin{aligned} \min (\ell -d_{\min }, d_{\max }-\ell )_+ \ge \min (\ell -b_i, d_i-\ell )_+ , \end{aligned}$$

which in turn implies that

$$\begin{aligned} \text{ PL }(\ell )= & {} \max _i\{ \min (\ell -b_i, d_i-\ell )_+ \} \\= & {} \min (\ell -d_{\min }, d_{\max }-\ell )_+. \end{aligned}$$

Finally, let \((d_{\min }, d_{\max }) \subset (D_{\min }, D_{\max })\) and taking grids \(\{D_{\min }+\frac{(\ell -1)*(D_{\max }-D_{\min })}{L-1},\ell =1, 2, \ldots , L\}\), we have

$$\begin{aligned} \text{ PL }(\ell ) = \min (V_1 (\ell ), V_2 (\ell ))_{+}, \end{aligned}$$

where,

$$\begin{aligned} V_1(\ell )= & {} D_{\min } + \frac{(\ell -1) (D_{\max }-D_{\min })}{L-1} - d_{\min } \\ V_2(\ell )= & {} d_{\max } - D_{\min } - \frac{(\ell -1) (D_{\max } - D_{\min })}{L-1}. \end{aligned}$$

These expressions will be used on the WFT function obtained from each time series \(n=1,\ldots ,N\) in Section 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, R., Zhang, J., Ravishanker, N. et al. Clustering Activity–Travel Behavior Time Series using Topological Data Analysis. J. Big Data Anal. Transp. 1, 109–121 (2019). https://doi.org/10.1007/s42421-019-00008-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42421-019-00008-6

Keywords

Navigation