Description
|
These datasets resemble an overlap of two patterns. Thereby, a range of configurations with respect to geographic scale is contained. Regarding the attribute values attached to the geometric points, we resemble the situation of spatial heterogeneity (i.e., differing means, but constant variance across the patterns). Overall, the aim of these datasets is to resemble the overlap of different phenomena within social media data, especially Twitter and similar kinds of feeds where users contribute fully autonomously. (2016-03-18)
|
Notes
| These data resemble different configurations of overlapping patterns. Thereby, the geographic scales of the patterns are varied, and possible scale differences are combined with each other. This has been done for 90 configurations, whereby random sampling (geometry as well as attributes) was done 100 times for each pattern configuration. Overall, 9.000 patterns are contained. The datasets contained in the uploaded ZIP file are organized into four folders: overlap_lws_non_rev, overlap_lws_rev, overlap_swl_non_rev and overlap_swl_rev. Therebey, swl is acronym to "small with large," while lws stands for "large with small" respectively. The reason for this denominator scheme is that, in case of lws, 23.8% of the points from the larger-scale sub-pattern interact with at least one point from the smaller-scale counterpart. With swl the same has been done, but vice versa. When the the scale differences become too large, the ratio of 23.8% is no longer achievable (for purely geometric reasons). In these cases, a closest fit solution has been seeked instead. The suffix "non_rev" meant that the attribute values were dispersed in a way such that the values increase from center towards pattern boundary. "rev" refers to a reversed attribute dispersal mechanism. Each single CSV file follows the following naming scheme: clust_rep_min_max. "rep" thereby refers to the repetition (1 - 100), "min" denotes the minimum distance (m) at which any point within the larger-scale pattern interacts, "max" refers to the maximum point spacing distance accordingly. Note that the scale of the small-scale pattern has been fixed to [1,10], only the scale of the large scale-pattern was varied. Within these files, the following columns are contained: The first column (no title) contains unique row IDs. The second column ("V1") contains the X-part of the geographic coordinates. The third column ("V2") contains the Y-part of the geographic coordinates. The fourth column ("V3") contains an integer describing the association to one of the two contained sub-patterns. 0 thereby means "small-scale," while any other number indicates association to the large-scale pattern. The fifth column ("vals") contains Gaussian random values. The values for the small-scale pattern have been drawn from N(250,150) while the values for the large-scale pattern have been drawn from N(750,150). The sixth and last column ("lag") contains the spatial lag of each observation. Thereby, the lag is adjusted to neighborhoods according to the larger-scale pattern (i.e., cut-off distance of "max"). The employed spatial weights were generated by an inverse distance weighting scheme. |