Any information found on Location Based Social Networks (LBSN) can become the context of a visual analytics question - which can be called a measurement, metric, or overlay (see Dunkel et al. 2018, p. 14).
From a user privacy perspective, some metrics are more problematic than others. As a means to systematically organize various visual analytic questions, a task matrix has been proposed in Dunkel et al. 2018.
The LBSN Structure presented here directly reflects the components of this task matrix. The following components can be summarized:
Relationships: Objects on LBSN relate to each other in various ways. For example, several Posts are related to a particular User. Or, two users are related by being connected, e.g. as Friends. These relationships are organized under Interlinkage, which can also be considered as the 5th Facet.
Bases: Objects can be broken down further in bases. A Post, for example, consists of several attributes, which we consider bases, such as its title, the post_body (the content and description), or other information automatically added (the timestamp of publication).
Metrics (Overlays): Any base can become the context of analysis, which is expressed in the task matrix presented in Dunkel et al. 2018. Typical metrics (overlays) in visual analytics include:
- postcount: The number of posts for a particular context ("PC")
- usercount: The number of users for a particular context ("UC")
- userdays: The number of cumulative distinct user count per day for a particular context ("PUD", as coined by Wood, Guerry, Silver and Lacayo 2013)
Note the difference
The difference between a base and an overlay may not be immediately obvious. A base defines the underlying context that is explored. For example, a specific region (spatial facet), or a distinct temporal window (temporal facet) would be considered a base. A metric (or overlay) reflects what is measured, e.g. the number of posts (postcount), the number of users (usercount) or the number of distinct user days (userdays).
These were the most frequent metrics that we observed in practice. However, many other metrics exist and this list is not exhaustive.
Immediately, the connection between bases, metrics, and privacy becomes obvious. To measure "postcounts", one needs to count distinct number of posts. Posts are typically referenced by an ID, a unique identifier. Each of these IDs is a reference to a person in a specific situation. Similarly, user IDs allow to identify users across several posts. Such unique identifiers are therefore the primary cause of privacy conflicts.
In a conference paper, we have proposed a general conceptual frame for systematically improving privacy-awareness in various visual analytic questions (see Löchner et al. 2019). In this conceptual frame, using HyperLogLog, a cardinality estimation algorithm by Flajolet et al. (2007), is proposed as a key to mitigating privacy risks.
By solving the count distinct problem, HyperLogLog can be directly applied to key metrics used in LBSN visual analytics, such as usercount, postcount, or userdays. However, since HyperLogLog is not, per se, privacy preserving, it must be combined with other approaches and components (see Desfontaines et al. 2018).
In the Tutorial & User Guide section, we demonstrate how LBSN Structure can be applied, and present and discuss several approaches to privacy-aware processing, in a more detailed fashion and for specific data processing examples.
This metric-section of the LBSN Structure is in a very early stage of development. Ideally, we hope that this section can be revised frequently to reflect a broader range of application contexts in the future.
Desfontaines, D., Lochbihler, A., & Basin, D. (2018). Cardinality Estimators do not Preserve Privacy. 1–21.
Dunkel, A., Andrienko, G., Andrienko, N., Burghardt, D., Hauthal, E., & Purves, R. (2018). A conceptual framework for studying collective reactions to events in location-based social media. International Journal of Geographical Information Science, 00(00), 1–25. https://doi.org/10.1080/13658816.2018.1546390
Flajolet, P., Fusy, E., Gandouet, O., & Meunier, F. (2007). HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Conference on Analysis of Algorithms, AofA 07. Nancy, France.
Löchner, M., Dunkel, A., Burghardt, D. (2019). Protecting privacy using HyperLogLog to process data from Location Based Social Networks. 1–7.
Wood, S., Guerry, A. D., Silver, J. M., & Lacayo, M. (2013). Using social media to quantify nature-based tourism and recreation. Scientific Reports, 3, 2976. http://doi.org/10.1038/srep02976