Module lbsntransform.output.hll.hll_bases
Sample bases and metrics for HLL aggregation
The base classes shown here are only examples that illustrate how to extract typical bases and metrics from Location Based Social Media (LBSN). The structure is motivated by the 4 facets of Social Media discussed in [1]. Base classes are organized per facet in subfolders base/facet
A template-class is provided that allows extending this module for individual needs (see class topical.TemplateBase). The general idea is that each class has
- a NAME, which is a reference of type Tuple(str, str) consisting of (1) its facet (either spatial, temporal, topical or social) and (2) a unique base reference (e.g. latlng, place, region, city or country). In our examples, base classes per facet follow the granularity hierarchy proposed in [4]
- a Key, which is a unique reference (i.e. the "base") that is measured
- a list of addititional (optional) attributes for the base, e.g. the lat-lng key as Postgis Geometry, or a name for the place etc.
- a list of (hll) metrics that are measured for the base (the "overlay"), e.g. a list of post_guids ("post_hll"), or user_guids ("user_hll"), or more complex metrics such as pud_hll (user days, as termed by [2]). These lists will be transformed into a "hll shard".
Additional Notes:
- the structure here is closely aligned with the SQL hll files maintained in [3], if you add any classes or metrics, make youre they're also updated in your hll db
- order is important: the sql commands are constructed dynamically from the class structures defined here, thus it is important that order of keys, attrs and metrics exactly match the sql db definitions. For this reason, OrderedDicts() are used, which store the order in which keys are added
- most classes here make use of (multiple) class inheritance to reduce code duplication
- how bases and metrics are mapped from lbsnstructure entirely depends on individual needs, the mappings demonstrated here are mere examples. Any complex mapping can be added to any of the classes and from any of the lbsnstructure objects (e.g. lbsn.Post, lbsn.Place, lbsn.Reaction, lbsn.User etc.)
[1] Dunkel, A., Andrienko, G., Andrienko, N., Burghardt, D., Hauthal, E., & Purves, R. (2019). A conceptual framework for studying collective reactions to events in location-based social media. International Journal of Geographical Information Science, 33, 4, 780-804.
[2] Sessions, C., Wood, S. A., Rabotyagov, S., & Fisher, D. M. (2016). Measuring recreational visitation at U.S. National Parks with crowd-sourced photo-graphs. Journal of Environmental Management, 183, 703–711. DOI: 10.1016/j.jenvman.2016.09.018
[3] https://gitlab.vgiscience.de/lbsn/databases/hlldb
[4] Löchner, M., Dunkel, A., & Burghardt, D. (2018). A privacy-aware model to process data from location-based social media.
Expand source code
# -*- coding: utf-8 -*-
"""
Sample bases and metrics for HLL aggregation
The base classes shown here are only examples that illustrate how to extract
typical bases and metrics from Location Based Social Media (LBSN). The
structure is motivated by the 4 facets of Social Media discussed in [1].
Base classes are organized per facet in subfolders base/facet
A template-class is provided that allows extending this module for individual
needs (see class topical.TemplateBase). The general idea is that each class has
* a NAME, which is a reference of type Tuple(str, str) consisting of (1)
its facet (either spatial, temporal, topical or social) and (2) a
unique base reference (e.g. latlng, place, region, city or country). In
our examples, base classes per facet follow the granularity hierarchy
proposed in [4]
* a Key, which is a unique reference (i.e. the "base") that is measured
* a list of addititional (optional) attributes for the base, e.g. the lat-lng
key as Postgis Geometry, or a name for the place etc.
* a list of (hll) metrics that are measured for the base (the "overlay"), e.g.
a list of post_guids ("post_hll"), or user_guids ("user_hll"), or more
complex metrics such as pud_hll (user days, as termed by [2]). These
lists will be transformed into a "hll shard".
Additional Notes:
* the structure here is closely aligned with the SQL hll files
maintained in [3], if you add any classes or metrics, make youre they're
also updated in your hll db
* order is important: the sql commands are constructed dynamically from the
class structures defined here, thus it is important that order of keys,
attrs and metrics exactly match the sql db definitions. For this reason,
OrderedDicts() are used, which store the order in which keys are added
* most classes here make use of (multiple) class inheritance to reduce code
duplication
* how bases and metrics are mapped from lbsnstructure entirely depends on
individual needs, the mappings demonstrated here are mere examples. Any
complex mapping can be added to any of the classes and from any of the
lbsnstructure objects (e.g. lbsn.Post, lbsn.Place, lbsn.Reaction,
lbsn.User etc.)
[1] Dunkel, A., Andrienko, G., Andrienko, N., Burghardt, D., Hauthal,
E., & Purves, R. (2019). A conceptual framework for studying collective
reactions to events in location-based social media. International
Journal of Geographical Information Science, 33, 4, 780-804.
[2] Sessions, C., Wood, S. A., Rabotyagov, S., & Fisher, D. M. (2016).
Measuring recreational visitation at U.S. National Parks with
crowd-sourced photo-graphs. Journal of Environmental Management,
183, 703–711. DOI: 10.1016/j.jenvman.2016.09.018
[3] https://gitlab.vgiscience.de/lbsn/databases/hlldb
[4] Löchner, M., Dunkel, A., & Burghardt, D. (2018).
A privacy-aware model to process data from location-based social media.
"""
import inspect
import sys
from collections import OrderedDict, namedtuple
from typing import List
import lbsnstructure as lbsn
from lbsntransform.tools.helper_functions import HelperFunctions as HF
# named tuple of defined hll metrics
HllMetrics = namedtuple( # pylint: disable=C0103
"HllMetricsTuple",
"user_hll post_hll pud_hll latlng_hll upl_hll utl_hll "
"upt_hll term_hll place_hll",
defaults=(None,) * 9,
)
HllBaseRef = namedtuple("HllBaseRefTuple", "facet base")
BaseRecordValue = namedtuple("BaseRecordValue", "record metrics")
# global static variable (dict) to register
# the base classes defined in this module, e.g.
# BASE_REGISTER = {
# LatLngBase.NAME: LatLngBase,
# PlaceBase.NAME: PlaceBase,
# CityBase.NAME: CityBase,
# ...
# }
BASE_REGISTER = {}
# global static dict for (hard-coded) base header,
# list of sql column names in correct order
# BASE_HEADER = {}
BASE_KEY = {}
BASE_ATTRS = {}
BASE_METRICS = {}
def register_classes():
"""Function to dynamically register base classes for each facet"""
for facet in ["social", "spatial", "topical", "temporal"]:
for __, obj in inspect.getmembers(
sys.modules[f"lbsntransform.output.hll.base.{facet}"]
):
if inspect.isclass(obj):
if hasattr(obj, "NAME"):
BASE_REGISTER[obj.NAME] = obj
BASE_KEY[obj.NAME] = obj().get_key()
BASE_ATTRS[obj.NAME] = obj().get_attr_keys()
BASE_METRICS[obj.NAME] = obj().get_metric_keys()
def merge_base_metrics(base1, base2):
"""Merge two base-metrics by union of its set values"""
# merge metric dicts
for key in base1.metrics.keys():
new_set = base2.metrics.get(key)
if new_set is None:
continue
base1.metrics[key] |= new_set
def append_baserecord(base_records: List["HllBase"], base_record: "HllBase"):
"""Append base_record to list, if all keys have valid values (not None)"""
if not base_record:
return
if None in base_record.key.values():
return
base_records.append(base_record)
def base_factory(facet=None, base=None, record: lbsn.Post = None):
"""Base is initialized based on facet-base tuple
and constructed by parsing lbsn records
Any bases that require special hooks need to be registered here. This
applies, for example, for bases that can appear multiple times
in a single record (hashtags, emoji, terms etc.).
"""
records = []
base_structure = BASE_REGISTER.get((facet, base))
if base_structure is None:
return
# for topical bases (e.g. hashtag, emoji, term)
# multiple bases can be created
# from a single lbsn record
if base == "hashtag":
# only explicit hashtags
tag_terms = HF.filter_terms(record.hashtags)
for tag in tag_terms:
records.append(base_structure(tag))
elif base == "emoji":
# do nothing
all_post_emoji = HF.extract_emoji(record.post_body)
for emoji in all_post_emoji:
# create base for each term
records.append(base_structure(emoji))
elif base == "term":
# any term mentioned in title,
# body or hashtag
all_post_terms = HF.get_all_post_terms(record)
for term in all_post_terms:
# create base for each term
records.append(base_structure(term))
elif base == "topic":
raise NotImplementedError("Parsing of Topics base is currently not implemented")
elif base == "domain":
raise NotImplementedError(
"Parsing of Domains base is currently not implemented"
)
elif base == "_term_latlng":
# any term mentioned in title,
# body or hashtag
all_post_terms = HF.get_all_post_terms(record)
for term in all_post_terms:
# create base for each term
base_record = base_structure(record=record, term=term)
append_baserecord(records, base_record)
elif base == "_hashtag_latlng":
# any hashtag explicitly used
tag_terms = HF.filter_terms(record.hashtags)
for tag in tag_terms:
base_record = base_structure(record=record, hashtag=tag)
append_baserecord(records, base_record)
elif base == "_emoji_latlng":
# any term mentioned in title,
# body or hashtag
all_post_emoji = HF.extract_emoji(record.post_body)
for emoji in all_post_emoji:
# create base for each emoji
base_record = base_structure(record=record, emoji=emoji)
append_baserecord(records, base_record)
elif base == "_month_hashtag":
# any hashtag explicitly used
tag_terms = HF.filter_terms(record.hashtags)
for tag in tag_terms:
base_record = base_structure(record=record, hashtag=tag)
append_baserecord(records, base_record)
elif base == "_month_hashtag_latlng":
# any hashtag explicitly used
tag_terms = HF.filter_terms(record.hashtags)
for tag in tag_terms:
base_record = base_structure(record=record, hashtag=tag)
append_baserecord(records, base_record)
else:
# init for all other bases with single lbsn record
base_record = base_structure(record)
append_baserecord(records, base_record)
return records
class HllBase:
"""Shared attributes for all hll bases"""
def __init__(self):
# the key, used as primary (unique) key
# in the relational db, e.g. 2019-01-01
# keys can consist of multiple parts, e.g.
# (lat, lng) which then form composite
# primary keys in the db
self.key = OrderedDict()
# any additional attributes that
# are stored additionally to the
# key, e.g. for date-key
# name: "New Year's Day"
self.attrs = OrderedDict()
# the hll-metrics, in constistent
# order matching the SQL upsert order
self.metrics = OrderedDict([("user_hll", set()), ("post_hll", set())])
def __ior__(self, other):
"""Implements bitwise or using the | operator."""
if other is None:
return self
merge_base_metrics(self, other)
return self
def __repr__(self):
"""Implement custom format str for debug"""
return HF.format_base_repr(self)
def get_key_value(self):
"""Returns key value for base"""
return tuple(self.key.values())
def get_key(self):
"""Returns key name for base"""
return list(self.key.keys())
def get_attr_keys(self):
"""Returns attr keys for base"""
return list(self.attrs.keys())
def get_metric_keys(self):
"""Returns metric keys for base"""
return list(self.metrics.keys())
def get_sql_header(self) -> List[str]:
"""Get joined header for hll upsert sql
Concat column names for key, attrs and metrics, e.g.:
latitude, longitude, latlng_geom, user_hll, post_hll, date_hll, utl_hll
"""
base_key_cols = [self.key.keys()]
base_attr_cols = [self.attrs.keys()]
base_metrics_cols = [self.metrics.keys()]
return base_key_cols + base_attr_cols + base_metrics_cols
def get_prepared_record(self):
"""Return prepared sql values tuple
Consisting of
* key and attributes tuple (base_record) = the record
* metric dicts with values = the metrics
"""
base_record = tuple(self.key.values())
for attr in self.attrs.values():
base_record += (attr,)
return BaseRecordValue(base_record, self.metrics)
Functions
def append_baserecord(base_records: List[ForwardRef('HllBase')], base_record: HllBase)
-
Append base_record to list, if all keys have valid values (not None)
Expand source code
def append_baserecord(base_records: List["HllBase"], base_record: "HllBase"): """Append base_record to list, if all keys have valid values (not None)""" if not base_record: return if None in base_record.key.values(): return base_records.append(base_record)
def base_factory(facet=None, base=None, record: lbsnstructure.topical_pb2.Post = None)
-
Base is initialized based on facet-base tuple and constructed by parsing lbsn records
Any bases that require special hooks need to be registered here. This applies, for example, for bases that can appear multiple times in a single record (hashtags, emoji, terms etc.).
Expand source code
def base_factory(facet=None, base=None, record: lbsn.Post = None): """Base is initialized based on facet-base tuple and constructed by parsing lbsn records Any bases that require special hooks need to be registered here. This applies, for example, for bases that can appear multiple times in a single record (hashtags, emoji, terms etc.). """ records = [] base_structure = BASE_REGISTER.get((facet, base)) if base_structure is None: return # for topical bases (e.g. hashtag, emoji, term) # multiple bases can be created # from a single lbsn record if base == "hashtag": # only explicit hashtags tag_terms = HF.filter_terms(record.hashtags) for tag in tag_terms: records.append(base_structure(tag)) elif base == "emoji": # do nothing all_post_emoji = HF.extract_emoji(record.post_body) for emoji in all_post_emoji: # create base for each term records.append(base_structure(emoji)) elif base == "term": # any term mentioned in title, # body or hashtag all_post_terms = HF.get_all_post_terms(record) for term in all_post_terms: # create base for each term records.append(base_structure(term)) elif base == "topic": raise NotImplementedError("Parsing of Topics base is currently not implemented") elif base == "domain": raise NotImplementedError( "Parsing of Domains base is currently not implemented" ) elif base == "_term_latlng": # any term mentioned in title, # body or hashtag all_post_terms = HF.get_all_post_terms(record) for term in all_post_terms: # create base for each term base_record = base_structure(record=record, term=term) append_baserecord(records, base_record) elif base == "_hashtag_latlng": # any hashtag explicitly used tag_terms = HF.filter_terms(record.hashtags) for tag in tag_terms: base_record = base_structure(record=record, hashtag=tag) append_baserecord(records, base_record) elif base == "_emoji_latlng": # any term mentioned in title, # body or hashtag all_post_emoji = HF.extract_emoji(record.post_body) for emoji in all_post_emoji: # create base for each emoji base_record = base_structure(record=record, emoji=emoji) append_baserecord(records, base_record) elif base == "_month_hashtag": # any hashtag explicitly used tag_terms = HF.filter_terms(record.hashtags) for tag in tag_terms: base_record = base_structure(record=record, hashtag=tag) append_baserecord(records, base_record) elif base == "_month_hashtag_latlng": # any hashtag explicitly used tag_terms = HF.filter_terms(record.hashtags) for tag in tag_terms: base_record = base_structure(record=record, hashtag=tag) append_baserecord(records, base_record) else: # init for all other bases with single lbsn record base_record = base_structure(record) append_baserecord(records, base_record) return records
def merge_base_metrics(base1, base2)
-
Merge two base-metrics by union of its set values
Expand source code
def merge_base_metrics(base1, base2): """Merge two base-metrics by union of its set values""" # merge metric dicts for key in base1.metrics.keys(): new_set = base2.metrics.get(key) if new_set is None: continue base1.metrics[key] |= new_set
def register_classes()
-
Function to dynamically register base classes for each facet
Expand source code
def register_classes(): """Function to dynamically register base classes for each facet""" for facet in ["social", "spatial", "topical", "temporal"]: for __, obj in inspect.getmembers( sys.modules[f"lbsntransform.output.hll.base.{facet}"] ): if inspect.isclass(obj): if hasattr(obj, "NAME"): BASE_REGISTER[obj.NAME] = obj BASE_KEY[obj.NAME] = obj().get_key() BASE_ATTRS[obj.NAME] = obj().get_attr_keys() BASE_METRICS[obj.NAME] = obj().get_metric_keys()
Classes
class BaseRecordValue (record, metrics)
-
BaseRecordValue(record, metrics)
Ancestors
- builtins.tuple
Instance variables
var metrics
-
Alias for field number 1
var record
-
Alias for field number 0
class HllBase
-
Shared attributes for all hll bases
Expand source code
class HllBase: """Shared attributes for all hll bases""" def __init__(self): # the key, used as primary (unique) key # in the relational db, e.g. 2019-01-01 # keys can consist of multiple parts, e.g. # (lat, lng) which then form composite # primary keys in the db self.key = OrderedDict() # any additional attributes that # are stored additionally to the # key, e.g. for date-key # name: "New Year's Day" self.attrs = OrderedDict() # the hll-metrics, in constistent # order matching the SQL upsert order self.metrics = OrderedDict([("user_hll", set()), ("post_hll", set())]) def __ior__(self, other): """Implements bitwise or using the | operator.""" if other is None: return self merge_base_metrics(self, other) return self def __repr__(self): """Implement custom format str for debug""" return HF.format_base_repr(self) def get_key_value(self): """Returns key value for base""" return tuple(self.key.values()) def get_key(self): """Returns key name for base""" return list(self.key.keys()) def get_attr_keys(self): """Returns attr keys for base""" return list(self.attrs.keys()) def get_metric_keys(self): """Returns metric keys for base""" return list(self.metrics.keys()) def get_sql_header(self) -> List[str]: """Get joined header for hll upsert sql Concat column names for key, attrs and metrics, e.g.: latitude, longitude, latlng_geom, user_hll, post_hll, date_hll, utl_hll """ base_key_cols = [self.key.keys()] base_attr_cols = [self.attrs.keys()] base_metrics_cols = [self.metrics.keys()] return base_key_cols + base_attr_cols + base_metrics_cols def get_prepared_record(self): """Return prepared sql values tuple Consisting of * key and attributes tuple (base_record) = the record * metric dicts with values = the metrics """ base_record = tuple(self.key.values()) for attr in self.attrs.values(): base_record += (attr,) return BaseRecordValue(base_record, self.metrics)
Subclasses
- SocialBase
- LatLngBase
- PlaceBase
- SpatialBase
- DateBase
- DayofmonthBase
- DayofweekBase
- DayofyearBase
- HourofdayBase
- MonthBase
- MonthHashtagBase
- MonthHashtagLatLngBase
- MonthLatLngBase
- MonthofyearBase
- TimeofdayBase
- TimestampBase
- YearBase
- EmojiLatLngBase
- HashtagLatLngBase
- TermLatLngBase
- TopicalBase
Methods
def get_attr_keys(self)
-
Returns attr keys for base
Expand source code
def get_attr_keys(self): """Returns attr keys for base""" return list(self.attrs.keys())
def get_key(self)
-
Returns key name for base
Expand source code
def get_key(self): """Returns key name for base""" return list(self.key.keys())
def get_key_value(self)
-
Returns key value for base
Expand source code
def get_key_value(self): """Returns key value for base""" return tuple(self.key.values())
def get_metric_keys(self)
-
Returns metric keys for base
Expand source code
def get_metric_keys(self): """Returns metric keys for base""" return list(self.metrics.keys())
def get_prepared_record(self)
-
Return prepared sql values tuple
Consisting of * key and attributes tuple (base_record) = the record * metric dicts with values = the metrics
Expand source code
def get_prepared_record(self): """Return prepared sql values tuple Consisting of * key and attributes tuple (base_record) = the record * metric dicts with values = the metrics """ base_record = tuple(self.key.values()) for attr in self.attrs.values(): base_record += (attr,) return BaseRecordValue(base_record, self.metrics)
def get_sql_header(self) ‑> List[str]
-
Get joined header for hll upsert sql Concat column names for key, attrs and metrics, e.g.: latitude, longitude, latlng_geom, user_hll, post_hll, date_hll, utl_hll
Expand source code
def get_sql_header(self) -> List[str]: """Get joined header for hll upsert sql Concat column names for key, attrs and metrics, e.g.: latitude, longitude, latlng_geom, user_hll, post_hll, date_hll, utl_hll """ base_key_cols = [self.key.keys()] base_attr_cols = [self.attrs.keys()] base_metrics_cols = [self.metrics.keys()] return base_key_cols + base_attr_cols + base_metrics_cols
class HllBaseRef (facet, base)
-
HllBaseRefTuple(facet, base)
Ancestors
- builtins.tuple
Instance variables
var base
-
Alias for field number 1
var facet
-
Alias for field number 0
class HllMetrics (user_hll=None, post_hll=None, pud_hll=None, latlng_hll=None, upl_hll=None, utl_hll=None, upt_hll=None, term_hll=None, place_hll=None)
-
HllMetricsTuple(user_hll, post_hll, pud_hll, latlng_hll, upl_hll, utl_hll, upt_hll, term_hll, place_hll)
Ancestors
- builtins.tuple
Instance variables
var latlng_hll
-
Alias for field number 3
var place_hll
-
Alias for field number 8
var post_hll
-
Alias for field number 1
var pud_hll
-
Alias for field number 2
var term_hll
-
Alias for field number 7
var upl_hll
-
Alias for field number 4
var upt_hll
-
Alias for field number 6
var user_hll
-
Alias for field number 0
var utl_hll
-
Alias for field number 5