Protocol Documentation
social.proto
Social Facet of the Common LBSN Data Structure
Overview of LBSN Structures that are organized under the Social Facet:
Object | Description |
---|---|
Origin | A Location Based Social Network consisting of a large group of people |
CompositeKey | A Composite Key used to reference unique objects across different LBSN |
User | A single user (e.g. a profile or an account) on a location based social network (LBSN) |
UserGroup | A single group of users on a LBSN |
Language | A common language used on LBSN, relating to a larger group of people sharing the same language |
Note that these assignments are not clear-cut - e.g. they're aspects of a User that may as well belong to one of the other facets.
CompositeKey
Except for language, a Composite Keys are used for all objects in the structure, which allow creating Composite References consisting of an Origin (a reference to a Location based Social networkLBSN) and the original or derived (hashed) (gu)id for each object on the respective LBSN.
Field | Type | Description |
---|---|---|
origin | Origin | e.g. 1= Instagram, 2= Flickr, 3=Twitter |
id | string | the services original unique (gu)id for this object |
Language
A language identifier on LBSN.
Field | Type | Description |
---|---|---|
language_short | string | A BCP 47 language identifier corresponding to the language of a Post or User (e.g.). Languages are organized under the Social Facet because they're usually references to cultures, which connect many people. |
name | string | Name of the language (English) |
language_name_de | string | Name of the language (German) |
Origin
An Origin is a reference to a unique Location Based Social Network (LBSN). We've added some of the most popular social media networks to the list, which can be extended further.
An Origin is the base unit of the LBSN structure and it is organized under the Social Facet because Social Media Networks are formed by large groups of people (the social part) around some common interest (e.g. Flickr or Instagram for photography, Twitter for opinion formation and exchange of political perspectives).
While OriginIDs are entirely open, we added a list of predefined common Networks.
Field | Type | Description |
---|---|---|
origin_id | Origin.OriginID | A unique Origin ID as a reference for the LBSN |
name | string | The name of Origin, e.g. the service's name |
User
A user (e.g. a profile or an account) on a location based social network (LBSN)
Note that it is often challenging to determine whether a social media profiles represent fictitious or real persons, bots or even ‘cyborgs’ (You et al. 2012). Therefore, a user may also be considered as an ‘avatar’ representing an organization or a group of individuals.
See also the Wikipedia entry.
Field | Type | Description |
---|---|---|
pkey | CompositeKey | Primary Key. A unique identifier of the object. |
user_name | string | Name of the User. Can be an alias, email or real name etc. |
user_fullname | string | Full name of the User. Can be an alias, email or real name etc. |
follows | int64 | Number of other users this user follows. |
followed | int64 | Number of times this user is followed by others. |
group_count | int64 | The number of public groups or communities this user is a member of |
biography | string | A short user biography or description. |
post_count | int64 | Number of posts this user has created. |
url | string | Full URL to public user profile. |
is_private | bool | Whether the user has chosen to remain private (e.g. profile not publicly visible). |
is_available | bool | A user that is not available can mean several things. When the user's account was deactivated or when users explicitly chose to delete their account, but keep public data, this field would be False. |
user_language | Language | A BCP 47 language identifier corresponding to the machine-detected or user selected language. |
user_location | string | The user-defined location for this profile. Not necessarily a location, nor machine-parseable (e.g. a user can choose 'world' as his/her location, or any other string) |
user_location_geom | string | Coordinates (Point: lat/lng) of the user-location, either provided by user or geocoded from the user's location. |
liked_count | int64 | The number of Posts this user has liked in total. |
active_since | google.protobuf.Timestamp | UTC datetime when the user was first active (e.g. time of account creation, or derived from first post_publish_date). |
profile_image_url | string | URL pointing to the public profile image of the user. |
user_timezone | string | Time zone ID that can be specified by the user. |
user_utc_offset | sint32 | Optional difference in hours from Coordinated Universal Time (UTC) for a particular user defined place. |
user_groups_member | string | The list of groups this user has joined/ is a member of (active participation interest). |
user_groups_follows | string | The list of groups this user follows (viewing interest). |
UserGroup
A user group on a location based social network (LBSN). Central to user groups is a common interest.
User groups are organized differently on different LBSN, sometimes they're centrally organized by the organization, or by a single user, at other times, they're self-organized, sometimes with specific limitations to join etc. For example, on Facebook user griups are self-organized on pages, on Twitter, 'Lists' may be used by single users to produce a curated list of Twitter accounts.
Field | Type | Description |
---|---|---|
pkey | CompositeKey | Primary Key. A unique identifier of the object. |
usergroup_name | string | Name of the UserGroup |
usergroup_description | string | Description of the UserGroup |
member_count | int64 | Total member count of this UserGroup |
usergroup_createdate | google.protobuf.Timestamp | Time of creation for this UserGroup |
user_owner_pkey | CompositeKey | A Reference to the owner of this UserGroup. |
Origin.OriginID
Predefined values for OriginID. Default origin id is LBSN (0)
Name | Number | Description |
---|---|---|
LBSN | 0 | default |
1 | ||
FLICKR | 2 | |
3 | ||
4 | ||
FOURSQUARE | 5 | |
WIKIDATA | 6 | |
WIKIPEDIA | 7 | |
8 | ||
GEOGRAPH | 9 | |
GOOGLEPLACEPHOTO | 10 | |
11 | ||
MAPILLARY | 12 | |
SNAPCHAT | 13 | |
POKEMONGO | 14 | |
WIKIMEDIACOMMONS | 15 | |
WIKIMAPIA | 16 | |
AIRBNB | 17 | |
PORTALNINANTIC | 18 | |
TIKTOK | 19 | |
TELEGRAM | 20 | |
GAB | 21 | |
IBIRD | 22 | |
INATURALIST | 23 | |
ISPOTNATURE | 24 |
topical.proto
Topical (or thematic) Facet of the Common LBSN Data Structure
Overview of LBSN Objects that are organized under the Topical Facet:
Object | Description |
---|---|
Post | An single post on a location based social network (LBSN) providing original (new) content |
PostReaction | A reaction on a location based social network (LBSN) such as like, quote, share etc. |
Post
An original post on a location based social network (LBSN)
Note that:
- all LBSM posts are reactions,
- all reactions have a referent event
- referent events may consist of complex motivational patterns and are therefore often difficult to identify
See also the Wikipedia entry.
Field | Type | Description |
---|---|---|
pkey | CompositeKey | Primary Key. A unique identifier of the object across networks. |
post_latlng | string | Location of the post, either chosen by the user, automatically attached to the post by input device (GPS) or complemented by algorithms of the service (e.g. Twitter post geocoding, as derived from the post_body text. If lat/lng coordinates are not submitted, this field can be substituded with location information from place, city or country table. In those cases, post_geoaccuracy indicates lower level geoaccuracy, e.g.: 'place', 'city', or 'country'. Formatting: WKT (Well-Known-Text String) |
place_pkey | CompositeKey | Reference to a place this post is associated with. |
city_pkey | CompositeKey | Reference to a city this post is associated with. |
country_pkey | CompositeKey | Reference to a country this post is associated with. |
user_pkey | CompositeKey | Reference to the user who created the post. |
post_publish_date | google.protobuf.Timestamp | The time when the post content was shared online, e.g. on Flickr, the publish_date refers to the time of photo sharing (upload-time) |
post_body | string | The textual content of the post, e.g. the description of the photo on Flickr, the tweet text on Twitter etc. |
post_geoaccuracy | Post.PostGeoaccuracy | This field specifies the highest location accuracy available for this post, either 'latlng', 'place', 'city' or 'country'. |
user_mentions_pkey | CompositeKey | A list of referenced user_guids that are mentioned in the post_body, post_title or other parts of a post. In postgres mapping, these are not direct references that are checked, but mere lists of strings (array), since Foreign Key Arrays are not supported. |
hashtags | string | List of hashtags explicitly assigned to the post, either inside post_body (e.g. with hash-character (#), or in a separate field such as "tags" on Flickr). Note that Flickr users may still use the hash symbol (#). Therefore, hashtag and tag are synonyms for users explicitly highlighting single terms inside the larger context of the post. |
emoji | string | List of Emoji Symbols, either extracted from post_body or provided in a separate field. Duplicates allowed. For possible symbols, see: unicode.org/emoji/charts/full-emoji-list.html |
post_like_count | int64 | Number of times this Post has been liked by other users. |
post_comment_count | int64 | Number of times this Post has been commented by other users, e.g. count of Reply-Tweets on Twitter, count of photo comments on Flickr etc. |
post_views_count | int64 | Number of times this Post has been viewed by other users. |
post_title | string | The title of the post. This is sometimes available in a separate field. E.g. on Flickr, a photo can have both a title and a description. On Instgram, however, only the post_body is available. |
post_create_date | google.protobuf.Timestamp | The time when the post content was originally created. Most often, this matched the publish_date (e.g. on Twitter or Instagram). On Flickr, the create_date refers to the photo's timestamp, and the publish_date refers to the time of photo sharing (upload-time) |
post_thumbnail_url | string | Url to the public thumbnail of this post. usually this will only be available for posts of type IMAGE. |
post_url | string | Url to the original post. |
post_type | Post.PostType | Type of post, e.g. text, image, video or other. If possible, choose the more specific type (e.g. VIDEO over TEXT even if text is present in a video-post). |
post_filter | string | Any filters/labels applied to post? (e.g. Instagram photo filters such as Amarao; Automatic translations of text; or the "flair" of Reddit posts). |
post_quote_count | int64 | Number of times this Post has been quoted by other users, e.g. count of Quote-Tweets on Twitter. |
post_share_count | int64 | Number of times this Post has been shared by other users, e.g. count of Retweets on Twitter. |
input_source | string | Type of input device used by the user to post, for a list see Twitter, e.g. 'Web', 'IPhone', 'Android' etc. Recommendation: should be oriented at Twitter's large list of source types. For camera models, have a look at Flickr. |
post_language | Language | Language of the post (A BCP 47 language identifier corresponding to the (machine-detected) language of the Post body-text, empty if no language could be detected, NULL if not specified. |
post_content_license | int32 | An integer for specifying the of the post which can be optionally chosen by users on some services (e.g. Flickr). For example: All Rights Reserved = 0. Numbers can be oriented at Flickr's list of content licenses: |
topic_group | string | Whether the post is assigned to any explicit topic groups. This could be a Reddit submission that belongs to a Subreddit (= the topic group); or a Flickr image posted to a number of photo groups. A Post can belong to multiple topics (e.g. a "cross-post" on Reddit). |
post_downvotes | int64 | Number of times this Post has been downvoted by other users (HackerNews, Reddit) |
PostReaction
A reaction on a location based social network (LBSN).
Note that posts are also reactions. However, a post reaction is a post with a reduced structure suitable for simple expressions such as likes that don't have all post attributes.
The difference between an original post and a post reaction is not clear cut. In general, original posts provide original (new) content that is compiled by the posting user/author. Post reactions merely add information, e.g. by quoting an original post, or provide an expression or stance towards a post (or another reaction), e.g. a like or 'star'.
Post reactions are suitable for mapping the spread of information, because it contains two attributes for referencing the original post that motivated the reaction (referencedPost) or a reference to another reaction that was reacted upon (referencedPostreaction)
Example reaction_types:
- share
- comment/reply
- quote
- like/star/highlight
- emoji etc.
Field | Type | Description |
---|---|---|
pkey | CompositeKey | Primary Key. A unique identifier of the object. |
user_pkey | CompositeKey | Reference to the user who reacted. |
referencedPost_pkey | CompositeKey | A reference to the original post to which this reaction refers to (e.g. for a reaction of type REPLY, reference of the original post_pkey) |
referencedPostreaction_pkey | CompositeKey | A reference to another reaction (e.g. if this reaction is a "like" of another reaction, reference original postreaction_pkey here) |
reaction_latlng | string | Location of the reaction (point), either chosen by the user, automatically attached to the reaction by input device (GPS) or complemented by algorithms of the service. |
reaction_type | PostReaction.ReactionType | Type of reaction. Choose the more specific type if multiple apply. Merge similar types: Retweet → Share; Reply → Comment; Star → Like |
reaction_date | google.protobuf.Timestamp | Time and Date of the reaction. |
reaction_content | string | Content of the reaction (e.g. the text). |
reaction_like_count | int64 | Number of times this reaction has been liked by others. |
user_mentions_pkey | CompositeKey | A list of referenced user_guids that are mentioned in the reaction. |
Post.PostGeoaccuracy
Spatial information can have different levels of granularity and users can often choose which locational accuracy they want use.
Name | Number | Description |
---|---|---|
UNKNOWN | 0 | |
LATLNG | 1 | A single coordinate |
PLACE | 2 | A place reference |
CITY | 3 | A city reference |
COUNTRY | 4 |
Post.PostType
Type of post
Name | Number | Description |
---|---|---|
TEXT | 0 | Default post type is text (e.g. a tweet on Twitter) |
IMAGE | 1 | Post of type "image" (e.g. a photo on Flickr) |
VIDEO | 2 | Post of type "video" (e.g. a video on Youtube) |
LINK | 3 | Post of type "link" (e.g. a link share on Reddit) |
OTHER | 4 | Post of specific type not yet added to the specification |
PostReaction.ReactionType
Possible type of reactions.
Name | Number | Description |
---|---|---|
UNKNOWN | 0 | |
SHARE | 1 | A sharing reaction usually does not add much content |
COMMENT | 2 | A comment reaction adds additional content |
QUOTE | 3 | A quote reaction adds some additional content |
LIKE | 4 | A like reaction is a basic form of appreciation |
EMOJI | 5 | An emoji usually encodes different expressions of feelings |
OTHER | 6 |
spatial.proto
Spatial Facet of the Common LBSN Data Structure
Overview of LBSN Structures that are organized under the Spatial Facet:
Object | Description |
---|---|
Place | A particular (named) place on a location based social network (LBSN). |
City | A city on a location based social network (LBSN). |
Country | A country on a location based social network (LBSN). |
City
A city on a location based social network (LBSN).
Cities are hierarchical above places, sometimes with specific public page where a City's official representation is presented, sometimes they're automatically added to structure place information into common groups.
Field | Type | Description |
---|---|---|
pkey | CompositeKey | Primary Key. A unique identifier of the object. |
name | string | Name of the city in English. |
name_alternatives | string | Alternative names (e.g. in other languages; synonyms). |
sub_type | string | Optionally add a subtype of City (e.g. "Neighborhood", "Admin", etc.) |
url | string | Url to the public web address of the city |
geom_center | string | WKT Point (centroid of geom_area) |
geom_area | string | WKT Polygon (boundary of the city) |
country_pkey | CompositeKey | Reference to the country this city belongs to. |
Country
A country on a location based social network (LBSN).
Cities are hierarchical above places, sometimes with specific public page where a Country's official representation is presented, sometimes they're automatically added to structure city and place information into common groups.
Field | Type | Description |
---|---|---|
pkey | CompositeKey | Primary Key. A unique identifier of the object. |
name | string | Name of the country in English. |
name_alternatives | string | Alternative names (e.g. in other languages; synonyms). |
url | string | Url to the public web address of the country |
geom_center | string | WKT Point (centroid of geom_area) |
geom_area | string | WKT Polygon (boundary of the country) |
Place
A place on a location based social network (LBSN).
Places are spatial named references of interests such as POIs, often added by users themselves and around which discussions may evolve.
Field | Type | Description |
---|---|---|
pkey | CompositeKey | Primary Key. A unique identifier of the object. |
name | string | Name of the place in English |
name_alternatives | string | Alternative names (e.g. in other languages; synonyms). |
post_count | int64 | Number of total posts for this place. |
url | string | URL to the public place-page representation on the respective LBSN |
geom_center | string | WKT Point (centroid of geom_area) |
geom_area | string | WKT Polygon (boundary of the place) |
city_pkey | CompositeKey | Reference to the city this place belongs to. |
place_description | string | Public description of the place. |
place_website | string | A link provided by users for this place (e.g. webpage for restaurant, park-management etc.) |
place_phone | string | Phone number publicly provided for some places on LBSN. |
address | string | Address publicly provided for some places on LBSN. |
zip_code | string | Zip_code publicly provided for some places on LBSN. |
checkin_count | int64 | Total number of user checkins for this place (e.g. checkin functionality on Foursquare or Facebook) |
like_count | int64 | Total number of times this places has been liked. |
parent_places | string | Places can be hierarchically structured, list any up-hierarchy places parent to this one as guids here |
attributes | Place.AttributesEntry | Any additional place attributes (key-value pair). Example: category → park; owner → "Katherine Dunn". |
Place.AttributesEntry
Field | Type | Description |
---|---|---|
key | string | |
value | string |
temporal.proto
Temporal Facet of the Common LBSN Data Structure
Overview of LBSN Structures that are organized under the Temporal Facet:
Object | Description |
---|---|
Event | An (named) event with a representation on LBSN. |
Event
An event with a representation on LBSN.
Events are temporal reference points with a start and end date. Start and end date may coincide.
Field | Type | Description |
---|---|---|
pkey | CompositeKey | Primary Key. A unique identifier of the object. |
name | string | Name of the event |
event_date | google.protobuf.Timestamp | Date and time of the event |
event_date_start | google.protobuf.Timestamp | Start date of the event |
event_date_end | google.protobuf.Timestamp | End date of the event |
duration | google.protobuf.Duration | Duration of the event in seconds |
event_latlng | string | Location of the event (WKT Point) |
event_area | string | Location of the event (WKT Polygon) |
place_pkey | CompositeKey | Place reference |
city_pkey | CompositeKey | City reference |
country_pkey | CompositeKey | Country reference |
user_pkey | CompositeKey | User reference (e.g. the owner of the event) |
event_description | string | A description of the event. |
event_website | string | Url to the public website of the event. |
event_type | string | Any string to describe the type of event |
event_share_count | int64 | Number of times this Event has been shared by other users. |
event_like_count | int64 | Number of times this Event has been liked/highlighted by other users. |
event_comment_count | int64 | Number of times this Event has been commented on. |
event_views_count | int64 | Number of times this Event has been viewed. |
event_engage_count | int64 | Number of users who participate in this Event. |
interlinkage.proto
Describes additional relationships of the LBSN Data Structure
Interlinkage and the spread of information
Relationship
LBSN Relationships map one-to-many and many-to-many relationships.
Field | Type | Description |
---|---|---|
pkey | RelationshipKey | Primary Key. A unique identifier of the object. |
relationship_type | Relationship.RelationshipType | Type of the relationship |
RelationshipKey
Many-to-many relationships that could otherwise not be implemented in the relational lbsn structure. Relationships can also link entities between two different origin_id's (e.g. different services).
Field | Type | Description |
---|---|---|
relation_to | CompositeKey | Relation from reference |
relation_from | CompositeKey | Relation to reference |
Relationship.RelationshipType
Available types of LBSN relationship
Name | Number | Description |
---|---|---|
UNKNOWN | 0 | |
isFRIEND | 1 | A friend of a user (i.e. this user y is the friend of user x). Being a friend is a mutual relationship. |
isCONNECTED | 2 | A user that is connected to someone, e.g. a follower of user x (i.e. this user y is the follower of user x). Being connected to someone (e.g. being a follower) is not a mutual relationship. |
isEQUAL | 3 | A user that has multiple representations on the same service or is linked across services. |
inGROUP | 4 | A user x that is a member of the group y. |
followsGROUP | 5 | A user x that follows the group y. |
inCOMMUNITY | 6 | A user x that is a member of the community y. |
MENTIONS_USER | 7 | A user x that mentions user y. |
hasHASHTAG | 8 | A post x that is tagged with term y. |
hasEMOJI | 9 | A post x contains emoji y. |
OTHER | 10 | Any other relation type. |