Skip to content

Protocol Documentation

version


Top

social.proto

Social Facet of the Common LBSN Data Structure

Who?

Overview of LBSN Structures that are organized under the Social Facet:

Object Description
Origin A Location Based Social Network consisting of a large group of people
CompositeKey A Composite Key used to reference unique objects across different LBSN
User A single user (e.g. a profile or an account) on a location based social network (LBSN)
UserGroup A single group of users on a LBSN
Language A common language used on LBSN, relating to a larger group of people sharing the same language

Note that these assignments are not clear-cut - e.g. they're aspects of a User that may as well belong to one of the other facets.

CompositeKey

Except for language, a Composite Keys are used for all objects in the structure, which allow creating Composite References consisting of an Origin (a reference to a Location based Social networkLBSN) and the original or derived (hashed) (gu)id for each object on the respective LBSN.

Field Type Description
origin Origin e.g. 1= Instagram, 2= Flickr, 3=Twitter
id string the services original unique (gu)id for this object

Language

A language identifier on LBSN.

Field Type Description
language_short string A BCP 47 language identifier corresponding to the language of a Post or User (e.g.). Languages are organized under the Social Facet because they're usually references to cultures, which connect many people.
name string Name of the language (English)
language_name_de string Name of the language (German)

Origin

An Origin is a reference to a unique Location Based Social Network (LBSN). We've added some of the most popular social media networks to the list, which can be extended further.

An Origin is the base unit of the LBSN structure and it is organized under the Social Facet because Social Media Networks are formed by large groups of people (the social part) around some common interest (e.g. Flickr or Instagram for photography, Twitter for opinion formation and exchange of political perspectives).

While OriginIDs are entirely open, we added a list of predefined common Networks.

Field Type Description
origin_id Origin.OriginID A unique Origin ID as a reference for the LBSN
name string The name of Origin, e.g. the service's name

User

A user (e.g. a profile or an account) on a location based social network (LBSN)

Note that it is often challenging to determine whether a social media profiles represent fictitious or real persons, bots or even ‘cyborgs’ (You et al. 2012). Therefore, a user may also be considered as an ‘avatar’ representing an organization or a group of individuals.

See also the Wikipedia entry.

Field Type Description
pkey CompositeKey Primary Key. A unique identifier of the object.
user_name string Name of the User. Can be an alias, email or real name etc.
user_fullname string Full name of the User. Can be an alias, email or real name etc.
follows int64 Number of other users this user follows.
followed int64 Number of times this user is followed by others.
group_count int64 The number of public groups or communities this user is a member of
biography string A short user biography or description.
post_count int64 Number of posts this user has created.
url string Full URL to public user profile.
is_private bool Whether the user has chosen to remain private (e.g. profile not publicly visible).
is_available bool A user that is not available can mean several things. When the user's account was deactivated or when users explicitly chose to delete their account, but keep public data, this field would be False.
user_language Language A BCP 47 language identifier corresponding to the machine-detected or user selected language.
user_location string The user-defined location for this profile. Not necessarily a location, nor machine-parseable (e.g. a user can choose 'world' as his/her location, or any other string)
user_location_geom string Coordinates (Point: lat/lng) of the user-location, either provided by user or geocoded from the user's location.
liked_count int64 The number of Posts this user has liked in total.
active_since google.protobuf.Timestamp UTC datetime when the user was first active (e.g. time of account creation, or derived from first post_publish_date).
profile_image_url string URL pointing to the public profile image of the user.
user_timezone string Time zone ID that can be specified by the user.
user_utc_offset sint32 Optional difference in hours from Coordinated Universal Time (UTC) for a particular user defined place.
user_groups_member string The list of groups this user has joined/ is a member of (active participation interest).
user_groups_follows string The list of groups this user follows (viewing interest).

UserGroup

A user group on a location based social network (LBSN). Central to user groups is a common interest.

User groups are organized differently on different LBSN, sometimes they're centrally organized by the organization, or by a single user, at other times, they're self-organized, sometimes with specific limitations to join etc. For example, on Facebook user griups are self-organized on pages, on Twitter, 'Lists' may be used by single users to produce a curated list of Twitter accounts.

Field Type Description
pkey CompositeKey Primary Key. A unique identifier of the object.
usergroup_name string Name of the UserGroup
usergroup_description string Description of the UserGroup
member_count int64 Total member count of this UserGroup
usergroup_createdate google.protobuf.Timestamp Time of creation for this UserGroup
user_owner_pkey CompositeKey A Reference to the owner of this UserGroup.

Origin.OriginID

Predefined values for OriginID. Default origin id is LBSN (0)

Name Number Description
LBSN 0 default
INSTAGRAM 1
FLICKR 2
TWITTER 3
FACEBOOK 4
FOURSQUARE 5
WIKIDATA 6
WIKIPEDIA 7
REDDIT 8
GEOGRAPH 9
GOOGLEPLACEPHOTO 10
PINTEREST 11
MAPILLARY 12
SNAPCHAT 13
POKEMONGO 14
WIKIMEDIACOMMONS 15
WIKIMAPIA 16
AIRBNB 17
PORTALNINANTIC 18
TIKTOK 19
TELEGRAM 20
GAB 21
IBIRD 22
INATURALIST 23
ISPOTNATURE 24

Top

topical.proto

Topical (or thematic) Facet of the Common LBSN Data Structure

What?

Overview of LBSN Objects that are organized under the Topical Facet:

Object Description
Post An single post on a location based social network (LBSN) providing original (new) content
PostReaction A reaction on a location based social network (LBSN) such as like, quote, share etc.

Post

An original post on a location based social network (LBSN)

Note that:

  • all LBSM posts are reactions,
  • all reactions have a referent event
  • referent events may consist of complex motivational patterns and are therefore often difficult to identify

See also the Wikipedia entry.

Field Type Description
pkey CompositeKey Primary Key. A unique identifier of the object across networks.
post_latlng string Location of the post, either chosen by the user, automatically attached to the post by input device (GPS) or complemented by algorithms of the service (e.g. Twitter post geocoding, as derived from the post_body text. If lat/lng coordinates are not submitted, this field can be substituded with location information from place, city or country table. In those cases, post_geoaccuracy indicates lower level geoaccuracy, e.g.: 'place', 'city', or 'country'. Formatting: WKT (Well-Known-Text String)
place_pkey CompositeKey Reference to a place this post is associated with.
city_pkey CompositeKey Reference to a city this post is associated with.
country_pkey CompositeKey Reference to a country this post is associated with.
user_pkey CompositeKey Reference to the user who created the post.
post_publish_date google.protobuf.Timestamp The time when the post content was shared online, e.g. on Flickr, the publish_date refers to the time of photo sharing (upload-time)
post_body string The textual content of the post, e.g. the description of the photo on Flickr, the tweet text on Twitter etc.
post_geoaccuracy Post.PostGeoaccuracy This field specifies the highest location accuracy available for this post, either 'latlng', 'place', 'city' or 'country'.
user_mentions_pkey CompositeKey A list of referenced user_guids that are mentioned in the post_body, post_title or other parts of a post. In postgres mapping, these are not direct references that are checked, but mere lists of strings (array), since Foreign Key Arrays are not supported.
hashtags string List of hashtags explicitly assigned to the post, either inside post_body (e.g. with hash-character (#), or in a separate field such as "tags" on Flickr). Note that Flickr users may still use the hash symbol (#). Therefore, hashtag and tag are synonyms for users explicitly highlighting single terms inside the larger context of the post.
emoji string List of Emoji Symbols, either extracted from post_body or provided in a separate field. Duplicates allowed. For possible symbols, see: unicode.org/emoji/charts/full-emoji-list.html
post_like_count int64 Number of times this Post has been liked by other users.
post_comment_count int64 Number of times this Post has been commented by other users, e.g. count of Reply-Tweets on Twitter, count of photo comments on Flickr etc.
post_views_count int64 Number of times this Post has been viewed by other users.
post_title string The title of the post. This is sometimes available in a separate field. E.g. on Flickr, a photo can have both a title and a description. On Instgram, however, only the post_body is available.
post_create_date google.protobuf.Timestamp The time when the post content was originally created. Most often, this matched the publish_date (e.g. on Twitter or Instagram). On Flickr, the create_date refers to the photo's timestamp, and the publish_date refers to the time of photo sharing (upload-time)
post_thumbnail_url string Url to the public thumbnail of this post. usually this will only be available for posts of type IMAGE.
post_url string Url to the original post.
post_type Post.PostType Type of post, e.g. text, image, video or other. If possible, choose the more specific type (e.g. VIDEO over TEXT even if text is present in a video-post).
post_filter string Any filters/labels applied to post? (e.g. Instagram photo filters such as Amarao; Automatic translations of text; or the "flair" of Reddit posts).
post_quote_count int64 Number of times this Post has been quoted by other users, e.g. count of Quote-Tweets on Twitter.
post_share_count int64 Number of times this Post has been shared by other users, e.g. count of Retweets on Twitter.
input_source string Type of input device used by the user to post, for a list see Twitter, e.g. 'Web', 'IPhone', 'Android' etc. Recommendation: should be oriented at Twitter's large list of source types. For camera models, have a look at Flickr.
post_language Language Language of the post (A BCP 47 language identifier corresponding to the (machine-detected) language of the Post body-text, empty if no language could be detected, NULL if not specified.
post_content_license int32 An integer for specifying the of the post which can be optionally chosen by users on some services (e.g. Flickr). For example: All Rights Reserved = 0. Numbers can be oriented at Flickr's list of content licenses:
topic_group string Whether the post is assigned to any explicit topic groups. This could be a Reddit submission that belongs to a Subreddit (= the topic group); or a Flickr image posted to a number of photo groups. A Post can belong to multiple topics (e.g. a "cross-post" on Reddit).
post_downvotes int64 Number of times this Post has been downvoted by other users (HackerNews, Reddit)

PostReaction

A reaction on a location based social network (LBSN).

Note that posts are also reactions. However, a post reaction is a post with a reduced structure suitable for simple expressions such as likes that don't have all post attributes.

The difference between an original post and a post reaction is not clear cut. In general, original posts provide original (new) content that is compiled by the posting user/author. Post reactions merely add information, e.g. by quoting an original post, or provide an expression or stance towards a post (or another reaction), e.g. a like or 'star'.

Post reactions are suitable for mapping the spread of information, because it contains two attributes for referencing the original post that motivated the reaction (referencedPost) or a reference to another reaction that was reacted upon (referencedPostreaction)

Example reaction_types:

  • share
  • comment/reply
  • quote
  • like/star/highlight
  • emoji etc.
Field Type Description
pkey CompositeKey Primary Key. A unique identifier of the object.
user_pkey CompositeKey Reference to the user who reacted.
referencedPost_pkey CompositeKey A reference to the original post to which this reaction refers to (e.g. for a reaction of type REPLY, reference of the original post_pkey)
referencedPostreaction_pkey CompositeKey A reference to another reaction (e.g. if this reaction is a "like" of another reaction, reference original postreaction_pkey here)
reaction_latlng string Location of the reaction (point), either chosen by the user, automatically attached to the reaction by input device (GPS) or complemented by algorithms of the service.
reaction_type PostReaction.ReactionType Type of reaction. Choose the more specific type if multiple apply. Merge similar types: Retweet → Share; Reply → Comment; Star → Like
reaction_date google.protobuf.Timestamp Time and Date of the reaction.
reaction_content string Content of the reaction (e.g. the text).
reaction_like_count int64 Number of times this reaction has been liked by others.
user_mentions_pkey CompositeKey A list of referenced user_guids that are mentioned in the reaction.

Post.PostGeoaccuracy

Spatial information can have different levels of granularity and users can often choose which locational accuracy they want use.

Name Number Description
UNKNOWN 0
LATLNG 1 A single coordinate
PLACE 2 A place reference
CITY 3 A city reference
COUNTRY 4

Post.PostType

Type of post

Name Number Description
TEXT 0 Default post type is text (e.g. a tweet on Twitter)
IMAGE 1 Post of type "image" (e.g. a photo on Flickr)
VIDEO 2 Post of type "video" (e.g. a video on Youtube)
LINK 3 Post of type "link" (e.g. a link share on Reddit)
OTHER 4 Post of specific type not yet added to the specification

PostReaction.ReactionType

Possible type of reactions.

Name Number Description
UNKNOWN 0
SHARE 1 A sharing reaction usually does not add much content
COMMENT 2 A comment reaction adds additional content
QUOTE 3 A quote reaction adds some additional content
LIKE 4 A like reaction is a basic form of appreciation
EMOJI 5 An emoji usually encodes different expressions of feelings
OTHER 6

Top

spatial.proto

Spatial Facet of the Common LBSN Data Structure

Where?

Overview of LBSN Structures that are organized under the Spatial Facet:

Object Description
Place A particular (named) place on a location based social network (LBSN).
City A city on a location based social network (LBSN).
Country A country on a location based social network (LBSN).

City

A city on a location based social network (LBSN).

Cities are hierarchical above places, sometimes with specific public page where a City's official representation is presented, sometimes they're automatically added to structure place information into common groups.

Field Type Description
pkey CompositeKey Primary Key. A unique identifier of the object.
name string Name of the city in English.
name_alternatives string Alternative names (e.g. in other languages; synonyms).
sub_type string Optionally add a subtype of City (e.g. "Neighborhood", "Admin", etc.)
url string Url to the public web address of the city
geom_center string WKT Point (centroid of geom_area)
geom_area string WKT Polygon (boundary of the city)
country_pkey CompositeKey Reference to the country this city belongs to.

Country

A country on a location based social network (LBSN).

Cities are hierarchical above places, sometimes with specific public page where a Country's official representation is presented, sometimes they're automatically added to structure city and place information into common groups.

Field Type Description
pkey CompositeKey Primary Key. A unique identifier of the object.
name string Name of the country in English.
name_alternatives string Alternative names (e.g. in other languages; synonyms).
url string Url to the public web address of the country
geom_center string WKT Point (centroid of geom_area)
geom_area string WKT Polygon (boundary of the country)

Place

A place on a location based social network (LBSN).

Places are spatial named references of interests such as POIs, often added by users themselves and around which discussions may evolve.

Field Type Description
pkey CompositeKey Primary Key. A unique identifier of the object.
name string Name of the place in English
name_alternatives string Alternative names (e.g. in other languages; synonyms).
post_count int64 Number of total posts for this place.
url string URL to the public place-page representation on the respective LBSN
geom_center string WKT Point (centroid of geom_area)
geom_area string WKT Polygon (boundary of the place)
city_pkey CompositeKey Reference to the city this place belongs to.
place_description string Public description of the place.
place_website string A link provided by users for this place (e.g. webpage for restaurant, park-management etc.)
place_phone string Phone number publicly provided for some places on LBSN.
address string Address publicly provided for some places on LBSN.
zip_code string Zip_code publicly provided for some places on LBSN.
checkin_count int64 Total number of user checkins for this place (e.g. checkin functionality on Foursquare or Facebook)
like_count int64 Total number of times this places has been liked.
parent_places string Places can be hierarchically structured, list any up-hierarchy places parent to this one as guids here
attributes Place.AttributesEntry Any additional place attributes (key-value pair). Example: category → park; owner → "Katherine Dunn".

Place.AttributesEntry

Field Type Description
key string
value string

Top

temporal.proto

Temporal Facet of the Common LBSN Data Structure

When?

Overview of LBSN Structures that are organized under the Temporal Facet:

Object Description
Event An (named) event with a representation on LBSN.

Event

An event with a representation on LBSN.

Events are temporal reference points with a start and end date. Start and end date may coincide.

Field Type Description
pkey CompositeKey Primary Key. A unique identifier of the object.
name string Name of the event
event_date google.protobuf.Timestamp Date and time of the event
event_date_start google.protobuf.Timestamp Start date of the event
event_date_end google.protobuf.Timestamp End date of the event
duration google.protobuf.Duration Duration of the event in seconds
event_latlng string Location of the event (WKT Point)
event_area string Location of the event (WKT Polygon)
place_pkey CompositeKey Place reference
city_pkey CompositeKey City reference
country_pkey CompositeKey Country reference
user_pkey CompositeKey User reference (e.g. the owner of the event)
event_description string A description of the event.
event_website string Url to the public website of the event.
event_type string Any string to describe the type of event
event_share_count int64 Number of times this Event has been shared by other users.
event_like_count int64 Number of times this Event has been liked/highlighted by other users.
event_comment_count int64 Number of times this Event has been commented on.
event_views_count int64 Number of times this Event has been viewed.
event_engage_count int64 Number of users who participate in this Event.

Top

interlinkage.proto

Describes additional relationships of the LBSN Data Structure

Interlinkage and the spread of information

Relationship

LBSN Relationships map one-to-many and many-to-many relationships.

Field Type Description
pkey RelationshipKey Primary Key. A unique identifier of the object.
relationship_type Relationship.RelationshipType Type of the relationship

RelationshipKey

Many-to-many relationships that could otherwise not be implemented in the relational lbsn structure. Relationships can also link entities between two different origin_id's (e.g. different services).

Field Type Description
relation_to CompositeKey Relation from reference
relation_from CompositeKey Relation to reference

Relationship.RelationshipType

Available types of LBSN relationship

Name Number Description
UNKNOWN 0
isFRIEND 1 A friend of a user (i.e. this user y is the friend of user x). Being a friend is a mutual relationship.
isCONNECTED 2 A user that is connected to someone, e.g. a follower of user x (i.e. this user y is the follower of user x). Being connected to someone (e.g. being a follower) is not a mutual relationship.
isEQUAL 3 A user that has multiple representations on the same service or is linked across services.
inGROUP 4 A user x that is a member of the group y.
followsGROUP 5 A user x that follows the group y.
inCOMMUNITY 6 A user x that is a member of the community y.
MENTIONS_USER 7 A user x that mentions user y.
hasHASHTAG 8 A post x that is tagged with term y.
hasEMOJI 9 A post x contains emoji y.
OTHER 10 Any other relation type.

Scalar Value Types

.proto Type Notes C++ Type Java Type Python Type
double double double float
float float float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long
uint32 Uses variable-length encoding. uint32 int int/long
uint64 Uses variable-length encoding. uint64 long int/long
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long
sfixed32 Always four bytes. int32 int int
sfixed64 Always eight bytes. int64 long int/long
bool bool boolean boolean
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode
bytes May contain any arbitrary sequence of bytes. string ByteString str

Last update: March 31, 2023
Back to top