Arguments and Usage

This page is generated from the source code ( and provides an overview of lbsntransform command line arguments.


usage: lbsntransform [-h] [--version] [-o ORIGIN] [--dry-run] [-l]
                     [--file_type FILE_TYPE] [--input_path_url INPUT_PATH_URL]
                     [--is_stacked_json] [--is_line_separated_json]
                     [--dbpassword_hllworker DBPASSWORD_HLLWORKER]
                     [--dbuser_hllworker DBUSER_HLLWORKER]
                     [--dbserveraddress_hllworker DBSERVERADDRESS_HLLWORKER]
                     [--dbname_hllworker DBNAME_HLLWORKER]
                     [-p DBPASSWORD_OUTPUT] [-u DBUSER_OUTPUT]
                     [-a DBSERVERADDRESS_OUTPUT] [-n DBNAME_OUTPUT]
                     [--dbformat_output DBFORMAT_OUTPUT]
                     [--dbpassword_input DBPASSWORD_INPUT]
                     [--dbuser_input DBUSER_INPUT]
                     [--dbserveraddress_input DBSERVERADDRESS_INPUT]
                     [--dbname_input DBNAME_INPUT]
                     [--dbformat_input DBFORMAT_INPUT] [-t TRANSFERLIMIT]
                     [--transfer_count TRANSFER_COUNT]
                     [--commit_volume COMMIT_VOLUME]
                     [--records_tofetch RECORDS_TOFETCH]
                     [--startwith_db_rownumber STARTWITH_DB_ROWNUMBER]
                     [--endwith_db_rownumber ENDWITH_DB_ROWNUMBER]
                     [--debug_mode DEBUG_MODE]
                     [--geocode_locations GEOCODE_LOCATIONS]
                     [--ignore_input_source_list IGNORE_INPUT_SOURCE_LIST]
                     [--mappings_path MAPPINGS_PATH]
                     [--input_lbsn_type INPUT_LBSN_TYPE]
                     [--map_full_relations] [--csv_output]
                     [--csv_allow_linebreaks] [--csv_delimiter CSV_DELIMITER]
                     [--use_csv_dictreader] [--recursive_load]
                     [--skip_until_file SKIP_UNTIL_FILE]
                     [--skip_until_record SKIP_UNTIL_RECORD] [--zip_records]
                     [--min_geoaccuracy MIN_GEOACCURACY]
                     [--include_lbsn_objects INCLUDE_LBSN_OBJECTS]
                     [--include_lbsn_bases INCLUDE_LBSN_BASES]
                     [--override_lbsn_query_schema OVERRIDE_LBSN_QUERY_SCHEMA]
                     [--hmac_key HMAC_KEY]


Quick reference table

The quick reference table contains truncated short summaries of descriptions. Jump to individual arguments in the navigation submenu on the left side.

Short Long Default Description
-h --help show this help message
--version show program's version
-o --origin 0 Input source type (id)
--dry-run Perform a trial run
-l --file_input This flag enables file
--file_type json Specify filetype
--input_path_url 01_Input Path to input folder.
--is_stacked_json Input is stacked json.
--is_line_separated_json Json is line separated
--dbpassword_hllworker None Password for hllworker
--dbuser_hllworker postgres Username for hllworker
--dbserveraddress_hllworker None IP for hllworker db
--dbname_hllworker None DB name for hllworker
-p --dbpassword_output None Password for out-db
-u --dbuser_output postgres Username for out-db.
-a --dbserveraddress_output None IP for output db,
-n --dbname_output None DB name for output db
--dbformat_output lbsn Format of the out-db.
--dbpassword_input None Password for input-db
--dbuser_input postgres Username for input-db.
--dbserveraddress_input None IP for input-db,
--dbname_input None DB name for input-db,
--dbformat_input json Format of the input-db
-t --transferlimit None Abort after x records.
--transfer_count 50000 Transfer batch limit x
--commit_volume None After x commit_volume,
--records_tofetch 10000 Fetch x records /batch
--disable_transfer_reactions Disable reactions.
--disable_reaction_post_referencing Disable reactions-refs
--ignore_non_geotagged Ignore none-geotagged.
--startwith_db_rownumber None Start with db row x.
--endwith_db_rownumber None End with db row x.
--debug_mode None Enable debug mode.
--geocode_locations None Path to loc-geocodes.
--ignore_input_source_list None Path to input ignore.
--mappings_path None Path mappings folder.
--input_lbsn_type None Input sub-type
--map_full_relations Map full relations.
--csv_output Store to local CSV.
--csv_allow_linebreaks Disable linebreak-rem.
--csv_delimiter , CSV delimiter.
--use_csv_dictreader Use csv.DictReader.
--recursive_load Recursive local sub di
--skip_until_file None Skip until file x.
--skip_until_record None Skip until record x.
--zip_records Zip records parallel.
--min_geoaccuracy None Min geoaccuracy to use
--include_lbsn_objects None lbsn objects to proces
--include_lbsn_bases None lbsn bases to update
--override_lbsn_query_schema None Override schema and ta
--hmac_key None Override db hmac key

-h, --help

show this help message and exit


show program's version number and exit

-o, --origin

(Default: 0)

Input source type (id).

  • Defaults to 0: LBSN

Other possible values:

  • 1 - Instagram
  • 2 - Flickr
  • 21 - Flickr YFCC100M
  • 3 - Twitter


Perform a trial run
with no changes made to database/output

-l, --file_input

This flag enables file input

(instead of reading data from a database).

  • To specify which files to process, see parameter --input_path_url.
  • To specify file types, e.g. whether to process data from json or csv, or from URLs,
    see --file_type


(Default: json)

Specify filetype

(json, csv etc.)

  • only applies if --file_input is used.


(Default: 01_Input)

Path to input folder.

  • If not provided, subfolder ./01_Input/ will be used.
  • You can also provide a web-url, starting with http(s)
  • URLs will be accessed using requests.get(url, stream=True).
  • To separate multiple urls, use semicolon (;). In this case, see also --zip_records.


Input is stacked json.

  • The typical form of json is [{json1},{json2}]
  • If --is_stacked_json is set, it will process stacked jsons in the form of {json1}{json2} (no comma)


Json is line separated

  • The typical form is [{json1},{json2}]
  • If --is_line_separated_json is set, it will process stacked jsons in the form of {json1} {json2} (with linebreak)
  • Unix style linebreaks (CR) will be used across platforms
  • Windows users, use (e.g.) notepad++ to convert from Windows style linebreaks (CRLF)


(Default: None)

Password for hllworker db

  • If reading data into hlldb, all HLL Worker parameters must be supplied bydefault.
  • You can substitute hlldb parameters here
  • In this case, lbsntransform will use hlldb to convert and union hll sets and to store output results
  • Currently, this re-use of hlldb requires to supply the same set of parameters twice
  • For separation of concerns, it is recommended to use a separate HLL Worker database


(Default: postgres)

Username for hllworker db.


(Default: None)

IP for hllworker db

  • e.g.
  • Optionally add port the to use, e.g.
  • 5432 is the default port


(Default: None)

DB name for hllworker db

  • e.g. hllworkerdb

-p, --dbpassword_output

(Default: None)

Password for out-db

(postgres raw/hll db)

-u, --dbuser_output

(Default: postgres)

Username for out-db.

Default: example-user-name2

-a, --dbserveraddress_output

(Default: None)

IP for output db,

  • e.g.
  • Optionally add port to use, e.g.
  • 5432 is the default port

-n, --dbname_output

(Default: None)

DB name for output db

  • e.g. rawdb or hlldb


(Default: lbsn)

Format of the out-db.

  • Either hll or lbsn.
  • This setting affects how data is stored, either in anonymized and aggregate form (hll), or in the lbsn raw structure (lbsn).


(Default: None)

Password for input-db


(Default: postgres)

Username for input-db.


(Default: None)

IP for input-db,

  • e.g.
  • Optionally add port to use, e.g.
  • 5432 is the default port


(Default: None)

DB name for input-db,

  • e.g.: rawdb


(Default: json)

Format of the input-db.

  • Either lbsn or json
  • If lbsn is used, the native lbsn raw input mapping (0) will be used
  • If json is used, a custom mapping for json must be provided, for mapping database json's to the lbsn structure. See input mappings

-t, --transferlimit

(Default: None)

Abort after x records.

  • This can be used to limit the number of records that will be processed.
  • e.g. --transferlimit 10000 will process the first 10000 input records
  • Defaults to None (= process all)
  • Note that one input record can map to many output records. This number applies to the number of input records, not the output count.


(Default: 50000)

Transfer batch limit x.

  • Defines after how many parsed records the results will be transferred to the DB.
  • Defaults to 50000
  • If you have a slow server, but a fast machine, larger values improve speed because duplicate check happens in Python, and not in Postgres coalesce;
  • However, larger values require more local memory. If you have a fast server, but a slow machine, try if a smaller batch --transfer_count (e.g. 5000) improves speed.


Use --transferlimit to limit the total number of records transferred. --transfer_count instead defines the batch count that is used to transfer data incrementally.


(Default: None)

After x commit_volume, changes (transactions) will be written to the output database (a Postgres COMMIT).

Note that updated entries in the output database are only written from the WAL buffer after a commit.

  • Default for rawdb: 10000
  • Default for hlldb: 100000


If you have concurrent writes to the DB (e.g. multiple lbsntransform processes) and if you see transaction deadlocks, reduce the commit_volume.


(Default: 10000)

Fetch x records /batch.

  • If retrieving data from a db (lbsn), limit the number of records to fetch at once.
  • Defaults to 10000


Disable reactions.

  • If set, processing of lbsn reactions will be skipped,
  • only original posts are transferred.
  • This is usefull to reduce processing and data footprint for some service data, e.g. for Twitter, with a large number of reactions containing little original content.


Disable reactions-refs.

Enable this option in args to prevent empty posts being stored due to Foreign-Key-Exists Requirement.
Possible parameters:

  • 0 = Save Original Tweets of Retweets as posts;
  • 1 = do not store Original Tweets of Retweets;
  • 2 = !Not implemented: Store Original Tweets of Retweets as post_reactions


Ignore none-geotagged.

If set, posts that are not geotagged are ignored during processing.


(Default: None)

Start with db row x.

If transferring from a databse (input), this flag can be used to resume processing (e.g.) if a transfer has been aborted.

  • Provide a number (row-id) to start processing from live db.
  • If input db type is lbsn, this is the primary key, without the origin_id, (e.g. the post_guid, place_guid etc.).
  • This flag will only work if processing a single lbsn object (e.g. --include_lbsn_objects "post").


--startwith_db_rownumber "123456789"
will lead to the first batch-query from the DB looking like this:

SELECT * FROM topical."post"
WHERE post_guid > '123456789'
ORDER BY post_guid ASC
LIMIT 10000;


(Default: None)

End with db row x.

Provide a number (row-id) to end processing from live db


(Default: None)

Enable debug mode.


(Default: None)

Path to loc-geocodes.

  • Provide path to a CSV file with location geocodes
  • CSV Header must be: lat, lng, name).
  • This can be used in mappings to assign coordinates (lat, lng) to use provided locations as text


(Default: None)

Path to input ignore.

Provide a path to a list of input_source types that will be ignored (e.g. to ignore certain bots etc.)


(Default: None)

Path mappings folder.

Provide a path to a custom folder that contains one or more input mapping modules (*.py).


(Default: None)

Input sub-type

  • e.g. post, profile, friendslist, followerslist etc.
  • This can be used to select an appropiate mapping procedure in a single mapping module.


Map full relations.

Set to true to map full relations, e.g. many-to-many relationships, such as user_follows, user_friend, or user_mentions etc. are mapped in a separate table. Defaults to False.


Store to local CSV.

If set, will store all submit values to local CSV instead. Currently, this type of output is not available.


Disable linebreak-rem.

If set, will not remove intext-linebreaks (\r or \n) in output CSVs


(Default: ,)

CSV delimiter.

  • Provide the CSV delimiter to be used.
  • Default is comma (,).
  • Note: to pass tab, use variable substitution ($"\t")


Use csv.DictReader.

By default, CSVs will be read line by line,
using the standard csv.reader().

This will enable csv.DictReader(),
which allows to access CSV fields by name in mappings.

A CSV with a header is required for this setting to work.

Note that csv.DictReader() may be slower than the default csv.reader().


Recursive local sub dirs.

If set, process input directories recursively (default depth: 2)


(Default: None)

Skip until file x.

If local input, skip all files until file with name x appears (default: start immediately)


(Default: None)

Skip until record x.

If local input, skip all records until record x (default: start with first)


Zip records parallel.

  • Use this flag to zip records of multiple input files
  • e.g. List1[A,B,C], List2[1,2,3] will be combined (zipped) on read to List[A1,B2,C3]


(Default: None)

Min geoaccuracy to use

Set to latlng, place, or city to limit processing of records based on mininum geoaccuracy (default: no limit)


(Default: None)

lbsn objects to process

If processing from lbsn db (rawdb), provide a comma separated list of lbsn objects to include.
May contain:

  • origin
  • country
  • city
  • place
  • user_groups
  • user
  • post
  • post_reaction


  • Excluded objects will not be queried, but empty objects may be created due to referenced foreign key relationships.
  • Defaults to origin,post


(Default: None)

lbsn bases to update

If the target output type is hll, provide a comma separated list of lbsn bases to include/update/store to.

Currently supported:

  • hashtag
  • emoji
  • term
  • _hashtag_latlng
  • _term_latlng
  • _emoji_latlng
  • _month_hashtag
  • _month_hashtag_latlng
  • _month_latlng
  • monthofyear
  • month
  • dayofmonth
  • dayofweek
  • hourofday
  • year
  • month
  • date
  • timestamp
  • country
  • region
  • city
  • place
  • latlng
  • community

Bases not included will be skipped. Per default, no bases will be considered.


--include_lbsn_bases hashtag,place,date,community

This will update entries in the Postgres hlldb tables topical.hashtag,, and non-existing entries will be created, existing ones will be updated (a hll_union).

See the structure definition in SQL here for a full list of hlldb table structures.

Argument only allowed one time.


(Default: None)

Override schema and table name

This can be used to redirect lbsn queries on the given object from input db to a specific schema/table such as a materialized view.

This can be usefull (e.g.) to limit processing of input data to a specific query.

Format is lbsn_type,schema.table.


--override_lbsn_query_schema post,mviews.mypostquery

Argument can be used multiple times.


(Default: None)

Override db hmac key

The hmac key that is used for cryptographic hashing during creation of HLL sets. Override what is set in hllworker database here.

Remember to re-use the same hmac key for any consecutive update of HLL sets.

The crypt.salt variable can also be set (temporarily or permanently) in the hll worker database itself.

ALTER DATABASE hllworkerdb SET crypt.salt = 'CRYPTSALT';

Further information is available in the YFCC HLL tutorial.