Skip to content

A minimal setup example

For testing and demonstration purposes, we have created several docker containers.

At the core, these include the raw (rawdb) and hll version (hlldb) of the lbsn structure.

There are many ways to run these containers, e.g. in Linux, WSL, or native Windows.

Windows user?

If you're working with Windows, the instructions below will only work in Windows Subsystem for Linux (WSL). Even if it is possible to run Docker containers natively in Windows, we strongly recommend using WSL or WSL2.

A minimal setup example

A minimal setup would include first cloning and starting rawdb, and hlldb.

git clone --recursive https://gitlab.vgiscience.de/lbsn/databases/hlldb.git
cd hlldb
mv .env.example .env
docker-compose up -d
git clone --recursive git@gitlab.vgiscience.de:lbsn/databases/hlldb.git
cd hlldb
mv .env.example .env
docker-compose up -d
git clone --recursive https://gitlab.vgiscience.de/lbsn/databases/rawdb.git
cd rawdb
mv .env.example .env
docker-compose up -d
git clone --recursive git@gitlab.vgiscience.de:lbsn/databases/rawdb.git
cd rawdb
mv .env.example .env
docker-compose up -d

Afterwards,

  • rawdb will be available at 127.0.0.1:15432 and
  • hlldb will be available at 127.0.0.1:25432

To import data:

Follow the instructions to install lbsntransform.

Get the appropriate mapping for your data. For example, for the YFCC100M dataset (CSV files), this is included in lbsntransform resources folder.

git clone https://gitlab.vgiscience.de/lbsn/lbsntransform.git \
    && cd lbsntransform \
    && git filter-branch --subdirectory-filter resources

Store your files (YFCC100M CSVs) for import in (e.g.) lbsntransform/01_Input.

For the YFC100M folder, we have provided subsets of the first 10,000 CSV records.

cd lbsntransform
mkdir 01_Input
cd 01_Input
wget --quiet https://cloudstore.zih.tu-dresden.de/index.php/s/f3knjyE7ZdpE9Wp/download \
    -O 02_yfcc100m_places_first_10000.csv
wget --quiet https://cloudstore.zih.tu-dresden.de/index.php/s/kizmNKkTP2qbdk7/download \
    -O 01_yfcc100m_posts_first_10000.csv
cd ..

For importing data to rawdb and hlldb, run lbsntransform with default parameters.

conda activate lbsntransform
lbsntransform --origin 21 \
    --file_input \
    --dbpassword_output "eX4mP13p455w0Rd" \
    --dbuser_output "postgres" \
    --dbserveraddress_output "127.0.0.1:15432" \
    --dbname_output "rawdb" \
    --csv_delimiter $'\t' \
    --file_type "csv" \
    --zip_records \
    --mappings_path "mappings/"
conda activate lbsntransform
lbsntransform --origin 21 \
    --file_input \
    --dbpassword_output "eX4mP13p455w0Rd" \
    --dbuser_output "postgres" \
    --dbserveraddress_output "127.0.0.1:25432" \
    --dbname_output "hlldb" \
    --dbformat_output "hll" \
    --dbpassword_hllworker "eX4mP13p455w0Rd" \
    --dbuser_hllworker "postgres" \
    --dbserveraddress_hllworker "127.0.0.1:25432" \
    --dbname_hllworker "hlldb" \
    --csv_delimiter $'\t' \
    --file_type "csv" \
    --include_lbsn_objects "origin,post" \
    --zip_records \
    --mappings_path "mappings/"
Speed up processing with a separate 'hll worker db'

There is a separate, third Docker container available, that contains an empty Postgres database, with Citus hll extension installed and a read-only user. This database can be used for hll conversions.

  1. First clone the hll worker Docker container

    git clone https://gitlab.vgiscience.de/lbsn/databases/pg-hll-empty.git
    cd pg-hll-empty
    mv .env.example .env
    mv vars.env.example vars.env
    docker-compose up -d
    

  2. Use the hll worker db for hll conversions in lbsntransform

    conda activate lbsntransform
    lbsntransform --origin 21 \
        --file_input \
        --dbpassword_output "eX4mP13p455w0Rd" \
        --dbuser_output "postgres" \
        --dbserveraddress_output "127.0.0.1:25432" \
        --dbname_output "hlldb" \
        --dbformat_output "hll" \
        --dbpassword_hllworker "eX4mP13p455w0Rd" \
        --dbuser_hllworker "postgres" \
        --dbserveraddress_hllworker "127.0.0.1:5432" \
        --dbname_hllworker "hllworkerdb" \
        --csv_delimiter $'\t' \
        --file_type "csv" \
        --include_lbsn_objects "origin,post" \
        --zip_records \
        --mappings_path "mappings/"
    

lbsnctl

An alternative way is to use lbsnctl, a shell script that starts the following docker services:

  • rawdb: A ready to use Docker Container with the SQL implementation of LBSN Structure
  • hlldb: A ready to use Docker Container with a privacy-aware version of LBSN Structure, e.g. for visual analytics
  • pgadmin: A web-based PostgreSQL database interface.
  • jupyterlab: A modern web-based user interface for python visual analytics.

Last update: May 26, 2021