A minimal setup example
For testing and demonstration purposes, we have created several docker containers.
At the core, these include the raw
(rawdb) and hll
version (hlldb) of the lbsn structure.
There are many ways to run these containers, e.g. in Linux, WSL, or native Windows.
Windows user?
If you're working with Windows, the instructions below will only work in Windows Subsystem for Linux (WSL). Even if it is possible to run Docker containers natively in Windows, we strongly recommend using WSL or WSL2.
A minimal setup example
A minimal setup would include first cloning and starting rawdb, and hlldb.
git clone --recursive https://gitlab.vgiscience.de/lbsn/databases/hlldb.git
cd hlldb
mv .env.example .env
docker network create lbsn-network
docker-compose up -d
git clone --recursive git@gitlab.vgiscience.de:lbsn/databases/hlldb.git
cd hlldb
mv .env.example .env
docker network create lbsn-network
docker-compose up -d
git clone --recursive https://gitlab.vgiscience.de/lbsn/databases/rawdb.git
cd rawdb
mv .env.example .env
docker network create lbsn-network
docker-compose up -d
git clone --recursive git@gitlab.vgiscience.de:lbsn/databases/rawdb.git
cd rawdb
mv .env.example .env
docker network create lbsn-network
docker-compose up -d
Afterwards,
- rawdb will be available at
127.0.0.1:15432
and - hlldb will be available at
127.0.0.1:25432
To import data:
Follow the instructions to install lbsntransform.
Get the appropriate mapping for your data. For example, for the YFCC100M dataset (CSV files), this is included in lbsntransform resources folder.
git clone https://gitlab.vgiscience.de/lbsn/lbsntransform.git \
&& cd lbsntransform \
&& git filter-branch --subdirectory-filter resources
Store your files (YFCC100M CSVs) for import in (e.g.) lbsntransform/01_Input
.
For the YFC100M folder, we have provided subsets of the first 10,000
CSV records.
cd lbsntransform
mkdir 01_Input
cd 01_Input
wget --quiet https://cloudstore.zih.tu-dresden.de/index.php/s/f3knjyE7ZdpE9Wp/download \
-O 02_yfcc100m_places_first_10000.csv
wget --quiet https://cloudstore.zih.tu-dresden.de/index.php/s/kizmNKkTP2qbdk7/download \
-O 01_yfcc100m_posts_first_10000.csv
cd ..
For importing data to rawdb and hlldb, run lbsntransform with default parameters.
conda activate lbsntransform
lbsntransform --origin 21 \
--file_input \
--dbpassword_output "eX4mP13p455w0Rd" \
--dbuser_output "postgres" \
--dbserveraddress_output "127.0.0.1:15432" \
--dbname_output "rawdb" \
--csv_delimiter $'\t' \
--file_type "csv" \
--zip_records \
--mappings_path "mappings/"
conda activate lbsntransform
lbsntransform --origin 21 \
--file_input \
--dbpassword_output "eX4mP13p455w0Rd" \
--dbuser_output "postgres" \
--dbserveraddress_output "127.0.0.1:25432" \
--dbname_output "hlldb" \
--dbformat_output "hll" \
--dbpassword_hllworker "eX4mP13p455w0Rd" \
--dbuser_hllworker "postgres" \
--dbserveraddress_hllworker "127.0.0.1:25432" \
--dbname_hllworker "hlldb" \
--csv_delimiter $'\t' \
--file_type "csv" \
--include_lbsn_objects "origin,post" \
--zip_records \
--mappings_path "mappings/"
Speed up processing with a separate 'hll worker db'
There is a separate, third Docker container available, that contains an empty Postgres database, with Citus hll extension installed and a read-only user. This database can be used for hll conversions.
-
First clone the hll worker Docker container
git clone https://gitlab.vgiscience.de/lbsn/databases/pg-hll-empty.git cd pg-hll-empty mv .env.example .env mv vars.env.example vars.env docker-compose up -d
-
Use the hll worker db for hll conversions in lbsntransform
conda activate lbsntransform lbsntransform --origin 21 \ --file_input \ --dbpassword_output "eX4mP13p455w0Rd" \ --dbuser_output "postgres" \ --dbserveraddress_output "127.0.0.1:25432" \ --dbname_output "hlldb" \ --dbformat_output "hll" \ --dbpassword_hllworker "eX4mP13p455w0Rd" \ --dbuser_hllworker "postgres" \ --dbserveraddress_hllworker "127.0.0.1:5432" \ --dbname_hllworker "hllworkerdb" \ --csv_delimiter $'\t' \ --file_type "csv" \ --include_lbsn_objects "origin,post" \ --zip_records \ --mappings_path "mappings/"
lbsnctl
An alternative way is to use lbsnctl, a shell script that starts the following docker services:
- rawdb: A ready to use Docker Container with the SQL implementation of LBSN Structure
- hlldb: A ready to use Docker Container with a privacy-aware version of LBSN Structure, e.g. for visual analytics
- pgadmin: A web-based PostgreSQL database interface.
- jupyterlab: A modern web-based user interface for python visual analytics.