/*
//paragraph [10] title:    [{\Large \bf ]    [}]
//characters [1] formula:     [$]     [$]
//[ae] [\"{a}]
//[oe] [\"{o}]
//[ue] [\"{u}]
//[ss] [{\ss}]
//[Ae] [\"{A}]
//[Oe] [\"{O}]
//[Ue] [\"{U}]
//[**] [$**$]
//[toc] [\tableofcontents]
//[=>] [\verb+=>+]
//[:Section Translation] [\label{sec:translation}]
//[Section Translation] [Section~\ref{sec:translation}]
//[:Section 4.1.1] [\label{sec:4.1.1}]
//[Section 4.1.1] [Section~\ref{sec:4.1.1}]
//[Figure pog1] [Figure~\ref{fig:pog1.eps}]
//[Figure pog2] [Figure~\ref{fig:pog2.eps}]
//[newpage] [\newpage]

[10] Creating a Distributed Spatial Database Using the Distributed2Algebra

Ralf Hartmut G[ue]ting, July 2015

[toc]

[newpage]

1 Introduction

In this example, we set up a distributed spatial database, using the Distributed2Algebra of Secondo.  We use a mini-cluster in Ralf's office, consisting of two computers running Ubuntu 14.04. The two computers provide the following resources:

  * ralf1: 4 disks, 6 cores, 8 GB main memory

  * ralf2: 4 disks, 8 cores, 32 GB

On ralf1, we use one disk exclusively for the master, the remaining for workers. Regarding cores and main memory we let the master overlap with workers. The main case of master and workers working simultaneously is data distribution from the master to the workers and back. In this case, one worker is not very active compared to the master (who is quite busy) so that the overlapping core should not be a problem. Similarly there is not a lot of memory used in these transfers. We therefore configure the system as follows:

  * Master (ralf1): 1 disk, one core, 6 GB

  * Workers: 

    * ralf1: 3 disks, 6 cores, 1000 MB per core (about 6 GB)

    * ralf2: 4 disks, 8 cores, 3600 MB per core (about 28 GB)

We always leave some space for the OS. For all workers we leave a bit of extra space for query operators underestimating their space consumption.


2 Passphrase-less Connection

The first thing is to set up an ssh connection from the master to the workers so that one can start SecondoMonitors on the computing nodes without the user having to enter a password. This is done as follows.

(1) In the master's home directory, enter

----	ssh-keygen -t dsa
----

This generates a public key. The system asks for a password, just type return for an empty password. Also type return for the default location to store the public key.

(2) Configure ssh to know the worker computers. Into a file .ssh/config, we put the following:

----
Host *ralf1
HostName 132.176.69.130
User ralf
Host *ralf2
HostName 132.176.69.78
User ralf
----

This makes the worker computers available with names ralf1 and ralf2 and also sets the user name and home directory on these computers.

(3) Transmit the public key to the target computers. 

----	
ssh-copy-id -i .ssh/id_dsa.pub <user>@server 

ssh-copy-id -i .ssh/id_dsa.pub ralf@132.176.69.78
----

If the master system does not have the command ssh-copy-id, one can use instead:

----	
cat ~/.ssh/*.pub | ssh <user>@<server> 'umask 077; cat >>.ssh/authorized_keys' 
----

At this point the system on ralf2 asks once for a password. Subsequently it is possible to log in on ralf2 by simply typing

----	ssh ralf2
----

One should log in once on each of the nodes because the system on the master needs to enter each node into its list of known hosts.


3 Getting Spatial Data to the Master

We use OpenStreetMap data about the German state of North-Rhine-Westphalia obtained in the form of shapefiles from GeoFabrik.

On http://download.geofabrik.de/ one can navigate a bit selecting Europe, then Germany. Then from the table Sub Regions in the row for Nordrhein-Westfalen, download 

http://download.geofabrik.de/europe/germany/nordrhein-westfalen-latest.shp.zip

You can also use this link to download directly, if it is still available. Unpack the zip file into a directory nrw within secondo/bin.

The database has 8 kinds of objects, for example, Roads, Buildings, Waterways.

In directory secondo/bin start a Secondo system without transactions (for loading data) using the command

----	SecondoTTYNT
----

At the prompt, enter

----
Secondo => @Scripts/nrwImportShape.SEC
----

The contents of the file nrwImportShape are shown here:

----	
# Importing NRW Data

close database

create database nrw

open database nrw

let Roads = dbimport2('../bin/nrw/roads.dbf') addcounter[No, 1] 
shpimport2('../bin/nrw/roads.shp') namedtransformstream[GeoData] 
addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] 
filter[isdefined(bbox(.GeoData))] validateAttr consume

let Waterways = dbimport2('../bin/nrw/waterways.dbf') addcounter[No, 1] 
shpimport2('../bin/nrw/waterways.shp') namedtransformstream[GeoData] 
addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] 
filter[isdefined(bbox(.GeoData))] validateAttr consume

let Railways = dbimport2('../bin/nrw/railways.dbf') addcounter[No, 1] 
shpimport2('../bin/nrw/railways.shp') namedtransformstream[GeoData] 
addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] 
filter[isdefined(bbox(.GeoData))] validateAttr consume

let Points = dbimport2('../bin/nrw/points.dbf') addcounter[No, 1] 
shpimport2('../bin/nrw/points.shp') namedtransformstream[GeoData] 
addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] 
filter[isdefined(bbox(.GeoData))] validateAttr consume

let Places = dbimport2('../bin/nrw/places.dbf') addcounter[No, 1] 
shpimport2('../bin/nrw/places.shp') namedtransformstream[GeoData] 
addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] 
filter[isdefined(bbox(.GeoData))] validateAttr consume

let Natural = dbimport2('../bin/nrw/natural.dbf') addcounter[No, 1] 
shpimport2('../bin/nrw/natural.shp') namedtransformstream[GeoData] 
addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] 
filter[isdefined(bbox(.GeoData))] validateAttr consume

let Buildings = dbimport2('../bin/nrw/buildings.dbf') addcounter[No, 1] 
shpimport2('../bin/nrw/buildings.shp') namedtransformstream[GeoData] 
addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] 
filter[isdefined(bbox(.GeoData))] validateAttr consume

let Landuse = dbimport2('../bin/nrw/landuse.dbf') addcounter[No, 1] 
shpimport2('../bin/nrw/landuse.shp') namedtransformstream[GeoData] 
addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] 
filter[isdefined(bbox(.GeoData))] validateAttr consume
----

This creates relations Roads, Waterways, etc. within a new database nrw.

Recently, this database is fairly large, for example, has 5.8 mio buildings. It takes roughly one hour to run this script.

Opening the database in the query optimizer collects information about the existing database objects which can be displayed with the predicate "showDatabase".

----
2 ?- showDatabase.

Collected information for database 'nrw':

Relation Natural	(Auxiliary objects:)
	AttributeName               Type          Memory  DiskCore    DiskLOB
	GeoData                     region        152.0      152.0    4500.864
	Type                        string        64.0       22.0     0
	Name                        string        64.0       54.0     0
	Osm_id                      string        64.0       17.0     0

	Indices: 


	Ordering:  []


	Cardinality:   129317
	Avg.TupleSize: 4745.864 = sizeTerm(480.0,245.0,4500.864)
	(Tuple size in memory is 136 + sum of attribute sizes.)

Relation Places	(Auxiliary objects:)
	AttributeName               Type          Memory  DiskCore    DiskLOB
	GeoData                     point         32.0       32.0     0
	Population                  int           16.0       5.0      0
	Type                        string        64.0       22.0     0
	Name                        string        64.0       54.0     0
	Osm_id                      string        64.0       17.0     0

	Indices: 


	Ordering:  []


	Cardinality:   15908
	Avg.TupleSize: 130.0 = sizeTerm(376.0,130.0,0)
	(Tuple size in memory is 136 + sum of attribute sizes.)

Relation Points	(Auxiliary objects:)
	AttributeName               Type          Memory  DiskCore    DiskLOB
	GeoData                     point         32.0       32.0     0
	Type                        string        64.0       22.0     0
	Name                        string        64.0       54.0     0
	Timestamp                   string        64.0       26.0     0
	Osm_id                      string        64.0       17.0     0

	Indices: 


	Ordering:  []


	Cardinality:   490421
	Avg.TupleSize: 151.0 = sizeTerm(424.0,151.0,0)
	(Tuple size in memory is 136 + sum of attribute sizes.)

Relation Railways	(Auxiliary objects:)
	AttributeName               Type          Memory  DiskCore    DiskLOB
	GeoData                     line          232.896    232.896  1189.248
	Type                        string        64.0       22.0     0
	Name                        string        64.0       54.0     0
	Osm_id                      string        64.0       17.0     0

	Indices: 


	Ordering:  []


	Cardinality:   53082
	Avg.TupleSize: 1515.144 = 
		sizeTerm(560.896,325.89599609375,1189.24800390625)
	(Tuple size in memory is 136 + sum of attribute sizes.)

Relation Roads	(Auxiliary objects:)
	AttributeName               Type          Memory  DiskCore    DiskLOB
	GeoData                     line          221.76     221.76   948.864
	Maxspeed                    int           16.0       5.0      0
	Tunnel                      int           16.0       5.0      0
	Bridge                      int           16.0       5.0      0
	Oneway                      int           16.0       5.0      0
	Type                        string        64.0       22.0     0
	Ref                         string        64.0       22.0     0
	Name                        string        64.0       54.0     0
	Osm_id                      string        64.0       17.0     0

	Indices: 


	Ordering:  []


	Cardinality:   1505462
	Avg.TupleSize: 1305.624 = 
		sizeTerm(677.76,356.760009765625,948.863990234375)
	(Tuple size in memory is 136 + sum of attribute sizes.)

Relation Waterways	(Auxiliary objects:)
	AttributeName               Type          Memory  DiskCore    DiskLOB
	GeoData                     line          250.944    250.944  1855.87
	Width                       int           16.0       5.0      0
	Type                        string        64.0       22.0     0
	Name                        string        64.0       54.0     0
	Osm_id                      string        64.0       17.0     0

	Indices: 


	Ordering:  []


	Cardinality:   80904
	Avg.TupleSize: 2204.816 = 
		sizeTerm(594.944,348.9440002441406,1855.8719997558592)
	(Tuple size in memory is 136 + sum of attribute sizes.)

Relation Buildings	(Auxiliary objects:)
	AttributeName               Type          Memory  DiskCore    DiskLOB
	GeoData                     region        152.0      152.0    1264.896
	Type                        string        64.0       22.0     0
	Name                        string        64.0       54.0     0
	Osm_id                      string        64.0       17.0     0

	Indices: 


	Ordering:  []


	Cardinality:   6516159
	Avg.TupleSize: 1509.896 = sizeTerm(480.0,245.0,1264.896)
	(Tuple size in memory is 136 + sum of attribute sizes.)

Relation Landuse	(Auxiliary objects:)
	AttributeName               Type          Memory  DiskCore    DiskLOB
	GeoData                     region        152.0      152.0    3151.488
	Type                        string        64.0       22.0     0
	Name                        string        64.0       54.0     0
	Osm_id                      string        64.0       17.0     0

	Indices: 


	Ordering:  []


	Cardinality:   359759
	Avg.TupleSize: 3396.488 = sizeTerm(480.0,245.0,3151.488)
	(Tuple size in memory is 136 + sum of attribute sizes.)

(Type 'showDatabaseSchema.' to view the complete database schema.)
true.

3 ?- 
----


4 Setting Up the Distributed Infrastructure

In the introduction it was mentioned that we wish to use the following resources:

  * Master (ralf1): 1 disk, one core, 6 GB

  * Workers: 

    * ralf1: 3 disks, 6 cores, 1000 MB per core (about 6 GB)

    * ralf2: 4 disks, 8 cores, 3600 MB per core (about 28 GB)


4.1 Setting Up the Monitors

Per worker disk we need to start one SecondoMonitor. 

We can start the monitors using a script ~remoteMonitors~ from the secondo/bin directory. It takes the following parameters (see also remoteMonitors.readme):

----
remoteMonitors <description file> <action>
----

Here the ~description file~ contains entries describing the monitors to be started, one line per monitor. Such a line has the format 

----
<Server> <Configuration file>  [<user>] 
----

Here <Server> is the IP address or the name of the node on which the monitor is to be started, and the second parameter is the SecondoConfig.ini file version to be used. The third parameter is needed to specify the name of the user in case it is different on the remote computer to be used. 


The ~action~ is one of \{start, check, or stop\} with the obvious meanings.

Here we provide a description file ~ClusterRalf7~ with the following content:

----
132.176.69.130 SecondoConfig.ini.130.1271 
132.176.69.130 SecondoConfig.ini.130.1272 
132.176.69.130 SecondoConfig.ini.130.1273 
132.176.69.78 SecondoConfig.ini.78.1274 
132.176.69.78 SecondoConfig.ini.78.1275 
132.176.69.78 SecondoConfig.ini.78.1276
132.176.69.78 SecondoConfig.ini.78.1277 
----

The SecondoConfig.ini files need to lie in the secondo/bin directory of the respective computers (130 and 78). They are obtained from the standard SecondoConfig.ini by modifying 

  1 The database directory to be used.

  2 The host IP address

  3 The port

  4 The global memory per server. We have:

    * 130: GlobalMemory=1000

    *  78: GlobalMemory=3600

The secondo-databases directories on the respective disc locations need to be created before using the ~remoteMonitors~ script.


4.2 Starting and Stopping the Monitors

We can now start the monitors:

----
remoteMonitors ClusterRalf7 start
----

As a result, we see:

----
ralf@ralf-ubuntu6:~/secondo/bin$ remoteMonitors ClusterRalf7 start
Try to start monitor on server 132.176.69.130
Monitor is running now at port 1271
Try to start monitor on server 132.176.69.130
Monitor is running now at port 1272
Try to start monitor on server 132.176.69.130
Monitor is running now at port 1273
Try to start monitor on server 132.176.69.78
Monitor is running now at port 1274
Try to start monitor on server 132.176.69.78
Monitor is running now at port 1275
Try to start monitor on server 132.176.69.78
Monitor is running now at port 1276
Try to start monitor on server 132.176.69.78
Monitor is running now at port 1277

ralf@ralf-ubuntu6:~/secondo/bin$ remoteMonitors ClusterRalf7 check
Try to check monitor on server 132.176.69.130
Monitor is running on port 1271 at 132.176.69.130
Try to check monitor on server 132.176.69.130
Monitor is running on port 1272 at 132.176.69.130
Try to check monitor on server 132.176.69.130
Monitor is running on port 1273 at 132.176.69.130
Try to check monitor on server 132.176.69.78
Monitor is running on port 1274 at 132.176.69.78
Try to check monitor on server 132.176.69.78
Monitor is running on port 1275 at 132.176.69.78
Try to check monitor on server 132.176.69.78
Monitor is running on port 1276 at 132.176.69.78
Try to check monitor on server 132.176.69.78
Monitor is running on port 1277 at 132.176.69.78
ralf@ralf-ubuntu6:~/secondo/bin$ 
----

We can stop all monitors using the command

----
remoteMonitors ClusterRalf7 stop
----


4.3 Setting Up, Starting and Stopping Workers

Workers are defined in a relation of a database to be used, hence in nrw. We will first use one worker (server) per disk in a set of experiments. We later use one worker per core.

----
let Workers7 = [const rel(tuple([Server: string, Port: int])) value 
(
  ("132.176.69.130" 1271)
  ("132.176.69.130" 1272)
  ("132.176.69.130" 1273)
  ("132.176.69.78" 1274)
  ("132.176.69.78" 1275)
  ("132.176.69.78" 1276)
  ("132.176.69.78" 1277)
)]
----


We now start one server for each node:

----
query Workers7 feed extend[Started: 
  connect(.Server, .Port, "SecondoConfig.ini")] consume
----


We can check the connections by

----	
query checkConnections() consume
----

We can see that servers have received Ids 0, ..., 6 in the order of relation Workers7.

To shutdown all workers we can use the command

----
query disconnect() 
----

To make starting and stopping easier, we define functions:

----
let start7 = fun() Workers7 feed extend[Started: connect(.Server, .Port)] 
  consume

let check7 = fun() checkConnections() consume

let stop7 = fun() disconnect()
----


5 Distributed NRW with 7 Workers

5.1 Distributing by Standard Attribute

We wish to distribute the Roads relation into a distributed array with 7 slots. We first need to create the array on the master:

----
let Roads0707_Osm_id = Workers7 feed extend[Config: "SecondoConfig.ini"] 
  createDarray2[7, "Roads0707_Osm_id", Roads, Server, Port, Config]
----

The input stream defines the available workers and a configuration file which does not really matter. Parameters in square brackets are the number of slots in the array, the name to be used in naming components in the worker databases, a relation from which the tuple type is derived, and then server, port, and configuration file to be used for workers.

The actual data distribution is done with a ddistribute4 operation:

----
query Roads feed ddistribute4[hashvalue(.Osm_id, 999997), Roads0707_Osm_id, 
  "Roads0707_Osm_id"]

5:09min (309.308sec)
----

It distributes input stream tuples by the function provided as first argument in brackets into the array given as a second argument. The last argument serves to name components in worker databases.

We count the tuples in each partition:

----
query Roads0707_Osm_id dloop2["", . count] getValue

0.89 seconds
----

Here the dloop2 operator builds a new distributed array of integers. By getValue we create a regular array on the master from it which can be displayed.

We can also use the tie operation of the regular array to sum up these numbers:

----
query Roads0707_Osm_id dloop2["", . count] getValue tie[. + ..]

0.93 secs
Result 1505462

query Roads count

1505462
----

We can see that the total number of tuples distributed agrees with that of the original relation.

A sequential query:

----
query Roads feed filter[.Type contains "primary"] count

15.6, 11.6, 15.4 seconds
Result 28388
----

The distributed version:

----
query Roads0707_Osm_id dloop2["", . feed 
  filter[.Type contains "primary"] count] getValue tie[. + ..]

5.5, 5.0, 5.1 seconds
Result 28388
----


Sequential self join of Roads on Osm\_id:

----
query Roads feed {r1} Roads feed {r2} itHashJoin[Osm_id_r1, Osm_id_r2] count

27.58, 24.73, 24.79 seconds
result 1505462 (correct)
----

The parallel version:

----
query Roads0707_Osm_id Roads0707_Osm_id dloop2a["", . feed {r1} .. feed {r2} 
  itHashJoin[Osm_id_r1, Osm_id_r2] count] getValue tie[. + ..]

9.3, 9.15, 10.28 seconds
----

The speedup is only about 3 although it could be 7. Why?

A query on a single partition on one worker takes 4.85, 4.74, 4.70 seconds.

----
query Roads0707_Osm_id Roads0707_Osm_id dloop2a["peter", . feed {r1} 
  .. feed {r2} itHashJoin[Osm_id_r1, Osm_id_r2] count] getValue
----

6.0, 6.0, 6.0 seconds


5.2 Using a Local Object on the Master in Queries on Workers

We define a rectangle around the area of Dortmund as follows:

----
let Dortmund = [const rect value (7.34 7.70 51.42 51.61)]
----

We now formulate a distributed query involving this object:

----
query Roads0707_Osm_id dloop2["", . feed 
  filter[bbox(.GeoData) intersects Dortmund] count] getValue tie[. + ..]
----

We get an error message because the object Dortmund mentioned in the query is known on the master but not on the workers. With the ~share~ operation we can move it into the databases of all workers connected:

----
query share("Dortmund", TRUE)
----

The second parameter determines whether any possibly existing objects in the worker databases should be overwitten.

After sharing the object, the original query works fine:

----
query Roads0707_Osm_id dloop2["", . feed 
  filter[bbox(.GeoData) intersects Dortmund] count] getValue tie[. + ..]

5.66, 5.62 seconds
Result 67078
----

We compare with the same query performed locally on the master:

----
query Roads feed feed filter[bbox(.GeoData) intersects Dortmund] count

12.66, 12.04 seconds
Result 67068
----


*/