/* //paragraph [10] title: [{\Large \bf ] [}] //characters [1] formula: [$] [$] //[ae] [\"{a}] //[oe] [\"{o}] //[ue] [\"{u}] //[ss] [{\ss}] //[Ae] [\"{A}] //[Oe] [\"{O}] //[Ue] [\"{U}] //[**] [$**$] //[toc] [\tableofcontents] //[=>] [\verb+=>+] //[:Section Translation] [\label{sec:translation}] //[Section Translation] [Section~\ref{sec:translation}] //[:Section 4.1.1] [\label{sec:4.1.1}] //[Section 4.1.1] [Section~\ref{sec:4.1.1}] //[Figure pog1] [Figure~\ref{fig:pog1.eps}] //[Figure pog2] [Figure~\ref{fig:pog2.eps}] //[newpage] [\newpage] [10] Creating a Distributed Spatial Database Using the Distributed2Algebra Ralf Hartmut G[ue]ting, July 2015 [toc] [newpage] 1 Introduction In this example, we set up a distributed spatial database, using the Distributed2Algebra of Secondo. We use a mini-cluster in Ralf's office, consisting of two computers running Ubuntu 14.04. The two computers provide the following resources: * ralf1: 4 disks, 6 cores, 8 GB main memory * ralf2: 4 disks, 8 cores, 32 GB On ralf1, we use one disk exclusively for the master, the remaining for workers. Regarding cores and main memory we let the master overlap with workers. The main case of master and workers working simultaneously is data distribution from the master to the workers and back. In this case, one worker is not very active compared to the master (who is quite busy) so that the overlapping core should not be a problem. Similarly there is not a lot of memory used in these transfers. We therefore configure the system as follows: * Master (ralf1): 1 disk, one core, 6 GB * Workers: * ralf1: 3 disks, 6 cores, 1000 MB per core (about 6 GB) * ralf2: 4 disks, 8 cores, 3600 MB per core (about 28 GB) We always leave some space for the OS. For all workers we leave a bit of extra space for query operators underestimating their space consumption. 2 Passphrase-less Connection The first thing is to set up an ssh connection from the master to the workers so that one can start SecondoMonitors on the computing nodes without the user having to enter a password. This is done as follows. (1) In the master's home directory, enter ---- ssh-keygen -t dsa ---- This generates a public key. The system asks for a password, just type return for an empty password. Also type return for the default location to store the public key. (2) Configure ssh to know the worker computers. Into a file .ssh/config, we put the following: ---- Host *ralf1 HostName 132.176.69.130 User ralf Host *ralf2 HostName 132.176.69.78 User ralf ---- This makes the worker computers available with names ralf1 and ralf2 and also sets the user name and home directory on these computers. (3) Transmit the public key to the target computers. ---- ssh-copy-id -i .ssh/id_dsa.pub @server ssh-copy-id -i .ssh/id_dsa.pub ralf@132.176.69.78 ---- If the master system does not have the command ssh-copy-id, one can use instead: ---- cat ~/.ssh/*.pub | ssh @ 'umask 077; cat >>.ssh/authorized_keys' ---- At this point the system on ralf2 asks once for a password. Subsequently it is possible to log in on ralf2 by simply typing ---- ssh ralf2 ---- One should log in once on each of the nodes because the system on the master needs to enter each node into its list of known hosts. 3 Getting Spatial Data to the Master We use OpenStreetMap data about the German state of North-Rhine-Westphalia obtained in the form of shapefiles from GeoFabrik. On http://download.geofabrik.de/ one can navigate a bit selecting Europe, then Germany. Then from the table Sub Regions in the row for Nordrhein-Westfalen, download http://download.geofabrik.de/europe/germany/nordrhein-westfalen-latest.shp.zip You can also use this link to download directly, if it is still available. Unpack the zip file into a directory nrw within secondo/bin. The database has 8 kinds of objects, for example, Roads, Buildings, Waterways. In directory secondo/bin start a Secondo system without transactions (for loading data) using the command ---- SecondoTTYNT ---- At the prompt, enter ---- Secondo => @Scripts/nrwImportShape.SEC ---- The contents of the file nrwImportShape are shown here: ---- # Importing NRW Data close database create database nrw open database nrw let Roads = dbimport2('../bin/nrw/roads.dbf') addcounter[No, 1] shpimport2('../bin/nrw/roads.shp') namedtransformstream[GeoData] addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] filter[isdefined(bbox(.GeoData))] validateAttr consume let Waterways = dbimport2('../bin/nrw/waterways.dbf') addcounter[No, 1] shpimport2('../bin/nrw/waterways.shp') namedtransformstream[GeoData] addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] filter[isdefined(bbox(.GeoData))] validateAttr consume let Railways = dbimport2('../bin/nrw/railways.dbf') addcounter[No, 1] shpimport2('../bin/nrw/railways.shp') namedtransformstream[GeoData] addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] filter[isdefined(bbox(.GeoData))] validateAttr consume let Points = dbimport2('../bin/nrw/points.dbf') addcounter[No, 1] shpimport2('../bin/nrw/points.shp') namedtransformstream[GeoData] addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] filter[isdefined(bbox(.GeoData))] validateAttr consume let Places = dbimport2('../bin/nrw/places.dbf') addcounter[No, 1] shpimport2('../bin/nrw/places.shp') namedtransformstream[GeoData] addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] filter[isdefined(bbox(.GeoData))] validateAttr consume let Natural = dbimport2('../bin/nrw/natural.dbf') addcounter[No, 1] shpimport2('../bin/nrw/natural.shp') namedtransformstream[GeoData] addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] filter[isdefined(bbox(.GeoData))] validateAttr consume let Buildings = dbimport2('../bin/nrw/buildings.dbf') addcounter[No, 1] shpimport2('../bin/nrw/buildings.shp') namedtransformstream[GeoData] addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] filter[isdefined(bbox(.GeoData))] validateAttr consume let Landuse = dbimport2('../bin/nrw/landuse.dbf') addcounter[No, 1] shpimport2('../bin/nrw/landuse.shp') namedtransformstream[GeoData] addcounter[No2, 1] mergejoin[No, No2] remove[No, No2] filter[isdefined(bbox(.GeoData))] validateAttr consume ---- This creates relations Roads, Waterways, etc. within a new database nrw. Recently, this database is fairly large, for example, has 5.8 mio buildings. It takes roughly one hour to run this script. Opening the database in the query optimizer collects information about the existing database objects which can be displayed with the predicate "showDatabase". ---- 2 ?- showDatabase. Collected information for database 'nrw': Relation Natural (Auxiliary objects:) AttributeName Type Memory DiskCore DiskLOB GeoData region 152.0 152.0 4500.864 Type string 64.0 22.0 0 Name string 64.0 54.0 0 Osm_id string 64.0 17.0 0 Indices: Ordering: [] Cardinality: 129317 Avg.TupleSize: 4745.864 = sizeTerm(480.0,245.0,4500.864) (Tuple size in memory is 136 + sum of attribute sizes.) Relation Places (Auxiliary objects:) AttributeName Type Memory DiskCore DiskLOB GeoData point 32.0 32.0 0 Population int 16.0 5.0 0 Type string 64.0 22.0 0 Name string 64.0 54.0 0 Osm_id string 64.0 17.0 0 Indices: Ordering: [] Cardinality: 15908 Avg.TupleSize: 130.0 = sizeTerm(376.0,130.0,0) (Tuple size in memory is 136 + sum of attribute sizes.) Relation Points (Auxiliary objects:) AttributeName Type Memory DiskCore DiskLOB GeoData point 32.0 32.0 0 Type string 64.0 22.0 0 Name string 64.0 54.0 0 Timestamp string 64.0 26.0 0 Osm_id string 64.0 17.0 0 Indices: Ordering: [] Cardinality: 490421 Avg.TupleSize: 151.0 = sizeTerm(424.0,151.0,0) (Tuple size in memory is 136 + sum of attribute sizes.) Relation Railways (Auxiliary objects:) AttributeName Type Memory DiskCore DiskLOB GeoData line 232.896 232.896 1189.248 Type string 64.0 22.0 0 Name string 64.0 54.0 0 Osm_id string 64.0 17.0 0 Indices: Ordering: [] Cardinality: 53082 Avg.TupleSize: 1515.144 = sizeTerm(560.896,325.89599609375,1189.24800390625) (Tuple size in memory is 136 + sum of attribute sizes.) Relation Roads (Auxiliary objects:) AttributeName Type Memory DiskCore DiskLOB GeoData line 221.76 221.76 948.864 Maxspeed int 16.0 5.0 0 Tunnel int 16.0 5.0 0 Bridge int 16.0 5.0 0 Oneway int 16.0 5.0 0 Type string 64.0 22.0 0 Ref string 64.0 22.0 0 Name string 64.0 54.0 0 Osm_id string 64.0 17.0 0 Indices: Ordering: [] Cardinality: 1505462 Avg.TupleSize: 1305.624 = sizeTerm(677.76,356.760009765625,948.863990234375) (Tuple size in memory is 136 + sum of attribute sizes.) Relation Waterways (Auxiliary objects:) AttributeName Type Memory DiskCore DiskLOB GeoData line 250.944 250.944 1855.87 Width int 16.0 5.0 0 Type string 64.0 22.0 0 Name string 64.0 54.0 0 Osm_id string 64.0 17.0 0 Indices: Ordering: [] Cardinality: 80904 Avg.TupleSize: 2204.816 = sizeTerm(594.944,348.9440002441406,1855.8719997558592) (Tuple size in memory is 136 + sum of attribute sizes.) Relation Buildings (Auxiliary objects:) AttributeName Type Memory DiskCore DiskLOB GeoData region 152.0 152.0 1264.896 Type string 64.0 22.0 0 Name string 64.0 54.0 0 Osm_id string 64.0 17.0 0 Indices: Ordering: [] Cardinality: 6516159 Avg.TupleSize: 1509.896 = sizeTerm(480.0,245.0,1264.896) (Tuple size in memory is 136 + sum of attribute sizes.) Relation Landuse (Auxiliary objects:) AttributeName Type Memory DiskCore DiskLOB GeoData region 152.0 152.0 3151.488 Type string 64.0 22.0 0 Name string 64.0 54.0 0 Osm_id string 64.0 17.0 0 Indices: Ordering: [] Cardinality: 359759 Avg.TupleSize: 3396.488 = sizeTerm(480.0,245.0,3151.488) (Tuple size in memory is 136 + sum of attribute sizes.) (Type 'showDatabaseSchema.' to view the complete database schema.) true. 3 ?- ---- 4 Setting Up the Distributed Infrastructure In the introduction it was mentioned that we wish to use the following resources: * Master (ralf1): 1 disk, one core, 6 GB * Workers: * ralf1: 3 disks, 6 cores, 1000 MB per core (about 6 GB) * ralf2: 4 disks, 8 cores, 3600 MB per core (about 28 GB) 4.1 Setting Up the Monitors Per worker disk we need to start one SecondoMonitor. We can start the monitors using a script ~remoteMonitors~ from the secondo/bin directory. It takes the following parameters (see also remoteMonitors.readme): ---- remoteMonitors ---- Here the ~description file~ contains entries describing the monitors to be started, one line per monitor. Such a line has the format ---- [] ---- Here is the IP address or the name of the node on which the monitor is to be started, and the second parameter is the SecondoConfig.ini file version to be used. The third parameter is needed to specify the name of the user in case it is different on the remote computer to be used. The ~action~ is one of \{start, check, or stop\} with the obvious meanings. Here we provide a description file ~ClusterRalf7~ with the following content: ---- 132.176.69.130 SecondoConfig.ini.130.1271 132.176.69.130 SecondoConfig.ini.130.1272 132.176.69.130 SecondoConfig.ini.130.1273 132.176.69.78 SecondoConfig.ini.78.1274 132.176.69.78 SecondoConfig.ini.78.1275 132.176.69.78 SecondoConfig.ini.78.1276 132.176.69.78 SecondoConfig.ini.78.1277 ---- The SecondoConfig.ini files need to lie in the secondo/bin directory of the respective computers (130 and 78). They are obtained from the standard SecondoConfig.ini by modifying 1 The database directory to be used. 2 The host IP address 3 The port 4 The global memory per server. We have: * 130: GlobalMemory=1000 * 78: GlobalMemory=3600 The secondo-databases directories on the respective disc locations need to be created before using the ~remoteMonitors~ script. 4.2 Starting and Stopping the Monitors We can now start the monitors: ---- remoteMonitors ClusterRalf7 start ---- As a result, we see: ---- ralf@ralf-ubuntu6:~/secondo/bin$ remoteMonitors ClusterRalf7 start Try to start monitor on server 132.176.69.130 Monitor is running now at port 1271 Try to start monitor on server 132.176.69.130 Monitor is running now at port 1272 Try to start monitor on server 132.176.69.130 Monitor is running now at port 1273 Try to start monitor on server 132.176.69.78 Monitor is running now at port 1274 Try to start monitor on server 132.176.69.78 Monitor is running now at port 1275 Try to start monitor on server 132.176.69.78 Monitor is running now at port 1276 Try to start monitor on server 132.176.69.78 Monitor is running now at port 1277 ralf@ralf-ubuntu6:~/secondo/bin$ remoteMonitors ClusterRalf7 check Try to check monitor on server 132.176.69.130 Monitor is running on port 1271 at 132.176.69.130 Try to check monitor on server 132.176.69.130 Monitor is running on port 1272 at 132.176.69.130 Try to check monitor on server 132.176.69.130 Monitor is running on port 1273 at 132.176.69.130 Try to check monitor on server 132.176.69.78 Monitor is running on port 1274 at 132.176.69.78 Try to check monitor on server 132.176.69.78 Monitor is running on port 1275 at 132.176.69.78 Try to check monitor on server 132.176.69.78 Monitor is running on port 1276 at 132.176.69.78 Try to check monitor on server 132.176.69.78 Monitor is running on port 1277 at 132.176.69.78 ralf@ralf-ubuntu6:~/secondo/bin$ ---- We can stop all monitors using the command ---- remoteMonitors ClusterRalf7 stop ---- 4.3 Setting Up, Starting and Stopping Workers Workers are defined in a relation of a database to be used, hence in nrw. We will first use one worker (server) per disk in a set of experiments. We later use one worker per core. ---- let Workers7 = [const rel(tuple([Server: string, Port: int])) value ( ("132.176.69.130" 1271) ("132.176.69.130" 1272) ("132.176.69.130" 1273) ("132.176.69.78" 1274) ("132.176.69.78" 1275) ("132.176.69.78" 1276) ("132.176.69.78" 1277) )] ---- We now start one server for each node: ---- query Workers7 feed extend[Started: connect(.Server, .Port, "SecondoConfig.ini")] consume ---- We can check the connections by ---- query checkConnections() consume ---- We can see that servers have received Ids 0, ..., 6 in the order of relation Workers7. To shutdown all workers we can use the command ---- query disconnect() ---- To make starting and stopping easier, we define functions: ---- let start7 = fun() Workers7 feed extend[Started: connect(.Server, .Port)] consume let check7 = fun() checkConnections() consume let stop7 = fun() disconnect() ---- 5 Distributed NRW with 7 Workers 5.1 Distributing by Standard Attribute We wish to distribute the Roads relation into a distributed array with 7 slots. We first need to create the array on the master: ---- let Roads0707_Osm_id = Workers7 feed extend[Config: "SecondoConfig.ini"] createDarray2[7, "Roads0707_Osm_id", Roads, Server, Port, Config] ---- The input stream defines the available workers and a configuration file which does not really matter. Parameters in square brackets are the number of slots in the array, the name to be used in naming components in the worker databases, a relation from which the tuple type is derived, and then server, port, and configuration file to be used for workers. The actual data distribution is done with a ddistribute4 operation: ---- query Roads feed ddistribute4[hashvalue(.Osm_id, 999997), Roads0707_Osm_id, "Roads0707_Osm_id"] 5:09min (309.308sec) ---- It distributes input stream tuples by the function provided as first argument in brackets into the array given as a second argument. The last argument serves to name components in worker databases. We count the tuples in each partition: ---- query Roads0707_Osm_id dloop2["", . count] getValue 0.89 seconds ---- Here the dloop2 operator builds a new distributed array of integers. By getValue we create a regular array on the master from it which can be displayed. We can also use the tie operation of the regular array to sum up these numbers: ---- query Roads0707_Osm_id dloop2["", . count] getValue tie[. + ..] 0.93 secs Result 1505462 query Roads count 1505462 ---- We can see that the total number of tuples distributed agrees with that of the original relation. A sequential query: ---- query Roads feed filter[.Type contains "primary"] count 15.6, 11.6, 15.4 seconds Result 28388 ---- The distributed version: ---- query Roads0707_Osm_id dloop2["", . feed filter[.Type contains "primary"] count] getValue tie[. + ..] 5.5, 5.0, 5.1 seconds Result 28388 ---- Sequential self join of Roads on Osm\_id: ---- query Roads feed {r1} Roads feed {r2} itHashJoin[Osm_id_r1, Osm_id_r2] count 27.58, 24.73, 24.79 seconds result 1505462 (correct) ---- The parallel version: ---- query Roads0707_Osm_id Roads0707_Osm_id dloop2a["", . feed {r1} .. feed {r2} itHashJoin[Osm_id_r1, Osm_id_r2] count] getValue tie[. + ..] 9.3, 9.15, 10.28 seconds ---- The speedup is only about 3 although it could be 7. Why? A query on a single partition on one worker takes 4.85, 4.74, 4.70 seconds. ---- query Roads0707_Osm_id Roads0707_Osm_id dloop2a["peter", . feed {r1} .. feed {r2} itHashJoin[Osm_id_r1, Osm_id_r2] count] getValue ---- 6.0, 6.0, 6.0 seconds 5.2 Using a Local Object on the Master in Queries on Workers We define a rectangle around the area of Dortmund as follows: ---- let Dortmund = [const rect value (7.34 7.70 51.42 51.61)] ---- We now formulate a distributed query involving this object: ---- query Roads0707_Osm_id dloop2["", . feed filter[bbox(.GeoData) intersects Dortmund] count] getValue tie[. + ..] ---- We get an error message because the object Dortmund mentioned in the query is known on the master but not on the workers. With the ~share~ operation we can move it into the databases of all workers connected: ---- query share("Dortmund", TRUE) ---- The second parameter determines whether any possibly existing objects in the worker databases should be overwitten. After sharing the object, the original query works fine: ---- query Roads0707_Osm_id dloop2["", . feed filter[bbox(.GeoData) intersects Dortmund] count] getValue tie[. + ..] 5.66, 5.62 seconds Result 67078 ---- We compare with the same query performed locally on the master: ---- query Roads feed feed filter[bbox(.GeoData) intersects Dortmund] count 12.66, 12.04 seconds Result 67068 ---- */