/* //paragraph [10] title: [{\Large \bf ] [}] //[star] [$*$] //[ue] [\"{u}] [10] Example for Using the Distributed Query Optimizer Ralf Hartmut G[ue]ting, 30.7.2020 This script allows one to create a small distributed example database based on database ~opt~ and use the distributed query optimizer with it. 1 Preliminaries 1.1 Preparing the Optimizer Within the file ~calloptimizer.pl~ in directory ~secondo/Optimizer~ the loaded modules must be configured as shown here (currently line 827 ff). The standard configuration has [optimizerNewProperties] and [distributed] commented out. They must be loaded and [optimizer] commented out, as shown. This is not necessary any more in recent versions of the optimizer (from 2021) as there is only one version of the optimizer now and [distributed] is loaded by default. ---- % The files for the standard optimization procedure will be % loaded by default! loadFiles(standard) :- ( not(loadedModule(standard)), % [optimizer], [optimizerNewProperties], % requires also distributed.pl [costs2014], [statistics], [database], [operators], [boundary], [searchtree], [relations], [testExamples], % [operatorSQL], % operatorSQL [distributed], % Section:Start:loadFiles_1_i % Section:End:loadFiles_1_i retractall(loadedModule(_)), assert(loadedModule(standard)) ) ; true. ---- 1.2 Preparing the Database 1 Database ~opt~ must be present. 2 Remote monitors have been started for a given ~Workers~ relation. The following steps must be done manually: ---- open database opt restore Workers from ... let myPort = ... ---- The ~Workers~ relation must fit the Cluster file for which monitors have been started. The variable ~myPort~ must be set to a port number exclusive to this user. Then run this script with the SecondoPLTTY interface from the Optimizer directory. This kind of script can be executed with ---- @%../bin/Scripts/DistOptExample.posec ---- This is explained in the Secondo User Manual, Section 5.16. */ distributedRelsAvailable /* Creates the three distributed relations * SEC2DISTRIBUTED * SEC2DISTINDEXES * SEC2WORKERS which serve to provide the optimizer with information about distributed relations and indexes as well as the available workers. 2 Describing Distributed Relations A distributed relation is described in the relation SEC2DISTRIBUTED with the following fields: * ~RelName~: the name of the (logical) distributed relation on the master * ~ArrayRef~: the name of the distributed array * ~DistType~: the type of the distributed array. Allowed values are ~dfarray~ (file based array) and ~darray~ (array stored in db). * ~NSlots~: the number of slots of the distributed array * ~PartType~: indicates the way how the relation is partitioned. Allowed values are * ~modulo~ (for d(f)distribute2), * ~random~ (d(f)distribute3), * ~function~ (d(f)distribute4), * ~share~ (replicated) * ~PartAttribute~: the attribute used to distribute the relation; for a random distribution the value is always [star]. * ~PartParam~: an additional parameter to describe a partitioning such as the spatial grid object used. Ignored if not needed. All attributes are of type ~string~ except ~NSlots~ which is of type ~int~. 3 Describing Distributed Indexes On each slot of a darray representing a distributed relation we can create an index. The optimizer currently supports B-tree and R-tree indexes. Such an index is decribed in the relation SEC2DISTINDEXES with fields: * ~DistObj~: the d[f]array object representing the distributed relation * ~Attr~: the indexed attribute * ~IndexType~: the type of the index (~btree~ or ~rtree~) * ~IndexObj~: the d[f]array reprsenting the distributed index All attributes are of type ~string~. 4 Create Distributed Relations Each distributed relation created must be described in SEC2DISTRIBUTED to be recognized by the optimizer. */ update SEC2WORKERS := Workers; let plzDOrte = plz feed ddistribute4["plzDOrte", hashvalue(.Ort, 999997), 40, Workers] insert into SEC2DISTRIBUTED values ["plz", "plzDOrte", "darray", 40, "function", "Ort", "*"] /* The query */ select count(*) from plz_d /* works already and executes the plan: ---- query plzDOrte dmap["", . feed count] getValue tie[(. + .. )] ---- */ let OrteDRandom = Orte feed ddistribute3["OrteDRandom", 40, TRUE, Workers] insert into SEC2DISTRIBUTED values ["Orte", "OrteDRandom", "darray", 40, "random", "*", "*"] /* 5 Some Example Queries Some example queries that can be run are the following: ---- select * from plz_d where Ort = "Hannover" select count(*) from [Orte_d as o, plz_d as p] where o.Ort = p.Ort select Ort from Orte_d select count(*) from plz_d select[Ort, min(plz) as Smallest, count(*) as Anzahl] from plz_d where Ort starts "Mann" groupby Ort. select [Ort, Kennzeichen, BevT, count(*) as AnzahlPLZ, min(p.PLZ) as MinPLZ, max(p.PLZ) as MaxPLZ, avg(p.PLZ) as DurchschnittsPLZ] from [Orte_d, plz_d as p] where Ort = p.Ort groupby [Ort, Kennzeichen, BevT] ---- 6 Creating an Index Each distributed index created needs to be described in SEC2DISTINDEXES to be recognized by the optimizer. */ let plzDOrte_Ort = plzDOrte dmap["plzDOrte_Ort", . createbtree[Ort]] insert into SEC2DISTINDEXES values ["plzDOrte", "Ort", "btree", "plzDOrte_Ort"] /* 7 Using an Index ---- select * from plz_d where Ort = "Unna" select * from plz_d where Ort starts "Hann" ---- */