/* //paragraph [10] title: [{\Large \bf ] [}] //characters [1] formula: [$] [$] //[ae] [\"{a}] //[oe] [\"{o}] //[ue] [\"{u}] //[ss] [{\ss}] //[Ae] [\"{A}] //[Oe] [\"{O}] //[Ue] [\"{U}] //[**] [$**$] //[toc] [\tableofcontents] //[=>] [\verb+=>+] //[:Section Translation] [\label{sec:translation}] //[Section Translation] [Section~\ref{sec:translation}] //[:Section 4.1.1] [\label{sec:4.1.1}] //[Section 4.1.1] [Section~\ref{sec:4.1.1}] //[Figure pog1] [Figure~\ref{fig:pog1.eps}] //[Figure pog2] [Figure~\ref{fig:pog2.eps}] //[newpage] [\newpage] [10] A Query Optimizer for Secondo Ralf Hartmut G[ue]ting, November - December 2002 [toc] [newpage] 1 Introduction 1.1 Overview This document not only describes, but ~is~ an optimizer for Secondo database systems. It contains the current source code for the optimizer, written in PROLOG. It can be compiled by a PROLOG system (SWI-Prolog 5.0 or higher) directly. The current version of the optimizer is capable of handling conjunctive queries, formulated in a relational environment. That is, it takes a set of relations together with a set of selection or join predicates over these relations and produces a query plan that can be executed by (the current relational system implemented in) Secondo. The selection of the query plan is based on cost estimates which in turn are based on given selectivities of predicates. Selectivities of predicates are maintained in a table (a set of PROLOG facts). If the selectivity of a predicate is not available from that table, then an interaction with the Secondo system should take place to determine the selectivity. There are various strategies conceivable for doing this which will be described elsewhere. However, the current version of the optimizer just emits a message that the selectivity is missing and quits. The optimizer also implements a simple SQL-like language for entering queries. The notation is pretty much like SQL except that the lists occurring (lists of attributes, relations, predicates) are written in PROLOG notation. Also note that the where-clause is a list of predicates rather than an arbitrary boolean expression and hence allows one to formulate conjunctive queries only. 1.2 Optimization Algorithm The optimizer employs an as far as we know novel optimization algorithm which is based on ~shortest path search in a predicated order graph~. This technique is remarkably simple to implement, yet efficient. A predicate order graph (POG) is the graph whose nodes represent sets of evaluated predicates and whose edges represent predicates, containing all possible orders of predicates. Such a graph for three predicates ~p~, ~q~, and ~r~ is shown in [Figure pog1]. Figure 1: A predicate order graph for three predicates ~p~, ~q~ and ~r~ [pog1.eps] Here the bottom node has no predicate evaluated and the top node has all predicates evaluated. The example illustrates, more precisely, possible sequences of selections on an argument relation of size 1000. If selectivities of predicates are given (for ~p~ its is 1/2, for ~q~ 1/10, and for ~r~ 1/5), then we can annotate the POG with sizes of intermediate results as shown, assuming that all predicates are independent (not ~correlated~). This means that the selectivity of a predicate is the same regardless of the order of evaluation, which of course does not need to be true. If we can further compute for each edge of the POG possible evaluation methods, adding a new ``executable'' edge for each method, and mark the edge with estimated costs for this method, then finding a shortest path through the POG corresponds to finding the cheapest query plan. [Figure pog2] shows an example of a POG annotated with evaluation methods. Figure 2: A POG annotated with evaluation methods [pog2.eps] In this example, there is only a single method associated with each edge. In general, however, there will be several methods. The example represents the query: ---- select * from Staedte, Laender, Regiert where Land = LName and PName = 'CDU' and LName = PLand ---- for relation schemas ---- Staedte(SName, Bev, Land) Laender(LName, LBev) Regiert(PName, PLand) ---- Hence the optimization algorithm described and implemented in the following sections proceeds in the following steps: 1 For given relations and predicates, construct the predicate order graph and store it as a set of facts in memory (Sections 2 through 4). 2 For each edge, construct corresponding executable edges (called ~plan edges~ below). This is controlled by optimization rules describing how selections or joins can be translated (Sections 5 and 6). 3 Based on sizes of arguments and selectivities (stored in the file ~database.pl~) compute the sizes of all intermediate results. Also annotate edges of the POG with selectivities (Section 7). 4 For each plan edge, compute its cost and store it in memory (as a set of facts). This is based on sizes of arguments and the selectivity associated with the edge and on a cost function (predicate) written for each operator that may occur in a query plan (Section 8). 5 The algorithm for finding shortest paths by Dijkstra is employed to find a shortest path through the graph of plan edges annotated with costs (called ~cost edges~). This path is transformed into a Secondo query plan and returned (Section 9). 6 Finally, a simple subset of SQL in a PROLOG notation is implemented. So it is possible to enter queries in this language. The optimizer determines from it the lists of relations and predicates in the form needed for constructing the POG, and then invokes step 1 (Section 11). 2 Data Structures In the construction of the predicate order graph, the following data structures are used. ---- pr(P, A) pr(P, B, C) ---- A selection or join predicate, e.g. pr(p, a), pr(q, b, c). Means a selection predicate p on relation a, and a join predicate q on relations b and c. ---- arp(Arg, Rels, Preds) ---- An argument, relations, predicate triple. It describes a set of relations ~Rels~ on which the predicates ~Preds~ have been evaluated. To access the result of this evaluation one needs to refer to ~Arg~. Arg is either arg(N) or res(N), N an integer. Examples: arg(5), res(1) Rels is a list of relation names, e.g. [a, b, c] Preds is a list of predicate names, e.g. [p, q, r] ---- node(No, Preds, Partition) ---- A node. ~No~ is the number of the node into which the evaluated predicates are encoded (each bit corresponds to a predicate number, e.g. node number 5 = 101 (binary) says that the first predicate (no 1) and the third predicate (no 4) have been evaluated in this node. For predicate i, its predicate number is "2^{i-1}"[1]. ~Preds~ is the list of names of evaluated predicates, e.g. [p, q]. ~Partition~ is a list of arp elements, see above. ---- edge(Source, Target, Term, Result, Node, PredNo) ---- An edge, representing a predicate. ~Source~ and ~Target~ are the numbers of source and target nodes in the predicate order graph, e.g. 0 and 1. ~Term~ is either a selection or a join, for example, select(arg(0), pr(p, a) or join(res(4), res(1), pr(q, a, b)) ~Result~ is the number of the node into which the result of this predicate application should be written. Normally it is the same as Target, but for an edge leading to a node combining several independent results, it the number of the ``real'' node to obtain this result. An example of this can be found in [Figure pog2] where the join edge leading from node 3 to node 7 does not use the result of node 3 (there is none) but rather the two independent results from nodes 1 and 2 (this pair is conceptually the result available in node 3). ~Node~ is the source node for this edge, in the form node(...) as described above. ~PredNo~ is the predicate number for the predicate represented by this edge. Predicate numbers are of the form "2^i" as explained for nodes. 3 Construction of the Predicate Order Graph 3.1 pog ---- pog(Rels, Preds, Nodes, Edges) :- ---- For a given list of relations ~Rels~ and predicates ~Preds~, ~Nodes~ and ~Edges~ are the predicate order graph where edges are annotated with selection and join operations applied to the correct arguments. Example call: ---- pog([staedte, laender], [pr(p, staedte), pr(q, laender), pr(r, staedte, laender)], N, E). ---- */ pog(Rels, Preds, Nodes, Edges) :- length(Rels, N), reverse(Rels, Rels2), deleteArguments, partition(Rels2, N, Partition0), length(Preds, M), reverse(Preds, Preds2), pog2(Partition0, M, Preds2, Nodes, Edges), deleteNodes, storeNodes(Nodes), deleteEdges, storeEdges(Edges), % RHG 2014 Create plan and cost edges during shortest path search. % deletePlanEdges, deleteVariables, % createPlanEdges, HighNode is 2**M -1, retract(highNode(_)), assert(highNode(HighNode)), deleteSizes. % deleteCostEdges. % end RHG 2014 /* 3.2 partition ---- partition(Rels, N, Partition0) :- ---- Given a list of ~N~ relations ~Rel~, return an initial partition such that each relation r is packed into the form arp(arg(i), [r], []). */ partition([], _, []). partition([Rel | Rels], N, [Arp | Arps]) :- N1 is N-1, Arp = arp(arg(N), [Rel], []), assert(argument(N, Rel)), partition(Rels, N1, Arps). /* 3.3 pog2 ---- pog2(Partition0, NoOfPreds, Preds, Nodes, Edges) :- ---- For the given start partition ~Partition0~, a list of predicates ~Preds~ containing ~NoOfPred~ predicates, return the ~Nodes~ and ~Edges~ of the predicate order graph. */ pog2(Part0, _, [], [node(0, [], Part0)], []). pog2(Part0, NoOfPreds, [Pred | Preds], Nodes, Edges) :- N1 is NoOfPreds-1, PredNo is 2**N1, pog2(Part0, N1, Preds, NodesOld, EdgesOld), newNodes(Pred, PredNo, NodesOld, NodesNew), newEdges(Pred, PredNo, NodesOld, EdgesNew), copyEdges(Pred, PredNo, EdgesOld, EdgesCopy), append(NodesOld, NodesNew, Nodes), append(EdgesOld, EdgesNew, Edges2), append(Edges2, EdgesCopy, Edges). /* 3.4 newNodes ---- newNodes(Pred, PredNo, NodesOld, NodesNew) :- ---- Given a predicate ~Pred~ with number ~PredNo~ and a list of nodes ~NodesOld~ resulting from evaluating all predicates with lower numbers, construct a list of nodes which result from applying to each of the existing nodes the predicate ~Pred~. */ newNodes(_, _, [], []). newNodes(Pred, PNo, [Node | Nodes], [NodeNew | NodesNew]) :- newNode(Pred, PNo, Node, NodeNew), newNodes(Pred, PNo, Nodes, NodesNew). newNode(Pred, PNo, node(No, Preds, Part), node(No2, [Pred | Preds], Part2)) :- No2 is No + PNo, copyPart(Pred, PNo, Part, Part2). /* 3.5 copyPart ---- copyPart(Pred, PNo, Part, Part2) :- ---- copy the partition ~Part~ of a node so that the new partition ~Part2~ after applying the predicate ~Pred~ with number ~PNo~ results. This means that for a selection predicate we have to find the arp containing its relation and modify it accordingly, the other arps in the partition are copied unchanged. For a join predicate we have to find the two arps containing its two relations and to merge them into a single arp; the remaining arps are copied unchanged. Or a join predicate may find its two relations in the same arp which means another join on the same two relations has already been performed. */ copyPart(_, _, [], []). copyPart(pr(P, Rel), PNo, Arps, [Arp2 | Arps2]) :- select(X, Arps, Arps2), X = arp(Arg, Rels, Preds), member(Rel, Rels), !, nodeNo(Arg, No), ResNo is No + PNo, Arp2 = arp(res(ResNo), Rels, [P | Preds]). copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :- select(X, Arps, Arps2), X = arp(Arg, Rels, Preds), member(R1, Rels), member(R2, Rels), !, nodeNo(Arg, No), ResNo is No + PNo, Arp2 = arp(res(ResNo), Rels, [P | Preds]). copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :- select(X, Arps, Rest), X = arp(ArgX, RelsX, PredsX), member(R1, RelsX), select(Y, Rest, Arps2), Y = arp(ArgY, RelsY, PredsY), member(R2, RelsY), !, nodeNo(ArgX, NoX), nodeNo(ArgY, NoY), ResNo is NoX + NoY + PNo, append(RelsX, RelsY, Rels), append(PredsX, PredsY, Preds), Arp2 = arp(res(ResNo), Rels, [P | Preds]). nodeNo(arg(_), 0). nodeNo(res(N), N). /* 3.6 newEdges ---- newEdges(Pred, PredNo, NodesOld, EdgesNew) :- ---- for each of the nodes in ~NodesOld~ return a new edge in ~EdgesNew~ built by applying the predicate ~Pred~ with number ~PNo~. */ newEdges(_, _, [], []). newEdges(Pred, PNo, [Node | Nodes], [Edge | Edges]) :- newEdge(Pred, PNo, Node, Edge), newEdges(Pred, PNo, Nodes, Edges). newEdge(pr(P, Rel), PNo, Node, Edge) :- findRel(Rel, Node, Source, Arg), Target is Source + PNo, nodeNo(Arg, ArgNo), Result is ArgNo + PNo, Edge = edge(Source, Target, select(Arg, pr(P, Rel)), Result, Node, PNo). newEdge(pr(P, R1, R2), PNo, Node, Edge) :- findRels(R1, R2, Node, Source, Arg), Target is Source + PNo, nodeNo(Arg, ArgNo), Result is ArgNo + PNo, Edge = edge(Source, Target, select(Arg, pr(P, R1, R2)), Result, Node, PNo). newEdge(pr(P, R1, R2), PNo, Node, Edge) :- findRels(R1, R2, Node, Source, Arg1, Arg2), Target is Source + PNo, nodeNo(Arg1, Arg1No), nodeNo(Arg2, Arg2No), Result is Arg1No + Arg2No + PNo, Edge = edge(Source, Target, join(Arg1, Arg2, pr(P, R1, R2)), Result, Node, PNo). /* 3.7 findRel ---- findRel(Rel, Node, Source, Arg):- ---- find the relation ~Rel~ within a node description ~Node~ and return the node number ~No~ and the description ~Arg~ of the argument (e.g. res(3)) found within the arp containing Rel. ---- findRels(Rel1, Rel2, Node, Source, Arg1, Arg2):- ---- similar for two relations. */ findRel(Rel, node(No, _, Arps), No, ArgX) :- select(X, Arps, _), X = arp(ArgX, RelsX, _), member(Rel, RelsX). findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX) :- select(X, Arps, _), X = arp(ArgX, RelsX, _), member(Rel1, RelsX), member(Rel2, RelsX). findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX, ArgY) :- select(X, Arps, Rest), X = arp(ArgX, RelsX, _), member(Rel1, RelsX), !, select(Y, Rest, _), Y = arp(ArgY, RelsY, _), member(Rel2, RelsY). /* 3.8 copyEdges ---- copyEdges(Pred, PredNo, EdgesOld, EdgesCopy):- ---- Given a set of edges ~EdgesOld~ and a predicate ~Pred~ with number ~PredNo~, return a copy of each edge in ~EdgesOld~ in ~EdgesNew~ such that the copied version reflects a previous application of predicate ~Pred~. This is implemented by retrieving from each old edge its start node, constructing for this start node and predicate ~Pred~ a target node to which then the predicate associated with the old edge is applied. */ copyEdges(_, _, [], []). copyEdges(Pred, PNo, [Edge | Edges], [Edge2 | Edges2]) :- Edge = edge(_, _, Term, _, Node, PNo2), pred(Term, Pred2), newNode(Pred, PNo, Node, NodeNew), newEdge(Pred2, PNo2, NodeNew, Edge2), copyEdges(Pred, PNo, Edges, Edges2). pred(select(_, P), P). pred(join(_, _, P), P). /* 3.9 writeEdgeList ---- writeEdgeList(List):- ---- Write the list of edges ~List~. */ writeEdgeList([edge(Source, Target, Term, _, _, _) | Edges]) :- write(Source), write('-'), write(Target), write(':'), write(Term), nl, writeEdgeList(Edges). /* 4 Managing the Graph in Memory 4.1 Storing and Deleting Nodes and Edges ---- storeNodes(NodeList). storeEdges(EdgeList). deleteNodes. deleteEdges. ---- Just as the names say. Store a list of nodes or edges, repectively, as facts; and delete them from memory again. */ storeNodes([Node | Nodes]) :- assert(Node), storeNodes(Nodes). storeNodes([]). storeEdges([Edge | Edges]) :- assert(Edge), storeEdges(Edges). storeEdges([]). deleteNode :- retract(node(_, _, _)), fail. deleteNodes :- not(deleteNode). deleteEdge :- retract(edge(_, _, _, _, _, _)), fail. deleteEdges :- not(deleteEdge). deleteArgument :- retract(argument(_, _)), fail. deleteArguments :- not(deleteArgument). /* 4.2 Writing Nodes and Edges ---- writeNodes. writeEdges. ---- Write the currently stored nodes and edges, respectively. */ writeNode :- node(No, Preds, Partition), write('Node: '), write(No), nl, write('Preds: '), write(Preds), nl, write('Partition: '), write(Partition), nl, nl, fail. writeNodes :- not(writeNode). writeEdge :- edge(Source, Target, Term, Result, _, _), write('Source: '), write(Source), nl, write('Target: '), write(Target), nl, write('Term: '), write(Term), nl, write('Result: '), write(Result), nl, nl, fail. writeEdges :- not(writeEdge). /* 5 Rule-Based Translation of Selections and Joins [:Section Translation] 5.1 Precise Notation for Input Since now we have to look into the structure of predicates, and need to be able to generate Secondo executable expressions in their precise format, we need to define the input notation precisely. 5.1.1 The Source Language [:Section 4.1.1] We assume the queries can be entered basically as select-from-where structures, as follows. Let schemas be given as: ---- plz(PLZ:string, Ort:string) Staedte(SName:string, Bev:int, PLZ:int, Vorwahl:string, Kennzeichen:string) ---- Then we should be able to enter queries: ---- select SName, Bev from Staedte where Bev > 500000 ---- In the next example we need to avoid the name conflict for PLZ ---- select * from Staedte as s, plz as p where s.SName = p.Ort and p.PLZ > 40000 ---- In the PROLOG version, we will use the following notations: ---- rel(Name, Var, Case) ---- For example ---- rel(staedte, *, u) ---- is a term denoting the ~Staedte~ relation; ~u~ says that it is actually to be written in upper case whereas ---- rel(plz, *, l) ---- denotes the ~plz~ relation to be written in lower case. The second argument ~Var~ contains an explicit variable if it has been assigned, otherwise the symbol [*]. If an explicit variable has been used in the query, we need to perfom renaming in the plan. For example, in the second query above, the relations would be denoted as ---- rel(staedte, s, u) rel(plz, p, l) ---- Within predicates, attributes are annotated as follows: ---- attr(Name, Arg, Case) attr(ort, 2, u) ---- This says that ~ort~ is an attribute of the second argument within a join condition, to be written in upper case. For a selection condition, the second argument is ignored; it can be set to 0 or 1. Hence for the two queries above, the translation would be ---- fromwhere( [rel(staedte, *, u)], [pr(attr(bev, 0, u) > 500000, rel(staedte, *, u))] ) fromwhere( [rel(staedte, s, u), rel(plz, p, l)], [pr(attr(s:sName, 1, u) = attr(p:ort, 2, u), rel(staedte, s, u), rel(plz, p, l)), pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l))] ) ---- Note that the upper or lower case distinction refers only to the first letter of a relation or attribute name. Other letters are written on the PROLOG side in the same way as in Secondo. Note further that if explicit variables are used, the attribute name will include them, e.g. s:sName. The projection occurring in the select-from-where statement is for the moment not passed to the optimizer; it is treated outside. So example 2 is rewritten as: */ example3 :- pog([rel(staedte, s, u), rel(plz, p, l)], [pr(attr(p:ort, 2, u) = attr(s:sName, 1, u), rel(staedte, s, u), rel(plz, p, l) ), pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l)), pr((attr(p:pLZ, 1, u) mod 5) = 0, rel(plz, p, l))], _, _). /* The two queries mentioned above are: */ example4 :- pog( [rel(staedte, *, u)], [pr(attr(bev, 1, u) > 500000, rel(staedte, *, u))], _, _). example5 :- pog( [rel(staedte, s, u), rel(plz, p, l)], [pr(attr(s:sName, 1, u) = attr(p:ort, 2, u), rel(staedte, s, u), rel(plz, p, l)), pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l))], _, _). /* 5.1.2 The Target Language In the target language, we use the following operators: ---- feed: rel(Tuple) -> stream(Tuple) consume: stream(Tuple) -> rel(Tuple) filter: stream(Tuple) x (Tuple -> bool) -> stream(Tuple) product: stream(Tuple1) x stream(Tuple2) -> stream(Tuple3) where Tuple3 = Tuple1 o Tuple2 hashjoin: stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2 x nbuckets -> stream(Tuple3) where Tuple3 = Tuple1 o Tuple2 attrname1 occurs in Tuple1 attrname2 occurs in Tuple2 nbuckets is the number of hash buckets to be used sortmergejoin: stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2 -> stream(Tuple3) where Tuple3 = Tuple1 o Tuple2 attrname1 occurs in Tuple1 attrname2 occurs in Tuple2 loopjoin: stream(Tuple1) x (Tuple1 -> stream(Tuple2) -> stream(Tuple3) where Tuple3 = Tuple1 o Tuple2 exactmatch: btree(Tuple, AttrType) x rel(Tuple) x AttrType -> stream(Tuple) extend: stream(Tuple1) x (Newname x (Tuple -> Attrtype))+ -> stream(Tuple2) where Tuple2 is Tuple1 to which pairs (Newname, Attrtype) have been appended remove: stream(Tuple1) x Attrname+ -> stream(Tuple2) where Tuple2 is Tuple1 from which the mentioned attributes have been removed. project: stream(Tuple1) x Attrname+ -> stream(Tuple2) where Tuple2 is Tuple1 projected on the mentioned attributes. rename stream(Tuple1) x NewName -> stream(Tuple2) where Tuple2 is Tuple1 modified by appending "_newname" to each attribute name count stream(Tuple) -> int count the number of tuples in a stream sortby stream(Tuple) x (Attrname, asc/desc)+ -> stream(Tuple) sort stream lexicographically by the given attribute names groupby stream(Tuple) x GroupAttrs x NewFields -> stream(Tuple2) group stream by the grouping attributes; for each group compute new fields each of which is specified in the form Attrname : Expr. The argument stream must already be sorted by the grouping attributes. dloop darray(X) x string x (X->Y) -> darray(Y) Performs a function on each element of a darray instance.The string argument specifies the name of the result. If the name is undefined or an empty string, a name is generated automatically. dloop2 darray(X) x darray(Y) x string x (fun : X x Y -> Z) -> darray(Z) Performs a function on the elements of two darray instances. The string argument specifies the name of the resulting darray. If the string is undefined or empty, a name is generated automatically. dmap d[f]array x string x fun -> d[f]array Performs a function on a distributed file array. If the string argument is empty or undefined, a name for the result is chosen automatically. If not, the string specifies the name. The result is of type dfarray if the function produces a tuple stream or a relationi; otherwise the result is a darray. dmap2 d[f]array x d[f]array x string x fun -> d[f]array Joins the slots of two distributed arrays. partition d[f]array(rel(tuple)) x string x (tuple->int) x int-> dfmatrix Redistributes the contents of a dfarray value. The new slot contents are kept on the worker where the values were stored before redistributing them. The last argument (int) determines the number of slots of the redistribution. If this value is smaller or equal to zero, the number of slots is overtaken from the array argument. partitionF d[f]array(rel(X)) x string x ([fs]rel(X)->stream(Y)) x (Y -> int) x int -> dfmatrix(rel(Y)) Repartitions a distributed [file] array. Before repartition, a function is applied to the slots. collect2 dfmatrix x string x int -> dfarray Collects the slots of a matrix into a dfarray. The string is the name of the resulting array, the int value specified a port for file transfer. The port value can be any port usable on all workers. A corresponding file transfer server is started automatically. areduce dfmatrix(rel(t)) x string x (fsrel(t)->Y) x int -> d[f]array(Y) Performs a function on the distributed slots of an array. The task distribution is dynamically, meaning that a fast worker will handle more slots than a slower one. The result type depends on the result of the function. For a relation or a tuple stream, a dfarray will be created. For other non- stream results, a darray is the resulting type. dsummarize darray(DATA) -> stream(DATA) , d[f]array(rel(X)) -> stream(X) Produces a stream of the darray elements. getValue {darray(T),dfarray(T)} -> array(T) Converts a distributed array into a normal one. tie ((array t) (map t t t)) -> t Calculates the "value" of an array evaluating the elements of the array with a given function from left to right. ---- In PROLOG, all expressions involving such operators are written in prefix notation. Parameter functions are written as ---- fun([param(Var1, Type1), ..., paran(VarN, TypeN)], Expr) ---- 5.1.3 Converting Plans to Atoms and Writing them. Predicate ~plan\_to\_atom~ converts a plan to a string atom, which represents the plan as a SECONDO query in text syntax. For attributes we have to distinguish whether a leading ``.'' needs to be written (if the attribute occurs within a parameter function) or whether just the attribute name is needed as in the arguments for hashjoin, for example. Predicate ~wp~ (``write plan'') uses predicate ~plan\_to\_atom~ to convert its argument to an atom and then writes that atom to standard output. */ upper(Lower, Upper) :- atom_codes(Lower, [First | Rest]), to_upper(First, First2), UpperList = [First2 | Rest], atom_codes(Upper, UpperList). wp(Plan) :- plan_to_atom_string(Plan, PlanAtom), write(PlanAtom). /* Function ~newVariable~ outputs a new unique variable name. The variable name is unique in the sense that ~newVariable~ never outputs the same name twice (in a PROLOG session). It should be emphasized that the output is not a PROLOG variable but a variable name to be used for defining abstractions in the Secondo system. */ :- dynamic(varDefined/1). newVariable(Var) :- varDefined(N), !, N1 is N + 1, retract(varDefined(N)), assert(varDefined(N1)), atom_concat('var', N1, Var). newVariable(Var) :- assert(varDefined(1)), Var = 'var1'. deleteVariable :- retract(varDefined(_)), fail. deleteVariables :- not(deleteVariable). /* Arguments: */ %fapra 2015/16 /* To consider distributed queries with predicates containing non-relation objects, it's necessary to replicate the objects to the involved workers. For now we assume that every found object is contained in the distributed part of the query (function of dmap or dmap2). A possible later extension is to examine the distributed relations and to share the objects only to workers containing parts of those relations. */ :- dynamic(replicatedObject/1). %distributed query without objects replicateObjects(QueryPart, QueryPart) :- findall(X,replicatedObject(X), ObjectList), length(ObjectList,0),!. %distributed query using objects in predicate replicateObjects(QueryPart, Result) :- findall(X,replicatedObject(X), ObjectList), length(ObjectList,Length), Length >0, maplist(createSharedClause,ObjectList,CommandList), append(CommandList,[QueryPart], Result). createSharedClause(Obj, SharedCommand) :- atom_concat('share("',Obj,StrObj), atom_concat(StrObj,'",TRUE)',SharedCommand). plan_to_atom_string(X, Result) :- isDistributedQuery, retractall(replicatedObject(_)), plan_to_atom(X,QueryPart), replicateObjects(QueryPart, Result), !. plan_to_atom_string(X, Result) :- not(isDistributedQuery), plan_to_atom(X,Result), !. plan_to_atom(obj(Object,_,u), Result) :- isDistributedQuery, upper(Object, UpperObject), atom_concat(UpperObject, ' ', Result), assertOnce(replicatedObject(UpperObject)), !. plan_to_atom(obj(Object,_,l), Result) :- isDistributedQuery, atom_concat(Object, ' ', Result), assertOnce(replicatedObject(Object)), !. plan_to_atom(obj(Object,_,u), Result) :- upper(Object, UpperObject), atom_concat(UpperObject, ' ', Result), !. plan_to_atom(obj(Object,_,l), Result) :- atom_concat(Object, ' ', Result), !. plan_to_atom(dot, Result) :- atom_concat('.', ' ', Result), !. %end fapra 2015/16 plan_to_atom(rel(Name, _, l), Result) :- atom_concat(Name, ' ', Result), !. plan_to_atom(rel(Name, _, u), Result) :- upper(Name, Name2), atom_concat(Name2, ' ', Result), !. plan_to_atom(res(N), Result) :- atom_concat('res(', N, Res1), atom_concat(Res1, ') ', Result), !. plan_to_atom(Term, Result) :- is_list(Term), Term = [First | _], atomic(First), !, atom_codes(TermRes, Term), normalize_space(atom(Out),TermRes), concat_atom(['"', Out, '"'], '', Result). /* Lists: */ plan_to_atom([X], AtomX) :- plan_to_atom(X, AtomX), !. plan_to_atom([X | Xs], Result) :- plan_to_atom(X, XAtom), plan_to_atom(Xs, XsAtom), concat_atom([XAtom, ', ', XsAtom], '', Result), !. /* Operators: only special syntax. General rules for standard syntax see below. */ plan_to_atom(sample(Rel, S, T), Result) :- plan_to_atom(Rel, ResRel), concat_atom([ResRel, 'sample[', S, ', ', T, '] '], '', Result), !. plan_to_atom(hashjoin(X, Y, A, B, C), Result) :- plan_to_atom(X, XAtom), plan_to_atom(Y, YAtom), plan_to_atom(A, AAtom), plan_to_atom(B, BAtom), concat_atom([XAtom, YAtom, 'hashjoin[', AAtom, ', ', BAtom, ', ', C, '] '], '', Result), !. plan_to_atom(sortmergejoin(X, Y, A, B), Result) :- plan_to_atom(X, XAtom), plan_to_atom(Y, YAtom), plan_to_atom(A, AAtom), plan_to_atom(B, BAtom), concat_atom([XAtom, YAtom, 'sortmergejoin[', AAtom, ', ', BAtom, '] '], '', Result), !. plan_to_atom(mergejoin(X, Y, A, B), Result) :- plan_to_atom(X, XAtom), plan_to_atom(Y, YAtom), plan_to_atom(A, AAtom), plan_to_atom(B, BAtom), concat_atom([XAtom, YAtom, 'mergejoin[', AAtom, ', ', BAtom, '] '], '', Result), !. plan_to_atom(groupby(Stream, GroupAttrs, Fields), Result) :- plan_to_atom(Stream, SAtom), plan_to_atom(GroupAttrs, GAtom), plan_to_atom(Fields, FAtom), concat_atom([SAtom, 'groupby[', GAtom, '; ', FAtom, ']'], '', Result), !. plan_to_atom(field(NewAttr, Expr), Result) :- plan_to_atom(attrname(NewAttr), NAtom), plan_to_atom(Expr, EAtom), concat_atom([NAtom, ': ', EAtom], '', Result). plan_to_atom(exactmatchfun(IndexName, Rel, attr(Name, R, Case)), Result) :- plan_to_atom(Rel, RelAtom), plan_to_atom(a(Name, R, Case), AttrAtom), newVariable(T), concat_atom(['fun(', T, ' : TUPLE) ', IndexName, ' ', RelAtom, 'exactmatch[attr(', T, ', ', AttrAtom, ')] '], Result), !. plan_to_atom(newattr(Attr, Expr), Result) :- plan_to_atom(Attr, AttrAtom), plan_to_atom(Expr, ExprAtom), concat_atom([AttrAtom, ': ', ExprAtom], '', Result), !. plan_to_atom(rename(X, Y), Result) :- plan_to_atom(X, XAtom), concat_atom([XAtom, '{', Y, '} '], '', Result), !. plan_to_atom(fun(Params, Expr), Result) :- params_to_atom(Params, ParamAtom), plan_to_atom(Expr, ExprAtom), concat_atom(['fun ', ParamAtom, ExprAtom], '', Result), !. plan_to_atom(attribute(X, Y), Result) :- plan_to_atom(X, XAtom), plan_to_atom(Y, YAtom), concat_atom(['attr(', XAtom, ', ', YAtom, ')'], '', Result), !. plan_to_atom(increment(X), Result) :- plan_to_atom(X, XAtom), concat_atom([XAtom, '++'], '', Result), !. %fapra 2015/16 plan_to_atom(dloop2(PreArg1, PreArg2, PostArg1, PostArg2), Result) :- plan_to_atom(PreArg1, PreArg1Atom), plan_to_atom(PreArg2, PreArg2Atom), plan_to_atom(PostArg1, PostArg1Atom), plan_to_atom(PostArg2, PostArg2Atom), concat_atom( [PreArg1Atom, PreArg2Atom, 'dloop2[', PostArg1Atom, ', ', PostArg2Atom, ']'], '', Result), !. %end fapra 2015/16 /* Sort orders and attribute names. */ plan_to_atom(asc(Attr), Result) :- plan_to_atom(Attr, AttrAtom), atom_concat(AttrAtom, ' asc', Result). plan_to_atom(desc(Attr), Result) :- plan_to_atom(Attr, AttrAtom), atom_concat(AttrAtom, ' desc', Result). plan_to_atom(attr(Name, Arg, Case), Result) :- plan_to_atom(a(Name, Arg, Case), ResA), atom_concat('.', ResA, Result). plan_to_atom(attrname(attr(Name, Arg, Case)), Result) :- plan_to_atom(a(Name, Arg, Case), Result). plan_to_atom(a(A:B, _, _), Result) :- upper(B, B2), concat_atom([B2, '_', A], Result), !. plan_to_atom(a(X, _, _), X2) :- upper(X, X2), !. %fapra 2015/16 plan_to_atom(our_attrname(attr(Name, Arg, Case)), Result) :- plan_to_atom(our_a(Name, Arg, Case), Result). plan_to_atom(our_a(_:B, _, _), Result) :- upper(B, B2), concat_atom(['..', B2], Result), !. plan_to_atom(our_a(X, _, _), Result) :- upper(X, X2), concat_atom(['..', X2], Result), !. plan_to_atom(simple_attrname(attr(Name, Arg, Case)), Result) :- plan_to_atom(simple_a(Name, Arg, Case), Result), !. plan_to_atom(simple_a(_:B, _, _), B2) :- upper(B, B2), !. plan_to_atom(simple_a(X, _, _), X2) :- upper(X, X2), !. plan_to_atom(extendstream(A, B, C), Plan) :- plan_to_atom(A, PlanA), plan_to_atom(B, PlanB), plan_to_atom(C, PlanC), concat_atom([PlanA, ' ', 'extendstream(', PlanB, ': ', PlanC, ')'], Plan). %end fapra 2015/16 /* Translation of operators driven by predicate ~secondoOp~ in file ~opSyntax~. There are rules for * postfix, 1 or 2 arguments * postfix followed by one argument in square brackets, in total 2 or 3 arguments * prefix, 2 arguments Other syntax, if not default (see below) needs to be coded explicitly. */ plan_to_atom(Term, Result) :- functor(Term, Op, 1), secondoOp(Op, postfix, 1), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), concat_atom([Res1, ' ', Op, ' '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 2), secondoOp(Op, postfix, 2), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), concat_atom([Res1, ' ', Res2, ' ', Op, ' '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 2), secondoOp(Op, postfixbrackets, 2), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), concat_atom([Res1, ' ', Op, '[', Res2, '] '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 3), secondoOp(Op, postfixbrackets, 3), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), arg(3, Term, Arg3), plan_to_atom(Arg3, Res3), concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, '] '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 2), secondoOp(Op, prefix, 2), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), concat_atom([Op, '(', Res1, ',', Res2, ') '], '', Result), !. %fapra 2015/16 /* Additional plan\_to\_atom rules to map Distributed2-operators. */ plan_to_atom(Term, Result) :- functor(Term, Op, 1), secondoOp(Op, prefix, 1), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), concat_atom([Op, '(', Res1, ') '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 4), secondoOp(Op, prefix, 4), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), arg(3, Term, Arg3), plan_to_atom(Arg3, Res3), arg(4, Term, Arg4), plan_to_atom(Arg4, Res4), concat_atom([Op, '(', Res1, ',', Res2, ', ', Res3, ', ', Res4, ') '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 4), secondoOp(Op, postfixbrackets, 4), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), arg(3, Term, Arg3), plan_to_atom(Arg3, Res3), arg(4, Term, Arg4), plan_to_atom(Arg4, Res4), concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, ', ', Res4, ']'], '' , Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 3), secondoOp(Op, postfixbrackets2, 3), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), arg(3, Term, Arg3), plan_to_atom(Arg3, Res3), concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, '] '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 4), secondoOp(Op, postfixbrackets3, 4), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), arg(3, Term, Arg3), plan_to_atom(Arg3, Res3), arg(4, Term, Arg4), plan_to_atom(Arg4, Res4), concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3,', ', Res4, '] '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 5), secondoOp(Op, postfixbrackets3, 5), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), arg(3, Term, Arg3), plan_to_atom(Arg3, Res3), arg(4, Term, Arg4), plan_to_atom(Arg4, Res4), arg(5, Term, Arg5), plan_to_atom(Arg5, Res5), concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, ', ', Res4,', ',Res5, '] '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 5), secondoOp(Op, postfixbrackets4, 5), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), arg(3, Term, Arg3), plan_to_atom(Arg3, Res3), arg(4, Term, Arg4), plan_to_atom(Arg4, Res4), arg(5, Term, Arg5), plan_to_atom(Arg5, Res5), concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, ', ', Res4,', ',Res5, '] '], '', Result), !. plan_to_atom(Term, Result) :- functor(Term, Op, 6), secondoOp(Op, postfixbrackets5, 6), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), arg(2, Term, Arg2), plan_to_atom(Arg2, Res2), arg(3, Term, Arg3), plan_to_atom(Arg3, Res3), arg(4, Term, Arg4), plan_to_atom(Arg4, Res4), arg(5, Term, Arg5), plan_to_atom(Arg5, Res5), arg(6, Term, Arg6), plan_to_atom(Arg6, Res6), concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, ', ', Res4,', ', Res5,', ',Res6, '] '], '', Result), !. %end fapra 2015/16 /* Generic rules. Operators that are not recognized are assumed to be: * 1 argument: prefix * 2 arguments: infix * 3 arguments: prefix */ plan_to_atom(Term, Result) :- functor(Term, Op, 1), arg(1, Term, Arg1), plan_to_atom(Arg1, Res1), concat_atom([Op, '(', Res1, ')'], '', Result). plan_to_atom(Term, Result) :- functor(Term, Op, 2), arg(1, Term, Arg1), arg(2, Term, Arg2), plan_to_atom(Arg1, Res1), plan_to_atom(Arg2, Res2), concat_atom(['(', Res1, ' ', Op, ' ', Res2, ')'], '', Result). plan_to_atom(Term, Result) :- functor(Term, Op, 3), arg(1, Term, Arg1), arg(2, Term, Arg2), arg(3, Term, Arg3), plan_to_atom(Arg1, Res1), plan_to_atom(Arg2, Res2), plan_to_atom(Arg3, Res3), concat_atom([Op, '(', Res1, ', ', Res2, ', ', Res3, ')'], '', Result). plan_to_atom(X, Result) :- atomic(X), term_to_atom(X, Result), !. plan_to_atom(X, _) :- write('Error while converting term: '), write(X), nl. params_to_atom([], ' '). params_to_atom([param(Var, Type) | Params], Result) :- type_to_atom(Type, TypeAtom), params_to_atom(Params, ParamsAtom), concat_atom(['(', Var, ': ', TypeAtom, ') ', ParamsAtom], '', Result), !. type_to_atom(tuple, 'TUPLE'). type_to_atom(tuple2, 'TUPLE2'). type_to_atom(group, 'GROUP'). /* 5.2 Optimization Rules We introduce a predicate [=>] which can be read as ``translates into''. 5.2.1 Translation of the Arguments of an Edge of the POG If the argument is of the form res(N), then it is a stream already and can be used unchanged. If it is of the form arg(N), then it is a base relation; a ~feed~ must be applied and possibly a ~rename~. */ ordered(plz, ort). ordered(orte, ort). ordered(staedte, sName). ordered(thousand, no). ordered(ten, no). order(Name, Attr) :- ordered(Name, Attr), !. order(_, none). % The following rule is needed for listing all plan edges or cost edges, % not for optimization as such. res(N) => [res(N), none]. % arg(N) => feed(rel(Name, *, Case)) :- % argument(N, rel(Name, *, Case)), !. % arg(N) => rename(feed(rel(Name, Var, Case)), Var) :- % argument(N, rel(Name, Var, Case)). [res(N), P] => [res(N), P]. % Translate into distributed argument arg(N) => [Plan, Properties] :- isDistributedQuery, !, distributedarg(N) => [Plan, Properties]. /* Treat transaltion into distributed arguments. The properties we use are... ~distribution~(DistributionType, DistributionAttribute, DistirbutionParameter): DistributionType is share, spatial, modulo, function or random, DistributionAttribute is the attribute of the relation used to determine on which partition(s) to put a given tuple (in theory this could also be a list), DistributionParamter is the parameter used for the distribution (like grid or funciton object / operator). ~distributedobjecttype~(Type) (Type is darray, dfarray or dfmatrix). ~disjointpartitioning~ signals that, if we treat a partition as the multi set of the tuples it contains, the union of all partitions is the original relation (put differently, in as far as duplicates exist, they have been present in the original relation). Since some second plans eliminate duplicates anyways, they can do without their arguments having this property (e.g. spatial join). */ % Translate into object found in SEC2DISTRIBUTED. distributedarg(N) => [ObjName, X] :- X =[distribution(DistType, DCDistAttr, DistParam), distributedobjecttype(DistObjType),disjointpartitioning], argument(N, Rel), Rel = rel(Name, _, _), distributedRels(rel(Name, _, _), ObjName, DistObjType, DistType, DistAttr, DistParam), not(DistType = spatial), downcase_atom(DistAttr, DCDistAttr). % Spatial partitioning with filtering on original attribute % does not in general yield disjoint partitions distributedarg(N) => [ObjName, [distribution(DistType, DCDistAttr, DistParam), distributedobjecttype(DistObjType)]] :- argument(N, Rel), Rel = rel(Name, _, _), distributedRels(rel(Name, _, _), ObjName, DistObjType, DistType, DistAttr, DistParam), DistType = spatial, downcase_atom(DistAttr, DCDistAttr). % Filter spatially distributed argument on attribute original. distributedarg(N) => [Plan, [distribution(spatial, DCDistAttr, DistParam), distributedobjecttype(DistObjType), disjointpartitioning]] :- argument(N, Rel), Rel = rel(Name, _, _), distributedRels(rel(Name, _, _), ObjName, DistObjType, spatial, DistAttr, DistParam), downcase_atom(DistAttr, DCDistAttr), Plan = dmap(ObjName, " ", filter(feed(rel(., *, u)), attr(original, l, u))). /* Redistributed argument relation to be spatially distributed using the provided attribute. The distribution type must be spatial and the attribute must be provided as a ground term. The grid may be provided to be used for the distribution. If it is not provided we fall back to using the grid object called grid. You need to have this in your database. Yields a dfarray or a dfmatrix. */ distributedarg(N) => [Plan, [distribution(DistType,DistAttr,Grid), distributedobjecttype(DistObjType)]] :- % only use this in one direction. Might be generalized in the future. ground(DistAttr), ground(DistType), % if we do not have a grid specified, use the grid-object (ground(Grid) -> true; Grid = grid), DistType = spatial, argument(N, Rel), Rel = rel(Name, _, _), distributedRels(rel(Name, _, _), ObjName, _, OriginalDistType, _, _), % cannot redistribute replicated relations not(OriginalDistType = share), spelled(Name:DistAttr, AttrTerm), InnerPlan = partitionF(ObjName, " ", extendstream(feed(rel('.', *, u)), attrname(attr(cell, *, u)), cellnumber(bbox(AttrTerm), Grid)), attr('.Cell', *, u), 0), %there should be another option to add the 2nd dot % collect into dfarray or simply be content with the dfmatrix (DistObjType = dfarray, Plan = collect2(InnerPlan, " ", 1238); DistObjType = dfmatrix, Plan = InnerPlan). arg(N) => [feed(rel(Name, *, Case)), [order(X)]] :- argument(N, rel(Name, *, Case)), !, order(Name, X). arg(N) => [rename(feed(rel(Name, Var, Case)), Var), [order(Var:X)]] :- argument(N, rel(Name, Var, Case)), !, order(Name, X). /* 5.2.2 Translation of Selections */ %fapra 2015/16 % Translate selection into distributed selection. select(Arg, Y) => X :- isDistributedQuery, !, /* Operand is distributed. Do not translate into local selection. */ distributedselect(Arg, Y) => X. %end fapra 2015/16 % select(Arg, pr(Pred, _)) => filter(ArgS, Pred) :- % Arg => ArgS. % select(Arg, pr(Pred, _, _)) => filter(ArgS, Pred) :- % Arg => ArgS. select(Arg, pr(Pred, _)) => [filter(ArgS, Pred), P] :- Arg => [ArgS, P]. select(Arg, pr(Pred, _, _)) => [filter(ArgS, Pred), P] :- Arg => [ArgS, P]. /* Translation of selections using indices. */ select(arg(N), Y) => [X, P] :- indexselect(arg(N), Y) => [X, P], !. select(arg(N), Y) => [X, [none]] :- indexselect(arg(N), Y) => X. indexselect(arg(N), pr(attr(AttrName, Arg, Case) = Y, Rel)) => X :- indexselect(arg(N), pr(Y = attr(AttrName, Arg, Case), Rel)) => X. indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) => [exactmatch(IndexName, rel(Name, *, Case), Y), [order(AttrName)]] :- argument(N, rel(Name, *, Case)), !, hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName). indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) => [rename(exactmatch(IndexName, rel(Name, Var, Case), Y), Var), [order(AttrName)]] :- argument(N, rel(Name, Var, Case)), !, hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName). indexselect(arg(N), pr(attr(AttrName, Arg, Case) <= Y, Rel)) => X :- indexselect(arg(N), pr(Y >= attr(AttrName, Arg, Case), Rel)) => X. indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) => [leftrange(IndexName, rel(Name, *, Case), Y), [order(AttrName)]] :- argument(N, rel(Name, *, Case)), !, hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName). indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) => [rename(leftrange(IndexName, rel(Name, Var, Case), Y), Var), [order(AttrName)]] :- argument(N, rel(Name, Var, Case)), !, hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName). indexselect(arg(N), pr(attr(AttrName, Arg, Case) >= Y, Rel)) => X :- indexselect(arg(N), pr(Y <= attr(AttrName, Arg, Case), Rel)) => X. indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) => [rightrange(IndexName, rel(Name, *, Case), Y), [order(AttrName)]] :- argument(N, rel(Name, *, Case)), !, hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName). indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) => [rename(rightrange(IndexName, rel(Name, Var, Case), Y), Var), [order(AttrName)]] :- argument(N, rel(Name, Var, Case)), !, hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName). %fapra 2015/16 /* Translation of selections that concern distributed relations. */ % Commutativity of intersects. distributedselect(ObjName, pr(Val intersects attr(Attr, Arg, Case), Rel)) => X :- distributedselect(ObjName, pr(attr(Attr, Arg, Case) intersects Val, Rel)) => X. % Use spatial index for an intersection predicate. distributedselect(arg(N), Pred) => [dmap2(IndexObj, RelObj, " ", filter(filter(Intersection, InnerPred), attr(original, l, u)), 1238), [distributedobjecttype(dfarray), disjointpartitioning]] :- argument(N, Rel), Pred = pr(Attr intersects Val, rel(_, Var, _)), Pred = pr(InnerPred, _), % We need a materialized argument relation to use the index distributedRels(Rel, RelObj, _, _, _), RelObj = rel(RelObjName, _, _), % Lookup an rtree index for the relation + attribute downcase_atom(RelObjName, DCRelObjName), attrnameDCAtom(Attr, DCAttr), distributedIndex(DCRelObjName, DCAttr, rtree, DCIndexObjName), % Check the database object for the correct spelling spelledObj(DCIndexObjName, IndexObjName,_, Case), IndexObj = rel(IndexObjName, *, Case), IndParam = rel('.', *, u), RelParam = rel('..', *, u), renameStream(windowintersects(IndParam, RelParam, Val), Var, Intersection). % Use btree index for a starts predicate. distributedselect(arg(N), pr(Attr starts Val, rel(_, Var, _))) => [dmap2(IndexObj, RelObj, " ", Range, 1238), [distributedobjecttype(dfarray), disjointpartitioning]] :- argument(N, Rel), distributedRels(Rel, RelObj, _, _, _), RelObj = rel(RelObjName, _, _), downcase_atom(RelObjName, DCRelObjName), attrnameDCAtom(Attr, DCAttr), % Lookup a btree index for the relation + attribute distributedIndex(DCRelObjName, DCAttr, btree, DCIndexObjName), spelledObj(DCIndexObjName, IndexObjName,_, Case), IndexObj = rel(IndexObjName, *, Case), IndParam = rel('.', *, u), RelParam = rel('..', *, u), renameStream(range(IndParam, RelParam, Val, increment(Val)), Var, Range). % Generic case. distributedselect(Arg, pr(Cond, rel(_,Var,_))) => [dmap(ArgS," ", filter(Param,Cond)), P] :- Arg => [ArgS, P], % we accept darrays and dfarrays (member(distributedobjecttype(dfarray), P) ; member(distributedobjecttype(darray), P)), % partitions of the argument relations need to disjoint member(disjointpartitioning, P), % rename if needed feedRenameRelation(rel('.',*, u), Var, Param). %end fapra 2015/16 /* Here ~ArgS~ is meant to indicate ``argument stream''. 5.2.3 Translation of Joins A join can always be translated to filtering the Cartesian product. */ %fapra 2015/16 % we have to variants of joins in place, see if the first one can % handle. If yes, cut and use its result. join(Arg1, Arg2, Pred) => SecondoPlan:- isDistributedQuery, distributedjoin(Arg1, Arg2, Pred) => _, !, distributedjoin(Arg1, Arg2, Pred) => SecondoPlan. join(Arg1, Arg2, Pred) => SecondoPlan:- isDistributedQuery, !, Arg1 = arg(N1), Arg2 = arg(N2), not(N1=N2), Arg1 => [ObjName1, _], Arg2 => [ObjName2, _], distributedRels(_, ObjName1, _, _, _), distributedRels(_, ObjName2, _, _, _), distributedjoin(ObjName1, ObjName2, Pred) => SecondoPlan. %end fapra 2015/16 join(Arg1, Arg2, pr(Pred, _, _)) => [filter(product(Arg1S, Arg2S), Pred), P1] :- Arg1 => [Arg1S, P1], Arg2 => [Arg2S, _]. /* Index joins: */ join(Arg1, arg(N), pr(X=Y, _, _)) => [loopjoin(Arg1S, MatchExpr), P1] :- isOfSecond(Attr2, X, Y), isNotOfSecond(Expr1, X, Y), argument(N, RelDescription), hasIndex(RelDescription, Attr2, IndexName), Arg1 => [Arg1S, P1], exactmatch(IndexName, arg(N), Expr1) => MatchExpr. join(arg(N), Arg2, pr(X=Y, _, _)) => [loopjoin(Arg2S, MatchExpr), P2] :- isOfFirst(Attr1, X, Y), isNotOfFirst(Expr2, X, Y), argument(N, RelDescription), hasIndex(RelDescription, Attr1, IndexName), Arg2 => [Arg2S, P2], exactmatch(IndexName, arg(N), Expr2) => MatchExpr. exactmatch(IndexName, arg(N), Expr) => exactmatch(IndexName, rel(Name, *, Case), Expr) :- argument(N, rel(Name, *, Case)), !. exactmatch(IndexName, arg(N), Expr) => rename(exactmatch(IndexName, rel(Name, Var, Case), Expr), Var) :- argument(N, rel(Name, Var, Case)), !. /* For a join with a predicate of the form X = Y we can distinguish four cases depending on whether X and Y are attributes or more complex expressions. For example, a query condition might be ``PLZA = PLZB'' in which case we have just attribute names on both sides of the predicate operator, or it could be ``PLZA = PLZB + 1''. In the latter case we have an expression on the right hand side. This can still be translated to a hashjoin, for example, by first extending the second argument by a new attribute containing the value of the expression. For example, the query ---- select * from plz as p1, plz as p2 where p1.PLZ = p2.PLZ + 1 ---- can be translated to ---- plz feed {p1} plz feed {p2} extend[newPLZ: PLZ_p2 + 1] hashjoin[PLZ_p1, newPLZ, 997] remove[newPLZ] consume ---- This technique is built into the optimizer as follows. We first define the four cases (at the moment for equijoin only; this may later be extended) which also translate the arguments into streams. Then the rules translating to join methods can be formulated independently from this general technique. They translate terms of the form join00(Arg1Stream, Arg2Stream, Pred). */ join(Arg1, Arg2, pr(X=Y, R1, R2)) => [JoinPlan, P] :- X = attr(_, _, _), Y = attr(_, _, _), !, Arg1 => [Arg1S, P1], Arg2 => [Arg2S, P2], join00([Arg1S, P1], [Arg2S, P2], pr(X=Y, R1, R2)) => [JoinPlan, P]. join(Arg1, Arg2, pr(X=Y, R1, R2)) => [remove(JoinPlan, [attrname(attr(r_expr, 2, l))]), P] :- X = attr(_, _, _), not(Y = attr(_, _, _)), !, Arg1 => [Arg1S, P1], Arg2 => [Arg2S, _], Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]), join00([Arg1S, P1], [Arg2Extend, none], pr(X=attr(r_expr, 2, l), R1, R2)) => [JoinPlan, P]. join(Arg1, Arg2, pr(X=Y, R1, R2)) => [remove(JoinPlan, [attrname(attr(l_expr, 2, l))]), P] :- not(X = attr(_, _, _)), Y = attr(_, _, _), !, Arg1 => [Arg1S, _], Arg2 => [Arg2S, P2], Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]), join00([Arg1Extend, none], [Arg2S, P2], pr(attr(l_expr, 1, l)=Y, R1, R2)) => [JoinPlan, P]. join(Arg1, Arg2, pr(X=Y, R1, R2)) => [remove(JoinPlan, [attrname(attr(l_expr, 1, l)), attrname(attr(r_expr, 2, l))]), P] :- not(X = attr(_, _, _)), not(Y = attr(_, _, _)), !, Arg1 => [Arg1S, _], Arg2 => [Arg2S, _], Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]), Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]), join00([Arg1Extend, none], [Arg2Extend, none], pr(attr(l_expr, 1, l)=attr(r_expr, 2, l), R1, R2)) => [JoinPlan, P]. join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [sortmergejoin(Arg1S, Arg2S, attrname(Attr1), attrname(Attr2)), [order(Name1), order(Name2)] ] :- isOfFirst(Attr1, X, Y), Attr1 = attr(Name1, _, _), isOfSecond(Attr2, X, Y), Attr2 = attr(Name2, _, _). % use order property join00([Arg1S, P1], [Arg2S, P2], pr(X = Y, _, _)) => [mergejoin(Arg1S, Arg2S, attrname(Attr1), attrname(Attr2)), [order(Name1), order(Name2)] ] :- isOfFirst(Attr1, X, Y), Attr1 = attr(Name1, _, _), isOfSecond(Attr2, X, Y), Attr2 = attr(Name2, _, _), select(order(Name1), P1, _), select(order(Name2), P2, _). % hashjoin has asymmetric cost, therefore consider both orders join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [hashjoin(Arg1S, Arg2S, attrname(Attr1), attrname(Attr2), 999997), [none]] :- isOfFirst(Attr1, X, Y), isOfSecond(Attr2, X, Y). join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [hashjoin(Arg2S, Arg1S, attrname(Attr2), attrname(Attr1), 999997), [none]] :- isOfFirst(Attr1, X, Y), isOfSecond(Attr2, X, Y). %fapra 2015/16 % Translate a distributed spatial join with an intersection predicate. distributedjoin(Arg1, Arg2, Pred) => [SecondoPlan, [DistAttr1, distributedobjecttype(dfarray), disjointpartitioning]]:- Pred = pr(Attr1 intersects Attr2, rel(_, Rel1Var, _), rel(_, Rel2Var, _)), isOfFirst(Attr1, Rel1, Rel2), isOfSecond(Attr2, Rel1, Rel2), attrnameDCAtom(Attr1, Attr1Name), attrnameDCAtom(Attr2, Attr2Name), % allow using replicated + any distribution or both distributed by % join predicate ((DistAttr1 = distribution(_, _, _), DistAttr2 = distribution(share, _, _)); (DistAttr1 = distribution(spatial, Attr2Name, GridObj), DistAttr2 = distribution(spatial, Attr1Name, GridObj))), Arg1 => [ObjName1, [DistAttr1| Props1]], Arg2 => [ObjName2, [DistAttr2| Props2]], % rename the parameter relations if needed feedRenameRelation(param1, Rel1Var, Param1Plan), feedRenameRelation(param2, Rel2Var, Param2Plan), % rename the cell attribute if needed renamedRelAttr(attr(cell, 1, u), Rel1Var, CellAttr1), renamedRelAttr(attr(cell, 2, u), Rel2Var, CellAttr2), Scheme = filter( filter( filter( itSpatialJoin( Param1Plan, Param2Plan, attrname(Attr1), attrname(Attr2) ), CellAttr1 = CellAttr2 ), gridintersects( GridObj, bbox(Attr1), bbox(Attr2), CellAttr1 ) ), Attr1 intersects Attr2 ), % We have the actual query now. Distribute it to the workers. distributedquery([ObjName1, [DistAttr1| Props1]], [ObjName2, [DistAttr2| Props2]], Scheme) => SecondoPlan. /* ---- distributedquery(Arg1, Arg2, QueryScheme) => ---- Distribute the query given by QueryScheme to the workers. The scheme has the place holders param1 and param2 for its argument. The actual arguments are given in Arg1 and Arg2 as a pair of a plan and a property list. Several cases might arise depening on Arg1's and Arg2's distribution type (replicated vs partitioned) and their distributed object type (d(f)array vs dfmatrix). */ % Arg1 replicated, Arg2 partitioned, Arg2 is a d(f)array distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :- not(isPartitioned([Arg1S, P1])), isPartitioned([Arg2S, P2]), not(isDfmatrix([Arg2S, P2])), substituteSubterm(param2, rel('.', *, u), QueryScheme, QueryScheme1), substituteSubterm(param1, Arg2S, QueryScheme1, QueryScheme2), Query = dmap(Arg2S, " ", QueryScheme2), !. % Arg2 replicated, Arg1 partitioned, Arg1 is a d(f)array distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :- isPartitioned([Arg1S, P1]), not(isPartitioned([Arg2S, P2])), not(isDfmatrix([Arg1S, P1])), substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1), substituteSubterm(param2, Arg2S, QueryScheme1, QueryScheme2), Query = dmap(Arg1S, " ", QueryScheme2), !. % Arg1 partitioned, Arg2 partitioned, both are d(f)arrays distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :- isPartitioned([Arg1S, P1]), isPartitioned([Arg2S, P2]), not(isDfmatrix([Arg2S, P2])), not(isDfmatrix([Arg1S, P1])), substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1), substituteSubterm(param2, rel('..', *, u), QueryScheme1, QueryScheme2), Query = dmap2(Arg1S, Arg2S, " ", QueryScheme2, 1238), !. % Arg1 partitioned, Arg2 partitioned, both dfmatrices distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :- isPartitioned([Arg1S, P1]), isPartitioned([Arg2S, P2]), isDfmatrix([Arg2S, P2]), isDfmatrix([Arg1S, P1]), substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1), substituteSubterm(param2, rel('..', *, u), QueryScheme1, QueryScheme2), Query = areduce2(Arg1S, Arg2S, "", QueryScheme2, 1238), !. % Arg1 replicated, Arg2 replicated distributedquery([Arg1S, P1], [Arg2S, P2], _) => _ :- not(isPartitioned([Arg1S, P1])), not(isPartitioned([Arg2S, P2])), write('A potential plan edge could not be generated because '), write('queries with two replicated arguments '), write('cannot be formulated using DistributedAlgebra as of now.\n'), fail. %Equijoin distributedjoin(ObjName1, ObjName2, pr(attr(X1,X2,X3)=attr(Y1,Y2,Y3), Rel1, Rel2)) => [SecondoPlan, [none]] :- X=attr(X1,X2,X3), Y=attr(Y1,Y2,Y3), Rel1 = rel(_, _, _), Rel2 = rel(_, _, _), isOfFirst(_, X, Y), isOfSecond(_, X, Y), buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2), SecondoPlan, false). %Standard Join distributedjoin(ObjName1, ObjName2, pr(Pred,Rel1, Rel2)) => [SecondoPlan, [none]] :- Rel1 = rel(_, _, _), Rel2 = rel(_, _, _), buildStdSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2), SecondoPlan, false). /* It is assumed that if "function" is specified in the system relation "SEC2DISTRIBUTED", then a deterministic function using the specified attribute was used. The functions used for partitioning both used relations are assumed to result in the same values if given the same attribute value. E.g. both used the same hashvalue. */ /* Equijoin Secondo Plan for both are partitioned by join attribute using modulo. Modulo is the most efficient compared to the other options, because we do not need to repartition and also there is no need to calculate the worker, on which a tuple is located, the worker number is already the modulo value. Thus it is slightly more efficient than any other function (i.e. hash). In case it is possible in the future to deploy different secondo plans to different workers (i.e. tell each worker which part of the shared relation it should use), having 2 replicated relations is the most efficient solution. */ buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2), SecondoPlan, _):- plan_to_atom(simple_attrname(X), X2), plan_to_atom(simple_attrname(Y), Y2), distributedRels(_, ObjName1, _, 'modulo', X2), distributedRels(_, ObjName2, _, 'modulo', Y2), Rel1 = rel(_, Rel1Var, _), Rel2 = rel(_, Rel2Var, _), % rename the parameter relations of the dmapped plan if needed feedRenameRelation(rel('.', *, u), Rel1Var, Feed1), feedRenameRelation(rel('..', *, u), Rel2Var, Feed2), !, SecondoPlan = dmap2(ObjName1, ObjName2, " ", hashjoin(Feed1, Feed2,attrname(X), attrname(Y), 999997), 1238). %Equijoin Secondo Plan for both are partitioned by join attribute %using a function buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2), SecondoPlan, _):- plan_to_atom(simple_attrname(X), X2), plan_to_atom(simple_attrname(Y), Y2), distributedRels(_, ObjName1, _, 'function', X2), distributedRels(_, ObjName2, _, 'function', Y2), Rel1 = rel(_, Rel1Var, _), Rel2 = rel(_, Rel2Var, _), % rename the parameter relations of the dmapped plan if needed feedRenameRelation(rel('.', *, u), Rel1Var, Feed1), feedRenameRelation(rel('..', *, u), Rel2Var, Feed2), !, SecondoPlan = dmap2(ObjName1, ObjName2, " ", hashjoin(Feed1, Feed2,attrname(X), attrname(Y), 999997), 1238). %Equijoin Secondo Plan for one replicated (relation) and %one partitioned (darray/dfarray) buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2), SecondoPlan, _):- distributedRels(_ ,ObjName1,_ ,'share',_ ), isPartitioned(ObjName2), Rel1 = rel(_, Rel1Var, _), Rel2 = rel(_, Rel2Var, _), % rename the parameter relations of the dmapped plan if needed feedRenameRelation(ObjName1, Rel1Var, Feed1), feedRenameRelation(rel('.', *, u), Rel2Var, Feed2), !, SecondoPlan = dmap(ObjName2, " ", hashjoin(Feed1, Feed2, attrname(X), attrname(Y), 999997)). %Commutativity for Equijoin & Standard Join buildSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2), SecondoPlan, false):- buildSecondoPlan(ObjName2, ObjName1, pr(Pred, Rel1, Rel2), SecondoPlan, true). %Equijoin Secondo Plan for repartitioning 2 "wrongly" %partitioned relations (darray/dfarray) buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2), SecondoPlan, _):- isPartitioned(ObjName1), isPartitioned(ObjName2), Rel1 = rel(_, Rel1Var, _), Rel2 = rel(_, Rel2Var, _), % rename the parameter relations of the dmapped plan if needed feedRenameRelation(rel('.', *, u), Rel1Var, Feed1), feedRenameRelation(rel('..', *, u), Rel2Var, Feed2), !, SecondoPlan = dmap2( collect2( partitionF(ObjName1, "LeftPartOfJoin", feed(rel('.',*,u)), hashvalue(our_attrname(X), 999997), 0), "L", 1238), collect2( partitionF(ObjName2, "RightPartOfJoin", feed(rel('.',*,u)), hashvalue(our_attrname(Y), 999997), 0), "R", 1238), " ", hashjoin(Feed1, Feed2, attrname(X), attrname(Y), 999997), 1238). %Equijoin Secondo Plan for repartitioning 2 replicated rels buildSecondoPlan(ObjName1, ObjName2, pr(attr(_,_,_)=attr(_,_,_), _, _), _, true):- distributedRels(_ ,ObjName1,_ ,'share',_ ), distributedRels(_, ObjName2, _,'share', _), !, write('Both relations are replicated, the query cannot be executed!'), false. % Plan yields a dfmatrix isDfmatrix([_, P]) :- member(distributedobjecttype(dfmatrix), P). % Plan yields a partitioned distribution. isPartitioned([_, P]):- is_list(P), !,( member(distribution('function', _, _), P); member(distribution('modulo', _, _), P); member(distribution('random', _, _), P); member(distribution('spatial', _, _), P)). % Secondo object represents a partitioned distribution. isPartitioned(ObjName):- distributedRels(_, ObjName,_ ,'function', _); distributedRels(_, ObjName,_ ,'modulo', _); distributedRels(_, ObjName,_ ,'random', _); distributedRels(_, ObjName,_ ,'spatial', _). %Standard Join Secondo Plan (one replicated, one partitioned) buildStdSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2), SecondoPlan, _):- (DistArgrel = ObjName2, ReplArgrel = ObjName1; DistArgrel = ObjName1, ReplArgrel = ObjName2), distributedRels(_, ReplArgrel, _ , 'share', _), isPartitioned(DistArgrel), Rel1 = rel(_, Rel1Var, _), Rel2 = rel(_, Rel2Var, _), % rename the parameter relations of the dmapped plan if needed feedRenameRelation(rel('.', *, u), Rel2Var, Feed2), feedRenameRelation(ReplArgrel, Rel1Var, Feed1), !, SecondoPlan = dmap(DistArgrel, " ", filter(product(Feed2,Feed1), Pred)). %Standard Join Secondo Plan, both are partitioned buildStdSecondoPlan(ObjName1, ObjName2, pr(_, _, _), _, true):- isPartitioned(ObjName1), isPartitioned(ObjName2), !, write('The joined relations are both partitioned and thus'), write(' not distributed correctly for standard join.'), false. %Standard Join Secondo Plan, if repartitioning is needed buildStdSecondoPlan(_, _, pr(_, _, _), _, true):- !, write('The joined relations are not distributed correctly '), write('for standard join.'), false. %end fapra 2015/16 /* ---- isOfFirst(Attr, X, Y) isOfSecond(Attr, X, Y) ---- ~Attr~ equal to either ~X~ or ~Y~ is an attribute of the first(second) relation. */ isOfFirst(X, X, _) :- X = attr(_, 1, _). isOfFirst(Y, _, Y) :- Y = attr(_, 1, _). isOfSecond(X, X, _) :- X = attr(_, 2, _). isOfSecond(Y, _, Y) :- Y = attr(_, 2, _). isNotOfFirst(Y, X, Y) :- X = attr(_, 1, _). isNotOfFirst(X, X, Y) :- Y = attr(_, 1, _). isNotOfSecond(Y, X, Y) :- X = attr(_, 2, _). isNotOfSecond(X, X, Y) :- Y = attr(_, 2, _). /* 6 Creating Query Plan Edges */ % RHG 2014 planEdge(Source, Target, Plan, Result) :- edge(Source, Target, Term, Result, _, _), Term => PlanExpr, getProperties(PlanExpr, Plan, _). % Version with properties % Selection Edges planEdge(Source, Target, PropertiesIn, Plan, [[Result, P2] | PRest], Result) :- edge(Source, Target, select(res(N), Pred), Result, _, _), select([N, P], PropertiesIn, PRest), select([res(N), P], Pred) => PlanExpr, getProperties(PlanExpr, Plan, P2). % Join Edges planEdge(Source, Target, PropertiesIn, Plan, [[Result, P2] | PRest], Result) :- edge(Source, Target, join(arg(N), res(M), Pred), Result, _, _), select([M, P], PropertiesIn, PRest), join(arg(N), [res(M), P], Pred) => PlanExpr, getProperties(PlanExpr, Plan, P2). planEdge(Source, Target, PropertiesIn, Plan, [[Result, P2] | PRest], Result) :- edge(Source, Target, join(res(M), arg(N), Pred), Result, _, _), select([M, P], PropertiesIn, PRest), join([res(M), P], arg(N), Pred) => PlanExpr, getProperties(PlanExpr, Plan, P2). planEdge(Source, Target, PropertiesIn, Plan, [[Result, P3] | PRest], Result) :- edge(Source, Target, join(res(N), res(M), Pred), Result, _, _), select([N, P], PropertiesIn, PIn2), select([M, P2], PIn2, PRest), join([res(N), P], [res(M), P2], Pred) => PlanExpr, getProperties(PlanExpr, Plan, P3). % Remaining edges without intermediate results planEdge(Source, Target, PropertiesIn, Plan, [[Result, P] | PropertiesIn], Result) :- edge(Source, Target, Term, Result, _, _), Term = select(arg(_), _), Term => PlanExpr, getProperties(PlanExpr, Plan, P). planEdge(Source, Target, PropertiesIn, Plan, [[Result, P] | PropertiesIn], Result) :- edge(Source, Target, Term, Result, _, _), Term = join(arg(_), arg(_), _), Term => PlanExpr, getProperties(PlanExpr, Plan, P). getProperties([Plan, P], Plan, P) :- !. getProperties(Plan, Plan, none). % end RHG 2014 createPlanEdge :- edge(Source, Target, Term, Result, _, _), Term => Plan, assert(planEdge(Source, Target, Plan, Result)), fail. createPlanEdges :- not(createPlanEdge). deletePlanEdge :- retract(planEdge(_, _, _, _)), fail. deletePlanEdges :- not(deletePlanEdge). writePlanEdge :- planEdge(Source, Target, Plan, Result), write('Source: '), write(Source), nl, write('Target: '), write(Target), nl, write('Plan: '), wp(Plan), nl, % write(Plan), nl, write('Result: '), write(Result), nl, nl, pe(N), retract(pe(_)), N1 is N + 1, assert(pe(N1)), % count edges fail. writePlanEdgesProp :- planEdge(Source, Target, _, Plan, Prop, Result), write('Source: '), write(Source), nl, write('Target: '), write(Target), nl, write('Plan: '), wp(Plan), nl, write(Prop), nl, % write(Plan), nl, write('Result: '), write(Result), nl, nl, pe(N), retract(pe(_)), N1 is N + 1, assert(pe(N1)), % count edges fail. writePlanEdges :- assert(pe(0)), not(writePlanEdge), not(writePlanEdgesProp), pe(N), write('The total number of plan edges is '), write(N), write('.'), nl. wpe :- writePlanEdges. /* 7 Assigning Sizes and Selectivities to the Nodes and Edges of the POG ---- assignSizes. deleteSizes. ---- Assign sizes (numbers of tuples) to all nodes in the pog, based on the cardinalities of the argument relations and the selectivities of the predicates. Store sizes as facts of the form resultSize(Result, Size). Store selectivities as facts of the form edgeSelectivity(Source, Target, Sel). Delete sizes from memory. 7.1 Assigning Sizes and Selectivities It is important that edges are processed in the order in which they have been created. This will ensure that for an edge the size of its argument nodes are available. */ assignSizes :- not(assignSizes1). assignSizes1 :- edge(Source, Target, Term, Result, _, _), assignSize(Source, Target, Term, Result), fail. %assignSize(Source, Target, select(Arg, Pred), Result) :- % Pred = pr(attr(original, *, u), _), % !, % predicate used for eliminating one of many spatially overlapping tuples % resSize(Arg, Size), % setNodeSize(Result, Size), % % assume overlap is rather small % assert(edgeSelectivity(Source, Target, 1)). assignSize(Source, Target, select(Arg, Pred), Result) :- resSize(Arg, Card), selectivity(Pred, Sel), Size is Card * Sel, setNodeSize(Result, Size), assert(edgeSelectivity(Source, Target, Sel)). assignSize(Source, Target, join(Arg1, Arg2, Pred), Result) :- resSize(Arg1, Card1), resSize(Arg2, Card2), selectivity(Pred, Sel), Size is Card1 * Card2 * Sel, setNodeSize(Result, Size), assert(edgeSelectivity(Source, Target, Sel)). /* ---- setNodeSize(Node, Size) :- ---- Set the size of node ~Node~ to ~Size~ if no size has been assigned before. */ setNodeSize(Node, _) :- resultSize(Node, _), !. setNodeSize(Node, Size) :- assert(resultSize(Node, Size)). /* ---- resSize(Arg, Size) :- ---- Argument ~Arg~ has size ~Size~. */ resSize(arg(N), Size) :- argument(N, rel(Rel, _, _)), card(Rel, Size), !. resSize(arg(N), _) :- write('Error in optimizer: cannot find cardinality for '), argument(N, Rel), wp(Rel), nl, fail. resSize(res(N), Size) :- resultSize(N, Size), !. /* ---- writeSizes :- ---- Write sizes and selectivities. */ writeSize :- resultSize(Node, Size), write('Node: '), write(Node), nl, write('Size: '), write(Size), nl, nl, fail. writeSize :- edgeSelectivity(Source, Target, Sel), write('Source: '), write(Source), nl, write('Target: '), write(Target), nl, write('Selectivity: '), write(Sel), nl, nl, fail. writeSizes :- not(writeSize). /* ---- deleteSizes :- ---- Delete node sizes and selectivities of edges. */ deleteSize :- retract(resultSize(_, _)), fail. deleteSize :- retract(edgeSelectivity(_, _, _)), fail. deleteSizes :- not(deleteSize). /* 8 Computing Edge Costs for Plan Edges 8.1 The Costs of Terms ---- cost(Term, Sel, Size, Cost) :- ---- The cost of an executable ~Term~ representing a predicate with selectivity ~Sel~ is ~Cost~ and the size of the result is ~Size~. This is evaluated recursively descending into the term. When the operator realizing the predicate (e.g. ~filter~) is encountered, the selectivity ~Sel~ is used to determine the size of the result. It is assumed that only a single operator of this kind occurs within the term. 8.1.1 Arguments */ cost(Obj, Sel, Size, Cost) :- distributedRels(Rel, Obj, _, DistType, _, _), not(DistType = share), cost(Rel, Sel, Size, Cost). cost(rel(Rel, _, _), _, Size, 0) :- card(Rel, Size). cost(res(N), _, Size, 0) :- resultSize(N, Size). /* 8.1.2 Operators */ cost(feed(X), Sel, S, C) :- cost(X, Sel, S, C1), feedTC(A), C is C1 + A * S. /* Here ~feedTC~ means ``feed tuple cost'', i.e., the cost per tuple, a constant to be determined in experiments. These constants are kept in file ``Operators.pl''. */ cost(consume(X), Sel, S, C) :- cost(X, Sel, S, C1), consumeTC(A), C is C1 + A * S. cost(filter(X, Pred), _, S, C) :- % This is special case for spatially distributed relations % we cannot determine the selectivity for the predicate because % it does not exist as a local relation on the master. % We assume verly little overlap in the spatial distribution. Pred=attr(original, l, u), !, cost(X, 1, SizeX, CostX), filterTC(A), S is SizeX * 0.9, C is CostX + A * SizeX. cost(filter(X, _), Sel, S, C) :- cost(X, 1, SizeX, CostX), filterTC(A), S is SizeX * Sel, C is CostX + A * SizeX. /* For the moment we assume a cost of 1 for evaluating a predicate; this should be changed shortly. */ cost(product(X, Y), _, S, C) :- cost(X, 1, SizeX, CostX), cost(Y, 1, SizeY, CostY), productTC(A, B), S is SizeX * SizeY, C is CostX + CostY + SizeY * A + S * B. cost(leftrange(_, Rel, _), Sel, Size, Cost) :- cost(Rel, 1, RelSize, _), leftrangeTC(C), Size is Sel * RelSize, Cost is Sel * RelSize * C. cost(rightrange(_, Rel, _), Sel, Size, Cost) :- cost(Rel, 1, RelSize, _), leftrangeTC(C), Size is Sel * RelSize, Cost is Sel * RelSize * C. /* Simplistic cost estimation for loop joins. If attribute values are assumed independent, then the selectivity of a subquery appearing in an index join equals the overall join selectivity. Therefore it is possible to estimate the result size and cost of a subquery (i.e. ~exactmatch~ and ~exactmatchfun~). As a subquery in an index join is executed as often as a tuple from the left input stream arrives, it is also possible to estimate the overall index join cost. */ cost(exactmatchfun(_, Rel, _), Sel, Size, Cost) :- cost(Rel, 1, RelSize, _), exactmatchTC(A, B, C, D), Size is Sel * RelSize, Cost is A + B * (log10(RelSize) - C) + % query cost Sel * RelSize * D. % size of result cost(exactmatch(_, Rel, _), Sel, Size, Cost) :- cost(Rel, 1, RelSize, _), exactmatchTC(A, B, C, D), Size is Sel * RelSize, Cost is A + B * (log10(RelSize) - C) + % query cost Sel * RelSize * D. % size of result cost(loopjoin(X, Y), Sel, S, Cost) :- cost(X, 1, SizeX, CostX), cost(Y, Sel, SizeY, CostY), S is SizeX * SizeY, loopjoinTC(A), Cost is CostX + % producing the first argument SizeX * A + % base cost for loopjoin SizeX * CostY. % sum of query costs cost(fun(_, X), Sel, Size, Cost) :- cost(X, Sel, Size, Cost). cost(hashjoin(X, Y, _, _, 999997), Sel, S, C) :- cost(X, 1, SizeX, CostX), cost(Y, 1, SizeY, CostY), hashjoinTC(A, B, D), S is SizeX * SizeY * Sel, C is CostX + CostY + % producing the arguments A * SizeY + % A - time [microsecond] per build B * SizeX + % B - time per probe D * S. % C - time per result tuple % table fits in memory assumed cost(sortmergejoin(X, Y, _, _), Sel, S, C) :- cost(X, 1, SizeX, CostX), cost(Y, 1, SizeY, CostY), sortmergejoinTC(A, B, D), S is SizeX * SizeY * Sel, C is CostX + CostY + % producing the arguments A * (SizeX + SizeY) + % sorting the arguments B * (SizeX + SizeY) + % merge step D * S. % cost of results cost(mergejoin(X, Y, _, _), Sel, S, C) :- cost(X, 1, SizeX, CostX), cost(Y, 1, SizeY, CostY), sortmergejoinTC(_, B, D), S is SizeX * SizeY * Sel, C is CostX + CostY + % producing the arguments B * (SizeX + SizeY) + % merge step D * S. % cost of results cost(extend(X, _), Sel, S, C) :- cost(X, Sel, S, C1), extendTC(A), C is C1 + A * S. cost(remove(X, _), Sel, S, C) :- cost(X, Sel, S, C1), removeTC(A), C is C1 + A * S. cost(project(X, _), Sel, S, C) :- cost(X, Sel, S, C1), projectTC(A), C is C1 + A * S. cost(rename(X, _), Sel, S, C) :- cost(X, Sel, S, C1), renameTC(A), C is C1 + A * S. %fapra 2015/16 % Taken from standard optimizer. cost(itSpatialJoin(X, Y, _, _), Sel, S, C) :- cost(X, 1, SizeX, CostX), cost(Y, 1, SizeY, CostY), itSpatialJoinTC(A, B), S is SizeX * SizeY * Sel, C is CostX + CostY + A * (SizeX + SizeY) + B * S. cost(windowintersects(_, Rel, _), Sel, Size, Cost) :- cost(Rel, 1, RelSize, _), windowintersectsTC(A), Size is Sel * RelSize, Cost is Size * A. cost(hashvalue(_,_), _, 1, 0). cost(dmap(Obj, _, InnerPlan), Sel, S, C) :- distributedRels(LocalMasterRel, Obj, _, _, _), substituteSubterm(rel('.', *, u), LocalMasterRel, InnerPlan, LocalInnerPlan), cost(LocalInnerPlan, Sel, S, InnerC), !, C is InnerC * S. cost(dmap(Obj, _, InnerPlan), Sel, S, C) :- substituteSubterm(rel('.', *, u), Obj, InnerPlan, LocalInnerPlan), cost(LocalInnerPlan, Sel, S, InnerC), !, C is InnerC * S. % if we cannot determine cost of first dmap-argument cost(dmap(_, _, X), Sel, S, C) :- cost(X, 1, SizeX, CostX), dmapTC(A), S is SizeX * Sel, C is CostX + A * SizeX. cost(dmap2(_, RelObj, _, InnerPlan, _), Sel, S, C) :- distributedRels(LocalMasterRel, RelObj, _, _, _), substituteSubterm(rel('..', *, u), LocalMasterRel, InnerPlan, LocalInnerPlan), dmap2TC(A), cost(LocalMasterRel, 1, Card, _), cost(LocalInnerPlan, Sel, _, InnerCost), !, S is Sel * Card, C is InnerCost + A * S. % we have two d/farray-objects as arguments cost(dmap2(RelObj1, RelObj2, _, InnerPlan, _), Sel, _, C) :- distributedRels(LocalMasterRel1, RelObj1, _, _, _), distributedRels(LocalMasterRel2, RelObj2, _, _, _), substituteSubterm(rel('.', *, u), LocalMasterRel1, InnerPlan, LocalInnerPlan1), substituteSubterm(rel('..', *, u), LocalMasterRel2, LocalInnerPlan1, LocalInnerPlan), dmap2TC(A), cost(LocalMasterRel1, 1, Card1, _), cost(LocalMasterRel2, 1, Card2, _), cost(LocalInnerPlan, Sel, _, InnerCost), !, S1 is Sel * Card1, S2 is Sel * Card2, C is InnerCost + A * S1 + A * S2. % we have two d/farray-values as arguments cost(dmap2(Arg1, Arg2, _, InnerPlan, _), Sel, _, C) :- cost(Arg1, _, _, C1), cost(Arg2, _, _, C2), substituteSubterm(rel('.', *, u), Arg1, InnerPlan, LocalInnerPlan1), substituteSubterm(rel('..', *, u), Arg2, LocalInnerPlan1, LocalInnerPlan), cost(LocalInnerPlan, Sel, _, InnerCost), dmap2TC(A), !, ArgS1 is Sel * C1, ArgS2 is Sel * C2, C is InnerCost + A * ArgS1 + A * ArgS2. cost(dmap2(RelObj1, RelObj2, _, InnerPlan, _), Sel, _, C) :- substituteSubterm(rel('.', *, u), "#!SUBST1!#", RelObj1, RelObj_Mod1), substituteSubterm(rel('.', *, u), "#!SUBST2!#", RelObj2, RelObj_Mod2), substituteSubterm(rel('.', *, u), RelObj_Mod1, InnerPlan, TempPlan1), substituteSubterm(rel('..', *, u), RelObj_Mod2, TempPlan1, TempPlan2), substituteSubterm( "#!SUBST1!#", rel('.',*,u),TempPlan2, TempPlan3), substituteSubterm( "#!SUBST2!#", rel('.',*,u),TempPlan3, FinallyGoodPlan), dmap2TC(A), cost(RelObj1, 1, Card1, _), cost(RelObj2, 1, Card2, _), cost(FinallyGoodPlan, Sel, _, InnerCost), !, S1 is Sel * Card1, S2 is Sel * Card2, C is InnerCost + A * S1 + A * S2. % we have two d/fmatrix-values as arguments cost(areduce2(Arg1, Arg2, _, InnerPlan, _), Sel, _, C) :- cost(Arg1, _, _, C1), cost(Arg2, _, _, C2), substituteSubterm(rel('.', *, u), Arg1, InnerPlan, LocalInnerPlan1), substituteSubterm(rel('..', *, u), Arg2, LocalInnerPlan1, LocalInnerPlan), cost(LocalInnerPlan, Sel, _, InnerCost), areduce2TC(A), !, ArgS1 is Sel * C1, ArgS2 is Sel * C2, C is InnerCost + A * ArgS1 + A * ArgS2. cost(collect2(InnerPlan, _ , _), Sel, S, C) :- cost(InnerPlan, Sel, S, InnerCost), collect2TC(A), C is InnerCost + A * S. cost(partitionF(RelObj, _, InnerPlan, _, _), Sel, S, C) :- distributedRels(LocalMasterRel, RelObj, _, _, _), substituteSubterm(rel('.', *, u), LocalMasterRel, InnerPlan, LocalInnerPlan), partitionFTC(A), cost(LocalMasterRel, 1, S, _), cost(LocalInnerPlan, Sel, _, InnerCost), !, C is (InnerCost + A) * S. % generic case cost(partitionF(RelObj, _, _, _), _, S, C) :- cost(RelObj, 1, RS, RC), partitionFTC(A), S is RS, C is RC + S * A. cost(extendstream(Stream, _, cellnumber(bbox(_), _)), _, S, C) :- cost(Stream, 1, S, StreamC), extendstreamTC(ETC), bboxTC(BTC), cellnumberTC(CTC), TC is ETC + BTC + CTC, C is S * TC + StreamC. cost(range(_, Rel, _, _), Sel, S, C) :- cost(Rel, 1, Card, _), S is Sel * Card, leftrangeTC(A), C is A * S. cost(dloop2(_, RelObj, _, InnerPlan), Sel, S, C) :- distributedRels(LocalMasterRel, RelObj, _, _, _), substituteSubterm(rel('..', *, u), LocalMasterRel, InnerPlan, LocalInnerPlan), dloopTC(A), cost(LocalMasterRel, 1, Card, _), cost(LocalInnerPlan, Sel, _, InnerCost), !, S is Sel * Card, C is InnerCost + A * S. /* dummy for dsummarize */ cost(dsummarize(_), _, _, 0). cost(dsummarize(X), Sel, S, C) :- cost(X, Sel, S, C1), dsummarizeTC(A), C is C1 + A * S. %end fapra 2015/16 /* 8.2 Creating Cost Edges These are plan edges extended by a cost measure. */ % RHG 2014 costEdge(Source, Target, Term, Result, Size, Cost) :- planEdge(Source, Target, Term, Result), edgeSelectivity(Source, Target, Sel), cost(Term, Sel, Size, Cost). % Version with properties costEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result, Size, Cost) :- planEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result), edgeSelectivity(Source, Target, Sel), cost(Plan, Sel, Size, Cost). % end RHG 2014 createCostEdge :- planEdge(Source, Target, Term, Result), edgeSelectivity(Source, Target, Sel), cost(Term, Sel, Size, Cost), assert(costEdge(Source, Target, Term, Result, Size, Cost)), fail. createCostEdges :- not(createCostEdge). deleteCostEdge :- retract(costEdge(_, _, _, _, _, _)), fail. deleteCostEdges :- not(deleteCostEdge). writeCostEdge :- costEdge(Source, Target, Plan, Result, Size, Cost), write('Source: '), write(Source), nl, write('Target: '), write(Target), nl, write('Plan: '), wp(Plan), nl, write('Result: '), write(Result), nl, write('Size: '), write(Size), nl, write('Cost: '), write(Cost), nl, nl, ce(N), retract(ce(_)), N1 is N + 1, assert(ce(N1)), % count edges fail. writeCostEdges :- assert(ce(0)), not(writeCostEdge), ce(N), write('The total number of cost edges is '), write(N), write('.'), nl. wce :- writeCostEdges. writeCostEdgeUsed :- costEdgeUsed(Source, Version, Target, PropertiesIn, Plan, PropertiesOut, Result, Size, Cost), write('Source: ('), write(Source), write(', '), write(Version), write(')'), nl, write('Target: '), write(Target), nl, write('PropertiesIn: '), write(PropertiesIn), nl, write('Plan: '), wp(Plan), nl, write('PropertiesOut: '), write(PropertiesOut), nl, write('Result: '), write(Result), nl, write('Size: '), write(Size), nl, write('Cost: '), write(Cost), nl, nl, ceu(N), retract(ceu(_)), N1 is N + 1, assert(ceu(N1)), % count edges fail. writeCostEdgesUsed :- assert(ceu(0)), not(writeCostEdgeUsed), ceu(N), write('The total number of cost edges used is '), write(N), write('.'), nl. wceu :- writeCostEdgesUsed. deleteCostEdgeUsed :- retract(costEdgeUsed(_, _, _, _, _, _, _, _, _)), fail. deleteCostEdgesUsed :- not(deleteCostEdgeUsed). /* ---- assignCosts ---- This just puts together creation of sizes and cost edges. */ assignCosts :- assignSizes. % RHG 2014 % createCostEdges. /* 9 Finding Shortest Paths = Cheapest Plans The cheapest plan corresponds to the shortest path through the predicate order graph. 9.1 Shortest Path Algorithm by Dijkstra We implement the shortest path algorithm by Dijkstra. There are two relevant sets of nodes: * center: the nodes for which shortest paths have already been computed * boundary: the nodes that have been seen, but that have not yet been expanded. These need to be kept in a priority queue. A node, as used during shortest path computation, is represented as a term ---- node(n(Name, Version), Distance, [Path, Properties]) ---- where ~Name~ is the node number, ~Version~ a version number of this node, ~Distance~ the distance along the shortest path to this node, ~Path~ is the list of edges forming the shortest path, and ~Properties~ the physical properties (such as order) for the result obtained at this node version. The graph is represented by the set of ~costEdges~. The center is represented as a set of facts of the form ---- center(n(Name, Version), node(n(Name, Version), Distance, [Path, Properties])) ---- Since predicates are generally indexed by their first argument, finding a node in the center via the node number should be very efficient. We assume it is possible in constant time. The boundary is represented by an abstract data type as described in the interface below. Essentially it is a priority queue implementation. ---- successor(Node, Succ) :- ---- ~Succ~ is a successor of node ~Node~ via some edge. This includes computation of the distance and path of the successor. */ % RHG 2014 % successor(node(Source,Distance, Path), node(Target, Distance2, Path2)) :- % costEdge(Source, Target, Term, Result, Size, Cost), % assert(costEdgeUsed(Source, Target, Term, Result, Size, Cost)), % Distance2 is Distance + Cost, % append(Path, [costEdge(Source, Target, Term, Result, Size, Cost)], Path2). % Version with properties successor(node(n(Source, Version), Distance, [Path, PropertiesIn]), simplenode(Target, Distance2, [Path2, PropertiesOut])) :- costEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result, Size, Cost), assert(costEdgeUsed(Source, Version, Target, PropertiesIn, Plan, PropertiesOut, Result, Size, Cost)), Distance2 is Distance + Cost, append(Path, [costEdge(Source, Target, Plan, Result, Size, Cost)], Path2). % end RHG 2014 /* ---- dijkstra(Source, Dest, Path, Length) :- ---- The shortest path from ~Source~ to ~Dest~ is ~Path~ of length ~Length~. */ dijkstra(Source, Dest, Path, Length) :- emptyCenter, b_empty(Boundary), deleteCostEdgesUsed, % RHG b_insert(Boundary, node(n(Source, 1), 0, [[], []]), Boundary1), dijkstra1(Boundary1, n(Dest, 1), 0, notfound), center(n(Dest, _), node(n(Dest, _), Length, [Path, _])). emptyCenter :- not(emptyCenter1). emptyCenter1 :- retract(center(_, _)), fail. /* ---- dijkstra1(Boundary, Dest, NoOfCalls) :- ---- Compute the shortest paths to all nodes and store them in a predicate ~center~. Initially to be called with no fact ~center~ asserted, and ~Boundary~ just containing the start node. For testing we check at which iteration the destination ~Dest~ is reached. */ dijkstra1(Boundary, _, _, found) :- !, tree_height(Boundary, H), write('Height of search tree for boundary is '), write(H), nl. dijkstra1(Boundary, _, _, _) :- b_isEmpty(Boundary). dijkstra1(Boundary, Dest, N, _) :- % nl, nl, % write('dijkstra1 called.'), nl, % write('Boundary = '), write(Boundary), nl, write('====='), nl, b_removemin(Boundary, Node, Bound2), Node = node(Name, _, _), % write('Node = '), write(Name), nl, assert(center(Name, Node)), % write('Center = '), writeCenter, nl, write('====='), nl, checkDest(Name, Dest, N, Found), putsuccessors(Bound2, Node, Bound3), % write('putsuccessors succeeded.'), nl, N1 is N+1, dijkstra1(Bound3, Dest, N1, Found). checkDest(n(Name, _), n(Name, _), N, found) :- write('Destination node '), write(Name), write(' reached at iteration '), write(N), nl. checkDest(_, _, _, notfound). /* Some auxiliary functions for testing: */ writeList([]). writeList([X | Rest]) :- nl, nl, write('-----'), nl, write(X), writeList(Rest). writeCenter :- not(writeCenter1). writeCenter1 :- center(_, node(Name, Distance, Path)), write('Node: '), write(Name), nl, write('Cost: '), write(Distance), nl, write('Path: '), nl, write(Path), nl, fail. writePath([]). writePath([costEdge(Source, Target, Term, Result, Size, Cost) | Path]) :- write(costEdge(Source, Target, Result, Size, Cost)), nl, write(' '), wp(Term), nl, writePath(Path). /* ---- putsuccessors(Boundary, Node, BoundaryNew) :- ---- Insert into ~Boundary~ all successors of node ~Node~ not yet present in the center, updating their distance if they are already present, to obtain ~BoundaryNew~. */ putsuccessors(Boundary, Node, BoundaryNew) :- findall(Succ, successor(Node, Succ), Successors), % write('successors of '), write(Node), nl, % writeList(Successors), nl, nl, putsucc1(Boundary, Successors, BoundaryNew). % write('the new boundary is: '), write(BoundaryNew), % nl, write('====='), nl. /* ---- putsucc1(Boundary, Successors, BoundaryNew) :- ---- put all successors not yet in the center from the list ~Successors~ into the ~Boundary~ to get ~BoundaryNew~. The cases to be distinguished are: * The list of successors is empty. * The first successor simplenode(N, \_, \_) is already in the center, hence the shortest path to it is already known and it does not need to be inserted into the boundary. * The first successor X = simplenode(N, \_, \_) exists in the boundary. That means, there exists a non-empty set V(N) with versions of N in the boundary. We say, X dominates Y iff the distance of X is less than or equal to that of Y and the properties of X include those of Y. * If X is not dominated by any Y in V(N), then insert X into the boundary. * If X dominates any Y in V(N), then remove Y from the boundary. * The first successor does not exist in the boundary. It is inserted. */ putsucc1(Boundary, [], Boundary). putsucc1(Boundary, [simplenode(N, _, _) | Successors], BNew) :- center(n(N, 1), _), !, putsucc1(Boundary, Successors, BNew). putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :- findall(Node, b_memberByName(Boundary, n(N, _), Node), Nodes), insertIfNotDominated(Boundary, simplenode(N, D, P), Nodes, 1, Boundary2), removeThoseDominated(Boundary2, simplenode(N, D, P), Nodes, Boundary3), putsucc1(Boundary3, Successors, BNew). % putsucc1(Boundary, [simplenode(N, D, [_, Properties]) | Successors], % BNew) :- % b_memberByName(Boundary, n(N, 1), node(n(N, 1), DistOld, [_, Properties)), % DistOld =< D, !, % putsucc1(Boundary, Successors, BNew). % putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :- % b_memberByName(Boundary, n(N, 1), node(n(N, 1), DistOld, _)), % D < DistOld, !, % b_deleteByName(Boundary, n(N, 1), Bound2), % b_insert(Bound2, node(n(N, 1), D, P), Bound3), % putsucc1(Bound3, Successors, BNew). % the following not needed % putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :- % nl, % write('putsucc1 called with final case'), nl, % write(simplenode(N, D, P)), nl, % b_insert(Boundary, node(n(N, 1), D, P), Bound2), % putsucc1(Bound2, Successors, BNew). insertIfNotDominated(Boundary, simplenode(N, D, P), [], Version, BoundaryOut) :- b_insert(Boundary, node(n(N, Version), D, P), BoundaryOut). % nl, write('***** inserted '), write(node(n(N, Version), D, P)), nl. insertIfNotDominated(Boundary, simplenode(N, D, [Path, Prop]), [node(n(N, V), DistOld, [_, PropOld]) | Nodes], Version, BoundaryOut) :- ( D < DistOld ; otherProperties(Prop, PropOld) ), % not dominated ( V > Version -> Version2 is V + 1 ; Version2 is Version + 1 ), insertIfNotDominated(Boundary, simplenode(N, D, [Path, Prop]), Nodes, Version2, BoundaryOut). insertIfNotDominated(Boundary, simplenode(N, D, [_, Prop]), [node(n(N, _), DistOld, [_, PropOld]) | _], _, Boundary) :- % nl, write('***** NOT inserted '), write(simplenode(N, D, [Path, Prop])), nl, D >= DistOld, included(Prop, PropOld). % is dominated and can be ignored. removeThoseDominated(Boundary, simplenode(_, _, [_, _]), [], Boundary). removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]), [node(n(N, _), DistOld, [_, PropOld]) | Nodes], Boundary2) :- ( DistOld =< D ; otherProperties(PropOld, Prop) ), !, % not dominated removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]), Nodes, Boundary2). removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]), [node(n(N, V), _, [_, _]) | Nodes], Boundary3) :- b_deleteByName(Boundary, n(N, V), Boundary2), % nl, write('***** deleted '), write(n(N, V)), nl, removeThoseDominated(Boundary2, simplenode(N, D, [Path, Prop]), Nodes, Boundary3). :-dynamic noProperties/0. included(_, _) :- noProperties, !. included([[Node, List1] | Props1], Props2) :- select([Node, List2], Props2, Props2Rest), included2(List1, List2), included(Props1, Props2Rest). included([], _). included2([], _). included2([P1 | Props1], Props2) :- select(P1, Props2, Props2Rest), included2(Props1, Props2Rest). included2([none], _). otherProperties(Props1, Props2) :- not(included(Props1, Props2)). /* 9.2 Interface ~Boundary~ The boundary is represented in a data structure with the following operations: ---- b_empty(-Boundary) :- ---- Creates an empty boundary and returns it. ---- b_isEmpty(+Boundary) :- ---- Checks whether the boundary is empty. ---- b_removemin(+Boundary, -Node, -BoundaryOut) :- ---- Returns the node ~Node~ with minimal distance from the set ~Boundary~ and returns also ~BoundaryOut~ where this node is removed. ---- b_insert(+Boundary, +Node, -BoundaryOut) :- ---- Inserts a node that must not yet be present (i.e., no other node of that name). ---- b_memberByName(+Boundary, +Name, -Node) :- ---- If a node ~Node~ with name ~Name~ is present, it is returned. ---- b_deleteByName(+Boundary, +Name, -BoundaryOut) :- ---- Returns the boundary, where the node with name ~Name~ is deleted. */ /* 9.3 Constructing the Plan from the Shortest Path ---- plan(Path, Plan) ---- The plan corresponding to ~Path~ is ~Plan~. */ %fapra 15/16 plan(Path, Plan) :- isDistributedQuery, !, deleteNodePlans, mergePlanEdges(Path, MergedPath), traversePath(MergedPath), highNode(N), nodePlan(N, Plan). %end fapra 15/16 plan(Path, Plan) :- deleteNodePlans, traversePath(Path), highNode(N), nodePlan(N, Plan). deleteNodePlans :- not(deleteNodePlan). deleteNodePlan :- retract(nodePlan(_, _)), fail. traversePath([]). traversePath([costEdge(_, _, Term, Result, _, _) | Path]) :- embedSubPlans(Term, Term2), assert(nodePlan(Result, Term2)), traversePath(Path). embedSubPlans(res(N), Term) :- nodePlan(N, Term), !. embedSubPlans(Term, Term2) :- compound(Term), !, Term =.. [Functor | Args], embedded(Args, Args2), Term2 =.. [Functor | Args2]. embedSubPlans(Term, Term). embedded([], []). embedded([Arg | Args], [Arg2 | Args2]) :- embedSubPlans(Arg, Arg2), embedded(Args, Args2). %fapra 15/16 /* ---- mergePlanEdges(PlanEdgeList, MergedEdgesList) ---- Merge the distribution of a query on a distributed query result to the distribution of the query on a query result. Example: dmap(... filter(.,bla1)) dmap filter(., bla2) ...becomes: dmap(... filter(filter(., bla1), bla2)) */ mergePlanEdges([], []). mergePlanEdges([X], [X]). /* Merge rule for two successive dmaps with filtrations as there parameters should be the most common case. */ mergePlanEdges([Edge1, Edge2|Edges], MergedEdges) :- Edge1 = costEdge(Source, _, Plan1, Res1, _, C1), Edge2 = costEdge(_, Target, Plan2, Res2, S2, C2), Plan1 = dmap(Arg, _, filter(FilterArg, Pred1)), successiveFilterOnParam(FilterArg, ArgTerm), Plan2 = dmap(res(Res1), ResName, filter(ArgTerm, Pred2)), MergedPlan = dmap(Arg, ResName, filter(filter(FilterArg, Pred1), Pred2)), % the plan is already chosen at this point, so costs will have no influence MergedCosts is C1 + C2, MergedHead = costEdge(Source, Target, MergedPlan, Res2, S2, MergedCosts), mergePlanEdges([MergedHead|Edges], MergedEdges). % First two edges cannot be merges according to the above rules. mergePlanEdges([X|Tail], [X|MergedTail]) :- mergePlanEdges(Tail, MergedTail). % Term is a dot or a nested filtration on a dot. successiveFilterOnParam(Term, ArgTerm) :- functor(Term, filter, 2), arg(1, Term, FirstArg), successiveFilterOnParam(FirstArg, ArgTerm). successiveFilterOnParam(Term, Term) :- Term = feed(rel('.', _, _)). successiveFilterOnParam(Term, Term) :- Term = rename(feed(rel('.', _, _)), _). %end fapra 15/16 % highestNode(Path, N) :- % reverse(Path, Path2), % Path2 = [costEdge(_, N, _, _, _, _) | _]. /* 9.4 Computing the Best Plan for a Given Predicate Order Graph */ bestPlan :- assignCosts, highNode(N), dijkstra(0, N, Path, Cost), plan(Path, Plan), write('The best plan is:'), nl, nl, wp(Plan), nl, nl, write('The cost is: '), write(Cost), nl. bestPlan(Plan, Cost) :- assignCosts, highNode(N), dijkstra(0, N, Path, Cost), plan(Path, Plan). /* 10 A Larger Example It is now time to test efficiency with a larger example. We consider the query: ---- select * from Staedte, plz as p1, plz as p2, plz as p3, where SName = p1.Ort and p1.PLZ = p2.PLZ + 1 and p2.PLZ = p3.PLZ * 5 and Bev > 300000 and Bev < 500000 and p2.PLZ > 50000 and p2.PLZ < 60000 and Kennzeichen starts "W" and p3.Ort contains "burg" and p3.Ort starts "M" ---- This translates to: */ example6 :- pog( [rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)], [ pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u), rel(plz, p1, l)), pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), rel(plz, p1, l), rel(plz, p2, l)), pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5), rel(plz, p2, l), rel(plz, p3, l)), pr(attr(bev, 1, u) > 300000, rel(staedte, *, u)), pr(attr(bev, 1, u) < 500000, rel(staedte, *, u)), pr(attr(p2:pLZ, 1, u) > 50000, rel(plz, p2, l)), pr(attr(p2:pLZ, 1, u) < 60000, rel(plz, p2, l)), pr(attr(kennzeichen, 1, u) starts "W", rel(staedte, *, u)), pr(attr(p3:ort, 1, u) contains "burg", rel(plz, p3, l)), pr(attr(p3:ort, 1, u) starts "M", rel(plz, p3, l)) ], _, _). /* This doesn't work (initially, now it works). Let's keep the numbers a bit smaller and avoid too many big joins first. */ example7 :- pog( [rel(staedte, *, u), rel(plz, p1, l)], [ pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u), rel(plz, p1, l)), pr(attr(bev, 0, u) > 300000, rel(staedte, *, u)), pr(attr(bev, 0, u) < 500000, rel(staedte, *, u)), pr(attr(p1:pLZ, 0, u) > 50000, rel(plz, p1, l)), pr(attr(p1:pLZ, 0, u) < 60000, rel(plz, p1, l)), pr(attr(kennzeichen, 0, u) starts "F", rel(staedte, *, u)), pr(attr(p1:ort, 0, u) contains "burg", rel(plz, p1, l)), pr(attr(p1:ort, 0, u) starts "M", rel(plz, p1, l)) ], _, _). example8 :- pog( [rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l)], [ pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u), rel(plz, p1, l)), pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), rel(plz, p1, l), rel(plz, p2, l)), pr(attr(bev, 0, u) > 300000, rel(staedte, *, u)), pr(attr(bev, 0, u) < 500000, rel(staedte, *, u)), pr(attr(p1:pLZ, 0, u) > 50000, rel(plz, p1, l)), pr(attr(p1:pLZ, 0, u) < 60000, rel(plz, p1, l)), pr(attr(kennzeichen, 0, u) starts "F", rel(staedte, *, u)), pr(attr(p1:ort, 0, u) contains "burg", rel(plz, p1, l)), pr(attr(p1:ort, 0, u) starts "M", rel(plz, p1, l)) ], _, _). /* Let's study a small example again with two independent conditions. */ example9 :- pog([rel(staedte, s, u), rel(plz, p, l)], [pr(attr(p:ort, 2, u) = attr(s:sName, 1, u), rel(staedte, s, u), rel(plz, p, l) ), pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l)), pr(attr(s:bev, 0, u) > 300000, rel(staedte, s, u))], _, _). example10 :- pog( [rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)], [ pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u), rel(plz, p1, l)), pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), rel(plz, p1, l), rel(plz, p2, l)), pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5), rel(plz, p2, l), rel(plz, p3, l)) ], _, _). /* 11 A User Level Language We have started to construct the optimizer by building the predicate order graph, using a notation for relations and predicates as useful for that purpose. Later, in [Section Translation], we have adapted the notation to be able to translate and construct query plans as needed in Secondo. In this section we will introduce a more user friendly notation for queries, pretty similar to SQL, but suitable for being written directly in PROLOG. 11.1 The Language The basic select-from-where statement will be written as ---- select from where ---- The first example query from [Section 4.1.1] can then be written as: ---- select [sname, bev] from [staedte] where [bev > 500000] ---- Instead of lists consisting of a single element we will also support writing just the element, hence the query can also be written: ---- select [sname, bev] from staedte where bev > 500000 ---- The second query can be written as: ---- select * from [staedte as s, plz as p] where [sname = p:ort, p:plz > 40000] ---- Note that all relation names and attribute names are written just in lower case; the system will lookup the spelling in a table. Furthermore, it will be possible to add a groupby- and an orderby-clause: * groupby ---- select from where groupby ---- Example: ---- select [ort, min(plz) as minplz, max(plz) as maxplz, count(*) as cntplz] from plz where plz > 40000 groupby ort ---- * orderby ---- select from where orderby ---- Example: ---- select [ort, plz] from plz orderby [ort asc, plz desc] ---- This example also shows that the where-clause may be omitted. It is also possible to combine grouping and ordering: ---- select [ort, min(plz) as minplz, max(plz) as maxplz, count(*) as cntplz] from plz where plz > 40000 groupby ort orderby cntplz desc ---- Currently only a basic part of this language has been implemented. 11.2 Structure We introduce ~select~, ~from~, ~where~, and ~as~ as PROLOG operators: */ :- op(990, fx, sql). :- op(985, xfx, >>). :- op(950, fx, select). :- op(960, xfx, from). :- op(950, xfx, where). :- op(930, xfx, as). :- op(970, xfx, groupby). :- op(980, xfx, orderby). :- op(930, xf, asc). :- op(930, xf, desc). /* This ensures that the select-from-where statement is viewed as a term with the structure: ---- from(select(AttrList(), where(RelList, PredList)) ---- That this works, can be tested with: ---- P = (select s:sname from staedte as s where s:bev > 500000), P = (X from Y), X = (select AttrList), Y = (RelList where PredList), RelList = (Rel as Var). ---- The result is: ---- P = select s:sname from staedte as s where s:bev>500000 X = select s:sname Y = staedte as s where s:bev>500000 AttrList = s:sname RelList = staedte as s PredList = s:bev>500000 Rel = staedte Var = s ---- 11.3 Schema Lookup The second task is to lookup attribute names in order to build the input notation for the construction of the predicate order graph. 11.3.1 Tables In the file ~database~ we maintain the following tables. Relation schemas are written as: ---- relation(staedte, [sname, bev, plz, vorwahl, kennzeichen]). relation(plz, [plz, ort]). ---- The spelling of relation or attribute names is given in a table ---- spelling(staedte:plz, pLZ). spelling(staedte:sname, sName). spelling(plz, lc(plz)). spelling(plz:plz, pLZ). ---- The default assumption is that the first letter of a name is upper case and all others are lower case. If this is true, then no entry in the table ~spelling~ is needed. If a name starts with a lower case letter, then this is expressed by the functor ~lc~. 11.3.2 Looking up Relation and Attribute Names */ callLookup(Query, Query2) :- newQuery, lookup(Query, Query2), !. %fapra 2015/16 /* added clearIsDistributedQuery */ newQuery :- not(clearVariables), not(clearQueryRelations), not(clearQueryAttributes), not(clearIsDistributedQuery), not(clearIsLocalQuery). clearVariables :- retract(variable(_, _)), fail. clearQueryRelations :- retract(queryRel(_, _)), fail. clearQueryAttributes :- retract(queryAttr(_)), fail. clearIsDistributedQuery :- retract(isDistributedQuery), fail. clearIsLocalQuery :- retract(isLocalQuery), fail. %end fapra 2015/16 /* ---- lookup(Query, Query2) :- ---- ~Query2~ is a modified version of ~Query~ where all relation names and attribute names have the form as required in [Section Translation]. */ lookup(select Attrs from Rels where Preds, select Attrs2 from Rels2List where Preds2List) :- lookupRels(Rels, Rels2), checkDistributedQuery, lookupAttrs(Attrs, Attrs2), lookupPreds(Preds, Preds2), makeList(Rels2, Rels2List), makeList(Preds2, Preds2List). lookup(select Attrs from Rels, select Attrs2 from Rels2) :- lookupRels(Rels, Rels2), checkDistributedQuery, lookupAttrs(Attrs, Attrs2). lookup(Query orderby Attrs, Query2 orderby Attrs3) :- lookup(Query, Query2), makeList(Attrs, Attrs2), lookupAttrs(Attrs2, Attrs3). lookup(Query groupby Attrs, Query2 groupby Attrs3) :- lookup(Query, Query2), makeList(Attrs, Attrs2), lookupAttrs(Attrs2, Attrs3). makeList(L, L) :- is_list(L). makeList(L, [L]) :- not(is_list(L)). /* 11.3.3 Modification of the From-Clause ---- lookupRels(Rels, Rels2) ---- Modify the list of relation names. If there are relations without variables, store them in a table ~queryRel~. Any two such relations must have distinct sets of attribute names. Also, any two variables must be distinct. */ lookupRels([], []). lookupRels([R | Rs], [R2 | R2s]) :- lookupRel(R, R2), lookupRels(Rs, R2s). lookupRels(Rel, Rel2) :- not(is_list(Rel)), lookupRel(Rel, Rel2). /* ---- lookupRel(Rel, Rel2) :- ---- Translate and store a single relation definition. */ :- dynamic variable/2, queryRel/2, queryAttr/1. lookupRel(Rel as Var, rel(Rel2, Var, Case)) :- removeDistributedSuffix(Rel,DRel), relation(DRel, _), !, spelled(DRel, Rel2, Case), not(defined(Var)), assert(variable(Var, rel(Rel2, Var, Case))). lookupRel(Rel, rel(Rel2, *, Case)) :- removeDistributedSuffix(Rel,DRel), relation(DRel, _), !, spelled(DRel, Rel2, Case), not(duplicateAttrs(Rel)), assert(queryRel(DRel, rel(Rel2, *, Case))). lookupRel(Term, Term) :- write('Error in query: relation '), write(Term), write(' not known'), nl, fail. defined(Var) :- variable(Var, _), write('Error in query: doubly defined variable '), write(Var), write('.'), nl. %fapra 2015/16 /* Checks if all relations are distributed. Currently the optimizer can only handle queries including relations, that are all local or distributed. Situations with mixed relationtypes will be discarded. */ %handle not distributed queries checkDistributedQuery :- not(isDistributedQuery), isLocalQuery, !. checkDistributedQuery :- isDistributedQuery, not(isLocalQuery), !. checkDistributedQuery :- write('Error in query: not all relations distributed '), fail, !. %end fapra 2015/16 /* ---- duplicateAttrs(Rel) :- ---- There is a relation stored in ~queryRel~ that has attribute names also occurring in ~Rel~. */ duplicateAttrs(Rel) :- queryRel(Rel2, _), relation(Rel2, Attrs2), member(Attr, Attrs2), relation(Rel, Attrs), member(Attr, Attrs), write('Error in query: duplicate attribute names in relations '), write(Rel2), write(' and '), write(Rel), write('.'), nl. /* 11.3.4 Modification of the Select-Clause */ lookupAttrs([], []). lookupAttrs([A | As], [A2 | A2s]) :- lookupAttr(A, A2), lookupAttrs(As, A2s). lookupAttrs(Attr, Attr2) :- not(is_list(Attr)), lookupAttr(Attr, Attr2). lookupAttr(Var:Attr, attr(Var:Attr2, 0, Case)) :- !, variable(Var, Rel2), Rel2 = rel(Rel, _, _), spelled(Rel:Attr, attr(Attr2, _, Case)). lookupAttr(Attr asc, Attr2 asc) :- !, lookupAttr(Attr, Attr2). lookupAttr(Attr desc, Attr2 desc) :- !, lookupAttr(Attr, Attr2). lookupAttr(Attr, Attr2) :- isAttribute(Attr, Rel), !, spelled(Rel:Attr, Attr2). lookupAttr(*, *) :- !. lookupAttr(count(*), count(*)) :- !. lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :- lookupAttr(Expr, Expr2), not(queryAttr(attr(Name, 0, u))), !, assert(queryAttr(attr(Name, 0, u))). lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :- lookupAttr(Expr, Expr2), queryAttr(attr(Name, 0, u)), !, write('***** Error: attribute name '), write(Name), write(' doubly defined in query.'), nl. lookupAttr(Term, Term2) :- compound(Term), functor(Term, Op, 1), arg(1, Term, Arg1), lookupAttr(Arg1, Res1), functor(Term2, Op, 1), arg(1, Term2, Res1). lookupAttr(Name, attr(Name, 0, u)) :- queryAttr(attr(Name, 0, u)), !. lookupAttr(Name, Name) :- write('Error in attribute list: could not recognize '), write(Name), nl, fail. isAttribute(Name, Rel) :- queryRel(Rel, _), relation(Rel, List), member(Name, List). /* 11.3.5 Modification of the Where-Clause */ lookupPreds([], []). lookupPreds([P | Ps], [P2 | P2s]) :- !, lookupPred(P, P2), lookupPreds(Ps, P2s). lookupPreds(Pred, Pred2) :- not(is_list(Pred)), lookupPred(Pred, Pred2). lookupPred(Pred, pr(Pred2, Rel)) :- lookupPred1(Pred, Pred2, 0, [], 1, [Rel]), !. lookupPred(Pred, pr(Pred2, Rel1, Rel2)) :- lookupPred1(Pred, Pred2, 0, [], 2, [Rel1, Rel2]), !. lookupPred(Pred, _) :- lookupPred1(Pred, _, 0, [], 0, []), write('Error in query: constant predicate is not allowed.'), nl, fail, !. lookupPred(Pred, _) :- lookupPred1(Pred, _, 0, [], N, _), N > 2, write('Error in query: predicate involving more than two relations '), write('is not allowed.'), nl, fail. /* ---- lookupPred1(+Pred, Pred2, +N, +RelsBefore, -M, -RelsAfter) :- ---- ~Pred2~ is the transformed version of ~Pred~; before this is called, ~N~ attributes in list ~RelsBefore~ have been found; after the transformation in total ~M~ attributes referring to the relations in list ~RelsAfter~ have been found. */ lookupPred1(Var:Attr, attr(Var:Attr2, N1, Case), N, RelsBefore, N1, RelsAfter) :- variable(Var, Rel2), !, Rel2 = rel(Rel, _, _), spelled(Rel:Attr, attr(Attr2, _, Case)), N1 is N + 1, append(RelsBefore, [Rel2], RelsAfter). lookupPred1(Attr, attr(Attr2, N1, Case), N, RelsBefore, N1, RelsAfter) :- isAttribute(Attr, Rel), !, spelled(Rel:Attr, attr(Attr2, _, Case)), queryRel(Rel, Rel2), N1 is N + 1, append(RelsBefore, [Rel2], RelsAfter). lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :- compound(Term), functor(Term, F, 1), !, arg(1, Term, Arg1), lookupPred1(Arg1, Arg1Out, N, RelsBefore, M, RelsAfter), functor(Term2, F, 1), arg(1, Term2, Arg1Out). lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :- compound(Term), functor(Term, F, 2), !, arg(1, Term, Arg1), arg(2, Term, Arg2), lookupPred1(Arg1, Arg1Out, N, RelsBefore, M1, RelsAfter1), lookupPred1(Arg2, Arg2Out, M1, RelsAfter1, M, RelsAfter), functor(Term2, F, 2), arg(1, Term2, Arg1Out), arg(2, Term2, Arg2Out). lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :- compound(Term), functor(Term, F, 3), !, arg(1, Term, Arg1), arg(2, Term, Arg2), arg(3, Term, Arg3), lookupPred1(Arg1, Arg1Out, N, RelsBefore, M1, RelsAfter1), lookupPred1(Arg2, Arg2Out, M1, RelsAfter1, M2, RelsAfter2), lookupPred1(Arg3, Arg3Out, M2, RelsAfter2, M, RelsAfter), functor(Term2, F, 3), arg(1, Term2, Arg1Out), arg(2, Term2, Arg2Out), arg(3, Term2, Arg3Out). % may need to be extended to operators with more than three arguments. %fapra 2015/16 /* Lookup generic, non- relation objects. If ~Term~ is a secondo object, so mark it with the the functor ~obj(Term,Type,Case)~. Where ~Term~ is the identifier starting with a lower case character and type the kind of object. ~Case~ indicates if the object names first letter is written with a capital letter or not (u,l). */ lookupPred1(Term, ObjTerm, N, Rels, N, Rels) :- atom(Term), not(is_list(Term)), spelledObj(Term,Obj,Type,Case), ObjTerm = obj(Obj,Type,Case), !. lookupPred1(Term, Term, N, Rels, N, Rels) :- atom(Term), not(is_list(Term)), write('Symbol '), write(Term), write(' not recognized, supposed to be a Secondo object.'), nl, !. lookupPred1(Term, Term, N, Rels, N, Rels). %end fapra 2015/16 /* 11.3.6 Check the Spelling of Relation and Attribute Names */ spelled(Rel:Attr, attr(Attr2, 0, l)) :- downcase_atom(Rel, DCRel), downcase_atom(Attr, DCAttr), spelling(DCRel:DCAttr, Attr3), Attr3 = lc(Attr2), !. spelled(Rel:Attr, attr(Attr2, 0, u)) :- downcase_atom(Rel, DCRel), downcase_atom(Attr, DCAttr), spelling(DCRel:DCAttr, Attr2), !. spelled(_:_, attr(_, 0, _)) :- !, fail. % no attr entry in spelling table spelled(Rel, Rel2, l) :- downcase_atom(Rel, DCRel), spelling(DCRel, Rel3), Rel3 = lc(Rel2), !. spelled(Rel, Rel2, u) :- downcase_atom(Rel, DCRel), spelling(DCRel, Rel2), !. % if we do not get a spelling hint, % assume it was spelled correctly spelled(Rel, Rel, u) :- atom_chars(Rel, [FirstChar|_]), char_type(FirstChar, upper), write('spelling of '), write(Rel), write(' could not be determined. Assume it is spelled uppercase'), !. spelled(Rel, Rel, l) :- atom_chars(Rel, [FirstChar|_]), char_type(FirstChar, lower), write('spelling of '), write(Rel), write(' could not be determined. Assume it is spelled uppercase'), !. spelled(_, _, _) :- !, fail. % no rel entry in spelling table. %fapra 2015/16 /* 11.3.7 Check the spelling of non-relation objects */ spelledObj(Term, Obj, Type, l) :- downcase_atom(Term, DcObj), objectCatalog(DcObj, LcObj, Type), LcObj = lc(Obj), !. spelledObj(Term, Obj, Type, u) :- downcase_atom(Term, DcObj), objectCatalog(DcObj, Obj, Type), !. spelledObj(_, _, _, _) :- !, fail. % no entry, avoid backtracking. %end fapra 2015/16 /* 10.3.8 Examples We can now formulate several of the previous queries at the user level. */ example11 :- showTranslate(select [sname, bev] from staedte where bev > 500000). showTranslate(Query) :- callLookup(Query, Query2), write(Query), nl, write(Query2), nl. example12 :- showTranslate( select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000] ). example13 :- showTranslate( select * from [staedte, plz as p1, plz as p2, plz as p3] where [ sname = p1:ort, p1:plz = p2:plz + 1, p2:plz = p3:plz * 5, bev > 300000, bev < 500000, p2:plz > 50000, p2:plz < 60000, kennzeichen starts "W", p3:ort contains "burg", p3:ort starts "M"] ). /* 11.4 Translating a Query to a Plan ---- translate(Query, Stream, SelectClause, Cost) :- ---- ~Query~ is translated into a ~Stream~ to which still the translation of the ~SelectClause~ needs to be applied. A ~Cost~ is returned which currently is only the cost for evaluating the essential part, the conjunctive query. */ translate(Query orderby Attrs, sortby(Stream, AttrNames), Select, 0) :- !, translate(Query, Stream, Select, _), attrnamesSort(Attrs, AttrNames). translate(Query groupby Attrs, groupby(sortby(Stream, AttrNamesSort), AttrNamesGroup, Fields), select Select2, Cost) :- translate(Query, Stream, SelectClause, Cost), makeList(Attrs, Attrs2), attrnames(Attrs2, AttrNamesGroup), attrnamesSort(Attrs2, AttrNamesSort), SelectClause = (select Select), makeList(Select, SelAttrs), translateFields(SelAttrs, Attrs2, Fields, Select2), !. translate(Select from Rels where Preds, Stream, Select, Cost) :- pog(Rels, Preds, _, _), bestPlan(Stream, Cost), !. %fapra 2015/16 translate(Select from Rel, feed(Rel), Select, 0) :- not(isDistributedQuery), not(is_list(Rel)), !. translate(Select from Rel, ObjName,Select, 0) :- isDistributedQuery, distributedRels(Rel, ObjName, _, _, _), not(is_list(Rel)), !. translate(Select from Rel, dist(Rel,ObjName),Select, 0) :- isDistributedQuery, distributedRels(Rel, ObjName, _, _, _), not(is_list(Rel)), !. translate(Select from [Rel], feed(Rel), Select, 0). translate(Select from [Rel | Rels], product(feed(Rel), Stream), Select, 0) :- not(isDistributedQuery), translate(Select from Rels, Stream, Select, _). %end fapra 2015/16 /* ---- translateFields(Select, GroupAttrs, Fields, Select2) :- ---- Translate the ~Select~ clause of a query containing ~groupby~. Grouping was done by the attributes ~GroupAttrs~. Return a list ~Fields~ of terms of the form ~field(Name, Expr)~; such a list can be used as an argument to the groupby operator. Also, return a modified select clause ~Select2~, which will translate to a corresponding projection operation. */ translateFields([], _, [], []). translateFields([count(*) as NewAttr | Select], GroupAttrs, [field(NewAttr , count(feed(group))) | Fields], [NewAttr | Select2]) :- translateFields(Select, GroupAttrs, Fields, Select2), !. translateFields([sum(Attr) as NewAttr | Select], GroupAttrs, [field(NewAttr, sum(feed(group), attrname(Attr))) | Fields], [NewAttr| Select2]) :- translateFields(Select, GroupAttrs, Fields, Select2), !. translateFields([Attr | Select], GroupAttrs, Fields, [Attr | Select2]) :- member(Attr, GroupAttrs), !, translateFields(Select, GroupAttrs, Fields, Select2). /* Generic rule for aggregate functions, similar to sum. */ translateFields([Term as NewAttr | Select], GroupAttrs, [field(NewAttr, Term2) | Fields], [NewAttr| Select2]) :- compound(Term), functor(Term, AggrOp, 1), arg(1, Term, Attr), member(AggrOp, [min, max, avg]), functor(Term2, AggrOp, 2), arg(1, Term2, feed(group)), arg(2, Term2, attrname(Attr)), translateFields(Select, GroupAttrs, Fields, Select2), !. translateFields([Term | Select], GroupAttrs, Fields, Select2) :- compound(Term), functor(Term, AggrOp, 1), arg(1, Term, Attr), member(AggrOp, [count, sum, min, max, avg]), functor(Term2, AggrOp, 2), arg(1, Term2, feed(group)), arg(2, Term2, attrname(Attr)), translateFields(Select, GroupAttrs, Fields, Select2), write('*****'), nl, write('***** Error in groupby: missing name for new attribute'), nl, write('*****'), nl, !. translateFields([Attr | Select], GroupAttrs, Fields, Select2) :- not(member(Attr, GroupAttrs)), !, translateFields(Select, GroupAttrs, Fields, Select2), write('*****'), nl, write('***** Error in groupby: '), write(Attr), write(' is neither a grouping attribute'), nl, write(' nor an aggregate expression.'), nl, write('*****'), nl. %fapra 15/16 % Extract parts from a query destructureQuery(Select from Rel where Pred, Select, Rel, Pred). % Pred is a predicate about the value of an attribute being equal to given value attrValueEqualityPredicate(Pred, Value, Attr, Rel) :- Pred = pr(Value = Attr, Rel), Attr = attr(_, _, _). attrValueEqualityPredicate(Pred, Value, Attr, Rel) :- Pred = pr(Attr = Value, Rel), Attr = attr(_, _, _). /* ---- substituteSubterm(Substituted, Substitute, OriginalTerm, TermWithSubstitution) ---- Substituting ~Substituted~ for ~Substitute~ on ~OriginalTerm~ yields ~TermWithSubstitution~. We have a cut in every clause to remove unnecessary choice points during the search for planedges, which ois driven by meta predicates. */ % The whole term is to be substituted: substituteSubterm(Substituted, Substitute, Substituted, Substitute):- !. % The whole term doesn't match and it's not compound: substituteSubterm(Substituted, _, OriginalTerm, OriginalTerm) :- functor(OriginalTerm, _, 0), OriginalTerm \= Substituted, !. % The whole term doesn't match and it's compount - dive into its subterms: substituteSubterm(Substituted, Substitute, OriginalTerm, TermWithSubstitution) :- functor(OriginalTerm, Functor, Arity), functor(TermWithSubstitution, Functor, Arity), substituteSubtermInNthSubterm(Arity, Substituted, Substitute, OriginalTerm, TermWithSubstitution), !. % Terminal case. All subterms have been processed. substituteSubtermInNthSubterm(0, _, _, _, _):- !. % Generic case. Process nth subterm. substituteSubtermInNthSubterm(N, Substituted, Substitute, OriginalTerm, TermWithSubstitution) :- not(N = 0), arg(N, OriginalTerm, OriginalNthTerm), substituteSubterm(Substituted, Substitute, OriginalNthTerm, NthTermWithSubstitution), arg(N, TermWithSubstitution, NthTermWithSubstitution), Next is N - 1, substituteSubtermInNthSubterm(Next, Substituted, Substitute, OriginalTerm, TermWithSubstitution), !. /* ---- queryToPlan(Query, Plan, Cost) :- ---- Translate the ~Query~ into a ~Plan~. The ~Cost~ for evaluating the conjunctive query is also returned. The ~Query~ must be such that relation and attribute names have been looked up already. fapra 15/16: We have a duplicate of each non-distributed clause which treats the distributed case. These clauses are guard with an isDistributedQuery goal. end fapra 15/16 */ queryToPlan(Query, consume(dsummarize(Stream)), Cost) :- selectClause(Query, *), isDistributedQuery, !, translate(Query, Stream, select *, Cost). queryToPlan(Query, consume(Stream), Cost) :- selectClause(Query, *), !, translate(Query, Stream, select *, Cost). queryToPlan(Query, count(dsummarize(Stream)), Cost) :- selectClause(Query, count(*)), isDistributedQuery, !, translate(Query, Stream, select count(*), Cost). queryToPlan(Query, count(Stream), Cost) :- selectClause(Query, count(*)), !, translate(Query, Stream, select count(*), Cost). %TF: changed to execute projection in dmap operator queryToPlan(Query, consume(dsummarize(dmap(Stream," ", project(Plan,AttrNames)))), Cost) :- isDistributedQuery, !, translate(Query, dist(rel(_,Var,_),Stream), select Attrs, Cost), !, feedRenameRelation(rel(dot,Var,_),Plan), makeList(Attrs, Attrs2), attrnames(Attrs2, AttrNames). queryToPlan(Query, consume(project(Stream, AttrNames)), Cost) :- translate(Query, Stream, select Attrs, Cost), !, makeList(Attrs, Attrs2), attrnames(Attrs2, AttrNames). %end fapra 15/16 /* ---- queryToStream(Query, Plan, Cost) :- ---- Same as ~queryToPlan~, but returns a stream plan, if possible. To be used for ``mixed queries'' that add Secondo operators to the plan built by the optimizer. */ queryToStream(Query, Stream, Cost) :- selectClause(Query, *), translate(Query, Stream, select *, Cost), !. queryToStream(Query, count(Stream), Cost) :- selectClause(Query, count(*)), translate(Query, Stream, select count(*), Cost), !. queryToStream(Query, project(Stream, AttrNames), Cost) :- translate(Query, Stream, select Attrs, Cost), !, makeList(Attrs, Attrs2), attrnames(Attrs2, AttrNames). /* ---- selectClause(Query, C) :- ---- The select-clause of the ~Query~ is ~C~. */ % allows select [count(*)] to succeed. Activate later on in development. %selectClause(select [X] from Y, Z) :- % selectClause(select X from Y, Z). selectClause(select * from _, *) :- !. selectClause(select count(*) from _, count(*)) :- !. selectClause(select Attrs from _, Attrs) :- !. selectClause(Query groupby _, C) :- !, selectClause(Query, C). selectClause(Query orderby _, C) :- !, selectClause(Query, C). /* ---- attrnames(Attrs, AttrNames) :- ---- Transform each attribute X into attrname(X). */ attrnames([], []). attrnames([Attr | Attrs], [attrname(Attr) | AttrNames]) :- attrnames(Attrs, AttrNames). /* ---- attrnamesSort(Attrs, AttrNames) :- ---- Transform attribute names of orderby clause. */ attrnamesSort([], []). attrnamesSort([Attr | Attrs], [Attr2 | Attrs2]) :- attrnameSort(Attr, Attr2), attrnamesSort(Attrs, Attrs2). attrnameSort(Attr asc, attrname(Attr) asc) :- !. attrnameSort(Attr desc, attrname(Attr) desc) :- !. attrnameSort(Attr, attrname(Attr) asc). /* 11.3.8 Integration with Optimizer ---- optimize(Query). ---- Optimize ~Query~ and print the best ~Plan~. */ optimize(Query) :- callLookup(Query, Query2), queryToPlan(Query2, Plan, Cost), writeln(Plan), plan_to_atom_string(Plan, SecondoQuery), write('The plan is: '), nl, nl, write(SecondoQuery), nl, nl, write('Estimated Cost: '), write(Cost), nl, nl. optimize(Query, QueryOut, CostOut) :- callLookup(Query, Query2), queryToPlan(Query2, Plan, CostOut), plan_to_atom_string(Plan, QueryOut). /* ---- sqlToPlan(QueryText, Plan) ---- Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom. */ sqlToPlan(QueryText, Plan) :- term_to_atom(sql Query, QueryText), optimize(Query, Plan, _). /* ---- sqlToPlan(QueryText, Plan) ---- Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom. ~QueryText~ starts not with sql in this version. */ sqlToPlan(QueryText, Plan) :- term_to_atom(Query, QueryText), optimize(Query, Plan, _). /* 11.3.8 Examples We can now formulate the previous example queries in the user level language. Example3: */ example14 :- optimize( select * from [staedte as s, plz as p] where [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0] ). example14(Query, Cost) :- optimize( select * from [staedte as s, plz as p] where [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0], Query, Cost ). /* Example4: */ example15 :- optimize( select * from staedte where bev > 500000 ). example15(Query, Cost) :- optimize( select * from staedte where bev > 500000, Query, Cost ). /* Example5: */ example16 :- optimize( select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000] ). example16(Query, Cost) :- optimize( select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000], Query, Cost ). /* Example6. This may need a larger local stack size. Start Prolog as ---- pl -L4M ---- which initializes the local stack to 4 MB. */ example17 :- optimize( select * from [staedte, plz as p1, plz as p2, plz as p3] where [ sname = p1:ort, p1:plz = p2:plz + 1, p2:plz = p3:plz * 5, bev > 300000, bev < 500000, p2:plz > 50000, p2:plz < 60000, kennzeichen starts "W", p3:ort contains "burg", p3:ort starts "M"] ). example17(Query, Cost) :- optimize( select * from [staedte, plz as p1, plz as p2, plz as p3] where [ sname = p1:ort, p1:plz = p2:plz + 1, p2:plz = p3:plz * 5, bev > 300000, bev < 500000, p2:plz > 50000, p2:plz < 60000, kennzeichen starts "W", p3:ort contains "burg", p3:ort starts "M"], Query, Cost ). /* Example 18: */ example18 :- optimize( select * from [staedte, plz as p1] where [ sname = p1:ort, bev > 300000, bev < 500000, p1:plz > 50000, p1:plz < 60000, kennzeichen starts "W", p1:ort contains "burg", p1:ort starts "M"] ). example18(Query, Cost) :- optimize( select * from [staedte, plz as p1] where [ sname = p1:ort, bev > 300000, bev < 500000, p1:plz > 50000, p1:plz < 60000, kennzeichen starts "W", p1:ort contains "burg", p1:ort starts "M"], Query, Cost ). /* Example 19: */ example19 :- optimize( select * from [staedte, plz as p1, plz as p2] where [ sname = p1:ort, p1:plz = p2:plz + 1, bev > 300000, bev < 500000, p1:plz > 50000, p1:plz < 60000, kennzeichen starts "W", p1:ort contains "burg", p1:ort starts "M"] ). example19(Query, Cost) :- optimize( select * from [staedte, plz as p1, plz as p2] where [ sname = p1:ort, p1:plz = p2:plz + 1, bev > 300000, bev < 500000, p1:plz > 50000, p1:plz < 60000, kennzeichen starts "W", p1:ort contains "burg", p1:ort starts "M"], Query, Cost ). /* Example 20: */ example20 :- optimize( select * from [staedte as s, plz as p] where [ p:ort = s:sname, p:plz > 40000, s:bev > 300000] ). example20(Query, Cost) :- optimize( select * from [staedte as s, plz as p] where [ p:ort = s:sname, p:plz > 40000, s:bev > 300000], Query, Cost ). /* Example 21: */ example21 :- optimize( select * from [staedte, plz as p1, plz as p2, plz as p3] where [ sname = p1:ort, p1:plz = p2:plz + 1, p2:plz = p3:plz * 5] ). example21(Query, Cost) :- optimize( select * from [staedte, plz as p1, plz as p2, plz as p3] where [ sname = p1:ort, p1:plz = p2:plz + 1, p2:plz = p3:plz * 5], Query, Cost ). /* 12 Optimizing and Calling Secondo ---- sql Term sql(Term, SecondoQueryRest) let(X, Term) let(X, Term, SecondoQueryRest) ---- ~Term~ must be one of the available select-from-where statements. It is optimized and Secondo is called to execute it. ~SecondoQueryRest~ is a character string (atom) containing a sequence of Secondo operators that can be appended to a given plan found by the optimizer; in this case the optimizer returns a plan producing a stream. The two versions of ~let~ allow one to assign the result of a query to a new object ~X~, using the optimizer. */ sql Term :- mOptimize(Term, Query, Cost), nl, write('The best plan is: '), nl, nl, write(Query), nl, nl, write('Estimated Cost: '), write(Cost), nl, nl, query(Query). sql(Term, SecondoQueryRest) :- mStreamOptimize(Term, SecondoQuery, Cost), concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query), nl, write('The best plan is: '), nl, nl, write(Query), nl, nl, write('Estimated Cost: '), write(Cost), nl, nl, query(Query). let(X, Term) :- mOptimize(Term, Query, Cost), nl, write('The best plan is: '), nl, nl, write(Query), nl, nl, write('Estimated Cost: '), write(Cost), nl, nl, concat_atom(['let ', X, ' = ', Query], '', Command), secondo(Command). let(X, Term, SecondoQueryRest) :- mStreamOptimize(Term, SecondoQuery, Cost), concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query), nl, write('The best plan is: '), nl, nl, write(Query), nl, nl, write('Estimated Cost: '), write(Cost), nl, nl, concat_atom(['let ', X, ' = ', Query], '', Command), secondo(Command). /* ---- streamOptimize(Term, Query, Cost) :- ---- Optimize the ~Term~ producing an incomplete Secondo query plan ~Query~ returning a stream. */ streamOptimize(Term, Query, Cost) :- callLookup(Term, Term2), queryToStream(Term2, Plan, Cost), plan_to_atom_string(Plan, Query). /* ---- mOptimize(Term, Query, Cost) :- mStreamOptimize(union [Term], Query, Cost) :- ---- Means ``multi-optimize''. Optimize a ~Term~ possibly consisting of several subexpressions to be independently optimized, as in union and intersection queries. ~mStreamOptimize~ is a variant returning a stream. */ :-op(800, fx, union). :-op(800, fx, intersection). mOptimize(union Terms, Query, Cost) :- mStreamOptimize(union Terms, Plan, Cost), concat_atom([Plan, 'consume'], '', Query). mOptimize(intersection Terms, Query, Cost) :- mStreamOptimize(intersection Terms, Plan, Cost), concat_atom([Plan, 'consume'], '', Query). mOptimize(Term, Query, Cost) :- optimize(Term, Query, Cost). mStreamOptimize(union [Term], Query, Cost) :- streamOptimize(Term, QueryPart, Cost), concat_atom([QueryPart, 'sort rdup '], '', Query). mStreamOptimize(union [Term | Terms], Query, Cost) :- streamOptimize(Term, Plan1, Cost1), mStreamOptimize(union Terms, Plan2, Cost2), concat_atom([Plan1, 'sort rdup ', Plan2, 'mergeunion '], '', Query), Cost is Cost1 + Cost2. mStreamOptimize(intersection [Term], Query, Cost) :- streamOptimize(Term, QueryPart, Cost), concat_atom([QueryPart, 'sort rdup '], '', Query). mStreamOptimize(intersection [Term | Terms], Query, Cost) :- streamOptimize(Term, Plan1, Cost1), mStreamOptimize(intersection Terms, Plan2, Cost2), concat_atom([Plan1, 'sort rdup ', Plan2, 'mergesec '], '', Query), Cost is Cost1 + Cost2. mStreamOptimize(Term, Query, Cost) :- streamOptimize(Term, Query, Cost). /* Some auxiliary stuff. */ bestPlanCount :- bestPlan(P, _), plan_to_atom_string(P, S), atom_concat(S, ' count', Q), nl, write(Q), nl, query(Q). bestPlanConsume :- bestPlan(P, _), plan_to_atom_string(P, S), atom_concat(S, ' consume', Q), nl, write(Q), nl, query(Q). %fapra 15/16 /* Rename an attribute to match the renaming of its relation. */ % No renaming needed. renamedRelAttr(RelAttr, Var, RelAttr) :- Var = *, !. renamedRelAttr(attr(Name, N, C), Var, attr(Var:Name, N, C)). % Extract the down case name from an attr term. attrnameDCAtom(Attr, DCAttrName) :- Attr = attr(_:Name, _, _), !, atom_string(AName, Name), downcase_atom(AName, DCAttrName). attrnameDCAtom(Attr, DCAttrName) :- Attr = attr(Name, _, _), atom_string(AName, Name), downcase_atom(AName, DCAttrName). /* Rame a tuple a stream. */ % No renaming needed. renameStream(Stream, Var, Plan) :- Var = *, !, Plan = Stream. renameStream(Stream, Var, rename(Stream, Var)). /* Transform a relation to a tuple stream and rename it. */ % No renaming needed. feedRenameRelation(Rel, Var, Plan) :- Var = *, !, Plan = feed(Rel). feedRenameRelation(Rel, Var, Plan) :- Plan = rename(feed(Rel), Var). feedRenameRelation(rel(Rel, Var,_), Plan) :- feedRenameRelation(Rel, Var, Plan),!. %end fapra 15/16