4951 lines
133 KiB
Prolog
4951 lines
133 KiB
Prolog
/*
|
|
//paragraph [10] title: [{\Large \bf ] [}]
|
|
//characters [1] formula: [$] [$]
|
|
//[ae] [\"{a}]
|
|
//[oe] [\"{o}]
|
|
//[ue] [\"{u}]
|
|
//[ss] [{\ss}]
|
|
//[Ae] [\"{A}]
|
|
//[Oe] [\"{O}]
|
|
//[Ue] [\"{U}]
|
|
//[**] [$**$]
|
|
//[toc] [\tableofcontents]
|
|
//[=>] [\verb+=>+]
|
|
//[:Section Translation] [\label{sec:translation}]
|
|
//[Section Translation] [Section~\ref{sec:translation}]
|
|
//[:Section 4.1.1] [\label{sec:4.1.1}]
|
|
//[Section 4.1.1] [Section~\ref{sec:4.1.1}]
|
|
//[Figure pog1] [Figure~\ref{fig:pog1.eps}]
|
|
//[Figure pog2] [Figure~\ref{fig:pog2.eps}]
|
|
//[newpage] [\newpage]
|
|
|
|
[10] A Query Optimizer for Secondo
|
|
|
|
Ralf Hartmut G[ue]ting, November - December 2002
|
|
|
|
[toc]
|
|
|
|
[newpage]
|
|
|
|
1 Introduction
|
|
|
|
1.1 Overview
|
|
|
|
This document not only describes, but ~is~ an optimizer for Secondo database
|
|
systems. It contains the current source code for the optimizer, written in
|
|
PROLOG. It can be compiled by a PROLOG system (SWI-Prolog 5.0 or higher)
|
|
directly.
|
|
|
|
The current version of the optimizer is capable of handling conjunctive queries,
|
|
formulated in a relational environment. That is, it takes a set of
|
|
relations together with a set of selection or join predicates over these
|
|
relations and produces a query plan that can be executed by (the current
|
|
relational system implemented in) Secondo.
|
|
|
|
The selection of the query plan is based on cost estimates which in turn are
|
|
based on given selectivities of predicates. Selectivities of predicates are
|
|
maintained in a table (a set of PROLOG facts). If the selectivity of a predicate
|
|
is not available from that table, then an interaction with the Secondo system
|
|
should take place to determine the selectivity. There are various strategies
|
|
conceivable for doing this which will be described elsewhere. However, the
|
|
current version of the optimizer just emits a message that the selectivity is
|
|
missing and quits.
|
|
|
|
The optimizer also implements a simple SQL-like language for entering queries.
|
|
The notation is pretty much like SQL except that the lists occurring (lists of
|
|
attributes, relations, predicates) are written in PROLOG notation. Also note
|
|
that the where-clause is a list of predicates rather than an arbitrary boolean
|
|
expression and hence allows one to formulate conjunctive queries only.
|
|
|
|
|
|
1.2 Optimization Algorithm
|
|
|
|
The optimizer employs an as far as we know novel optimization algorithm which is
|
|
based on ~shortest path search in a predicated order graph~. This technique is
|
|
remarkably simple to implement, yet efficient.
|
|
|
|
A predicate order graph (POG) is the graph whose nodes represent sets of
|
|
evaluated predicates and whose edges represent predicates, containing all
|
|
possible orders of predicates. Such a graph for three predicates ~p~, ~q~, and
|
|
~r~ is shown in [Figure pog1].
|
|
|
|
Figure 1: A predicate order graph for three predicates ~p~, ~q~
|
|
and ~r~ [pog1.eps]
|
|
|
|
Here the bottom node has no predicate evaluated and the top node has all
|
|
predicates evaluated. The example illustrates, more precisely, possible
|
|
sequences of selections on an argument relation of size 1000. If selectivities
|
|
of predicates are given (for ~p~ its is 1/2, for ~q~ 1/10, and for ~r~ 1/5),
|
|
then we can annotate the POG with sizes of intermediate results as shown,
|
|
assuming that all predicates are independent (not ~correlated~). This means that
|
|
the selectivity of a predicate is the same regardless of the order of
|
|
evaluation, which of course does not need to be true.
|
|
|
|
If we can further compute for each edge of the POG possible evaluation
|
|
methods, adding a new ``executable'' edge for each method, and mark the
|
|
edge with estimated costs for this method, then finding a shortest path through
|
|
the POG corresponds to finding the cheapest query plan. [Figure pog2] shows an
|
|
example of a POG annotated with evaluation methods.
|
|
|
|
Figure 2: A POG annotated with evaluation methods [pog2.eps]
|
|
|
|
In this example, there is only a single method associated with each edge. In
|
|
general, however, there will be several methods. The example represents the
|
|
query:
|
|
|
|
---- select *
|
|
from Staedte, Laender, Regiert
|
|
where Land = LName and PName = 'CDU' and LName = PLand
|
|
----
|
|
|
|
for relation schemas
|
|
|
|
---- Staedte(SName, Bev, Land)
|
|
Laender(LName, LBev)
|
|
Regiert(PName, PLand)
|
|
----
|
|
|
|
Hence the optimization algorithm described and implemented in the following
|
|
sections proceeds in the following steps:
|
|
|
|
1 For given relations and predicates, construct the predicate order graph and
|
|
store it as a set of facts in memory (Sections 2 through 4).
|
|
|
|
2 For each edge, construct corresponding executable edges (called ~plan edges~
|
|
below). This is controlled by optimization rules describing how selections or
|
|
joins can be translated (Sections 5 and 6).
|
|
|
|
3 Based on sizes of arguments and selectivities (stored in the file
|
|
~database.pl~) compute the sizes of all intermediate results. Also annotate
|
|
edges of the POG with selectivities (Section 7).
|
|
|
|
4 For each plan edge, compute its cost and store it in memory (as a set of
|
|
facts). This is based on sizes of arguments and the selectivity associated with
|
|
the edge and on a cost function (predicate) written for each operator that may
|
|
occur in a query plan (Section 8).
|
|
|
|
5 The algorithm for finding shortest paths by Dijkstra is employed to find a
|
|
shortest path through the graph of plan edges annotated with costs (called ~cost
|
|
edges~). This path is transformed into a Secondo query plan and returned
|
|
(Section 9).
|
|
|
|
6 Finally, a simple subset of SQL in a PROLOG notation is implemented. So it
|
|
is possible to enter queries in this language. The optimizer determines from it
|
|
the lists of relations and predicates in the form needed for constructing the
|
|
POG, and then invokes step 1 (Section 11).
|
|
|
|
|
|
|
|
2 Data Structures
|
|
|
|
In the construction of the predicate order graph, the following data structures
|
|
are used.
|
|
|
|
---- pr(P, A)
|
|
pr(P, B, C)
|
|
----
|
|
|
|
A selection or join predicate, e.g. pr(p, a), pr(q, b, c). Means a
|
|
selection predicate p on relation a, and a join predicate q on relations
|
|
b and c.
|
|
|
|
---- arp(Arg, Rels, Preds)
|
|
----
|
|
|
|
An argument, relations, predicate triple. It describes a set of relations
|
|
~Rels~ on which the predicates ~Preds~ have been evaluated. To access the
|
|
result of this evaluation one needs to refer to ~Arg~.
|
|
|
|
Arg is either arg(N) or res(N), N an integer. Examples: arg(5), res(1)
|
|
|
|
Rels is a list of relation names, e.g. [a, b, c]
|
|
|
|
Preds is a list of predicate names, e.g. [p, q, r]
|
|
|
|
|
|
---- node(No, Preds, Partition)
|
|
----
|
|
|
|
A node.
|
|
|
|
~No~ is the number of the node into which the evaluated predicates
|
|
are encoded (each bit corresponds to a predicate number, e.g. node number
|
|
5 = 101 (binary) says that the first predicate (no 1) and the third
|
|
predicate (no 4) have been evaluated in this node. For predicate i,
|
|
its predicate number is "2^{i-1}"[1].
|
|
|
|
~Preds~ is the list of names of evaluated predicates, e.g. [p, q].
|
|
|
|
~Partition~ is a list of arp elements, see above.
|
|
|
|
|
|
---- edge(Source, Target, Term, Result, Node, PredNo)
|
|
----
|
|
|
|
An edge, representing a predicate.
|
|
|
|
~Source~ and ~Target~ are the numbers of source and target nodes in the
|
|
predicate order graph, e.g. 0 and 1.
|
|
|
|
~Term~ is either a selection or a join, for example,
|
|
select(arg(0), pr(p, a) or join(res(4), res(1), pr(q, a, b))
|
|
|
|
~Result~ is the number of the node into which the result of this predicate
|
|
application should be written. Normally it is the same as Target,
|
|
but for an edge leading to a node combining several independent results,
|
|
it the number of the ``real'' node to obtain this result. An example of this can
|
|
be found in [Figure pog2] where the join edge leading from node 3 to node 7 does
|
|
not use the result of node 3 (there is none) but rather the two independent
|
|
results from nodes 1 and 2 (this pair is conceptually the result available in
|
|
node 3).
|
|
|
|
~Node~ is the source node for this edge, in the form node(...) as
|
|
described above.
|
|
|
|
~PredNo~ is the predicate number for the predicate represented by this
|
|
edge. Predicate numbers are of the form "2^i" as explained
|
|
for nodes.
|
|
|
|
3 Construction of the Predicate Order Graph
|
|
|
|
3.1 pog
|
|
|
|
---- pog(Rels, Preds, Nodes, Edges) :-
|
|
----
|
|
|
|
For a given list of relations ~Rels~ and predicates ~Preds~, ~Nodes~ and
|
|
~Edges~ are the predicate order graph where edges are annotated with selection
|
|
and join operations applied to the correct arguments.
|
|
|
|
Example call:
|
|
|
|
---- pog([staedte, laender], [pr(p, staedte), pr(q, laender), pr(r, staedte,
|
|
laender)], N, E).
|
|
----
|
|
|
|
*/
|
|
|
|
pog(Rels, Preds, Nodes, Edges) :-
|
|
length(Rels, N), reverse(Rels, Rels2), deleteArguments,
|
|
partition(Rels2, N, Partition0),
|
|
length(Preds, M), reverse(Preds, Preds2),
|
|
pog2(Partition0, M, Preds2, Nodes, Edges),
|
|
deleteNodes, storeNodes(Nodes),
|
|
deleteEdges, storeEdges(Edges),
|
|
% RHG 2014 Create plan and cost edges during shortest path search.
|
|
% deletePlanEdges,
|
|
deleteVariables,
|
|
% createPlanEdges,
|
|
HighNode is 2**M -1,
|
|
retract(highNode(_)), assert(highNode(HighNode)),
|
|
deleteSizes.
|
|
% deleteCostEdges.
|
|
% end RHG 2014
|
|
/*
|
|
|
|
3.2 partition
|
|
|
|
---- partition(Rels, N, Partition0) :-
|
|
----
|
|
|
|
Given a list of ~N~ relations ~Rel~, return an initial partition such that
|
|
each relation r is packed into the form arp(arg(i), [r], []).
|
|
|
|
*/
|
|
|
|
partition([], _, []).
|
|
|
|
partition([Rel | Rels], N, [Arp | Arps]) :-
|
|
N1 is N-1,
|
|
Arp = arp(arg(N), [Rel], []),
|
|
assert(argument(N, Rel)),
|
|
partition(Rels, N1, Arps).
|
|
|
|
|
|
/*
|
|
|
|
3.3 pog2
|
|
|
|
---- pog2(Partition0, NoOfPreds, Preds, Nodes, Edges) :-
|
|
----
|
|
|
|
For the given start partition ~Partition0~, a list of predicates ~Preds~
|
|
containing ~NoOfPred~ predicates, return the ~Nodes~ and ~Edges~ of the
|
|
predicate order graph.
|
|
|
|
*/
|
|
|
|
pog2(Part0, _, [], [node(0, [], Part0)], []).
|
|
|
|
pog2(Part0, NoOfPreds, [Pred | Preds], Nodes, Edges) :-
|
|
N1 is NoOfPreds-1,
|
|
PredNo is 2**N1,
|
|
pog2(Part0, N1, Preds, NodesOld, EdgesOld),
|
|
newNodes(Pred, PredNo, NodesOld, NodesNew),
|
|
newEdges(Pred, PredNo, NodesOld, EdgesNew),
|
|
copyEdges(Pred, PredNo, EdgesOld, EdgesCopy),
|
|
append(NodesOld, NodesNew, Nodes),
|
|
append(EdgesOld, EdgesNew, Edges2),
|
|
append(Edges2, EdgesCopy, Edges).
|
|
|
|
/*
|
|
3.4 newNodes
|
|
|
|
---- newNodes(Pred, PredNo, NodesOld, NodesNew) :-
|
|
----
|
|
|
|
Given a predicate ~Pred~ with number ~PredNo~ and a list of nodes ~NodesOld~
|
|
resulting from evaluating all predicates with lower numbers, construct
|
|
a list of nodes which result from applying to each of the existing nodes
|
|
the predicate ~Pred~.
|
|
|
|
*/
|
|
|
|
newNodes(_, _, [], []).
|
|
|
|
newNodes(Pred, PNo, [Node | Nodes], [NodeNew | NodesNew]) :-
|
|
newNode(Pred, PNo, Node, NodeNew),
|
|
newNodes(Pred, PNo, Nodes, NodesNew).
|
|
|
|
newNode(Pred, PNo, node(No, Preds, Part), node(No2, [Pred | Preds], Part2)) :-
|
|
No2 is No + PNo,
|
|
copyPart(Pred, PNo, Part, Part2).
|
|
|
|
/*
|
|
3.5 copyPart
|
|
|
|
---- copyPart(Pred, PNo, Part, Part2) :-
|
|
----
|
|
|
|
copy the partition ~Part~ of a node so that the new partition ~Part2~
|
|
after applying the predicate ~Pred~ with number ~PNo~ results.
|
|
|
|
This means that for a selection predicate we have to find the arp
|
|
containing its relation and modify it accordingly, the other arps
|
|
in the partition are copied unchanged.
|
|
|
|
For a join predicate we have to find the two arps containing its
|
|
two relations and to merge them into a single arp; the remaining
|
|
arps are copied unchanged.
|
|
|
|
Or a join predicate may find its two relations in the same arp which means
|
|
another join on the same two relations has already been performed.
|
|
|
|
*/
|
|
|
|
copyPart(_, _, [], []).
|
|
|
|
copyPart(pr(P, Rel), PNo, Arps, [Arp2 | Arps2]) :-
|
|
select(X, Arps, Arps2),
|
|
X = arp(Arg, Rels, Preds),
|
|
member(Rel, Rels), !,
|
|
nodeNo(Arg, No),
|
|
ResNo is No + PNo,
|
|
Arp2 = arp(res(ResNo), Rels, [P | Preds]).
|
|
|
|
copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :-
|
|
select(X, Arps, Arps2),
|
|
X = arp(Arg, Rels, Preds),
|
|
member(R1, Rels),
|
|
member(R2, Rels), !,
|
|
nodeNo(Arg, No),
|
|
ResNo is No + PNo,
|
|
Arp2 = arp(res(ResNo), Rels, [P | Preds]).
|
|
|
|
copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :-
|
|
select(X, Arps, Rest),
|
|
X = arp(ArgX, RelsX, PredsX),
|
|
member(R1, RelsX),
|
|
select(Y, Rest, Arps2),
|
|
Y = arp(ArgY, RelsY, PredsY),
|
|
member(R2, RelsY), !,
|
|
nodeNo(ArgX, NoX),
|
|
nodeNo(ArgY, NoY),
|
|
ResNo is NoX + NoY + PNo,
|
|
append(RelsX, RelsY, Rels),
|
|
append(PredsX, PredsY, Preds),
|
|
Arp2 = arp(res(ResNo), Rels, [P | Preds]).
|
|
|
|
nodeNo(arg(_), 0).
|
|
nodeNo(res(N), N).
|
|
|
|
/*
|
|
3.6 newEdges
|
|
|
|
---- newEdges(Pred, PredNo, NodesOld, EdgesNew) :-
|
|
----
|
|
|
|
for each of the nodes in ~NodesOld~ return a new edge in ~EdgesNew~
|
|
built by applying the predicate ~Pred~ with number ~PNo~.
|
|
|
|
*/
|
|
|
|
newEdges(_, _, [], []).
|
|
|
|
newEdges(Pred, PNo, [Node | Nodes], [Edge | Edges]) :-
|
|
newEdge(Pred, PNo, Node, Edge),
|
|
newEdges(Pred, PNo, Nodes, Edges).
|
|
|
|
newEdge(pr(P, Rel), PNo, Node, Edge) :-
|
|
findRel(Rel, Node, Source, Arg),
|
|
Target is Source + PNo,
|
|
nodeNo(Arg, ArgNo),
|
|
Result is ArgNo + PNo,
|
|
Edge = edge(Source, Target, select(Arg, pr(P, Rel)), Result, Node, PNo).
|
|
|
|
newEdge(pr(P, R1, R2), PNo, Node, Edge) :-
|
|
findRels(R1, R2, Node, Source, Arg),
|
|
Target is Source + PNo,
|
|
nodeNo(Arg, ArgNo),
|
|
Result is ArgNo + PNo,
|
|
Edge = edge(Source, Target, select(Arg, pr(P, R1, R2)), Result, Node, PNo).
|
|
|
|
newEdge(pr(P, R1, R2), PNo, Node, Edge) :-
|
|
findRels(R1, R2, Node, Source, Arg1, Arg2),
|
|
Target is Source + PNo,
|
|
nodeNo(Arg1, Arg1No),
|
|
nodeNo(Arg2, Arg2No),
|
|
Result is Arg1No + Arg2No + PNo,
|
|
Edge = edge(Source, Target, join(Arg1, Arg2, pr(P, R1, R2)), Result,
|
|
Node, PNo).
|
|
|
|
|
|
/*
|
|
3.7 findRel
|
|
|
|
---- findRel(Rel, Node, Source, Arg):-
|
|
----
|
|
|
|
find the relation ~Rel~ within a node description ~Node~ and return the
|
|
node number ~No~ and the description ~Arg~ of the argument (e.g. res(3)) found
|
|
within the arp containing Rel.
|
|
|
|
---- findRels(Rel1, Rel2, Node, Source, Arg1, Arg2):-
|
|
----
|
|
|
|
similar for two relations.
|
|
|
|
*/
|
|
|
|
findRel(Rel, node(No, _, Arps), No, ArgX) :-
|
|
select(X, Arps, _),
|
|
X = arp(ArgX, RelsX, _),
|
|
member(Rel, RelsX).
|
|
|
|
|
|
findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX) :-
|
|
select(X, Arps, _),
|
|
X = arp(ArgX, RelsX, _),
|
|
member(Rel1, RelsX),
|
|
member(Rel2, RelsX).
|
|
|
|
findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX, ArgY) :-
|
|
select(X, Arps, Rest),
|
|
X = arp(ArgX, RelsX, _),
|
|
member(Rel1, RelsX), !,
|
|
select(Y, Rest, _),
|
|
Y = arp(ArgY, RelsY, _),
|
|
member(Rel2, RelsY).
|
|
|
|
|
|
|
|
/*
|
|
3.8 copyEdges
|
|
|
|
---- copyEdges(Pred, PredNo, EdgesOld, EdgesCopy):-
|
|
----
|
|
|
|
Given a set of edges ~EdgesOld~ and a predicate ~Pred~ with number ~PredNo~,
|
|
return a copy of each edge in ~EdgesOld~ in ~EdgesNew~ such that the
|
|
copied version reflects a previous application of predicate ~Pred~.
|
|
|
|
This is implemented by retrieving from each old edge its start node,
|
|
constructing for this start node and predicate ~Pred~ a target node to
|
|
which then the predicate associated with the old edge is applied.
|
|
|
|
*/
|
|
|
|
copyEdges(_, _, [], []).
|
|
|
|
copyEdges(Pred, PNo, [Edge | Edges], [Edge2 | Edges2]) :-
|
|
Edge = edge(_, _, Term, _, Node, PNo2),
|
|
pred(Term, Pred2),
|
|
newNode(Pred, PNo, Node, NodeNew),
|
|
newEdge(Pred2, PNo2, NodeNew, Edge2),
|
|
copyEdges(Pred, PNo, Edges, Edges2).
|
|
|
|
pred(select(_, P), P).
|
|
pred(join(_, _, P), P).
|
|
|
|
/*
|
|
3.9 writeEdgeList
|
|
|
|
---- writeEdgeList(List):-
|
|
----
|
|
|
|
Write the list of edges ~List~.
|
|
|
|
*/
|
|
|
|
writeEdgeList([edge(Source, Target, Term, _, _, _) | Edges]) :-
|
|
write(Source), write('-'), write(Target), write(':'), write(Term), nl,
|
|
writeEdgeList(Edges).
|
|
|
|
/*
|
|
4 Managing the Graph in Memory
|
|
|
|
4.1 Storing and Deleting Nodes and Edges
|
|
|
|
---- storeNodes(NodeList).
|
|
storeEdges(EdgeList).
|
|
deleteNodes.
|
|
deleteEdges.
|
|
----
|
|
|
|
Just as the names say. Store a list of nodes or edges, repectively, as facts;
|
|
and delete them from memory again.
|
|
|
|
*/
|
|
|
|
storeNodes([Node | Nodes]) :- assert(Node), storeNodes(Nodes).
|
|
storeNodes([]).
|
|
|
|
storeEdges([Edge | Edges]) :- assert(Edge), storeEdges(Edges).
|
|
storeEdges([]).
|
|
|
|
deleteNode :- retract(node(_, _, _)), fail.
|
|
deleteNodes :- not(deleteNode).
|
|
|
|
deleteEdge :- retract(edge(_, _, _, _, _, _)), fail.
|
|
deleteEdges :- not(deleteEdge).
|
|
|
|
deleteArgument :- retract(argument(_, _)), fail.
|
|
deleteArguments :- not(deleteArgument).
|
|
|
|
|
|
/*
|
|
4.2 Writing Nodes and Edges
|
|
|
|
---- writeNodes.
|
|
writeEdges.
|
|
----
|
|
|
|
Write the currently stored nodes and edges, respectively.
|
|
|
|
*/
|
|
writeNode :-
|
|
node(No, Preds, Partition),
|
|
write('Node: '), write(No), nl,
|
|
write('Preds: '), write(Preds), nl,
|
|
write('Partition: '), write(Partition), nl, nl,
|
|
fail.
|
|
writeNodes :- not(writeNode).
|
|
|
|
writeEdge :-
|
|
edge(Source, Target, Term, Result, _, _),
|
|
write('Source: '), write(Source), nl,
|
|
write('Target: '), write(Target), nl,
|
|
write('Term: '), write(Term), nl,
|
|
write('Result: '), write(Result), nl, nl,
|
|
fail.
|
|
|
|
writeEdges :- not(writeEdge).
|
|
|
|
/*
|
|
5 Rule-Based Translation of Selections and Joins
|
|
[:Section Translation]
|
|
|
|
5.1 Precise Notation for Input
|
|
|
|
Since now we have to look into the structure of predicates, and need to be
|
|
able to generate Secondo executable expressions in their precise format, we
|
|
need to define the input notation precisely.
|
|
|
|
5.1.1 The Source Language
|
|
[:Section 4.1.1]
|
|
|
|
We assume the queries can be entered basically as select-from-where
|
|
structures, as follows. Let schemas be given as:
|
|
|
|
---- plz(PLZ:string, Ort:string)
|
|
Staedte(SName:string, Bev:int, PLZ:int, Vorwahl:string, Kennzeichen:string)
|
|
----
|
|
|
|
Then we should be able to enter queries:
|
|
|
|
---- select SName, Bev
|
|
from Staedte
|
|
where Bev > 500000
|
|
----
|
|
|
|
In the next example we need to avoid the name conflict for PLZ
|
|
|
|
---- select *
|
|
from Staedte as s, plz as p
|
|
where s.SName = p.Ort and p.PLZ > 40000
|
|
----
|
|
|
|
In the PROLOG version, we will use the following notations:
|
|
|
|
---- rel(Name, Var, Case)
|
|
----
|
|
|
|
For example
|
|
|
|
---- rel(staedte, *, u)
|
|
----
|
|
|
|
is a term denoting the ~Staedte~ relation; ~u~ says that it is actually to be
|
|
written in upper case whereas
|
|
|
|
---- rel(plz, *, l)
|
|
----
|
|
|
|
denotes the ~plz~ relation to be written in lower case. The second argument
|
|
~Var~ contains an explicit variable if it has been assigned, otherwise the
|
|
symbol [*]. If an explicit variable has been used in the query, we need to
|
|
perfom renaming in the plan. For example, in the second query above, the
|
|
relations would be denoted as
|
|
|
|
---- rel(staedte, s, u)
|
|
rel(plz, p, l)
|
|
----
|
|
|
|
Within predicates, attributes are annotated as follows:
|
|
|
|
---- attr(Name, Arg, Case)
|
|
|
|
attr(ort, 2, u)
|
|
----
|
|
|
|
This says that ~ort~ is an attribute of the second argument within a join
|
|
condition, to be written in upper case. For a selection condition, the second
|
|
argument is ignored; it can be set to 0 or 1.
|
|
|
|
Hence for the two queries above, the translation would be
|
|
|
|
---- fromwhere(
|
|
[rel(staedte, *, u)],
|
|
[pr(attr(bev, 0, u) > 500000, rel(staedte, *, u))]
|
|
)
|
|
|
|
fromwhere(
|
|
[rel(staedte, s, u), rel(plz, p, l)],
|
|
[pr(attr(s:sName, 1, u) = attr(p:ort, 2, u),
|
|
rel(staedte, s, u), rel(plz, p, l)),
|
|
pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l))]
|
|
)
|
|
----
|
|
|
|
Note that the upper or lower case distinction refers only to the first letter
|
|
of a relation or attribute name. Other letters are written on the PROLOG side
|
|
in the same way as in Secondo.
|
|
|
|
Note further that if explicit variables are used, the attribute name will
|
|
include them, e.g. s:sName.
|
|
|
|
The projection occurring in the select-from-where statement is for the moment
|
|
not passed to the optimizer; it is treated outside.
|
|
|
|
So example 2 is rewritten as:
|
|
|
|
*/
|
|
|
|
example3 :- pog([rel(staedte, s, u), rel(plz, p, l)],
|
|
[pr(attr(p:ort, 2, u) = attr(s:sName, 1, u),
|
|
rel(staedte, s, u), rel(plz, p, l) ),
|
|
pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l)),
|
|
pr((attr(p:pLZ, 1, u) mod 5) = 0, rel(plz, p, l))], _, _).
|
|
|
|
/*
|
|
|
|
The two queries mentioned above are:
|
|
|
|
*/
|
|
|
|
example4 :- pog(
|
|
[rel(staedte, *, u)],
|
|
[pr(attr(bev, 1, u) > 500000, rel(staedte, *, u))],
|
|
_, _).
|
|
|
|
example5 :- pog(
|
|
[rel(staedte, s, u), rel(plz, p, l)],
|
|
[pr(attr(s:sName, 1, u) = attr(p:ort, 2, u), rel(staedte, s, u), rel(plz, p,
|
|
l)),
|
|
pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l))],
|
|
_, _).
|
|
|
|
/*
|
|
|
|
5.1.2 The Target Language
|
|
|
|
In the target language, we use the following operators:
|
|
|
|
---- feed: rel(Tuple) -> stream(Tuple)
|
|
consume: stream(Tuple) -> rel(Tuple)
|
|
|
|
filter: stream(Tuple) x (Tuple -> bool) -> stream(Tuple)
|
|
product: stream(Tuple1) x stream(Tuple2) -> stream(Tuple3)
|
|
|
|
where Tuple3 = Tuple1 o Tuple2
|
|
|
|
hashjoin: stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2
|
|
x nbuckets -> stream(Tuple3)
|
|
|
|
where Tuple3 = Tuple1 o Tuple2
|
|
attrname1 occurs in Tuple1
|
|
attrname2 occurs in Tuple2
|
|
nbuckets is the number of hash buckets
|
|
to be used
|
|
|
|
sortmergejoin: stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2
|
|
-> stream(Tuple3)
|
|
|
|
where Tuple3 = Tuple1 o Tuple2
|
|
attrname1 occurs in Tuple1
|
|
attrname2 occurs in Tuple2
|
|
|
|
loopjoin: stream(Tuple1) x (Tuple1 -> stream(Tuple2)
|
|
-> stream(Tuple3)
|
|
|
|
where Tuple3 = Tuple1 o Tuple2
|
|
|
|
exactmatch: btree(Tuple, AttrType) x rel(Tuple) x AttrType
|
|
-> stream(Tuple)
|
|
|
|
extend: stream(Tuple1) x (Newname x (Tuple -> Attrtype))+
|
|
-> stream(Tuple2)
|
|
|
|
where Tuple2 is Tuple1 to which pairs
|
|
(Newname, Attrtype) have been appended
|
|
|
|
remove: stream(Tuple1) x Attrname+ -> stream(Tuple2)
|
|
|
|
where Tuple2 is Tuple1 from which the mentioned
|
|
attributes have been removed.
|
|
|
|
project: stream(Tuple1) x Attrname+ -> stream(Tuple2)
|
|
|
|
where Tuple2 is Tuple1 projected on the
|
|
mentioned attributes.
|
|
|
|
rename stream(Tuple1) x NewName -> stream(Tuple2)
|
|
|
|
where Tuple2 is Tuple1 modified by appending
|
|
"_newname" to each attribute name
|
|
|
|
count stream(Tuple) -> int
|
|
|
|
count the number of tuples in a stream
|
|
|
|
sortby stream(Tuple) x (Attrname, asc/desc)+ -> stream(Tuple)
|
|
|
|
sort stream lexicographically by the given
|
|
attribute names
|
|
|
|
groupby stream(Tuple) x GroupAttrs x NewFields -> stream(Tuple2)
|
|
|
|
group stream by the grouping attributes; for each group
|
|
compute new fields each of which is specified in the
|
|
form Attrname : Expr. The argument stream must already
|
|
be sorted by the grouping attributes.
|
|
|
|
dloop darray(X) x string x (X->Y) -> darray(Y)
|
|
|
|
Performs a function on each element of a darray instance.The
|
|
string argument specifies the name of the result. If the
|
|
name is undefined or an empty string, a name is generated
|
|
automatically.
|
|
|
|
dloop2 darray(X) x darray(Y) x string x (fun : X x Y -> Z) -> darray(Z)
|
|
|
|
Performs a function on the elements of two darray instances.
|
|
The string argument specifies the name of the resulting
|
|
darray. If the string is undefined or empty, a name is
|
|
generated automatically.
|
|
|
|
dmap d[f]array x string x fun -> d[f]array
|
|
|
|
Performs a function on a distributed file array. If the
|
|
string argument is empty or undefined, a name for the result
|
|
is chosen automatically. If not, the string specifies the
|
|
name. The result is of type dfarray if the function produces
|
|
a tuple stream or a relationi; otherwise the result is a
|
|
darray.
|
|
|
|
dmap2 d[f]array x d[f]array x string x fun -> d[f]array
|
|
|
|
Joins the slots of two distributed arrays.
|
|
|
|
partition d[f]array(rel(tuple)) x string x (tuple->int) x int-> dfmatrix
|
|
|
|
Redistributes the contents of a dfarray value. The new slot
|
|
contents are kept on the worker where the values were stored
|
|
before redistributing them. The last argument (int)
|
|
determines the number of slots of the redistribution. If
|
|
this value is smaller or equal to zero, the number of slots
|
|
is overtaken from the array argument.
|
|
|
|
partitionF d[f]array(rel(X)) x string x ([fs]rel(X)->stream(Y)) x (Y ->
|
|
int) x int -> dfmatrix(rel(Y))
|
|
|
|
Repartitions a distributed [file] array. Before repartition,
|
|
a function is applied to the slots.
|
|
|
|
collect2 dfmatrix x string x int -> dfarray
|
|
|
|
Collects the slots of a matrix into a dfarray. The string
|
|
is the name of the resulting array, the int value specified
|
|
a port for file transfer. The port value can be any port
|
|
usable on all workers. A corresponding file transfer server
|
|
is started automatically.
|
|
|
|
areduce dfmatrix(rel(t)) x string x (fsrel(t)->Y) x int -> d[f]array(Y)
|
|
|
|
Performs a function on the distributed slots of an array.
|
|
The task distribution is dynamically, meaning that a fast
|
|
worker will handle more slots than a slower one. The result
|
|
type depends on the result of the function. For a relation
|
|
or a tuple stream, a dfarray will be created. For other non-
|
|
stream results, a darray is the resulting type.
|
|
|
|
dsummarize darray(DATA) -> stream(DATA) , d[f]array(rel(X)) -> stream(X)
|
|
|
|
Produces a stream of the darray elements.
|
|
|
|
getValue {darray(T),dfarray(T)} -> array(T)
|
|
|
|
Converts a distributed array into a normal one.
|
|
|
|
tie ((array t) (map t t t)) -> t
|
|
|
|
Calculates the "value" of an array evaluating the elements
|
|
of the array with a given function from left to right.
|
|
|
|
|
|
|
|
----
|
|
|
|
In PROLOG, all expressions involving such operators are written in prefix
|
|
notation.
|
|
|
|
Parameter functions are written as
|
|
|
|
---- fun([param(Var1, Type1), ..., paran(VarN, TypeN)], Expr)
|
|
----
|
|
|
|
|
|
5.1.3 Converting Plans to Atoms and Writing them.
|
|
|
|
Predicate ~plan\_to\_atom~ converts a plan to a string atom, which represents
|
|
the plan as a SECONDO query in text syntax. For attributes we have to
|
|
distinguish whether a leading ``.'' needs to be written (if the attribute occurs
|
|
within a parameter function) or whether just the attribute name is needed as in
|
|
the arguments for hashjoin, for example. Predicate ~wp~ (``write plan'') uses
|
|
predicate ~plan\_to\_atom~ to convert its argument to an atom and then writes
|
|
that atom to standard output.
|
|
|
|
*/
|
|
|
|
upper(Lower, Upper) :-
|
|
atom_codes(Lower, [First | Rest]),
|
|
to_upper(First, First2),
|
|
UpperList = [First2 | Rest],
|
|
atom_codes(Upper, UpperList).
|
|
|
|
wp(Plan) :-
|
|
plan_to_atom_string(Plan, PlanAtom),
|
|
write(PlanAtom).
|
|
|
|
/*
|
|
|
|
Function ~newVariable~ outputs a new unique variable name.
|
|
The variable name is unique in the sense that ~newVariable~ never
|
|
outputs the same name twice (in a PROLOG session).
|
|
It should be emphasized that the output
|
|
is not a PROLOG variable but a variable name to be used for defining
|
|
abstractions in the Secondo system.
|
|
|
|
*/
|
|
|
|
:-
|
|
dynamic(varDefined/1).
|
|
|
|
newVariable(Var) :-
|
|
varDefined(N),
|
|
!,
|
|
N1 is N + 1,
|
|
retract(varDefined(N)),
|
|
assert(varDefined(N1)),
|
|
atom_concat('var', N1, Var).
|
|
|
|
newVariable(Var) :-
|
|
assert(varDefined(1)),
|
|
Var = 'var1'.
|
|
|
|
deleteVariable :- retract(varDefined(_)), fail.
|
|
|
|
deleteVariables :- not(deleteVariable).
|
|
|
|
/*
|
|
Arguments:
|
|
|
|
*/
|
|
|
|
%fapra 2015/16
|
|
|
|
/*
|
|
To consider distributed queries with predicates containing non-relation
|
|
objects, it's necessary to replicate the objects to the
|
|
involved workers.
|
|
|
|
For now we assume that every found object is contained in the distributed
|
|
part of the query (function of dmap or dmap2).
|
|
|
|
A possible later extension is to examine the distributed relations and
|
|
to share the objects only to workers containing parts of those relations.
|
|
|
|
*/
|
|
|
|
:-
|
|
dynamic(replicatedObject/1).
|
|
|
|
%distributed query without objects
|
|
replicateObjects(QueryPart, QueryPart) :-
|
|
findall(X,replicatedObject(X), ObjectList),
|
|
length(ObjectList,0),!.
|
|
|
|
%distributed query using objects in predicate
|
|
replicateObjects(QueryPart, Result) :-
|
|
findall(X,replicatedObject(X), ObjectList),
|
|
length(ObjectList,Length),
|
|
Length >0,
|
|
maplist(createSharedClause,ObjectList,CommandList),
|
|
append(CommandList,[QueryPart], Result).
|
|
|
|
createSharedClause(Obj, SharedCommand) :-
|
|
atom_concat('share("',Obj,StrObj),
|
|
atom_concat(StrObj,'",TRUE)',SharedCommand).
|
|
|
|
plan_to_atom_string(X, Result) :-
|
|
isDistributedQuery,
|
|
retractall(replicatedObject(_)),
|
|
plan_to_atom(X,QueryPart),
|
|
replicateObjects(QueryPart, Result),
|
|
!.
|
|
|
|
plan_to_atom_string(X, Result) :-
|
|
not(isDistributedQuery),
|
|
plan_to_atom(X,Result),
|
|
!.
|
|
|
|
plan_to_atom(obj(Object,_,u), Result) :-
|
|
isDistributedQuery,
|
|
upper(Object, UpperObject),
|
|
atom_concat(UpperObject, ' ', Result),
|
|
assertOnce(replicatedObject(UpperObject)),
|
|
!.
|
|
|
|
plan_to_atom(obj(Object,_,l), Result) :-
|
|
isDistributedQuery,
|
|
atom_concat(Object, ' ', Result),
|
|
assertOnce(replicatedObject(Object)),
|
|
!.
|
|
|
|
|
|
plan_to_atom(obj(Object,_,u), Result) :-
|
|
upper(Object, UpperObject),
|
|
atom_concat(UpperObject, ' ', Result),
|
|
!.
|
|
|
|
plan_to_atom(obj(Object,_,l), Result) :-
|
|
atom_concat(Object, ' ', Result),
|
|
!.
|
|
|
|
plan_to_atom(dot, Result) :-
|
|
atom_concat('.', ' ', Result),
|
|
!.
|
|
|
|
%end fapra 2015/16
|
|
|
|
plan_to_atom(rel(Name, _, l), Result) :-
|
|
atom_concat(Name, ' ', Result),
|
|
!.
|
|
|
|
plan_to_atom(rel(Name, _, u), Result) :-
|
|
upper(Name, Name2),
|
|
atom_concat(Name2, ' ', Result),
|
|
!.
|
|
|
|
plan_to_atom(res(N), Result) :-
|
|
atom_concat('res(', N, Res1),
|
|
atom_concat(Res1, ') ', Result),
|
|
!.
|
|
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
is_list(Term), Term = [First | _], atomic(First), !,
|
|
atom_codes(TermRes, Term),
|
|
normalize_space(atom(Out),TermRes),
|
|
concat_atom(['"', Out, '"'], '', Result).
|
|
|
|
/*
|
|
Lists:
|
|
|
|
*/
|
|
|
|
|
|
plan_to_atom([X], AtomX) :-
|
|
plan_to_atom(X, AtomX),
|
|
!.
|
|
|
|
plan_to_atom([X | Xs], Result) :-
|
|
plan_to_atom(X, XAtom),
|
|
plan_to_atom(Xs, XsAtom),
|
|
concat_atom([XAtom, ', ', XsAtom], '', Result),
|
|
!.
|
|
|
|
|
|
/*
|
|
Operators: only special syntax. General rules for standard syntax
|
|
see below.
|
|
|
|
*/
|
|
|
|
|
|
plan_to_atom(sample(Rel, S, T), Result) :-
|
|
plan_to_atom(Rel, ResRel),
|
|
concat_atom([ResRel, 'sample[', S, ', ', T, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(hashjoin(X, Y, A, B, C), Result) :-
|
|
plan_to_atom(X, XAtom),
|
|
plan_to_atom(Y, YAtom),
|
|
plan_to_atom(A, AAtom),
|
|
plan_to_atom(B, BAtom),
|
|
concat_atom([XAtom, YAtom, 'hashjoin[',
|
|
AAtom, ', ', BAtom, ', ', C, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(sortmergejoin(X, Y, A, B), Result) :-
|
|
plan_to_atom(X, XAtom),
|
|
plan_to_atom(Y, YAtom),
|
|
plan_to_atom(A, AAtom),
|
|
plan_to_atom(B, BAtom),
|
|
concat_atom([XAtom, YAtom, 'sortmergejoin[',
|
|
AAtom, ', ', BAtom, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(mergejoin(X, Y, A, B), Result) :-
|
|
plan_to_atom(X, XAtom),
|
|
plan_to_atom(Y, YAtom),
|
|
plan_to_atom(A, AAtom),
|
|
plan_to_atom(B, BAtom),
|
|
concat_atom([XAtom, YAtom, 'mergejoin[',
|
|
AAtom, ', ', BAtom, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(groupby(Stream, GroupAttrs, Fields), Result) :-
|
|
plan_to_atom(Stream, SAtom),
|
|
plan_to_atom(GroupAttrs, GAtom),
|
|
plan_to_atom(Fields, FAtom),
|
|
concat_atom([SAtom, 'groupby[', GAtom, '; ', FAtom, ']'], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(field(NewAttr, Expr), Result) :-
|
|
plan_to_atom(attrname(NewAttr), NAtom),
|
|
plan_to_atom(Expr, EAtom),
|
|
concat_atom([NAtom, ': ', EAtom], '', Result).
|
|
|
|
plan_to_atom(exactmatchfun(IndexName, Rel, attr(Name, R, Case)), Result) :-
|
|
plan_to_atom(Rel, RelAtom),
|
|
plan_to_atom(a(Name, R, Case), AttrAtom),
|
|
newVariable(T),
|
|
concat_atom(['fun(', T, ' : TUPLE) ', IndexName,
|
|
' ', RelAtom, 'exactmatch[attr(', T, ', ', AttrAtom, ')] '], Result),
|
|
!.
|
|
|
|
|
|
plan_to_atom(newattr(Attr, Expr), Result) :-
|
|
plan_to_atom(Attr, AttrAtom),
|
|
plan_to_atom(Expr, ExprAtom),
|
|
concat_atom([AttrAtom, ': ', ExprAtom], '', Result),
|
|
!.
|
|
|
|
|
|
plan_to_atom(rename(X, Y), Result) :-
|
|
plan_to_atom(X, XAtom),
|
|
concat_atom([XAtom, '{', Y, '} '], '', Result),
|
|
!.
|
|
|
|
|
|
plan_to_atom(fun(Params, Expr), Result) :-
|
|
params_to_atom(Params, ParamAtom),
|
|
plan_to_atom(Expr, ExprAtom),
|
|
concat_atom(['fun ', ParamAtom, ExprAtom], '', Result),
|
|
!.
|
|
|
|
|
|
plan_to_atom(attribute(X, Y), Result) :-
|
|
plan_to_atom(X, XAtom),
|
|
plan_to_atom(Y, YAtom),
|
|
concat_atom(['attr(', XAtom, ', ', YAtom, ')'], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(increment(X), Result) :-
|
|
plan_to_atom(X, XAtom),
|
|
concat_atom([XAtom, '++'], '', Result),
|
|
!.
|
|
|
|
%fapra 2015/16
|
|
|
|
plan_to_atom(dloop2(PreArg1, PreArg2, PostArg1, PostArg2), Result) :-
|
|
plan_to_atom(PreArg1, PreArg1Atom),
|
|
plan_to_atom(PreArg2, PreArg2Atom),
|
|
plan_to_atom(PostArg1, PostArg1Atom),
|
|
plan_to_atom(PostArg2, PostArg2Atom),
|
|
concat_atom(
|
|
[PreArg1Atom,
|
|
PreArg2Atom,
|
|
'dloop2[',
|
|
PostArg1Atom, ', ',
|
|
PostArg2Atom, ']'], '', Result),
|
|
!.
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
Sort orders and attribute names.
|
|
|
|
*/
|
|
|
|
plan_to_atom(asc(Attr), Result) :-
|
|
plan_to_atom(Attr, AttrAtom),
|
|
atom_concat(AttrAtom, ' asc', Result).
|
|
|
|
plan_to_atom(desc(Attr), Result) :-
|
|
plan_to_atom(Attr, AttrAtom),
|
|
atom_concat(AttrAtom, ' desc', Result).
|
|
|
|
plan_to_atom(attr(Name, Arg, Case), Result) :-
|
|
plan_to_atom(a(Name, Arg, Case), ResA),
|
|
atom_concat('.', ResA, Result).
|
|
|
|
plan_to_atom(attrname(attr(Name, Arg, Case)), Result) :-
|
|
plan_to_atom(a(Name, Arg, Case), Result).
|
|
|
|
plan_to_atom(a(A:B, _, _), Result) :-
|
|
upper(B, B2),
|
|
concat_atom([B2, '_', A], Result),
|
|
!.
|
|
|
|
plan_to_atom(a(X, _, _), X2) :-
|
|
upper(X, X2),
|
|
!.
|
|
|
|
%fapra 2015/16
|
|
|
|
plan_to_atom(our_attrname(attr(Name, Arg, Case)), Result) :-
|
|
plan_to_atom(our_a(Name, Arg, Case), Result).
|
|
|
|
plan_to_atom(our_a(_:B, _, _), Result) :-
|
|
upper(B, B2),
|
|
concat_atom(['..', B2], Result),
|
|
!.
|
|
|
|
plan_to_atom(our_a(X, _, _), Result) :-
|
|
upper(X, X2),
|
|
concat_atom(['..', X2], Result),
|
|
!.
|
|
|
|
plan_to_atom(simple_attrname(attr(Name, Arg, Case)), Result) :-
|
|
plan_to_atom(simple_a(Name, Arg, Case), Result), !.
|
|
|
|
plan_to_atom(simple_a(_:B, _, _), B2) :-
|
|
upper(B, B2),
|
|
!.
|
|
|
|
plan_to_atom(simple_a(X, _, _), X2) :-
|
|
upper(X, X2),
|
|
!.
|
|
|
|
plan_to_atom(extendstream(A, B, C), Plan) :-
|
|
plan_to_atom(A, PlanA),
|
|
plan_to_atom(B, PlanB),
|
|
plan_to_atom(C, PlanC),
|
|
concat_atom([PlanA, ' ', 'extendstream(',
|
|
PlanB, ': ', PlanC, ')'], Plan).
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
Translation of operators driven by predicate ~secondoOp~ in
|
|
file ~opSyntax~. There are rules for
|
|
|
|
* postfix, 1 or 2 arguments
|
|
|
|
* postfix followed by one argument in square brackets, in total 2
|
|
or 3 arguments
|
|
|
|
* prefix, 2 arguments
|
|
|
|
Other syntax, if not default (see below) needs to be coded explicitly.
|
|
|
|
*/
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 1),
|
|
secondoOp(Op, postfix, 1),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
concat_atom([Res1, ' ', Op, ' '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 2),
|
|
secondoOp(Op, postfix, 2),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
concat_atom([Res1, ' ', Res2, ' ', Op, ' '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 2),
|
|
secondoOp(Op, postfixbrackets, 2),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
concat_atom([Res1, ' ', Op, '[', Res2, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 3),
|
|
secondoOp(Op, postfixbrackets, 3),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
arg(3, Term, Arg3),
|
|
plan_to_atom(Arg3, Res3),
|
|
concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 2),
|
|
secondoOp(Op, prefix, 2),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
concat_atom([Op, '(', Res1, ',', Res2, ') '], '', Result),
|
|
!.
|
|
|
|
%fapra 2015/16
|
|
|
|
/*
|
|
Additional plan\_to\_atom rules to map Distributed2-operators.
|
|
|
|
*/
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 1),
|
|
secondoOp(Op, prefix, 1),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
concat_atom([Op, '(', Res1, ') '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 4),
|
|
secondoOp(Op, prefix, 4),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
arg(3, Term, Arg3),
|
|
plan_to_atom(Arg3, Res3),
|
|
arg(4, Term, Arg4),
|
|
plan_to_atom(Arg4, Res4),
|
|
concat_atom([Op, '(', Res1, ',', Res2, ', ', Res3,
|
|
', ', Res4, ') '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 4),
|
|
secondoOp(Op, postfixbrackets, 4),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
arg(3, Term, Arg3),
|
|
plan_to_atom(Arg3, Res3),
|
|
arg(4, Term, Arg4),
|
|
plan_to_atom(Arg4, Res4),
|
|
concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, ', ',
|
|
Res4, ']'], '' , Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 3),
|
|
secondoOp(Op, postfixbrackets2, 3),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
arg(3, Term, Arg3),
|
|
plan_to_atom(Arg3, Res3),
|
|
concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 4),
|
|
secondoOp(Op, postfixbrackets3, 4),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
arg(3, Term, Arg3),
|
|
plan_to_atom(Arg3, Res3),
|
|
arg(4, Term, Arg4),
|
|
plan_to_atom(Arg4, Res4),
|
|
concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3,', ',
|
|
Res4, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 5),
|
|
secondoOp(Op, postfixbrackets3, 5),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
arg(3, Term, Arg3),
|
|
plan_to_atom(Arg3, Res3),
|
|
arg(4, Term, Arg4),
|
|
plan_to_atom(Arg4, Res4),
|
|
arg(5, Term, Arg5),
|
|
plan_to_atom(Arg5, Res5),
|
|
concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, ', ',
|
|
Res4,', ',Res5, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 5),
|
|
secondoOp(Op, postfixbrackets4, 5),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
arg(3, Term, Arg3),
|
|
plan_to_atom(Arg3, Res3),
|
|
arg(4, Term, Arg4),
|
|
plan_to_atom(Arg4, Res4),
|
|
arg(5, Term, Arg5),
|
|
plan_to_atom(Arg5, Res5),
|
|
concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, ', ',
|
|
Res4,', ',Res5, '] '], '', Result),
|
|
!.
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 6),
|
|
secondoOp(Op, postfixbrackets5, 6),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg2, Res2),
|
|
arg(3, Term, Arg3),
|
|
plan_to_atom(Arg3, Res3),
|
|
arg(4, Term, Arg4),
|
|
plan_to_atom(Arg4, Res4),
|
|
arg(5, Term, Arg5),
|
|
plan_to_atom(Arg5, Res5),
|
|
arg(6, Term, Arg6),
|
|
plan_to_atom(Arg6, Res6),
|
|
concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, ', ', Res4,', ',
|
|
Res5,', ',Res6, '] '], '', Result),
|
|
!.
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
Generic rules. Operators that are not
|
|
recognized are assumed to be:
|
|
|
|
* 1 argument: prefix
|
|
|
|
* 2 arguments: infix
|
|
|
|
* 3 arguments: prefix
|
|
|
|
*/
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 1),
|
|
arg(1, Term, Arg1),
|
|
plan_to_atom(Arg1, Res1),
|
|
concat_atom([Op, '(', Res1, ')'], '', Result).
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 2),
|
|
arg(1, Term, Arg1),
|
|
arg(2, Term, Arg2),
|
|
plan_to_atom(Arg1, Res1),
|
|
plan_to_atom(Arg2, Res2),
|
|
concat_atom(['(', Res1, ' ', Op, ' ', Res2, ')'], '', Result).
|
|
|
|
plan_to_atom(Term, Result) :-
|
|
functor(Term, Op, 3),
|
|
arg(1, Term, Arg1),
|
|
arg(2, Term, Arg2),
|
|
arg(3, Term, Arg3),
|
|
plan_to_atom(Arg1, Res1),
|
|
plan_to_atom(Arg2, Res2),
|
|
plan_to_atom(Arg3, Res3),
|
|
concat_atom([Op, '(', Res1, ', ', Res2, ', ', Res3, ')'], '', Result).
|
|
|
|
plan_to_atom(X, Result) :-
|
|
atomic(X),
|
|
term_to_atom(X, Result),
|
|
!.
|
|
|
|
plan_to_atom(X, _) :-
|
|
write('Error while converting term: '),
|
|
write(X),
|
|
nl.
|
|
|
|
|
|
params_to_atom([], ' ').
|
|
|
|
params_to_atom([param(Var, Type) | Params], Result) :-
|
|
type_to_atom(Type, TypeAtom),
|
|
params_to_atom(Params, ParamsAtom),
|
|
concat_atom(['(', Var, ': ', TypeAtom, ') ', ParamsAtom], '', Result),
|
|
!.
|
|
|
|
type_to_atom(tuple, 'TUPLE').
|
|
type_to_atom(tuple2, 'TUPLE2').
|
|
type_to_atom(group, 'GROUP').
|
|
|
|
|
|
/*
|
|
|
|
5.2 Optimization Rules
|
|
|
|
We introduce a predicate [=>] which can be read as ``translates into''.
|
|
|
|
5.2.1 Translation of the Arguments of an Edge of the POG
|
|
|
|
If the argument is of the form res(N), then it is a stream already and can be
|
|
used unchanged. If it is of the form arg(N), then it is a base relation; a
|
|
~feed~ must be applied and possibly a ~rename~.
|
|
|
|
*/
|
|
|
|
ordered(plz, ort).
|
|
|
|
ordered(orte, ort).
|
|
|
|
ordered(staedte, sName).
|
|
|
|
ordered(thousand, no).
|
|
|
|
ordered(ten, no).
|
|
|
|
order(Name, Attr) :-
|
|
ordered(Name, Attr), !.
|
|
|
|
order(_, none).
|
|
|
|
% The following rule is needed for listing all plan edges or cost edges,
|
|
% not for optimization as such.
|
|
|
|
res(N) => [res(N), none].
|
|
|
|
% arg(N) => feed(rel(Name, *, Case)) :-
|
|
% argument(N, rel(Name, *, Case)), !.
|
|
|
|
% arg(N) => rename(feed(rel(Name, Var, Case)), Var) :-
|
|
% argument(N, rel(Name, Var, Case)).
|
|
|
|
[res(N), P] => [res(N), P].
|
|
|
|
|
|
% Translate into distributed argument
|
|
arg(N) => [Plan, Properties] :-
|
|
isDistributedQuery,
|
|
!,
|
|
distributedarg(N) => [Plan, Properties].
|
|
|
|
/*
|
|
Treat transaltion into distributed arguments. The properties we use are...
|
|
|
|
~distribution~(DistributionType, DistributionAttribute, DistirbutionParameter):
|
|
DistributionType is share, spatial, modulo, function or random,
|
|
DistributionAttribute is the attribute of the relation used to determine
|
|
on which partition(s) to put a given tuple (in theory this could also be a list),
|
|
DistributionParamter is the parameter used for the distribution (like grid or
|
|
funciton object / operator).
|
|
|
|
~distributedobjecttype~(Type) (Type is darray, dfarray or dfmatrix).
|
|
|
|
~disjointpartitioning~ signals that, if we treat a partition as the multi set
|
|
of the tuples it contains, the union of all partitions is the original relation
|
|
(put differently, in as far as duplicates exist, they have been present in the
|
|
original relation).
|
|
|
|
Since some second plans eliminate duplicates anyways, they can do without their
|
|
arguments having this property (e.g. spatial join).
|
|
|
|
*/
|
|
|
|
% Translate into object found in SEC2DISTRIBUTED.
|
|
distributedarg(N) => [ObjName, X] :-
|
|
X =[distribution(DistType, DCDistAttr, DistParam),
|
|
distributedobjecttype(DistObjType),disjointpartitioning],
|
|
argument(N, Rel),
|
|
Rel = rel(Name, _, _),
|
|
distributedRels(rel(Name, _, _), ObjName, DistObjType,
|
|
DistType, DistAttr, DistParam),
|
|
not(DistType = spatial),
|
|
downcase_atom(DistAttr, DCDistAttr).
|
|
|
|
% Spatial partitioning with filtering on original attribute
|
|
% does not in general yield disjoint partitions
|
|
distributedarg(N) => [ObjName,
|
|
[distribution(DistType, DCDistAttr, DistParam),
|
|
distributedobjecttype(DistObjType)]] :-
|
|
argument(N, Rel),
|
|
Rel = rel(Name, _, _),
|
|
distributedRels(rel(Name, _, _), ObjName, DistObjType,
|
|
DistType, DistAttr, DistParam),
|
|
DistType = spatial,
|
|
downcase_atom(DistAttr, DCDistAttr).
|
|
|
|
% Filter spatially distributed argument on attribute original.
|
|
distributedarg(N) => [Plan,
|
|
[distribution(spatial, DCDistAttr, DistParam),
|
|
distributedobjecttype(DistObjType), disjointpartitioning]] :-
|
|
argument(N, Rel),
|
|
Rel = rel(Name, _, _),
|
|
distributedRels(rel(Name, _, _), ObjName, DistObjType,
|
|
spatial, DistAttr, DistParam),
|
|
downcase_atom(DistAttr, DCDistAttr),
|
|
Plan = dmap(ObjName, " ", filter(feed(rel(., *, u)), attr(original, l, u))).
|
|
|
|
/*
|
|
Redistributed argument relation to be spatially distributed using the
|
|
provided attribute. The distribution type must be spatial and the
|
|
attribute must be provided as a ground term. The grid may be provided
|
|
to be used for the distribution. If it is not provided we fall back to
|
|
using the grid object called grid. You need to have this in your database.
|
|
Yields a dfarray or a dfmatrix.
|
|
|
|
*/
|
|
|
|
distributedarg(N) => [Plan, [distribution(DistType,DistAttr,Grid),
|
|
distributedobjecttype(DistObjType)]] :-
|
|
% only use this in one direction. Might be generalized in the future.
|
|
ground(DistAttr),
|
|
ground(DistType),
|
|
% if we do not have a grid specified, use the grid-object
|
|
(ground(Grid) -> true; Grid = grid),
|
|
DistType = spatial,
|
|
argument(N, Rel),
|
|
Rel = rel(Name, _, _),
|
|
distributedRels(rel(Name, _, _), ObjName, _, OriginalDistType, _, _),
|
|
% cannot redistribute replicated relations
|
|
not(OriginalDistType = share),
|
|
spelled(Name:DistAttr, AttrTerm),
|
|
InnerPlan = partitionF(ObjName, " ", extendstream(feed(rel('.', *, u)),
|
|
attrname(attr(cell, *, u)), cellnumber(bbox(AttrTerm), Grid)),
|
|
attr('.Cell', *, u), 0), %there should be another option to add the 2nd dot
|
|
% collect into dfarray or simply be content with the dfmatrix
|
|
(DistObjType = dfarray,
|
|
Plan = collect2(InnerPlan, " ", 1238);
|
|
DistObjType = dfmatrix,
|
|
Plan = InnerPlan).
|
|
|
|
arg(N) => [feed(rel(Name, *, Case)), [order(X)]] :-
|
|
argument(N, rel(Name, *, Case)), !,
|
|
order(Name, X).
|
|
|
|
arg(N) => [rename(feed(rel(Name, Var, Case)), Var), [order(Var:X)]] :-
|
|
argument(N, rel(Name, Var, Case)), !,
|
|
order(Name, X).
|
|
|
|
/*
|
|
5.2.2 Translation of Selections
|
|
|
|
*/
|
|
|
|
%fapra 2015/16
|
|
|
|
% Translate selection into distributed selection.
|
|
select(Arg, Y) => X :-
|
|
isDistributedQuery,
|
|
!, /* Operand is distributed. Do not translate into local selection. */
|
|
distributedselect(Arg, Y) => X.
|
|
|
|
%end fapra 2015/16
|
|
|
|
% select(Arg, pr(Pred, _)) => filter(ArgS, Pred) :-
|
|
% Arg => ArgS.
|
|
|
|
% select(Arg, pr(Pred, _, _)) => filter(ArgS, Pred) :-
|
|
% Arg => ArgS.
|
|
|
|
select(Arg, pr(Pred, _)) => [filter(ArgS, Pred), P] :-
|
|
Arg => [ArgS, P].
|
|
|
|
|
|
select(Arg, pr(Pred, _, _)) => [filter(ArgS, Pred), P] :-
|
|
Arg => [ArgS, P].
|
|
|
|
|
|
/*
|
|
|
|
Translation of selections using indices.
|
|
|
|
*/
|
|
|
|
select(arg(N), Y) => [X, P] :-
|
|
indexselect(arg(N), Y) => [X, P], !.
|
|
|
|
select(arg(N), Y) => [X, [none]] :-
|
|
indexselect(arg(N), Y) => X.
|
|
|
|
indexselect(arg(N), pr(attr(AttrName, Arg, Case) = Y, Rel)) => X :-
|
|
indexselect(arg(N), pr(Y = attr(AttrName, Arg, Case), Rel)) => X.
|
|
|
|
indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) =>
|
|
[exactmatch(IndexName, rel(Name, *, Case), Y), [order(AttrName)]]
|
|
:-
|
|
argument(N, rel(Name, *, Case)),
|
|
!,
|
|
hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName).
|
|
|
|
indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) =>
|
|
[rename(exactmatch(IndexName, rel(Name, Var, Case), Y), Var),
|
|
[order(AttrName)]]
|
|
:-
|
|
argument(N, rel(Name, Var, Case)),
|
|
!,
|
|
hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName).
|
|
|
|
indexselect(arg(N), pr(attr(AttrName, Arg, Case) <= Y, Rel)) => X :-
|
|
indexselect(arg(N), pr(Y >= attr(AttrName, Arg, Case), Rel)) => X.
|
|
|
|
indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) =>
|
|
[leftrange(IndexName, rel(Name, *, Case), Y), [order(AttrName)]]
|
|
:-
|
|
argument(N, rel(Name, *, Case)),
|
|
!,
|
|
hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName).
|
|
|
|
indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) =>
|
|
[rename(leftrange(IndexName, rel(Name, Var, Case), Y), Var),
|
|
[order(AttrName)]]
|
|
:-
|
|
argument(N, rel(Name, Var, Case)),
|
|
!,
|
|
hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName).
|
|
|
|
indexselect(arg(N), pr(attr(AttrName, Arg, Case) >= Y, Rel)) => X :-
|
|
indexselect(arg(N), pr(Y <= attr(AttrName, Arg, Case), Rel)) => X.
|
|
|
|
indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) =>
|
|
[rightrange(IndexName, rel(Name, *, Case), Y), [order(AttrName)]]
|
|
:-
|
|
argument(N, rel(Name, *, Case)),
|
|
!,
|
|
hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName).
|
|
|
|
indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) =>
|
|
[rename(rightrange(IndexName, rel(Name, Var, Case), Y), Var),
|
|
[order(AttrName)]]
|
|
:-
|
|
argument(N, rel(Name, Var, Case)),
|
|
!,
|
|
hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName).
|
|
|
|
%fapra 2015/16
|
|
|
|
/*
|
|
Translation of selections that concern distributed relations.
|
|
|
|
*/
|
|
|
|
% Commutativity of intersects.
|
|
distributedselect(ObjName,
|
|
pr(Val intersects attr(Attr, Arg, Case), Rel)) => X :-
|
|
distributedselect(ObjName, pr(attr(Attr, Arg, Case) intersects Val, Rel))
|
|
=> X.
|
|
|
|
% Use spatial index for an intersection predicate.
|
|
distributedselect(arg(N), Pred)
|
|
=> [dmap2(IndexObj, RelObj, " ",
|
|
filter(filter(Intersection, InnerPred), attr(original, l, u)), 1238),
|
|
[distributedobjecttype(dfarray), disjointpartitioning]] :-
|
|
argument(N, Rel),
|
|
Pred = pr(Attr intersects Val, rel(_, Var, _)),
|
|
Pred = pr(InnerPred, _),
|
|
% We need a materialized argument relation to use the index
|
|
distributedRels(Rel, RelObj, _, _, _),
|
|
RelObj = rel(RelObjName, _, _),
|
|
% Lookup an rtree index for the relation + attribute
|
|
downcase_atom(RelObjName, DCRelObjName),
|
|
attrnameDCAtom(Attr, DCAttr),
|
|
distributedIndex(DCRelObjName, DCAttr, rtree, DCIndexObjName),
|
|
% Check the database object for the correct spelling
|
|
spelledObj(DCIndexObjName, IndexObjName,_, Case),
|
|
IndexObj = rel(IndexObjName, *, Case),
|
|
IndParam = rel('.', *, u),
|
|
RelParam = rel('..', *, u),
|
|
renameStream(windowintersects(IndParam, RelParam, Val),
|
|
Var, Intersection).
|
|
|
|
% Use btree index for a starts predicate.
|
|
distributedselect(arg(N), pr(Attr starts Val, rel(_, Var, _)))
|
|
=> [dmap2(IndexObj, RelObj, " ",
|
|
Range, 1238), [distributedobjecttype(dfarray), disjointpartitioning]] :-
|
|
argument(N, Rel),
|
|
distributedRels(Rel, RelObj, _, _, _),
|
|
RelObj = rel(RelObjName, _, _),
|
|
downcase_atom(RelObjName, DCRelObjName),
|
|
attrnameDCAtom(Attr, DCAttr),
|
|
% Lookup a btree index for the relation + attribute
|
|
distributedIndex(DCRelObjName, DCAttr, btree, DCIndexObjName),
|
|
spelledObj(DCIndexObjName, IndexObjName,_, Case),
|
|
IndexObj = rel(IndexObjName, *, Case),
|
|
IndParam = rel('.', *, u),
|
|
RelParam = rel('..', *, u),
|
|
renameStream(range(IndParam, RelParam, Val, increment(Val)),
|
|
Var, Range).
|
|
|
|
|
|
% Generic case.
|
|
distributedselect(Arg, pr(Cond, rel(_,Var,_))) =>
|
|
[dmap(ArgS," ", filter(Param,Cond)), P] :-
|
|
Arg => [ArgS, P],
|
|
% we accept darrays and dfarrays
|
|
(member(distributedobjecttype(dfarray), P) ;
|
|
member(distributedobjecttype(darray), P)),
|
|
% partitions of the argument relations need to disjoint
|
|
member(disjointpartitioning, P),
|
|
% rename if needed
|
|
feedRenameRelation(rel('.',*, u), Var, Param).
|
|
|
|
%end fapra 2015/16
|
|
|
|
|
|
/*
|
|
Here ~ArgS~ is meant to indicate ``argument stream''.
|
|
|
|
5.2.3 Translation of Joins
|
|
|
|
A join can always be translated to filtering the Cartesian product.
|
|
|
|
*/
|
|
|
|
%fapra 2015/16
|
|
|
|
% we have to variants of joins in place, see if the first one can
|
|
% handle. If yes, cut and use its result.
|
|
join(Arg1, Arg2, Pred) => SecondoPlan:-
|
|
isDistributedQuery,
|
|
distributedjoin(Arg1, Arg2, Pred) => _, !,
|
|
distributedjoin(Arg1, Arg2, Pred) => SecondoPlan.
|
|
|
|
join(Arg1, Arg2, Pred) => SecondoPlan:-
|
|
isDistributedQuery, !,
|
|
Arg1 = arg(N1),
|
|
Arg2 = arg(N2),
|
|
not(N1=N2),
|
|
Arg1 => [ObjName1, _],
|
|
Arg2 => [ObjName2, _],
|
|
distributedRels(_, ObjName1, _, _, _),
|
|
distributedRels(_, ObjName2, _, _, _),
|
|
distributedjoin(ObjName1, ObjName2, Pred) => SecondoPlan.
|
|
|
|
%end fapra 2015/16
|
|
|
|
join(Arg1, Arg2, pr(Pred, _, _)) => [filter(product(Arg1S, Arg2S), Pred), P1] :-
|
|
Arg1 => [Arg1S, P1],
|
|
Arg2 => [Arg2S, _].
|
|
|
|
|
|
/*
|
|
|
|
Index joins:
|
|
|
|
*/
|
|
|
|
|
|
join(Arg1, arg(N), pr(X=Y, _, _)) => [loopjoin(Arg1S, MatchExpr), P1] :-
|
|
isOfSecond(Attr2, X, Y),
|
|
isNotOfSecond(Expr1, X, Y),
|
|
argument(N, RelDescription),
|
|
hasIndex(RelDescription, Attr2, IndexName),
|
|
Arg1 => [Arg1S, P1],
|
|
exactmatch(IndexName, arg(N), Expr1) => MatchExpr.
|
|
|
|
join(arg(N), Arg2, pr(X=Y, _, _)) => [loopjoin(Arg2S, MatchExpr), P2] :-
|
|
isOfFirst(Attr1, X, Y),
|
|
isNotOfFirst(Expr2, X, Y),
|
|
argument(N, RelDescription),
|
|
hasIndex(RelDescription, Attr1, IndexName),
|
|
Arg2 => [Arg2S, P2],
|
|
exactmatch(IndexName, arg(N), Expr2) => MatchExpr.
|
|
|
|
|
|
exactmatch(IndexName, arg(N), Expr) =>
|
|
exactmatch(IndexName, rel(Name, *, Case), Expr) :-
|
|
argument(N, rel(Name, *, Case)),
|
|
!.
|
|
|
|
exactmatch(IndexName, arg(N), Expr) =>
|
|
rename(exactmatch(IndexName, rel(Name, Var, Case), Expr), Var) :-
|
|
argument(N, rel(Name, Var, Case)),
|
|
!.
|
|
|
|
|
|
|
|
/*
|
|
|
|
For a join with a predicate of the form X = Y we can distinguish four cases
|
|
depending on whether X and Y are attributes or more complex expressions. For
|
|
example, a query condition might be ``PLZA = PLZB'' in which case we have just
|
|
attribute names on both sides of the predicate operator, or it could be ``PLZA =
|
|
PLZB + 1''. In the latter case we have an expression on the right hand side.
|
|
This can still be translated to a hashjoin, for example, by first extending the
|
|
second argument by a new attribute containing the value of the expression. For
|
|
example, the query
|
|
|
|
---- select *
|
|
from plz as p1, plz as p2
|
|
where p1.PLZ = p2.PLZ + 1
|
|
----
|
|
|
|
can be translated to
|
|
|
|
---- plz feed {p1} plz feed {p2} extend[newPLZ: PLZ_p2 + 1]
|
|
hashjoin[PLZ_p1, newPLZ, 997]
|
|
remove[newPLZ]
|
|
consume
|
|
----
|
|
|
|
This technique is built into the optimizer as follows. We first define the four
|
|
cases (at the moment for equijoin only; this may later be extended) which also
|
|
translate the arguments into streams. Then the rules translating to join
|
|
methods can be formulated independently from this general technique. They
|
|
translate terms of the form join00(Arg1Stream, Arg2Stream, Pred).
|
|
|
|
*/
|
|
|
|
join(Arg1, Arg2, pr(X=Y, R1, R2)) => [JoinPlan, P] :-
|
|
X = attr(_, _, _),
|
|
Y = attr(_, _, _), !,
|
|
Arg1 => [Arg1S, P1],
|
|
Arg2 => [Arg2S, P2],
|
|
join00([Arg1S, P1], [Arg2S, P2], pr(X=Y, R1, R2)) => [JoinPlan, P].
|
|
|
|
join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
|
|
[remove(JoinPlan, [attrname(attr(r_expr, 2, l))]), P] :-
|
|
X = attr(_, _, _),
|
|
not(Y = attr(_, _, _)), !,
|
|
Arg1 => [Arg1S, P1],
|
|
Arg2 => [Arg2S, _],
|
|
Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]),
|
|
join00([Arg1S, P1], [Arg2Extend, none], pr(X=attr(r_expr, 2, l), R1, R2))
|
|
=> [JoinPlan, P].
|
|
|
|
join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
|
|
[remove(JoinPlan, [attrname(attr(l_expr, 2, l))]), P] :-
|
|
not(X = attr(_, _, _)),
|
|
Y = attr(_, _, _), !,
|
|
Arg1 => [Arg1S, _],
|
|
Arg2 => [Arg2S, P2],
|
|
Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]),
|
|
join00([Arg1Extend, none], [Arg2S, P2], pr(attr(l_expr, 1, l)=Y, R1, R2))
|
|
=> [JoinPlan, P].
|
|
|
|
join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
|
|
[remove(JoinPlan, [attrname(attr(l_expr, 1, l)),
|
|
attrname(attr(r_expr, 2, l))]), P] :-
|
|
not(X = attr(_, _, _)),
|
|
not(Y = attr(_, _, _)), !,
|
|
Arg1 => [Arg1S, _],
|
|
Arg2 => [Arg2S, _],
|
|
Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]),
|
|
Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]),
|
|
join00([Arg1Extend, none], [Arg2Extend, none],
|
|
pr(attr(l_expr, 1, l)=attr(r_expr, 2, l), R1, R2)) => [JoinPlan, P].
|
|
|
|
|
|
join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [sortmergejoin(Arg1S, Arg2S,
|
|
attrname(Attr1), attrname(Attr2)), [order(Name1), order(Name2)] ] :-
|
|
isOfFirst(Attr1, X, Y), Attr1 = attr(Name1, _, _),
|
|
isOfSecond(Attr2, X, Y), Attr2 = attr(Name2, _, _).
|
|
|
|
% use order property
|
|
|
|
join00([Arg1S, P1], [Arg2S, P2], pr(X = Y, _, _)) => [mergejoin(Arg1S, Arg2S,
|
|
attrname(Attr1), attrname(Attr2)), [order(Name1), order(Name2)] ] :-
|
|
isOfFirst(Attr1, X, Y), Attr1 = attr(Name1, _, _),
|
|
isOfSecond(Attr2, X, Y), Attr2 = attr(Name2, _, _),
|
|
select(order(Name1), P1, _),
|
|
select(order(Name2), P2, _).
|
|
|
|
% hashjoin has asymmetric cost, therefore consider both orders
|
|
|
|
join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [hashjoin(Arg1S, Arg2S,
|
|
attrname(Attr1), attrname(Attr2), 999997), [none]] :-
|
|
isOfFirst(Attr1, X, Y),
|
|
isOfSecond(Attr2, X, Y).
|
|
|
|
join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [hashjoin(Arg2S, Arg1S,
|
|
attrname(Attr2), attrname(Attr1), 999997), [none]] :-
|
|
isOfFirst(Attr1, X, Y),
|
|
isOfSecond(Attr2, X, Y).
|
|
|
|
%fapra 2015/16
|
|
|
|
% Translate a distributed spatial join with an intersection predicate.
|
|
distributedjoin(Arg1, Arg2, Pred)
|
|
=> [SecondoPlan, [DistAttr1, distributedobjecttype(dfarray),
|
|
disjointpartitioning]]:-
|
|
Pred = pr(Attr1 intersects Attr2, rel(_, Rel1Var, _), rel(_, Rel2Var, _)),
|
|
isOfFirst(Attr1, Rel1, Rel2),
|
|
isOfSecond(Attr2, Rel1, Rel2),
|
|
attrnameDCAtom(Attr1, Attr1Name),
|
|
attrnameDCAtom(Attr2, Attr2Name),
|
|
% allow using replicated + any distribution or both distributed by
|
|
% join predicate
|
|
((DistAttr1 = distribution(_, _, _),
|
|
DistAttr2 = distribution(share, _, _));
|
|
(DistAttr1 = distribution(spatial, Attr2Name, GridObj),
|
|
DistAttr2 = distribution(spatial, Attr1Name, GridObj))),
|
|
Arg1 => [ObjName1, [DistAttr1| Props1]],
|
|
Arg2 => [ObjName2, [DistAttr2| Props2]],
|
|
% rename the parameter relations if needed
|
|
feedRenameRelation(param1, Rel1Var, Param1Plan),
|
|
feedRenameRelation(param2, Rel2Var, Param2Plan),
|
|
% rename the cell attribute if needed
|
|
renamedRelAttr(attr(cell, 1, u), Rel1Var, CellAttr1),
|
|
renamedRelAttr(attr(cell, 2, u), Rel2Var, CellAttr2),
|
|
Scheme =
|
|
filter(
|
|
filter(
|
|
filter(
|
|
itSpatialJoin(
|
|
Param1Plan,
|
|
Param2Plan,
|
|
attrname(Attr1),
|
|
attrname(Attr2)
|
|
),
|
|
CellAttr1 = CellAttr2
|
|
),
|
|
gridintersects(
|
|
GridObj,
|
|
bbox(Attr1),
|
|
bbox(Attr2),
|
|
CellAttr1
|
|
)
|
|
),
|
|
Attr1 intersects Attr2
|
|
),
|
|
% We have the actual query now. Distribute it to the workers.
|
|
distributedquery([ObjName1, [DistAttr1| Props1]],
|
|
[ObjName2, [DistAttr2| Props2]], Scheme)
|
|
=> SecondoPlan.
|
|
|
|
/*
|
|
----
|
|
distributedquery(Arg1, Arg2, QueryScheme) =>
|
|
----
|
|
|
|
Distribute the query given by QueryScheme to the workers. The scheme has
|
|
the place holders param1 and param2 for its argument. The actual arguments
|
|
are given in Arg1 and Arg2 as a pair of a plan and a property list.
|
|
Several cases might arise depening on Arg1's and
|
|
Arg2's distribution type (replicated vs partitioned) and their distributed object
|
|
type (d(f)array vs dfmatrix).
|
|
|
|
*/
|
|
|
|
% Arg1 replicated, Arg2 partitioned, Arg2 is a d(f)array
|
|
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
|
|
not(isPartitioned([Arg1S, P1])),
|
|
isPartitioned([Arg2S, P2]),
|
|
not(isDfmatrix([Arg2S, P2])),
|
|
substituteSubterm(param2, rel('.', *, u), QueryScheme, QueryScheme1),
|
|
substituteSubterm(param1, Arg2S, QueryScheme1, QueryScheme2),
|
|
Query = dmap(Arg2S, " ", QueryScheme2), !.
|
|
|
|
% Arg2 replicated, Arg1 partitioned, Arg1 is a d(f)array
|
|
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
|
|
isPartitioned([Arg1S, P1]),
|
|
not(isPartitioned([Arg2S, P2])),
|
|
not(isDfmatrix([Arg1S, P1])),
|
|
substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1),
|
|
substituteSubterm(param2, Arg2S, QueryScheme1, QueryScheme2),
|
|
Query = dmap(Arg1S, " ", QueryScheme2), !.
|
|
|
|
% Arg1 partitioned, Arg2 partitioned, both are d(f)arrays
|
|
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
|
|
isPartitioned([Arg1S, P1]),
|
|
isPartitioned([Arg2S, P2]),
|
|
not(isDfmatrix([Arg2S, P2])),
|
|
not(isDfmatrix([Arg1S, P1])),
|
|
substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1),
|
|
substituteSubterm(param2, rel('..', *, u), QueryScheme1, QueryScheme2),
|
|
Query = dmap2(Arg1S, Arg2S, " ", QueryScheme2, 1238), !.
|
|
|
|
% Arg1 partitioned, Arg2 partitioned, both dfmatrices
|
|
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
|
|
isPartitioned([Arg1S, P1]),
|
|
isPartitioned([Arg2S, P2]),
|
|
isDfmatrix([Arg2S, P2]),
|
|
isDfmatrix([Arg1S, P1]),
|
|
substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1),
|
|
substituteSubterm(param2, rel('..', *, u), QueryScheme1, QueryScheme2),
|
|
Query = areduce2(Arg1S, Arg2S, "", QueryScheme2, 1238), !.
|
|
|
|
% Arg1 replicated, Arg2 replicated
|
|
distributedquery([Arg1S, P1], [Arg2S, P2], _) => _ :-
|
|
not(isPartitioned([Arg1S, P1])),
|
|
not(isPartitioned([Arg2S, P2])),
|
|
write('A potential plan edge could not be generated because '),
|
|
write('queries with two replicated arguments '),
|
|
write('cannot be formulated using DistributedAlgebra as of now.\n'),
|
|
fail.
|
|
|
|
%Equijoin
|
|
distributedjoin(ObjName1, ObjName2, pr(attr(X1,X2,X3)=attr(Y1,Y2,Y3),
|
|
Rel1, Rel2))
|
|
=> [SecondoPlan, [none]] :-
|
|
X=attr(X1,X2,X3),
|
|
Y=attr(Y1,Y2,Y3),
|
|
Rel1 = rel(_, _, _),
|
|
Rel2 = rel(_, _, _),
|
|
isOfFirst(_, X, Y),
|
|
isOfSecond(_, X, Y),
|
|
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
|
|
SecondoPlan, false).
|
|
|
|
%Standard Join
|
|
distributedjoin(ObjName1, ObjName2, pr(Pred,Rel1, Rel2))
|
|
=> [SecondoPlan, [none]] :-
|
|
Rel1 = rel(_, _, _),
|
|
Rel2 = rel(_, _, _),
|
|
buildStdSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2),
|
|
SecondoPlan, false).
|
|
|
|
/*
|
|
It is assumed that if "function" is specified in
|
|
the system relation "SEC2DISTRIBUTED", then a deterministic
|
|
function using the specified attribute was used.
|
|
The functions used for partitioning both used relations are assumed
|
|
to result in the same values if given the same attribute value. E.g.
|
|
both used the same hashvalue.
|
|
|
|
*/
|
|
|
|
/*
|
|
|
|
Equijoin Secondo Plan for both are partitioned by join attribute
|
|
using modulo.
|
|
Modulo is the most efficient compared to the other options,
|
|
because we do not need to repartition and also there is no
|
|
need to calculate the worker, on which a tuple is located,
|
|
the worker number is already the modulo value. Thus it is
|
|
slightly more efficient than any other function (i.e. hash).
|
|
In case it is possible in the future to deploy different secondo plans
|
|
to different workers (i.e. tell each worker which part of the shared
|
|
relation it should use), having 2 replicated relations
|
|
is the most efficient solution.
|
|
|
|
*/
|
|
|
|
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
|
|
SecondoPlan, _):-
|
|
plan_to_atom(simple_attrname(X), X2),
|
|
plan_to_atom(simple_attrname(Y), Y2),
|
|
distributedRels(_, ObjName1, _, 'modulo', X2),
|
|
distributedRels(_, ObjName2, _, 'modulo', Y2),
|
|
Rel1 = rel(_, Rel1Var, _),
|
|
Rel2 = rel(_, Rel2Var, _),
|
|
% rename the parameter relations of the dmapped plan if needed
|
|
feedRenameRelation(rel('.', *, u), Rel1Var, Feed1),
|
|
feedRenameRelation(rel('..', *, u), Rel2Var, Feed2),
|
|
!,
|
|
SecondoPlan = dmap2(ObjName1, ObjName2, " ",
|
|
hashjoin(Feed1, Feed2,attrname(X),
|
|
attrname(Y), 999997), 1238).
|
|
|
|
%Equijoin Secondo Plan for both are partitioned by join attribute
|
|
%using a function
|
|
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
|
|
SecondoPlan, _):-
|
|
plan_to_atom(simple_attrname(X), X2),
|
|
plan_to_atom(simple_attrname(Y), Y2),
|
|
distributedRels(_, ObjName1, _, 'function', X2),
|
|
distributedRels(_, ObjName2, _, 'function', Y2),
|
|
Rel1 = rel(_, Rel1Var, _),
|
|
Rel2 = rel(_, Rel2Var, _),
|
|
% rename the parameter relations of the dmapped plan if needed
|
|
feedRenameRelation(rel('.', *, u), Rel1Var, Feed1),
|
|
feedRenameRelation(rel('..', *, u), Rel2Var, Feed2),
|
|
!,
|
|
SecondoPlan = dmap2(ObjName1, ObjName2, " ",
|
|
hashjoin(Feed1, Feed2,attrname(X),
|
|
attrname(Y), 999997), 1238).
|
|
|
|
%Equijoin Secondo Plan for one replicated (relation) and
|
|
%one partitioned (darray/dfarray)
|
|
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
|
|
SecondoPlan, _):-
|
|
distributedRels(_ ,ObjName1,_ ,'share',_ ),
|
|
isPartitioned(ObjName2),
|
|
Rel1 = rel(_, Rel1Var, _),
|
|
Rel2 = rel(_, Rel2Var, _),
|
|
% rename the parameter relations of the dmapped plan if needed
|
|
feedRenameRelation(ObjName1, Rel1Var, Feed1),
|
|
feedRenameRelation(rel('.', *, u), Rel2Var, Feed2),
|
|
!,
|
|
SecondoPlan = dmap(ObjName2, " ",
|
|
hashjoin(Feed1,
|
|
Feed2,
|
|
attrname(X), attrname(Y), 999997)).
|
|
|
|
%Commutativity for Equijoin & Standard Join
|
|
buildSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2),
|
|
SecondoPlan, false):-
|
|
buildSecondoPlan(ObjName2, ObjName1, pr(Pred, Rel1, Rel2),
|
|
SecondoPlan, true).
|
|
|
|
|
|
%Equijoin Secondo Plan for repartitioning 2 "wrongly"
|
|
%partitioned relations (darray/dfarray)
|
|
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
|
|
SecondoPlan, _):-
|
|
isPartitioned(ObjName1),
|
|
isPartitioned(ObjName2),
|
|
Rel1 = rel(_, Rel1Var, _),
|
|
Rel2 = rel(_, Rel2Var, _),
|
|
% rename the parameter relations of the dmapped plan if needed
|
|
feedRenameRelation(rel('.', *, u), Rel1Var, Feed1),
|
|
feedRenameRelation(rel('..', *, u), Rel2Var, Feed2),
|
|
!,
|
|
SecondoPlan = dmap2(
|
|
collect2(
|
|
partitionF(ObjName1, "LeftPartOfJoin", feed(rel('.',*,u)),
|
|
hashvalue(our_attrname(X), 999997), 0),
|
|
"L", 1238),
|
|
collect2(
|
|
partitionF(ObjName2, "RightPartOfJoin", feed(rel('.',*,u)),
|
|
hashvalue(our_attrname(Y), 999997), 0),
|
|
"R", 1238),
|
|
" ",
|
|
hashjoin(Feed1,
|
|
Feed2,
|
|
attrname(X), attrname(Y), 999997),
|
|
1238).
|
|
|
|
%Equijoin Secondo Plan for repartitioning 2 replicated rels
|
|
buildSecondoPlan(ObjName1, ObjName2, pr(attr(_,_,_)=attr(_,_,_), _, _),
|
|
_, true):-
|
|
distributedRels(_ ,ObjName1,_ ,'share',_ ),
|
|
distributedRels(_, ObjName2, _,'share', _),
|
|
!,
|
|
write('Both relations are replicated, the query cannot be executed!'),
|
|
false.
|
|
|
|
% Plan yields a dfmatrix
|
|
isDfmatrix([_, P]) :-
|
|
member(distributedobjecttype(dfmatrix), P).
|
|
|
|
% Plan yields a partitioned distribution.
|
|
isPartitioned([_, P]):-
|
|
is_list(P), !,(
|
|
member(distribution('function', _, _), P);
|
|
member(distribution('modulo', _, _), P);
|
|
member(distribution('random', _, _), P);
|
|
member(distribution('spatial', _, _), P)).
|
|
|
|
% Secondo object represents a partitioned distribution.
|
|
isPartitioned(ObjName):-
|
|
distributedRels(_, ObjName,_ ,'function', _);
|
|
distributedRels(_, ObjName,_ ,'modulo', _);
|
|
distributedRels(_, ObjName,_ ,'random', _);
|
|
distributedRels(_, ObjName,_ ,'spatial', _).
|
|
|
|
%Standard Join Secondo Plan (one replicated, one partitioned)
|
|
buildStdSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2),
|
|
SecondoPlan, _):-
|
|
(DistArgrel = ObjName2, ReplArgrel = ObjName1;
|
|
DistArgrel = ObjName1, ReplArgrel = ObjName2),
|
|
distributedRels(_, ReplArgrel, _ , 'share', _),
|
|
isPartitioned(DistArgrel),
|
|
Rel1 = rel(_, Rel1Var, _),
|
|
Rel2 = rel(_, Rel2Var, _),
|
|
% rename the parameter relations of the dmapped plan if needed
|
|
feedRenameRelation(rel('.', *, u), Rel2Var, Feed2),
|
|
feedRenameRelation(ReplArgrel, Rel1Var, Feed1),
|
|
!,
|
|
SecondoPlan = dmap(DistArgrel, " ",
|
|
filter(product(Feed2,Feed1), Pred)).
|
|
|
|
%Standard Join Secondo Plan, both are partitioned
|
|
buildStdSecondoPlan(ObjName1, ObjName2, pr(_, _, _),
|
|
_, true):-
|
|
isPartitioned(ObjName1),
|
|
isPartitioned(ObjName2),
|
|
!,
|
|
write('The joined relations are both partitioned and thus'),
|
|
write(' not distributed correctly for standard join.'),
|
|
false.
|
|
|
|
%Standard Join Secondo Plan, if repartitioning is needed
|
|
buildStdSecondoPlan(_, _, pr(_, _, _), _, true):-
|
|
!,
|
|
write('The joined relations are not distributed correctly '),
|
|
write('for standard join.'),
|
|
false.
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
|
|
---- isOfFirst(Attr, X, Y)
|
|
isOfSecond(Attr, X, Y)
|
|
----
|
|
|
|
~Attr~ equal to either ~X~ or ~Y~ is an attribute of the first(second) relation.
|
|
|
|
*/
|
|
|
|
|
|
isOfFirst(X, X, _) :- X = attr(_, 1, _).
|
|
isOfFirst(Y, _, Y) :- Y = attr(_, 1, _).
|
|
isOfSecond(X, X, _) :- X = attr(_, 2, _).
|
|
isOfSecond(Y, _, Y) :- Y = attr(_, 2, _).
|
|
|
|
isNotOfFirst(Y, X, Y) :- X = attr(_, 1, _).
|
|
isNotOfFirst(X, X, Y) :- Y = attr(_, 1, _).
|
|
isNotOfSecond(Y, X, Y) :- X = attr(_, 2, _).
|
|
isNotOfSecond(X, X, Y) :- Y = attr(_, 2, _).
|
|
|
|
|
|
/*
|
|
6 Creating Query Plan Edges
|
|
|
|
*/
|
|
|
|
% RHG 2014
|
|
|
|
planEdge(Source, Target, Plan, Result) :-
|
|
edge(Source, Target, Term, Result, _, _),
|
|
Term => PlanExpr,
|
|
getProperties(PlanExpr, Plan, _).
|
|
|
|
% Version with properties
|
|
|
|
% Selection Edges
|
|
|
|
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P2] | PRest], Result) :-
|
|
edge(Source, Target, select(res(N), Pred), Result, _, _),
|
|
select([N, P], PropertiesIn, PRest),
|
|
select([res(N), P], Pred) => PlanExpr,
|
|
getProperties(PlanExpr, Plan, P2).
|
|
|
|
% Join Edges
|
|
|
|
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P2] | PRest], Result) :-
|
|
edge(Source, Target, join(arg(N), res(M), Pred), Result, _, _),
|
|
select([M, P], PropertiesIn, PRest),
|
|
join(arg(N), [res(M), P], Pred) => PlanExpr,
|
|
getProperties(PlanExpr, Plan, P2).
|
|
|
|
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P2] | PRest], Result) :-
|
|
edge(Source, Target, join(res(M), arg(N), Pred), Result, _, _),
|
|
select([M, P], PropertiesIn, PRest),
|
|
join([res(M), P], arg(N), Pred) => PlanExpr,
|
|
getProperties(PlanExpr, Plan, P2).
|
|
|
|
|
|
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P3] | PRest], Result) :-
|
|
edge(Source, Target, join(res(N), res(M), Pred), Result, _, _),
|
|
select([N, P], PropertiesIn, PIn2),
|
|
select([M, P2], PIn2, PRest),
|
|
join([res(N), P], [res(M), P2], Pred) => PlanExpr,
|
|
getProperties(PlanExpr, Plan, P3).
|
|
|
|
% Remaining edges without intermediate results
|
|
|
|
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P] | PropertiesIn],
|
|
Result) :-
|
|
edge(Source, Target, Term, Result, _, _),
|
|
Term = select(arg(_), _),
|
|
Term => PlanExpr,
|
|
getProperties(PlanExpr, Plan, P).
|
|
|
|
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P] | PropertiesIn],
|
|
Result) :-
|
|
edge(Source, Target, Term, Result, _, _),
|
|
Term = join(arg(_), arg(_), _),
|
|
Term => PlanExpr,
|
|
getProperties(PlanExpr, Plan, P).
|
|
|
|
|
|
|
|
|
|
getProperties([Plan, P], Plan, P) :- !.
|
|
|
|
getProperties(Plan, Plan, none).
|
|
|
|
% end RHG 2014
|
|
|
|
|
|
|
|
createPlanEdge :-
|
|
edge(Source, Target, Term, Result, _, _),
|
|
Term => Plan,
|
|
assert(planEdge(Source, Target, Plan, Result)),
|
|
fail.
|
|
|
|
createPlanEdges :- not(createPlanEdge).
|
|
|
|
deletePlanEdge :-
|
|
retract(planEdge(_, _, _, _)), fail.
|
|
|
|
deletePlanEdges :- not(deletePlanEdge).
|
|
|
|
writePlanEdge :-
|
|
planEdge(Source, Target, Plan, Result),
|
|
write('Source: '), write(Source), nl,
|
|
write('Target: '), write(Target), nl,
|
|
write('Plan: '), wp(Plan), nl,
|
|
% write(Plan), nl,
|
|
write('Result: '), write(Result), nl, nl,
|
|
pe(N), retract(pe(_)), N1 is N + 1, assert(pe(N1)), % count edges
|
|
fail.
|
|
|
|
|
|
writePlanEdgesProp :-
|
|
planEdge(Source, Target, _, Plan, Prop, Result),
|
|
write('Source: '), write(Source), nl,
|
|
write('Target: '), write(Target), nl,
|
|
write('Plan: '), wp(Plan), nl,
|
|
write(Prop), nl,
|
|
% write(Plan), nl,
|
|
write('Result: '), write(Result), nl, nl,
|
|
pe(N), retract(pe(_)), N1 is N + 1, assert(pe(N1)), % count edges
|
|
fail.
|
|
|
|
writePlanEdges :-
|
|
assert(pe(0)),
|
|
not(writePlanEdge),
|
|
not(writePlanEdgesProp),
|
|
pe(N),
|
|
write('The total number of plan edges is '), write(N), write('.'), nl.
|
|
|
|
wpe :- writePlanEdges.
|
|
|
|
|
|
|
|
/*
|
|
7 Assigning Sizes and Selectivities to the Nodes and Edges of the POG
|
|
|
|
---- assignSizes.
|
|
deleteSizes.
|
|
----
|
|
|
|
Assign sizes (numbers of tuples) to all nodes in the pog, based on the
|
|
cardinalities of the argument relations and the selectivities of the
|
|
predicates. Store sizes as facts of the form resultSize(Result, Size). Store
|
|
selectivities as facts of the form edgeSelectivity(Source, Target, Sel).
|
|
|
|
Delete sizes from memory.
|
|
|
|
7.1 Assigning Sizes and Selectivities
|
|
|
|
It is important that edges are processed in the order in which they have been
|
|
created. This will ensure that for an edge the size of its argument nodes are
|
|
available.
|
|
|
|
*/
|
|
|
|
assignSizes :- not(assignSizes1).
|
|
|
|
assignSizes1 :-
|
|
edge(Source, Target, Term, Result, _, _),
|
|
assignSize(Source, Target, Term, Result),
|
|
fail.
|
|
|
|
%assignSize(Source, Target, select(Arg, Pred), Result) :-
|
|
% Pred = pr(attr(original, *, u), _),
|
|
% !, % predicate used for eliminating one of many spatially overlapping tuples
|
|
% resSize(Arg, Size),
|
|
% setNodeSize(Result, Size),
|
|
% % assume overlap is rather small
|
|
% assert(edgeSelectivity(Source, Target, 1)).
|
|
|
|
assignSize(Source, Target, select(Arg, Pred), Result) :-
|
|
resSize(Arg, Card),
|
|
selectivity(Pred, Sel),
|
|
Size is Card * Sel,
|
|
setNodeSize(Result, Size),
|
|
assert(edgeSelectivity(Source, Target, Sel)).
|
|
|
|
assignSize(Source, Target, join(Arg1, Arg2, Pred), Result) :-
|
|
resSize(Arg1, Card1),
|
|
resSize(Arg2, Card2),
|
|
selectivity(Pred, Sel),
|
|
Size is Card1 * Card2 * Sel,
|
|
setNodeSize(Result, Size),
|
|
assert(edgeSelectivity(Source, Target, Sel)).
|
|
|
|
/*
|
|
---- setNodeSize(Node, Size) :-
|
|
----
|
|
|
|
Set the size of node ~Node~ to ~Size~ if no size has been assigned before.
|
|
|
|
*/
|
|
|
|
setNodeSize(Node, _) :- resultSize(Node, _), !.
|
|
setNodeSize(Node, Size) :- assert(resultSize(Node, Size)).
|
|
|
|
/*
|
|
---- resSize(Arg, Size) :-
|
|
----
|
|
|
|
Argument ~Arg~ has size ~Size~.
|
|
|
|
*/
|
|
|
|
resSize(arg(N), Size) :- argument(N, rel(Rel, _, _)), card(Rel, Size), !.
|
|
resSize(arg(N), _) :- write('Error in optimizer: cannot find cardinality for '),
|
|
argument(N, Rel), wp(Rel), nl, fail.
|
|
resSize(res(N), Size) :- resultSize(N, Size), !.
|
|
|
|
/*
|
|
---- writeSizes :-
|
|
----
|
|
|
|
Write sizes and selectivities.
|
|
|
|
*/
|
|
|
|
writeSize :-
|
|
resultSize(Node, Size),
|
|
write('Node: '), write(Node), nl,
|
|
write('Size: '), write(Size), nl, nl,
|
|
fail.
|
|
writeSize :-
|
|
edgeSelectivity(Source, Target, Sel),
|
|
write('Source: '), write(Source), nl,
|
|
write('Target: '), write(Target), nl,
|
|
write('Selectivity: '), write(Sel), nl, nl,
|
|
fail.
|
|
writeSizes :- not(writeSize).
|
|
|
|
/*
|
|
---- deleteSizes :-
|
|
----
|
|
|
|
Delete node sizes and selectivities of edges.
|
|
|
|
*/
|
|
|
|
deleteSize :- retract(resultSize(_, _)), fail.
|
|
deleteSize :- retract(edgeSelectivity(_, _, _)), fail.
|
|
deleteSizes :- not(deleteSize).
|
|
|
|
/*
|
|
8 Computing Edge Costs for Plan Edges
|
|
|
|
8.1 The Costs of Terms
|
|
|
|
---- cost(Term, Sel, Size, Cost) :-
|
|
----
|
|
|
|
The cost of an executable ~Term~ representing a predicate with selectivity ~Sel~
|
|
is ~Cost~ and the size of the result is ~Size~.
|
|
|
|
This is evaluated recursively descending into the term. When the operator
|
|
realizing the predicate (e.g. ~filter~) is encountered, the selectivity ~Sel~ is
|
|
used to determine the size of the result. It is assumed that only a single
|
|
operator of this kind occurs within the term.
|
|
|
|
8.1.1 Arguments
|
|
|
|
*/
|
|
|
|
cost(Obj, Sel, Size, Cost) :-
|
|
distributedRels(Rel, Obj, _, DistType, _, _),
|
|
not(DistType = share),
|
|
cost(Rel, Sel, Size, Cost).
|
|
|
|
|
|
cost(rel(Rel, _, _), _, Size, 0) :-
|
|
card(Rel, Size).
|
|
|
|
cost(res(N), _, Size, 0) :-
|
|
resultSize(N, Size).
|
|
|
|
/*
|
|
8.1.2 Operators
|
|
|
|
*/
|
|
|
|
cost(feed(X), Sel, S, C) :-
|
|
cost(X, Sel, S, C1),
|
|
feedTC(A),
|
|
C is C1 + A * S.
|
|
|
|
/*
|
|
Here ~feedTC~ means ``feed tuple cost'', i.e., the cost per tuple, a constant to
|
|
be determined in experiments. These constants are kept in file ``Operators.pl''.
|
|
|
|
*/
|
|
|
|
cost(consume(X), Sel, S, C) :-
|
|
cost(X, Sel, S, C1),
|
|
consumeTC(A),
|
|
C is C1 + A * S.
|
|
|
|
cost(filter(X, Pred), _, S, C) :-
|
|
% This is special case for spatially distributed relations
|
|
% we cannot determine the selectivity for the predicate because
|
|
% it does not exist as a local relation on the master.
|
|
% We assume verly little overlap in the spatial distribution.
|
|
Pred=attr(original, l, u), !,
|
|
cost(X, 1, SizeX, CostX),
|
|
filterTC(A),
|
|
S is SizeX * 0.9,
|
|
C is CostX + A * SizeX.
|
|
|
|
cost(filter(X, _), Sel, S, C) :-
|
|
cost(X, 1, SizeX, CostX),
|
|
filterTC(A),
|
|
S is SizeX * Sel,
|
|
C is CostX + A * SizeX.
|
|
|
|
|
|
/*
|
|
For the moment we assume a cost of 1 for evaluating a predicate; this should be
|
|
changed shortly.
|
|
|
|
*/
|
|
|
|
cost(product(X, Y), _, S, C) :-
|
|
cost(X, 1, SizeX, CostX),
|
|
cost(Y, 1, SizeY, CostY),
|
|
productTC(A, B),
|
|
S is SizeX * SizeY,
|
|
C is CostX + CostY + SizeY * A + S * B.
|
|
|
|
cost(leftrange(_, Rel, _), Sel, Size, Cost) :-
|
|
cost(Rel, 1, RelSize, _),
|
|
leftrangeTC(C),
|
|
Size is Sel * RelSize,
|
|
Cost is Sel * RelSize * C.
|
|
|
|
cost(rightrange(_, Rel, _), Sel, Size, Cost) :-
|
|
cost(Rel, 1, RelSize, _),
|
|
leftrangeTC(C),
|
|
Size is Sel * RelSize,
|
|
Cost is Sel * RelSize * C.
|
|
|
|
/*
|
|
|
|
Simplistic cost estimation for loop joins.
|
|
|
|
If attribute values are assumed independent, then the selectivity
|
|
of a subquery appearing in an index join equals the overall
|
|
join selectivity. Therefore it is possible to estimate
|
|
the result size and cost of a subquery
|
|
(i.e. ~exactmatch~ and ~exactmatchfun~). As a subquery in an
|
|
index join is executed as often as a tuple from the left
|
|
input stream arrives, it is also possible to estimate the
|
|
overall index join cost.
|
|
|
|
*/
|
|
|
|
cost(exactmatchfun(_, Rel, _), Sel, Size, Cost) :-
|
|
cost(Rel, 1, RelSize, _),
|
|
exactmatchTC(A, B, C, D),
|
|
Size is Sel * RelSize,
|
|
Cost is A + B * (log10(RelSize) - C) + % query cost
|
|
Sel * RelSize * D. % size of result
|
|
|
|
cost(exactmatch(_, Rel, _), Sel, Size, Cost) :-
|
|
cost(Rel, 1, RelSize, _),
|
|
exactmatchTC(A, B, C, D),
|
|
Size is Sel * RelSize,
|
|
Cost is A + B * (log10(RelSize) - C) + % query cost
|
|
Sel * RelSize * D. % size of result
|
|
|
|
cost(loopjoin(X, Y), Sel, S, Cost) :-
|
|
cost(X, 1, SizeX, CostX),
|
|
cost(Y, Sel, SizeY, CostY),
|
|
S is SizeX * SizeY,
|
|
loopjoinTC(A),
|
|
Cost is CostX + % producing the first argument
|
|
SizeX * A + % base cost for loopjoin
|
|
SizeX * CostY. % sum of query costs
|
|
|
|
cost(fun(_, X), Sel, Size, Cost) :-
|
|
cost(X, Sel, Size, Cost).
|
|
|
|
|
|
cost(hashjoin(X, Y, _, _, 999997), Sel, S, C) :-
|
|
cost(X, 1, SizeX, CostX),
|
|
cost(Y, 1, SizeY, CostY),
|
|
hashjoinTC(A, B, D),
|
|
S is SizeX * SizeY * Sel,
|
|
C is CostX + CostY + % producing the arguments
|
|
A * SizeY + % A - time [microsecond] per build
|
|
B * SizeX + % B - time per probe
|
|
D * S. % C - time per result tuple
|
|
% table fits in memory assumed
|
|
|
|
cost(sortmergejoin(X, Y, _, _), Sel, S, C) :-
|
|
cost(X, 1, SizeX, CostX),
|
|
cost(Y, 1, SizeY, CostY),
|
|
sortmergejoinTC(A, B, D),
|
|
S is SizeX * SizeY * Sel,
|
|
C is CostX + CostY + % producing the arguments
|
|
A * (SizeX + SizeY) + % sorting the arguments
|
|
B * (SizeX + SizeY) + % merge step
|
|
D * S. % cost of results
|
|
|
|
cost(mergejoin(X, Y, _, _), Sel, S, C) :-
|
|
cost(X, 1, SizeX, CostX),
|
|
cost(Y, 1, SizeY, CostY),
|
|
sortmergejoinTC(_, B, D),
|
|
S is SizeX * SizeY * Sel,
|
|
C is CostX + CostY + % producing the arguments
|
|
B * (SizeX + SizeY) + % merge step
|
|
D * S. % cost of results
|
|
|
|
cost(extend(X, _), Sel, S, C) :-
|
|
cost(X, Sel, S, C1),
|
|
extendTC(A),
|
|
C is C1 + A * S.
|
|
|
|
cost(remove(X, _), Sel, S, C) :-
|
|
cost(X, Sel, S, C1),
|
|
removeTC(A),
|
|
C is C1 + A * S.
|
|
|
|
cost(project(X, _), Sel, S, C) :-
|
|
cost(X, Sel, S, C1),
|
|
projectTC(A),
|
|
C is C1 + A * S.
|
|
|
|
cost(rename(X, _), Sel, S, C) :-
|
|
cost(X, Sel, S, C1),
|
|
renameTC(A),
|
|
C is C1 + A * S.
|
|
|
|
%fapra 2015/16
|
|
|
|
% Taken from standard optimizer.
|
|
cost(itSpatialJoin(X, Y, _, _), Sel, S, C) :-
|
|
cost(X, 1, SizeX, CostX),
|
|
cost(Y, 1, SizeY, CostY),
|
|
itSpatialJoinTC(A, B),
|
|
S is SizeX * SizeY * Sel,
|
|
C is CostX + CostY +
|
|
A * (SizeX + SizeY) +
|
|
B * S.
|
|
|
|
cost(windowintersects(_, Rel, _), Sel, Size, Cost) :-
|
|
cost(Rel, 1, RelSize, _),
|
|
windowintersectsTC(A),
|
|
Size is Sel * RelSize,
|
|
Cost is Size * A.
|
|
|
|
cost(hashvalue(_,_), _, 1, 0).
|
|
|
|
cost(dmap(Obj, _, InnerPlan), Sel, S, C) :-
|
|
distributedRels(LocalMasterRel, Obj, _, _, _),
|
|
substituteSubterm(rel('.', *, u), LocalMasterRel, InnerPlan, LocalInnerPlan),
|
|
cost(LocalInnerPlan, Sel, S, InnerC),
|
|
!,
|
|
C is InnerC * S.
|
|
|
|
cost(dmap(Obj, _, InnerPlan), Sel, S, C) :-
|
|
substituteSubterm(rel('.', *, u), Obj, InnerPlan, LocalInnerPlan),
|
|
cost(LocalInnerPlan, Sel, S, InnerC),
|
|
!,
|
|
C is InnerC * S.
|
|
|
|
% if we cannot determine cost of first dmap-argument
|
|
cost(dmap(_, _, X), Sel, S, C) :-
|
|
cost(X, 1, SizeX, CostX),
|
|
dmapTC(A),
|
|
S is SizeX * Sel,
|
|
C is CostX + A * SizeX.
|
|
|
|
cost(dmap2(_, RelObj, _, InnerPlan, _), Sel, S, C) :-
|
|
distributedRels(LocalMasterRel, RelObj, _, _, _),
|
|
substituteSubterm(rel('..', *, u), LocalMasterRel,
|
|
InnerPlan, LocalInnerPlan),
|
|
dmap2TC(A),
|
|
cost(LocalMasterRel, 1, Card, _),
|
|
cost(LocalInnerPlan, Sel, _, InnerCost),
|
|
!,
|
|
S is Sel * Card,
|
|
C is InnerCost + A * S.
|
|
|
|
% we have two d/farray-objects as arguments
|
|
cost(dmap2(RelObj1, RelObj2, _, InnerPlan, _), Sel, _, C) :-
|
|
distributedRels(LocalMasterRel1, RelObj1, _, _, _),
|
|
distributedRels(LocalMasterRel2, RelObj2, _, _, _),
|
|
substituteSubterm(rel('.', *, u), LocalMasterRel1,
|
|
InnerPlan, LocalInnerPlan1),
|
|
substituteSubterm(rel('..', *, u), LocalMasterRel2,
|
|
LocalInnerPlan1, LocalInnerPlan),
|
|
dmap2TC(A),
|
|
cost(LocalMasterRel1, 1, Card1, _),
|
|
cost(LocalMasterRel2, 1, Card2, _),
|
|
cost(LocalInnerPlan, Sel, _, InnerCost),
|
|
!,
|
|
S1 is Sel * Card1,
|
|
S2 is Sel * Card2,
|
|
C is InnerCost + A * S1 + A * S2.
|
|
|
|
% we have two d/farray-values as arguments
|
|
cost(dmap2(Arg1, Arg2, _, InnerPlan, _), Sel, _, C) :-
|
|
cost(Arg1, _, _, C1),
|
|
cost(Arg2, _, _, C2),
|
|
substituteSubterm(rel('.', *, u), Arg1,
|
|
InnerPlan, LocalInnerPlan1),
|
|
substituteSubterm(rel('..', *, u), Arg2,
|
|
LocalInnerPlan1, LocalInnerPlan),
|
|
cost(LocalInnerPlan, Sel, _, InnerCost),
|
|
dmap2TC(A),
|
|
!,
|
|
ArgS1 is Sel * C1,
|
|
ArgS2 is Sel * C2,
|
|
C is InnerCost + A * ArgS1 + A * ArgS2.
|
|
|
|
cost(dmap2(RelObj1, RelObj2, _, InnerPlan, _), Sel, _, C) :-
|
|
substituteSubterm(rel('.', *, u), "#!SUBST1!#", RelObj1, RelObj_Mod1),
|
|
substituteSubterm(rel('.', *, u), "#!SUBST2!#", RelObj2, RelObj_Mod2),
|
|
substituteSubterm(rel('.', *, u), RelObj_Mod1, InnerPlan, TempPlan1),
|
|
substituteSubterm(rel('..', *, u), RelObj_Mod2, TempPlan1, TempPlan2),
|
|
substituteSubterm( "#!SUBST1!#", rel('.',*,u),TempPlan2, TempPlan3),
|
|
substituteSubterm( "#!SUBST2!#", rel('.',*,u),TempPlan3, FinallyGoodPlan),
|
|
dmap2TC(A),
|
|
cost(RelObj1, 1, Card1, _),
|
|
cost(RelObj2, 1, Card2, _),
|
|
cost(FinallyGoodPlan, Sel, _, InnerCost),
|
|
!,
|
|
S1 is Sel * Card1,
|
|
S2 is Sel * Card2,
|
|
C is InnerCost + A * S1 + A * S2.
|
|
|
|
% we have two d/fmatrix-values as arguments
|
|
cost(areduce2(Arg1, Arg2, _, InnerPlan, _), Sel, _, C) :-
|
|
cost(Arg1, _, _, C1),
|
|
cost(Arg2, _, _, C2),
|
|
substituteSubterm(rel('.', *, u), Arg1,
|
|
InnerPlan, LocalInnerPlan1),
|
|
substituteSubterm(rel('..', *, u), Arg2,
|
|
LocalInnerPlan1, LocalInnerPlan),
|
|
cost(LocalInnerPlan, Sel, _, InnerCost),
|
|
areduce2TC(A),
|
|
!,
|
|
ArgS1 is Sel * C1,
|
|
ArgS2 is Sel * C2,
|
|
C is InnerCost + A * ArgS1 + A * ArgS2.
|
|
|
|
cost(collect2(InnerPlan, _ , _), Sel, S, C) :-
|
|
cost(InnerPlan, Sel, S, InnerCost),
|
|
collect2TC(A),
|
|
C is InnerCost + A * S.
|
|
|
|
cost(partitionF(RelObj, _, InnerPlan, _, _), Sel, S, C) :-
|
|
distributedRels(LocalMasterRel, RelObj, _, _, _),
|
|
substituteSubterm(rel('.', *, u), LocalMasterRel,
|
|
InnerPlan, LocalInnerPlan),
|
|
partitionFTC(A),
|
|
cost(LocalMasterRel, 1, S, _),
|
|
cost(LocalInnerPlan, Sel, _, InnerCost),
|
|
!,
|
|
C is (InnerCost + A) * S.
|
|
|
|
% generic case
|
|
cost(partitionF(RelObj, _, _, _), _, S, C) :-
|
|
cost(RelObj, 1, RS, RC),
|
|
partitionFTC(A),
|
|
S is RS,
|
|
C is RC + S * A.
|
|
|
|
cost(extendstream(Stream, _, cellnumber(bbox(_), _)), _, S, C) :-
|
|
cost(Stream, 1, S, StreamC),
|
|
extendstreamTC(ETC),
|
|
bboxTC(BTC),
|
|
cellnumberTC(CTC),
|
|
TC is ETC + BTC + CTC,
|
|
C is S * TC + StreamC.
|
|
|
|
cost(range(_, Rel, _, _), Sel, S, C) :-
|
|
cost(Rel, 1, Card, _),
|
|
S is Sel * Card,
|
|
leftrangeTC(A),
|
|
C is A * S.
|
|
|
|
cost(dloop2(_, RelObj, _, InnerPlan), Sel, S, C) :-
|
|
distributedRels(LocalMasterRel, RelObj, _, _, _),
|
|
substituteSubterm(rel('..', *, u), LocalMasterRel,
|
|
InnerPlan, LocalInnerPlan),
|
|
dloopTC(A),
|
|
cost(LocalMasterRel, 1, Card, _),
|
|
cost(LocalInnerPlan, Sel, _, InnerCost),
|
|
!,
|
|
S is Sel * Card,
|
|
C is InnerCost + A * S.
|
|
|
|
|
|
/* dummy for dsummarize */
|
|
cost(dsummarize(_), _, _, 0).
|
|
|
|
cost(dsummarize(X), Sel, S, C) :-
|
|
cost(X, Sel, S, C1),
|
|
dsummarizeTC(A),
|
|
C is C1 + A * S.
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
8.2 Creating Cost Edges
|
|
|
|
These are plan edges extended by a cost measure.
|
|
|
|
*/
|
|
|
|
% RHG 2014
|
|
|
|
costEdge(Source, Target, Term, Result, Size, Cost) :-
|
|
planEdge(Source, Target, Term, Result),
|
|
edgeSelectivity(Source, Target, Sel),
|
|
cost(Term, Sel, Size, Cost).
|
|
|
|
% Version with properties
|
|
|
|
costEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result,
|
|
Size, Cost) :-
|
|
planEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result),
|
|
edgeSelectivity(Source, Target, Sel),
|
|
cost(Plan, Sel, Size, Cost).
|
|
|
|
|
|
% end RHG 2014
|
|
|
|
createCostEdge :-
|
|
planEdge(Source, Target, Term, Result),
|
|
edgeSelectivity(Source, Target, Sel),
|
|
cost(Term, Sel, Size, Cost),
|
|
assert(costEdge(Source, Target, Term, Result, Size, Cost)),
|
|
fail.
|
|
|
|
createCostEdges :- not(createCostEdge).
|
|
|
|
deleteCostEdge :-
|
|
retract(costEdge(_, _, _, _, _, _)), fail.
|
|
|
|
deleteCostEdges :- not(deleteCostEdge).
|
|
|
|
writeCostEdge :-
|
|
costEdge(Source, Target, Plan, Result, Size, Cost),
|
|
write('Source: '), write(Source), nl,
|
|
write('Target: '), write(Target), nl,
|
|
write('Plan: '), wp(Plan), nl,
|
|
write('Result: '), write(Result), nl,
|
|
write('Size: '), write(Size), nl,
|
|
write('Cost: '), write(Cost), nl,
|
|
nl,
|
|
ce(N), retract(ce(_)), N1 is N + 1, assert(ce(N1)), % count edges
|
|
fail.
|
|
|
|
writeCostEdges :-
|
|
assert(ce(0)),
|
|
not(writeCostEdge),
|
|
ce(N),
|
|
write('The total number of cost edges is '), write(N), write('.'), nl.
|
|
|
|
wce :- writeCostEdges.
|
|
|
|
|
|
writeCostEdgeUsed :-
|
|
costEdgeUsed(Source, Version, Target, PropertiesIn, Plan, PropertiesOut,
|
|
Result, Size, Cost),
|
|
write('Source: ('), write(Source), write(', '), write(Version),
|
|
write(')'), nl,
|
|
write('Target: '), write(Target), nl,
|
|
write('PropertiesIn: '), write(PropertiesIn), nl,
|
|
write('Plan: '), wp(Plan), nl,
|
|
write('PropertiesOut: '), write(PropertiesOut), nl,
|
|
write('Result: '), write(Result), nl,
|
|
write('Size: '), write(Size), nl,
|
|
write('Cost: '), write(Cost), nl,
|
|
nl,
|
|
ceu(N), retract(ceu(_)), N1 is N + 1, assert(ceu(N1)), % count edges
|
|
fail.
|
|
|
|
writeCostEdgesUsed :-
|
|
assert(ceu(0)),
|
|
not(writeCostEdgeUsed),
|
|
ceu(N),
|
|
write('The total number of cost edges used is '), write(N), write('.'), nl.
|
|
|
|
wceu :- writeCostEdgesUsed.
|
|
|
|
deleteCostEdgeUsed :-
|
|
retract(costEdgeUsed(_, _, _, _, _, _, _, _, _)), fail.
|
|
|
|
deleteCostEdgesUsed :- not(deleteCostEdgeUsed).
|
|
|
|
|
|
|
|
/*
|
|
---- assignCosts
|
|
----
|
|
|
|
This just puts together creation of sizes and cost edges.
|
|
|
|
*/
|
|
|
|
assignCosts :-
|
|
assignSizes.
|
|
% RHG 2014
|
|
% createCostEdges.
|
|
|
|
|
|
/*
|
|
9 Finding Shortest Paths = Cheapest Plans
|
|
|
|
The cheapest plan corresponds to the shortest path through the predicate order
|
|
graph.
|
|
|
|
9.1 Shortest Path Algorithm by Dijkstra
|
|
|
|
We implement the shortest path algorithm by Dijkstra. There are two
|
|
relevant sets of nodes:
|
|
|
|
* center: the nodes for which shortest paths have already been
|
|
computed
|
|
|
|
* boundary: the nodes that have been seen, but that have not yet been
|
|
expanded. These need to be kept in a priority queue.
|
|
|
|
A node, as used during shortest path computation, is represented as a term
|
|
|
|
---- node(n(Name, Version), Distance, [Path, Properties])
|
|
----
|
|
|
|
where ~Name~ is the node number, ~Version~ a version number of this node, ~Distance~ the distance along the shortest path to this node, ~Path~ is the list of edges forming the shortest path, and ~Properties~ the physical properties (such as order) for the result obtained at this node version.
|
|
|
|
The graph is represented by the set of ~costEdges~.
|
|
|
|
The center is represented as a set of facts of the form
|
|
|
|
---- center(n(Name, Version), node(n(Name, Version), Distance, [Path, Properties]))
|
|
----
|
|
|
|
Since predicates are generally indexed by their first argument, finding a node
|
|
in the center via the node number should be very efficient. We assume it is
|
|
possible in constant time.
|
|
|
|
The boundary is represented by an abstract data type as described in the
|
|
interface below. Essentially it is a priority queue implementation.
|
|
|
|
|
|
---- successor(Node, Succ) :-
|
|
----
|
|
|
|
~Succ~ is a successor of node ~Node~ via some edge. This includes computation
|
|
of the distance and path of the successor.
|
|
|
|
*/
|
|
|
|
% RHG 2014
|
|
|
|
% successor(node(Source,Distance, Path), node(Target, Distance2, Path2)) :-
|
|
% costEdge(Source, Target, Term, Result, Size, Cost),
|
|
% assert(costEdgeUsed(Source, Target, Term, Result, Size, Cost)),
|
|
% Distance2 is Distance + Cost,
|
|
% append(Path, [costEdge(Source, Target, Term, Result, Size, Cost)], Path2).
|
|
|
|
% Version with properties
|
|
|
|
successor(node(n(Source, Version), Distance, [Path, PropertiesIn]),
|
|
simplenode(Target, Distance2, [Path2, PropertiesOut])) :-
|
|
costEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result,
|
|
Size, Cost),
|
|
assert(costEdgeUsed(Source, Version, Target, PropertiesIn, Plan,
|
|
PropertiesOut, Result, Size, Cost)),
|
|
Distance2 is Distance + Cost,
|
|
append(Path, [costEdge(Source, Target, Plan, Result, Size, Cost)], Path2).
|
|
|
|
% end RHG 2014
|
|
|
|
|
|
/*
|
|
|
|
---- dijkstra(Source, Dest, Path, Length) :-
|
|
----
|
|
|
|
The shortest path from ~Source~ to ~Dest~ is ~Path~ of length ~Length~.
|
|
|
|
*/
|
|
|
|
dijkstra(Source, Dest, Path, Length) :-
|
|
emptyCenter,
|
|
b_empty(Boundary),
|
|
deleteCostEdgesUsed, % RHG
|
|
b_insert(Boundary, node(n(Source, 1), 0, [[], []]), Boundary1),
|
|
dijkstra1(Boundary1, n(Dest, 1), 0, notfound),
|
|
center(n(Dest, _), node(n(Dest, _), Length, [Path, _])).
|
|
|
|
emptyCenter :- not(emptyCenter1).
|
|
|
|
emptyCenter1 :- retract(center(_, _)), fail.
|
|
|
|
|
|
/*
|
|
---- dijkstra1(Boundary, Dest, NoOfCalls) :-
|
|
----
|
|
|
|
Compute the shortest paths to all nodes and store them in a predicate
|
|
~center~. Initially to be called with no fact ~center~ asserted, and ~Boundary~
|
|
just containing the start node.
|
|
|
|
For testing we check at which iteration the destination ~Dest~ is reached.
|
|
|
|
*/
|
|
|
|
dijkstra1(Boundary, _, _, found) :- !,
|
|
tree_height(Boundary, H),
|
|
write('Height of search tree for boundary is '), write(H), nl.
|
|
|
|
dijkstra1(Boundary, _, _, _) :- b_isEmpty(Boundary).
|
|
|
|
dijkstra1(Boundary, Dest, N, _) :-
|
|
% nl, nl,
|
|
% write('dijkstra1 called.'), nl,
|
|
% write('Boundary = '), write(Boundary), nl, write('====='), nl,
|
|
b_removemin(Boundary, Node, Bound2),
|
|
Node = node(Name, _, _),
|
|
% write('Node = '), write(Name), nl,
|
|
assert(center(Name, Node)),
|
|
% write('Center = '), writeCenter, nl, write('====='), nl,
|
|
checkDest(Name, Dest, N, Found),
|
|
putsuccessors(Bound2, Node, Bound3),
|
|
% write('putsuccessors succeeded.'), nl,
|
|
N1 is N+1,
|
|
dijkstra1(Bound3, Dest, N1, Found).
|
|
|
|
checkDest(n(Name, _), n(Name, _), N, found) :- write('Destination node '),
|
|
write(Name), write(' reached at iteration '), write(N), nl.
|
|
|
|
checkDest(_, _, _, notfound).
|
|
|
|
|
|
/*
|
|
Some auxiliary functions for testing:
|
|
|
|
*/
|
|
|
|
writeList([]).
|
|
writeList([X | Rest]) :- nl, nl, write('-----'), nl, write(X), writeList(Rest).
|
|
|
|
writeCenter :- not(writeCenter1).
|
|
writeCenter1 :-
|
|
center(_, node(Name, Distance, Path)),
|
|
write('Node: '), write(Name), nl,
|
|
write('Cost: '), write(Distance), nl,
|
|
write('Path: '), nl, write(Path), nl, fail.
|
|
|
|
writePath([]).
|
|
writePath([costEdge(Source, Target, Term, Result, Size, Cost) | Path]) :-
|
|
write(costEdge(Source, Target, Result, Size, Cost)), nl,
|
|
write(' '), wp(Term), nl,
|
|
writePath(Path).
|
|
|
|
/*
|
|
---- putsuccessors(Boundary, Node, BoundaryNew) :-
|
|
----
|
|
|
|
Insert into ~Boundary~ all successors of node ~Node~ not yet present in
|
|
the center, updating their distance if they are already present, to obtain
|
|
~BoundaryNew~.
|
|
|
|
*/
|
|
putsuccessors(Boundary, Node, BoundaryNew) :-
|
|
findall(Succ, successor(Node, Succ), Successors),
|
|
|
|
% write('successors of '), write(Node), nl,
|
|
% writeList(Successors), nl, nl,
|
|
|
|
putsucc1(Boundary, Successors, BoundaryNew).
|
|
|
|
% write('the new boundary is: '), write(BoundaryNew),
|
|
% nl, write('====='), nl.
|
|
|
|
/*
|
|
---- putsucc1(Boundary, Successors, BoundaryNew) :-
|
|
----
|
|
|
|
put all successors not yet in the center from the list ~Successors~ into the
|
|
~Boundary~ to get ~BoundaryNew~. The cases to be distinguished are:
|
|
|
|
* The list of successors is empty.
|
|
|
|
* The first successor simplenode(N, \_, \_) is already in the center, hence the shortest path to it is already known and it does not need to be inserted into the boundary.
|
|
|
|
* The first successor X = simplenode(N, \_, \_) exists in the boundary. That means, there exists a non-empty set V(N) with versions of N in the boundary. We say, X dominates Y iff the distance of X is less than or equal to that of Y and the properties of X include those of Y.
|
|
|
|
* If X is not dominated by any Y in V(N), then insert X into the boundary.
|
|
|
|
* If X dominates any Y in V(N), then remove Y from the boundary.
|
|
|
|
* The first successor does not exist in the boundary. It is inserted.
|
|
|
|
*/
|
|
|
|
putsucc1(Boundary, [], Boundary).
|
|
|
|
putsucc1(Boundary, [simplenode(N, _, _) | Successors], BNew) :-
|
|
center(n(N, 1), _), !,
|
|
putsucc1(Boundary, Successors, BNew).
|
|
|
|
putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :-
|
|
findall(Node, b_memberByName(Boundary, n(N, _), Node), Nodes),
|
|
insertIfNotDominated(Boundary, simplenode(N, D, P), Nodes, 1, Boundary2),
|
|
removeThoseDominated(Boundary2, simplenode(N, D, P), Nodes, Boundary3),
|
|
putsucc1(Boundary3, Successors, BNew).
|
|
|
|
% putsucc1(Boundary, [simplenode(N, D, [_, Properties]) | Successors],
|
|
% BNew) :-
|
|
% b_memberByName(Boundary, n(N, 1), node(n(N, 1), DistOld, [_, Properties)),
|
|
% DistOld =< D, !,
|
|
% putsucc1(Boundary, Successors, BNew).
|
|
|
|
% putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :-
|
|
% b_memberByName(Boundary, n(N, 1), node(n(N, 1), DistOld, _)),
|
|
% D < DistOld, !,
|
|
% b_deleteByName(Boundary, n(N, 1), Bound2),
|
|
% b_insert(Bound2, node(n(N, 1), D, P), Bound3),
|
|
% putsucc1(Bound3, Successors, BNew).
|
|
|
|
% the following not needed
|
|
|
|
% putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :-
|
|
% nl,
|
|
% write('putsucc1 called with final case'), nl,
|
|
% write(simplenode(N, D, P)), nl,
|
|
% b_insert(Boundary, node(n(N, 1), D, P), Bound2),
|
|
% putsucc1(Bound2, Successors, BNew).
|
|
|
|
|
|
insertIfNotDominated(Boundary, simplenode(N, D, P), [], Version, BoundaryOut) :-
|
|
b_insert(Boundary, node(n(N, Version), D, P), BoundaryOut).
|
|
% nl, write('***** inserted '), write(node(n(N, Version), D, P)), nl.
|
|
|
|
insertIfNotDominated(Boundary, simplenode(N, D, [Path, Prop]),
|
|
[node(n(N, V), DistOld, [_, PropOld]) | Nodes], Version, BoundaryOut) :-
|
|
( D < DistOld ; otherProperties(Prop, PropOld) ), % not dominated
|
|
( V > Version
|
|
-> Version2 is V + 1
|
|
; Version2 is Version + 1
|
|
),
|
|
insertIfNotDominated(Boundary, simplenode(N, D, [Path, Prop]), Nodes,
|
|
Version2, BoundaryOut).
|
|
|
|
|
|
insertIfNotDominated(Boundary, simplenode(N, D, [_, Prop]),
|
|
[node(n(N, _), DistOld, [_, PropOld]) | _], _, Boundary) :-
|
|
% nl, write('***** NOT inserted '), write(simplenode(N, D, [Path, Prop])), nl,
|
|
D >= DistOld,
|
|
included(Prop, PropOld). % is dominated and can be ignored.
|
|
|
|
|
|
|
|
|
|
removeThoseDominated(Boundary, simplenode(_, _, [_, _]), [], Boundary).
|
|
|
|
removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]),
|
|
[node(n(N, _), DistOld, [_, PropOld]) | Nodes], Boundary2) :-
|
|
( DistOld =< D ; otherProperties(PropOld, Prop) ), !, % not dominated
|
|
removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]), Nodes,
|
|
Boundary2).
|
|
|
|
removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]),
|
|
[node(n(N, V), _, [_, _]) | Nodes], Boundary3) :-
|
|
b_deleteByName(Boundary, n(N, V), Boundary2),
|
|
% nl, write('***** deleted '), write(n(N, V)), nl,
|
|
removeThoseDominated(Boundary2, simplenode(N, D, [Path, Prop]), Nodes,
|
|
Boundary3).
|
|
|
|
|
|
|
|
:-dynamic noProperties/0.
|
|
|
|
included(_, _) :- noProperties, !.
|
|
|
|
included([[Node, List1] | Props1], Props2) :-
|
|
select([Node, List2], Props2, Props2Rest),
|
|
included2(List1, List2),
|
|
included(Props1, Props2Rest).
|
|
|
|
included([], _).
|
|
|
|
|
|
included2([], _).
|
|
|
|
included2([P1 | Props1], Props2) :-
|
|
select(P1, Props2, Props2Rest),
|
|
included2(Props1, Props2Rest).
|
|
|
|
included2([none], _).
|
|
|
|
|
|
otherProperties(Props1, Props2) :-
|
|
not(included(Props1, Props2)).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
9.2 Interface ~Boundary~
|
|
|
|
The boundary is represented in a data structure with the following
|
|
operations:
|
|
|
|
---- b_empty(-Boundary) :-
|
|
----
|
|
|
|
Creates an empty boundary and returns it.
|
|
|
|
---- b_isEmpty(+Boundary) :-
|
|
----
|
|
|
|
Checks whether the boundary is empty.
|
|
|
|
|
|
---- b_removemin(+Boundary, -Node, -BoundaryOut) :-
|
|
----
|
|
|
|
Returns the node ~Node~ with minimal distance from the set ~Boundary~ and
|
|
returns also ~BoundaryOut~ where this node is removed.
|
|
|
|
---- b_insert(+Boundary, +Node, -BoundaryOut) :-
|
|
----
|
|
|
|
Inserts a node that must not yet be present (i.e., no other node of that
|
|
name).
|
|
|
|
---- b_memberByName(+Boundary, +Name, -Node) :-
|
|
----
|
|
|
|
If a node ~Node~ with name ~Name~ is present, it is returned.
|
|
|
|
---- b_deleteByName(+Boundary, +Name, -BoundaryOut) :-
|
|
----
|
|
|
|
Returns the boundary, where the node with name ~Name~ is deleted.
|
|
|
|
*/
|
|
|
|
/*
|
|
9.3 Constructing the Plan from the Shortest Path
|
|
|
|
---- plan(Path, Plan)
|
|
----
|
|
|
|
The plan corresponding to ~Path~ is ~Plan~.
|
|
|
|
*/
|
|
|
|
%fapra 15/16
|
|
|
|
plan(Path, Plan) :-
|
|
isDistributedQuery,
|
|
!,
|
|
deleteNodePlans,
|
|
mergePlanEdges(Path, MergedPath),
|
|
traversePath(MergedPath),
|
|
highNode(N),
|
|
nodePlan(N, Plan).
|
|
|
|
%end fapra 15/16
|
|
|
|
plan(Path, Plan) :-
|
|
deleteNodePlans,
|
|
traversePath(Path),
|
|
highNode(N),
|
|
nodePlan(N, Plan).
|
|
|
|
|
|
deleteNodePlans :- not(deleteNodePlan).
|
|
|
|
deleteNodePlan :- retract(nodePlan(_, _)), fail.
|
|
|
|
traversePath([]).
|
|
|
|
traversePath([costEdge(_, _, Term, Result, _, _) | Path]) :-
|
|
embedSubPlans(Term, Term2),
|
|
assert(nodePlan(Result, Term2)),
|
|
traversePath(Path).
|
|
|
|
embedSubPlans(res(N), Term) :-
|
|
nodePlan(N, Term), !.
|
|
|
|
embedSubPlans(Term, Term2) :-
|
|
compound(Term), !,
|
|
Term =.. [Functor | Args],
|
|
embedded(Args, Args2),
|
|
Term2 =.. [Functor | Args2].
|
|
|
|
embedSubPlans(Term, Term).
|
|
|
|
|
|
embedded([], []).
|
|
|
|
embedded([Arg | Args], [Arg2 | Args2]) :-
|
|
embedSubPlans(Arg, Arg2),
|
|
embedded(Args, Args2).
|
|
|
|
%fapra 15/16
|
|
|
|
/*
|
|
|
|
---- mergePlanEdges(PlanEdgeList, MergedEdgesList)
|
|
----
|
|
|
|
Merge the distribution of a query on a distributed query result
|
|
to the distribution of the query on a query result. Example:
|
|
dmap(... filter(.,bla1)) dmap filter(., bla2)
|
|
...becomes: dmap(... filter(filter(., bla1), bla2))
|
|
|
|
*/
|
|
|
|
mergePlanEdges([], []).
|
|
mergePlanEdges([X], [X]).
|
|
|
|
/*
|
|
Merge rule for two successive dmaps with filtrations as there parameters
|
|
should be the most common case.
|
|
|
|
*/
|
|
|
|
mergePlanEdges([Edge1, Edge2|Edges], MergedEdges) :-
|
|
Edge1 = costEdge(Source, _, Plan1, Res1, _, C1),
|
|
Edge2 = costEdge(_, Target, Plan2, Res2, S2, C2),
|
|
Plan1 = dmap(Arg, _, filter(FilterArg, Pred1)),
|
|
successiveFilterOnParam(FilterArg, ArgTerm),
|
|
Plan2 = dmap(res(Res1), ResName, filter(ArgTerm, Pred2)),
|
|
MergedPlan = dmap(Arg, ResName,
|
|
filter(filter(FilterArg, Pred1), Pred2)),
|
|
% the plan is already chosen at this point, so costs will have no influence
|
|
MergedCosts is C1 + C2,
|
|
MergedHead = costEdge(Source, Target, MergedPlan, Res2, S2, MergedCosts),
|
|
mergePlanEdges([MergedHead|Edges], MergedEdges).
|
|
|
|
% First two edges cannot be merges according to the above rules.
|
|
mergePlanEdges([X|Tail], [X|MergedTail]) :-
|
|
mergePlanEdges(Tail, MergedTail).
|
|
|
|
% Term is a dot or a nested filtration on a dot.
|
|
successiveFilterOnParam(Term, ArgTerm) :-
|
|
functor(Term, filter, 2),
|
|
arg(1, Term, FirstArg),
|
|
successiveFilterOnParam(FirstArg, ArgTerm).
|
|
|
|
successiveFilterOnParam(Term, Term) :-
|
|
Term = feed(rel('.', _, _)).
|
|
|
|
successiveFilterOnParam(Term, Term) :-
|
|
Term = rename(feed(rel('.', _, _)), _).
|
|
|
|
%end fapra 15/16
|
|
|
|
% highestNode(Path, N) :-
|
|
% reverse(Path, Path2),
|
|
% Path2 = [costEdge(_, N, _, _, _, _) | _].
|
|
|
|
|
|
/*
|
|
9.4 Computing the Best Plan for a Given Predicate Order Graph
|
|
|
|
*/
|
|
|
|
bestPlan :-
|
|
assignCosts,
|
|
highNode(N),
|
|
dijkstra(0, N, Path, Cost),
|
|
plan(Path, Plan),
|
|
write('The best plan is:'), nl, nl,
|
|
wp(Plan),
|
|
nl, nl,
|
|
write('The cost is: '), write(Cost), nl.
|
|
|
|
bestPlan(Plan, Cost) :-
|
|
assignCosts,
|
|
highNode(N),
|
|
dijkstra(0, N, Path, Cost),
|
|
plan(Path, Plan).
|
|
|
|
/*
|
|
10 A Larger Example
|
|
|
|
It is now time to test efficiency with a larger example. We consider the query:
|
|
|
|
---- select *
|
|
from Staedte, plz as p1, plz as p2, plz as p3,
|
|
where SName = p1.Ort
|
|
and p1.PLZ = p2.PLZ + 1
|
|
and p2.PLZ = p3.PLZ * 5
|
|
and Bev > 300000
|
|
and Bev < 500000
|
|
and p2.PLZ > 50000
|
|
and p2.PLZ < 60000
|
|
and Kennzeichen starts "W"
|
|
and p3.Ort contains "burg"
|
|
and p3.Ort starts "M"
|
|
----
|
|
|
|
This translates to:
|
|
|
|
*/
|
|
|
|
example6 :- pog(
|
|
[rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)],
|
|
[
|
|
pr(attr(sName, 1, u) = attr(p1:ort, 2, u),
|
|
rel(staedte, *, u), rel(plz, p1, l)),
|
|
pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1),
|
|
rel(plz, p1, l), rel(plz, p2, l)),
|
|
pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5),
|
|
rel(plz, p2, l), rel(plz, p3, l)),
|
|
pr(attr(bev, 1, u) > 300000, rel(staedte, *, u)),
|
|
pr(attr(bev, 1, u) < 500000, rel(staedte, *, u)),
|
|
pr(attr(p2:pLZ, 1, u) > 50000, rel(plz, p2, l)),
|
|
pr(attr(p2:pLZ, 1, u) < 60000, rel(plz, p2, l)),
|
|
pr(attr(kennzeichen, 1, u) starts "W", rel(staedte, *, u)),
|
|
pr(attr(p3:ort, 1, u) contains "burg", rel(plz, p3, l)),
|
|
pr(attr(p3:ort, 1, u) starts "M", rel(plz, p3, l))
|
|
],
|
|
_, _).
|
|
|
|
/*
|
|
This doesn't work (initially, now it works). Let's keep the numbers a bit
|
|
smaller and avoid too many big joins first.
|
|
|
|
*/
|
|
example7 :- pog(
|
|
[rel(staedte, *, u), rel(plz, p1, l)],
|
|
[
|
|
pr(attr(sName, 1, u) = attr(p1:ort, 2, u),
|
|
rel(staedte, *, u), rel(plz, p1, l)),
|
|
pr(attr(bev, 0, u) > 300000, rel(staedte, *, u)),
|
|
pr(attr(bev, 0, u) < 500000, rel(staedte, *, u)),
|
|
pr(attr(p1:pLZ, 0, u) > 50000, rel(plz, p1, l)),
|
|
pr(attr(p1:pLZ, 0, u) < 60000, rel(plz, p1, l)),
|
|
pr(attr(kennzeichen, 0, u) starts "F", rel(staedte, *, u)),
|
|
pr(attr(p1:ort, 0, u) contains "burg", rel(plz, p1, l)),
|
|
pr(attr(p1:ort, 0, u) starts "M", rel(plz, p1, l))
|
|
],
|
|
_, _).
|
|
|
|
example8 :- pog(
|
|
[rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l)],
|
|
[
|
|
pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u),
|
|
rel(plz, p1, l)),
|
|
pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), rel(plz, p1, l),
|
|
rel(plz, p2, l)),
|
|
pr(attr(bev, 0, u) > 300000, rel(staedte, *, u)),
|
|
pr(attr(bev, 0, u) < 500000, rel(staedte, *, u)),
|
|
pr(attr(p1:pLZ, 0, u) > 50000, rel(plz, p1, l)),
|
|
pr(attr(p1:pLZ, 0, u) < 60000, rel(plz, p1, l)),
|
|
pr(attr(kennzeichen, 0, u) starts "F", rel(staedte, *, u)),
|
|
pr(attr(p1:ort, 0, u) contains "burg", rel(plz, p1, l)),
|
|
pr(attr(p1:ort, 0, u) starts "M", rel(plz, p1, l))
|
|
],
|
|
_, _).
|
|
|
|
/*
|
|
Let's study a small example again with two independent conditions.
|
|
|
|
*/
|
|
|
|
example9 :- pog([rel(staedte, s, u), rel(plz, p, l)],
|
|
[pr(attr(p:ort, 2, u) = attr(s:sName, 1, u),
|
|
rel(staedte, s, u), rel(plz, p, l) ),
|
|
pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l)),
|
|
pr(attr(s:bev, 0, u) > 300000, rel(staedte, s, u))], _, _).
|
|
|
|
example10 :- pog(
|
|
[rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)],
|
|
[
|
|
pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u),
|
|
rel(plz, p1, l)),
|
|
pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), rel(plz, p1, l),
|
|
rel(plz, p2, l)),
|
|
pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5), rel(plz, p2, l),
|
|
rel(plz, p3, l))
|
|
],
|
|
_, _).
|
|
|
|
/*
|
|
11 A User Level Language
|
|
|
|
We have started to construct the optimizer by building the predicate order
|
|
graph, using a notation for relations and predicates as useful for that
|
|
purpose. Later, in [Section Translation], we have adapted the notation to be
|
|
able to translate and construct query plans as needed in Secondo. In this
|
|
section we will introduce a more user friendly notation for queries, pretty
|
|
similar to SQL, but suitable for being written directly in PROLOG.
|
|
|
|
11.1 The Language
|
|
|
|
The basic select-from-where statement will be written as
|
|
|
|
---- select <attr-list>
|
|
from <rel-list>
|
|
where <pred-list>
|
|
----
|
|
|
|
The first example query from [Section 4.1.1] can then be written as:
|
|
|
|
---- select [sname, bev]
|
|
from [staedte]
|
|
where [bev > 500000]
|
|
----
|
|
|
|
Instead of lists consisting of a single element we will also support writing
|
|
just the element, hence the query can also be written:
|
|
|
|
---- select [sname, bev]
|
|
from staedte
|
|
where bev > 500000
|
|
----
|
|
|
|
The second query can be written as:
|
|
|
|
---- select *
|
|
from [staedte as s, plz as p]
|
|
where [sname = p:ort, p:plz > 40000]
|
|
----
|
|
|
|
Note that all relation names and attribute names are written just in lower
|
|
case; the system will lookup the spelling in a table.
|
|
|
|
Furthermore, it will be possible to add a groupby- and an orderby-clause:
|
|
|
|
* groupby
|
|
|
|
---- select <aggr-list>
|
|
from <rel-list>
|
|
where <pred-list>
|
|
groupby <group-attr-list>
|
|
----
|
|
|
|
Example:
|
|
|
|
----
|
|
select [ort, min(plz) as minplz, max(plz) as maxplz, count(*) as cntplz]
|
|
from plz
|
|
where plz > 40000
|
|
groupby ort
|
|
----
|
|
|
|
* orderby
|
|
|
|
---- select <attr-list>
|
|
from <rel-list>
|
|
where <pred-list>
|
|
orderby <order-attr-list>
|
|
----
|
|
|
|
Example:
|
|
|
|
---- select [ort, plz]
|
|
from plz
|
|
orderby [ort asc, plz desc]
|
|
----
|
|
|
|
This example also shows that the where-clause may be omitted. It is also
|
|
possible to combine grouping and ordering:
|
|
|
|
----
|
|
select [ort, min(plz) as minplz, max(plz) as maxplz, count(*) as cntplz]
|
|
from plz
|
|
where plz > 40000
|
|
groupby ort
|
|
orderby cntplz desc
|
|
----
|
|
|
|
Currently only a basic part of this language has been implemented.
|
|
|
|
|
|
11.2 Structure
|
|
|
|
We introduce ~select~, ~from~, ~where~, and ~as~ as PROLOG operators:
|
|
|
|
*/
|
|
|
|
:- op(990, fx, sql).
|
|
:- op(985, xfx, >>).
|
|
:- op(950, fx, select).
|
|
:- op(960, xfx, from).
|
|
:- op(950, xfx, where).
|
|
:- op(930, xfx, as).
|
|
:- op(970, xfx, groupby).
|
|
:- op(980, xfx, orderby).
|
|
:- op(930, xf, asc).
|
|
:- op(930, xf, desc).
|
|
|
|
/*
|
|
This ensures that the select-from-where statement is viewed as a term with the
|
|
structure:
|
|
|
|
---- from(select(AttrList(), where(RelList, PredList))
|
|
----
|
|
|
|
That this works, can be tested with:
|
|
|
|
---- P = (select s:sname from staedte as s where s:bev > 500000),
|
|
P = (X from Y), X = (select AttrList), Y = (RelList where PredList),
|
|
RelList = (Rel as Var).
|
|
----
|
|
|
|
The result is:
|
|
|
|
---- P = select s:sname from staedte as s where s:bev>500000
|
|
X = select s:sname
|
|
Y = staedte as s where s:bev>500000
|
|
AttrList = s:sname
|
|
RelList = staedte as s
|
|
PredList = s:bev>500000
|
|
Rel = staedte
|
|
Var = s
|
|
----
|
|
|
|
11.3 Schema Lookup
|
|
|
|
The second task is to lookup attribute names in order to build the input
|
|
notation for the construction of the predicate order graph.
|
|
|
|
11.3.1 Tables
|
|
|
|
In the file ~database~ we maintain the following tables.
|
|
|
|
Relation schemas are written as:
|
|
|
|
---- relation(staedte, [sname, bev, plz, vorwahl, kennzeichen]).
|
|
relation(plz, [plz, ort]).
|
|
----
|
|
|
|
The spelling of relation or attribute names is given in a table
|
|
|
|
---- spelling(staedte:plz, pLZ).
|
|
spelling(staedte:sname, sName).
|
|
spelling(plz, lc(plz)).
|
|
spelling(plz:plz, pLZ).
|
|
----
|
|
|
|
The default assumption is that the first letter of a name is upper case and all
|
|
others are lower case. If this is true, then no entry in the table ~spelling~
|
|
is needed. If a name starts with a lower case letter, then this is expressed by
|
|
the functor ~lc~.
|
|
|
|
11.3.2 Looking up Relation and Attribute Names
|
|
|
|
*/
|
|
|
|
callLookup(Query, Query2) :-
|
|
newQuery,
|
|
lookup(Query, Query2), !.
|
|
|
|
%fapra 2015/16
|
|
|
|
/*
|
|
added clearIsDistributedQuery
|
|
|
|
*/
|
|
|
|
newQuery :- not(clearVariables), not(clearQueryRelations),
|
|
not(clearQueryAttributes), not(clearIsDistributedQuery),
|
|
not(clearIsLocalQuery).
|
|
|
|
clearVariables :- retract(variable(_, _)), fail.
|
|
|
|
clearQueryRelations :- retract(queryRel(_, _)), fail.
|
|
|
|
clearQueryAttributes :- retract(queryAttr(_)), fail.
|
|
|
|
clearIsDistributedQuery :- retract(isDistributedQuery), fail.
|
|
|
|
clearIsLocalQuery :- retract(isLocalQuery), fail.
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
|
|
---- lookup(Query, Query2) :-
|
|
----
|
|
|
|
~Query2~ is a modified version of ~Query~ where all relation names and
|
|
attribute names have the form as required in [Section Translation].
|
|
|
|
*/
|
|
|
|
lookup(select Attrs from Rels where Preds,
|
|
select Attrs2 from Rels2List where Preds2List) :-
|
|
lookupRels(Rels, Rels2),
|
|
checkDistributedQuery,
|
|
lookupAttrs(Attrs, Attrs2),
|
|
lookupPreds(Preds, Preds2),
|
|
makeList(Rels2, Rels2List),
|
|
makeList(Preds2, Preds2List).
|
|
|
|
lookup(select Attrs from Rels,
|
|
select Attrs2 from Rels2) :-
|
|
lookupRels(Rels, Rels2),
|
|
checkDistributedQuery,
|
|
lookupAttrs(Attrs, Attrs2).
|
|
|
|
lookup(Query orderby Attrs, Query2 orderby Attrs3) :-
|
|
lookup(Query, Query2),
|
|
makeList(Attrs, Attrs2),
|
|
lookupAttrs(Attrs2, Attrs3).
|
|
|
|
lookup(Query groupby Attrs, Query2 groupby Attrs3) :-
|
|
lookup(Query, Query2),
|
|
makeList(Attrs, Attrs2),
|
|
lookupAttrs(Attrs2, Attrs3).
|
|
|
|
|
|
makeList(L, L) :- is_list(L).
|
|
|
|
makeList(L, [L]) :- not(is_list(L)).
|
|
|
|
/*
|
|
|
|
11.3.3 Modification of the From-Clause
|
|
|
|
---- lookupRels(Rels, Rels2)
|
|
----
|
|
|
|
Modify the list of relation names. If there are relations without variables,
|
|
store them in a table ~queryRel~. Any two such relations must have distinct
|
|
sets of attribute names. Also, any two variables must be distinct.
|
|
|
|
*/
|
|
|
|
lookupRels([], []).
|
|
|
|
lookupRels([R | Rs], [R2 | R2s]) :-
|
|
lookupRel(R, R2),
|
|
lookupRels(Rs, R2s).
|
|
|
|
lookupRels(Rel, Rel2) :-
|
|
not(is_list(Rel)),
|
|
lookupRel(Rel, Rel2).
|
|
|
|
/*
|
|
---- lookupRel(Rel, Rel2) :-
|
|
----
|
|
|
|
Translate and store a single relation definition.
|
|
|
|
*/
|
|
|
|
:- dynamic
|
|
variable/2,
|
|
queryRel/2,
|
|
queryAttr/1.
|
|
|
|
lookupRel(Rel as Var, rel(Rel2, Var, Case)) :-
|
|
removeDistributedSuffix(Rel,DRel),
|
|
relation(DRel, _), !,
|
|
spelled(DRel, Rel2, Case),
|
|
not(defined(Var)),
|
|
assert(variable(Var, rel(Rel2, Var, Case))).
|
|
|
|
lookupRel(Rel, rel(Rel2, *, Case)) :-
|
|
removeDistributedSuffix(Rel,DRel),
|
|
relation(DRel, _), !,
|
|
spelled(DRel, Rel2, Case),
|
|
not(duplicateAttrs(Rel)),
|
|
assert(queryRel(DRel, rel(Rel2, *, Case))).
|
|
|
|
lookupRel(Term, Term) :-
|
|
write('Error in query: relation '), write(Term), write(' not known'),
|
|
nl, fail.
|
|
|
|
defined(Var) :-
|
|
variable(Var, _),
|
|
write('Error in query: doubly defined variable '), write(Var), write('.'), nl.
|
|
|
|
|
|
|
|
%fapra 2015/16
|
|
|
|
/*
|
|
Checks if all relations are distributed. Currently the
|
|
optimizer can only handle queries including relations, that
|
|
are all local or distributed. Situations with mixed
|
|
relationtypes will be discarded.
|
|
|
|
*/
|
|
|
|
%handle not distributed queries
|
|
checkDistributedQuery :-
|
|
not(isDistributedQuery),
|
|
isLocalQuery,
|
|
!.
|
|
|
|
checkDistributedQuery :-
|
|
isDistributedQuery,
|
|
not(isLocalQuery),
|
|
!.
|
|
|
|
checkDistributedQuery :-
|
|
write('Error in query: not all relations distributed '),
|
|
fail,
|
|
!.
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
---- duplicateAttrs(Rel) :-
|
|
----
|
|
|
|
There is a relation stored in ~queryRel~ that has attribute names also
|
|
occurring in ~Rel~.
|
|
|
|
*/
|
|
|
|
duplicateAttrs(Rel) :-
|
|
queryRel(Rel2, _),
|
|
relation(Rel2, Attrs2),
|
|
member(Attr, Attrs2),
|
|
relation(Rel, Attrs),
|
|
member(Attr, Attrs),
|
|
write('Error in query: duplicate attribute names in relations '),
|
|
write(Rel2), write(' and '), write(Rel), write('.'), nl.
|
|
|
|
/*
|
|
11.3.4 Modification of the Select-Clause
|
|
|
|
*/
|
|
|
|
lookupAttrs([], []).
|
|
|
|
lookupAttrs([A | As], [A2 | A2s]) :-
|
|
lookupAttr(A, A2),
|
|
lookupAttrs(As, A2s).
|
|
|
|
lookupAttrs(Attr, Attr2) :-
|
|
not(is_list(Attr)),
|
|
lookupAttr(Attr, Attr2).
|
|
|
|
lookupAttr(Var:Attr, attr(Var:Attr2, 0, Case)) :- !,
|
|
variable(Var, Rel2),
|
|
Rel2 = rel(Rel, _, _),
|
|
spelled(Rel:Attr, attr(Attr2, _, Case)).
|
|
|
|
lookupAttr(Attr asc, Attr2 asc) :- !,
|
|
lookupAttr(Attr, Attr2).
|
|
|
|
lookupAttr(Attr desc, Attr2 desc) :- !,
|
|
lookupAttr(Attr, Attr2).
|
|
|
|
lookupAttr(Attr, Attr2) :-
|
|
isAttribute(Attr, Rel), !,
|
|
spelled(Rel:Attr, Attr2).
|
|
|
|
lookupAttr(*, *) :- !.
|
|
|
|
lookupAttr(count(*), count(*)) :- !.
|
|
|
|
lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :-
|
|
lookupAttr(Expr, Expr2),
|
|
not(queryAttr(attr(Name, 0, u))),
|
|
!,
|
|
assert(queryAttr(attr(Name, 0, u))).
|
|
|
|
lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :-
|
|
lookupAttr(Expr, Expr2),
|
|
queryAttr(attr(Name, 0, u)),
|
|
!,
|
|
write('***** Error: attribute name '), write(Name),
|
|
write(' doubly defined in query.'),
|
|
nl.
|
|
|
|
lookupAttr(Term, Term2) :-
|
|
compound(Term),
|
|
functor(Term, Op, 1),
|
|
arg(1, Term, Arg1),
|
|
lookupAttr(Arg1, Res1),
|
|
functor(Term2, Op, 1),
|
|
arg(1, Term2, Res1).
|
|
|
|
lookupAttr(Name, attr(Name, 0, u)) :-
|
|
queryAttr(attr(Name, 0, u)),
|
|
!.
|
|
|
|
lookupAttr(Name, Name) :-
|
|
write('Error in attribute list: could not recognize '), write(Name), nl, fail.
|
|
|
|
isAttribute(Name, Rel) :-
|
|
queryRel(Rel, _),
|
|
relation(Rel, List),
|
|
member(Name, List).
|
|
|
|
|
|
/*
|
|
11.3.5 Modification of the Where-Clause
|
|
|
|
*/
|
|
|
|
lookupPreds([], []).
|
|
|
|
lookupPreds([P | Ps], [P2 | P2s]) :- !,
|
|
lookupPred(P, P2),
|
|
lookupPreds(Ps, P2s).
|
|
|
|
lookupPreds(Pred, Pred2) :-
|
|
not(is_list(Pred)),
|
|
lookupPred(Pred, Pred2).
|
|
|
|
|
|
lookupPred(Pred, pr(Pred2, Rel)) :-
|
|
lookupPred1(Pred, Pred2, 0, [], 1, [Rel]), !.
|
|
|
|
lookupPred(Pred, pr(Pred2, Rel1, Rel2)) :-
|
|
lookupPred1(Pred, Pred2, 0, [], 2, [Rel1, Rel2]), !.
|
|
|
|
lookupPred(Pred, _) :-
|
|
lookupPred1(Pred, _, 0, [], 0, []),
|
|
write('Error in query: constant predicate is not allowed.'), nl, fail, !.
|
|
|
|
lookupPred(Pred, _) :-
|
|
lookupPred1(Pred, _, 0, [], N, _),
|
|
N > 2,
|
|
write('Error in query: predicate involving more than two relations '),
|
|
write('is not allowed.'), nl, fail.
|
|
|
|
/*
|
|
---- lookupPred1(+Pred, Pred2, +N, +RelsBefore, -M, -RelsAfter) :-
|
|
----
|
|
|
|
~Pred2~ is the transformed version of ~Pred~; before this is called, ~N~
|
|
attributes in list ~RelsBefore~ have been found; after the transformation in
|
|
total ~M~ attributes referring to the relations in list ~RelsAfter~ have been
|
|
found.
|
|
|
|
*/
|
|
|
|
lookupPred1(Var:Attr, attr(Var:Attr2, N1, Case), N, RelsBefore, N1, RelsAfter)
|
|
:-
|
|
variable(Var, Rel2), !, Rel2 = rel(Rel, _, _),
|
|
spelled(Rel:Attr, attr(Attr2, _, Case)),
|
|
N1 is N + 1,
|
|
append(RelsBefore, [Rel2], RelsAfter).
|
|
|
|
lookupPred1(Attr, attr(Attr2, N1, Case), N, RelsBefore, N1, RelsAfter) :-
|
|
isAttribute(Attr, Rel), !,
|
|
spelled(Rel:Attr, attr(Attr2, _, Case)),
|
|
queryRel(Rel, Rel2),
|
|
N1 is N + 1,
|
|
append(RelsBefore, [Rel2], RelsAfter).
|
|
|
|
lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
|
|
compound(Term),
|
|
functor(Term, F, 1), !,
|
|
arg(1, Term, Arg1),
|
|
lookupPred1(Arg1, Arg1Out, N, RelsBefore, M, RelsAfter),
|
|
functor(Term2, F, 1),
|
|
arg(1, Term2, Arg1Out).
|
|
|
|
lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
|
|
compound(Term),
|
|
functor(Term, F, 2), !,
|
|
arg(1, Term, Arg1),
|
|
arg(2, Term, Arg2),
|
|
lookupPred1(Arg1, Arg1Out, N, RelsBefore, M1, RelsAfter1),
|
|
lookupPred1(Arg2, Arg2Out, M1, RelsAfter1, M, RelsAfter),
|
|
functor(Term2, F, 2),
|
|
arg(1, Term2, Arg1Out),
|
|
arg(2, Term2, Arg2Out).
|
|
|
|
lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
|
|
compound(Term),
|
|
functor(Term, F, 3), !,
|
|
arg(1, Term, Arg1),
|
|
arg(2, Term, Arg2),
|
|
arg(3, Term, Arg3),
|
|
lookupPred1(Arg1, Arg1Out, N, RelsBefore, M1, RelsAfter1),
|
|
lookupPred1(Arg2, Arg2Out, M1, RelsAfter1, M2, RelsAfter2),
|
|
lookupPred1(Arg3, Arg3Out, M2, RelsAfter2, M, RelsAfter),
|
|
functor(Term2, F, 3),
|
|
arg(1, Term2, Arg1Out),
|
|
arg(2, Term2, Arg2Out),
|
|
arg(3, Term2, Arg3Out).
|
|
|
|
% may need to be extended to operators with more than three arguments.
|
|
|
|
%fapra 2015/16
|
|
|
|
/*
|
|
Lookup generic, non- relation objects.
|
|
|
|
If ~Term~ is a secondo object, so mark it with the the functor
|
|
~obj(Term,Type,Case)~. Where ~Term~ is the identifier starting with
|
|
a lower case character and type the kind of object. ~Case~ indicates if the
|
|
object names first letter is written with a capital letter or not (u,l).
|
|
|
|
*/
|
|
|
|
lookupPred1(Term, ObjTerm, N, Rels, N, Rels) :-
|
|
atom(Term),
|
|
not(is_list(Term)),
|
|
spelledObj(Term,Obj,Type,Case),
|
|
ObjTerm = obj(Obj,Type,Case),
|
|
!.
|
|
|
|
lookupPred1(Term, Term, N, Rels, N, Rels) :-
|
|
atom(Term),
|
|
not(is_list(Term)),
|
|
write('Symbol '), write(Term),
|
|
write(' not recognized, supposed to be a Secondo object.'), nl, !.
|
|
|
|
lookupPred1(Term, Term, N, Rels, N, Rels).
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
11.3.6 Check the Spelling of Relation and Attribute Names
|
|
|
|
*/
|
|
|
|
spelled(Rel:Attr, attr(Attr2, 0, l)) :-
|
|
downcase_atom(Rel, DCRel),
|
|
downcase_atom(Attr, DCAttr),
|
|
spelling(DCRel:DCAttr, Attr3),
|
|
Attr3 = lc(Attr2),
|
|
!.
|
|
|
|
spelled(Rel:Attr, attr(Attr2, 0, u)) :-
|
|
downcase_atom(Rel, DCRel),
|
|
downcase_atom(Attr, DCAttr),
|
|
spelling(DCRel:DCAttr, Attr2),
|
|
!.
|
|
|
|
spelled(_:_, attr(_, 0, _)) :- !, fail. % no attr entry in spelling table
|
|
|
|
spelled(Rel, Rel2, l) :-
|
|
downcase_atom(Rel, DCRel),
|
|
spelling(DCRel, Rel3),
|
|
Rel3 = lc(Rel2),
|
|
!.
|
|
|
|
spelled(Rel, Rel2, u) :-
|
|
downcase_atom(Rel, DCRel),
|
|
spelling(DCRel, Rel2), !.
|
|
|
|
% if we do not get a spelling hint,
|
|
% assume it was spelled correctly
|
|
|
|
spelled(Rel, Rel, u) :-
|
|
atom_chars(Rel, [FirstChar|_]),
|
|
char_type(FirstChar, upper),
|
|
write('spelling of '),
|
|
write(Rel),
|
|
write(' could not be determined. Assume it is spelled uppercase'), !.
|
|
|
|
spelled(Rel, Rel, l) :-
|
|
atom_chars(Rel, [FirstChar|_]),
|
|
char_type(FirstChar, lower),
|
|
write('spelling of '),
|
|
write(Rel),
|
|
write(' could not be determined. Assume it is spelled uppercase'), !.
|
|
|
|
spelled(_, _, _) :- !, fail. % no rel entry in spelling table.
|
|
|
|
%fapra 2015/16
|
|
|
|
/*
|
|
11.3.7 Check the spelling of non-relation objects
|
|
|
|
*/
|
|
|
|
spelledObj(Term, Obj, Type, l) :-
|
|
downcase_atom(Term, DcObj),
|
|
objectCatalog(DcObj, LcObj, Type),
|
|
LcObj = lc(Obj),
|
|
!.
|
|
|
|
spelledObj(Term, Obj, Type, u) :-
|
|
downcase_atom(Term, DcObj),
|
|
objectCatalog(DcObj, Obj, Type),
|
|
!.
|
|
|
|
spelledObj(_, _, _, _) :- !, fail. % no entry, avoid backtracking.
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
10.3.8 Examples
|
|
|
|
We can now formulate several of the previous queries at the user level.
|
|
|
|
*/
|
|
|
|
example11 :- showTranslate(select [sname, bev] from staedte where bev > 500000).
|
|
|
|
showTranslate(Query) :-
|
|
callLookup(Query, Query2),
|
|
write(Query), nl,
|
|
write(Query2), nl.
|
|
|
|
example12 :- showTranslate(
|
|
select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000]
|
|
).
|
|
|
|
example13 :- showTranslate(
|
|
select *
|
|
from [staedte, plz as p1, plz as p2, plz as p3]
|
|
where [
|
|
sname = p1:ort,
|
|
p1:plz = p2:plz + 1,
|
|
p2:plz = p3:plz * 5,
|
|
bev > 300000,
|
|
bev < 500000,
|
|
p2:plz > 50000,
|
|
p2:plz < 60000,
|
|
kennzeichen starts "W",
|
|
p3:ort contains "burg",
|
|
p3:ort starts "M"]
|
|
).
|
|
|
|
/*
|
|
11.4 Translating a Query to a Plan
|
|
|
|
---- translate(Query, Stream, SelectClause, Cost) :-
|
|
----
|
|
|
|
~Query~ is translated into a ~Stream~ to which still the translation of the
|
|
~SelectClause~ needs to be applied. A ~Cost~ is returned which currently is
|
|
only the cost for evaluating the essential part, the conjunctive query.
|
|
|
|
*/
|
|
|
|
translate(Query orderby Attrs, sortby(Stream, AttrNames), Select, 0) :-
|
|
!,
|
|
translate(Query, Stream, Select, _),
|
|
attrnamesSort(Attrs, AttrNames).
|
|
|
|
translate(Query groupby Attrs,
|
|
groupby(sortby(Stream, AttrNamesSort), AttrNamesGroup, Fields),
|
|
select Select2, Cost) :-
|
|
translate(Query, Stream, SelectClause, Cost),
|
|
makeList(Attrs, Attrs2),
|
|
attrnames(Attrs2, AttrNamesGroup),
|
|
attrnamesSort(Attrs2, AttrNamesSort),
|
|
SelectClause = (select Select),
|
|
makeList(Select, SelAttrs),
|
|
translateFields(SelAttrs, Attrs2, Fields, Select2),
|
|
!.
|
|
|
|
translate(Select from Rels where Preds, Stream, Select, Cost) :-
|
|
pog(Rels, Preds, _, _),
|
|
bestPlan(Stream, Cost),
|
|
!.
|
|
|
|
%fapra 2015/16
|
|
|
|
translate(Select from Rel, feed(Rel), Select, 0) :-
|
|
not(isDistributedQuery),
|
|
not(is_list(Rel)),
|
|
!.
|
|
|
|
translate(Select from Rel, ObjName,Select, 0) :-
|
|
isDistributedQuery,
|
|
distributedRels(Rel, ObjName, _, _, _),
|
|
not(is_list(Rel)),
|
|
!.
|
|
|
|
translate(Select from Rel, dist(Rel,ObjName),Select, 0) :-
|
|
isDistributedQuery,
|
|
distributedRels(Rel, ObjName, _, _, _),
|
|
not(is_list(Rel)),
|
|
!.
|
|
|
|
translate(Select from [Rel], feed(Rel), Select, 0).
|
|
|
|
translate(Select from [Rel | Rels], product(feed(Rel), Stream), Select, 0) :-
|
|
not(isDistributedQuery),
|
|
translate(Select from Rels, Stream, Select, _).
|
|
|
|
%end fapra 2015/16
|
|
|
|
/*
|
|
---- translateFields(Select, GroupAttrs, Fields, Select2) :-
|
|
----
|
|
|
|
Translate the ~Select~ clause of a query containing ~groupby~. Grouping
|
|
was done by the attributes ~GroupAttrs~. Return a list ~Fields~ of terms
|
|
of the form ~field(Name, Expr)~; such a list can be used as an argument to the
|
|
groupby operator. Also, return a modified select clause ~Select2~,
|
|
which will translate to a corresponding projection operation.
|
|
|
|
*/
|
|
|
|
translateFields([], _, [], []).
|
|
|
|
translateFields([count(*) as NewAttr | Select], GroupAttrs,
|
|
[field(NewAttr , count(feed(group))) | Fields], [NewAttr | Select2]) :-
|
|
translateFields(Select, GroupAttrs, Fields, Select2),
|
|
!.
|
|
|
|
translateFields([sum(Attr) as NewAttr | Select], GroupAttrs,
|
|
[field(NewAttr, sum(feed(group), attrname(Attr))) | Fields],
|
|
[NewAttr| Select2]) :-
|
|
translateFields(Select, GroupAttrs, Fields, Select2),
|
|
!.
|
|
|
|
translateFields([Attr | Select], GroupAttrs, Fields, [Attr | Select2]) :-
|
|
member(Attr, GroupAttrs),
|
|
!,
|
|
translateFields(Select, GroupAttrs, Fields, Select2).
|
|
|
|
|
|
/*
|
|
Generic rule for aggregate functions, similar to sum.
|
|
|
|
*/
|
|
|
|
translateFields([Term as NewAttr | Select], GroupAttrs,
|
|
[field(NewAttr, Term2) | Fields],
|
|
[NewAttr| Select2]) :-
|
|
compound(Term),
|
|
functor(Term, AggrOp, 1),
|
|
arg(1, Term, Attr),
|
|
member(AggrOp, [min, max, avg]),
|
|
functor(Term2, AggrOp, 2),
|
|
arg(1, Term2, feed(group)),
|
|
arg(2, Term2, attrname(Attr)),
|
|
translateFields(Select, GroupAttrs, Fields, Select2),
|
|
!.
|
|
|
|
translateFields([Term | Select], GroupAttrs,
|
|
Fields,
|
|
Select2) :-
|
|
compound(Term),
|
|
functor(Term, AggrOp, 1),
|
|
arg(1, Term, Attr),
|
|
member(AggrOp, [count, sum, min, max, avg]),
|
|
functor(Term2, AggrOp, 2),
|
|
arg(1, Term2, feed(group)),
|
|
arg(2, Term2, attrname(Attr)),
|
|
translateFields(Select, GroupAttrs, Fields, Select2),
|
|
write('*****'), nl,
|
|
write('***** Error in groupby: missing name for new attribute'), nl,
|
|
write('*****'), nl,
|
|
!.
|
|
|
|
|
|
translateFields([Attr | Select], GroupAttrs, Fields, Select2) :-
|
|
not(member(Attr, GroupAttrs)),
|
|
!,
|
|
translateFields(Select, GroupAttrs, Fields, Select2),
|
|
write('*****'), nl,
|
|
write('***** Error in groupby: '),
|
|
write(Attr),
|
|
write(' is neither a grouping attribute'), nl,
|
|
write(' nor an aggregate expression.'), nl,
|
|
write('*****'), nl.
|
|
|
|
%fapra 15/16
|
|
|
|
% Extract parts from a query
|
|
destructureQuery(Select from Rel where Pred, Select, Rel, Pred).
|
|
|
|
% Pred is a predicate about the value of an attribute being equal to given value
|
|
attrValueEqualityPredicate(Pred, Value, Attr, Rel) :-
|
|
Pred = pr(Value = Attr, Rel),
|
|
Attr = attr(_, _, _).
|
|
|
|
attrValueEqualityPredicate(Pred, Value, Attr, Rel) :-
|
|
Pred = pr(Attr = Value, Rel),
|
|
Attr = attr(_, _, _).
|
|
|
|
/*
|
|
|
|
---- substituteSubterm(Substituted, Substitute, OriginalTerm, TermWithSubstitution)
|
|
----
|
|
|
|
Substituting ~Substituted~ for ~Substitute~ on ~OriginalTerm~ yields ~TermWithSubstitution~.
|
|
We have a cut in every clause to remove unnecessary choice points
|
|
during the search for planedges, which ois driven by meta predicates.
|
|
|
|
*/
|
|
|
|
% The whole term is to be substituted:
|
|
substituteSubterm(Substituted, Substitute, Substituted, Substitute):- !.
|
|
|
|
% The whole term doesn't match and it's not compound:
|
|
substituteSubterm(Substituted, _, OriginalTerm, OriginalTerm) :-
|
|
functor(OriginalTerm, _, 0),
|
|
OriginalTerm \= Substituted, !.
|
|
|
|
% The whole term doesn't match and it's compount - dive into its subterms:
|
|
substituteSubterm(Substituted, Substitute, OriginalTerm,
|
|
TermWithSubstitution) :-
|
|
functor(OriginalTerm, Functor, Arity),
|
|
functor(TermWithSubstitution, Functor, Arity),
|
|
substituteSubtermInNthSubterm(Arity, Substituted,
|
|
Substitute, OriginalTerm, TermWithSubstitution), !.
|
|
|
|
% Terminal case. All subterms have been processed.
|
|
substituteSubtermInNthSubterm(0, _, _, _, _):- !.
|
|
|
|
% Generic case. Process nth subterm.
|
|
substituteSubtermInNthSubterm(N, Substituted, Substitute,
|
|
OriginalTerm, TermWithSubstitution) :-
|
|
not(N = 0),
|
|
arg(N, OriginalTerm, OriginalNthTerm),
|
|
substituteSubterm(Substituted, Substitute,
|
|
OriginalNthTerm, NthTermWithSubstitution),
|
|
arg(N, TermWithSubstitution, NthTermWithSubstitution),
|
|
Next is N - 1,
|
|
substituteSubtermInNthSubterm(Next, Substituted,
|
|
Substitute, OriginalTerm, TermWithSubstitution), !.
|
|
|
|
|
|
/*
|
|
|
|
---- queryToPlan(Query, Plan, Cost) :-
|
|
----
|
|
|
|
Translate the ~Query~ into a ~Plan~. The ~Cost~ for evaluating the conjunctive
|
|
query is also returned. The ~Query~ must be such that relation and attribute
|
|
names have been looked up already.
|
|
|
|
fapra 15/16:
|
|
We have a duplicate of each non-distributed clause which treats the distributed case. These
|
|
clauses are guard with an isDistributedQuery goal.
|
|
end fapra 15/16
|
|
|
|
*/
|
|
|
|
queryToPlan(Query, consume(dsummarize(Stream)), Cost) :-
|
|
selectClause(Query, *),
|
|
isDistributedQuery,
|
|
!,
|
|
translate(Query, Stream, select *, Cost).
|
|
|
|
queryToPlan(Query, consume(Stream), Cost) :-
|
|
selectClause(Query, *),
|
|
!,
|
|
translate(Query, Stream, select *, Cost).
|
|
|
|
queryToPlan(Query, count(dsummarize(Stream)), Cost) :-
|
|
selectClause(Query, count(*)),
|
|
isDistributedQuery,
|
|
!,
|
|
translate(Query, Stream, select count(*), Cost).
|
|
|
|
queryToPlan(Query, count(Stream), Cost) :-
|
|
selectClause(Query, count(*)),
|
|
!,
|
|
translate(Query, Stream, select count(*), Cost).
|
|
|
|
%TF: changed to execute projection in dmap operator
|
|
queryToPlan(Query, consume(dsummarize(dmap(Stream," ",
|
|
project(Plan,AttrNames)))), Cost) :-
|
|
isDistributedQuery,
|
|
!,
|
|
translate(Query, dist(rel(_,Var,_),Stream), select Attrs, Cost), !,
|
|
feedRenameRelation(rel(dot,Var,_),Plan),
|
|
makeList(Attrs, Attrs2),
|
|
attrnames(Attrs2, AttrNames).
|
|
|
|
queryToPlan(Query, consume(project(Stream, AttrNames)), Cost) :-
|
|
translate(Query, Stream, select Attrs, Cost), !,
|
|
makeList(Attrs, Attrs2),
|
|
attrnames(Attrs2, AttrNames).
|
|
|
|
%end fapra 15/16
|
|
|
|
/*
|
|
|
|
---- queryToStream(Query, Plan, Cost) :-
|
|
----
|
|
|
|
Same as ~queryToPlan~, but returns a stream plan, if possible. To be used for
|
|
``mixed queries'' that add Secondo operators to the plan built by the optimizer.
|
|
|
|
*/
|
|
|
|
queryToStream(Query, Stream, Cost) :-
|
|
selectClause(Query, *),
|
|
translate(Query, Stream, select *, Cost), !.
|
|
|
|
queryToStream(Query, count(Stream), Cost) :-
|
|
selectClause(Query, count(*)),
|
|
translate(Query, Stream, select count(*), Cost), !.
|
|
|
|
queryToStream(Query, project(Stream, AttrNames), Cost) :-
|
|
translate(Query, Stream, select Attrs, Cost), !,
|
|
makeList(Attrs, Attrs2),
|
|
attrnames(Attrs2, AttrNames).
|
|
|
|
|
|
|
|
/*
|
|
---- selectClause(Query, C) :-
|
|
----
|
|
|
|
The select-clause of the ~Query~ is ~C~.
|
|
|
|
*/
|
|
% allows select [count(*)] to succeed. Activate later on in development.
|
|
%selectClause(select [X] from Y, Z) :-
|
|
% selectClause(select X from Y, Z).
|
|
|
|
selectClause(select * from _, *) :- !.
|
|
|
|
selectClause(select count(*) from _, count(*)) :- !.
|
|
|
|
selectClause(select Attrs from _, Attrs) :- !.
|
|
|
|
selectClause(Query groupby _, C) :- !,
|
|
selectClause(Query, C).
|
|
|
|
selectClause(Query orderby _, C) :- !,
|
|
selectClause(Query, C).
|
|
|
|
|
|
|
|
/*
|
|
|
|
---- attrnames(Attrs, AttrNames) :-
|
|
----
|
|
|
|
Transform each attribute X into attrname(X).
|
|
|
|
*/
|
|
|
|
attrnames([], []).
|
|
|
|
attrnames([Attr | Attrs], [attrname(Attr) | AttrNames]) :-
|
|
attrnames(Attrs, AttrNames).
|
|
|
|
/*
|
|
|
|
---- attrnamesSort(Attrs, AttrNames) :-
|
|
----
|
|
|
|
Transform attribute names of orderby clause.
|
|
|
|
*/
|
|
|
|
attrnamesSort([], []).
|
|
|
|
attrnamesSort([Attr | Attrs], [Attr2 | Attrs2]) :-
|
|
attrnameSort(Attr, Attr2),
|
|
attrnamesSort(Attrs, Attrs2).
|
|
|
|
attrnameSort(Attr asc, attrname(Attr) asc) :- !.
|
|
|
|
attrnameSort(Attr desc, attrname(Attr) desc) :- !.
|
|
|
|
attrnameSort(Attr, attrname(Attr) asc).
|
|
|
|
|
|
/*
|
|
|
|
|
|
11.3.8 Integration with Optimizer
|
|
|
|
---- optimize(Query).
|
|
----
|
|
|
|
Optimize ~Query~ and print the best ~Plan~.
|
|
|
|
*/
|
|
|
|
optimize(Query) :-
|
|
callLookup(Query, Query2),
|
|
queryToPlan(Query2, Plan, Cost),
|
|
writeln(Plan),
|
|
plan_to_atom_string(Plan, SecondoQuery),
|
|
write('The plan is: '), nl, nl,
|
|
write(SecondoQuery), nl, nl,
|
|
write('Estimated Cost: '), write(Cost), nl, nl.
|
|
|
|
|
|
optimize(Query, QueryOut, CostOut) :-
|
|
callLookup(Query, Query2),
|
|
queryToPlan(Query2, Plan, CostOut),
|
|
plan_to_atom_string(Plan, QueryOut).
|
|
|
|
/*
|
|
---- sqlToPlan(QueryText, Plan)
|
|
----
|
|
|
|
Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom.
|
|
|
|
*/
|
|
sqlToPlan(QueryText, Plan) :-
|
|
term_to_atom(sql Query, QueryText),
|
|
optimize(Query, Plan, _).
|
|
|
|
|
|
/*
|
|
---- sqlToPlan(QueryText, Plan)
|
|
----
|
|
|
|
Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom.
|
|
~QueryText~ starts not with sql in this version.
|
|
|
|
*/
|
|
sqlToPlan(QueryText, Plan) :-
|
|
term_to_atom(Query, QueryText),
|
|
optimize(Query, Plan, _).
|
|
|
|
|
|
|
|
|
|
/*
|
|
11.3.8 Examples
|
|
|
|
We can now formulate the previous example queries in the user level language.
|
|
|
|
|
|
Example3:
|
|
|
|
*/
|
|
|
|
example14 :- optimize(
|
|
select * from [staedte as s, plz as p]
|
|
where [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0]
|
|
).
|
|
|
|
example14(Query, Cost) :- optimize(
|
|
select * from [staedte as s, plz as p]
|
|
where [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0],
|
|
Query, Cost
|
|
).
|
|
|
|
|
|
/*
|
|
Example4:
|
|
|
|
*/
|
|
example15 :- optimize(
|
|
select * from staedte where bev > 500000
|
|
).
|
|
|
|
example15(Query, Cost) :- optimize(
|
|
select * from staedte where bev > 500000,
|
|
Query, Cost
|
|
).
|
|
|
|
/*
|
|
Example5:
|
|
|
|
*/
|
|
example16 :- optimize(
|
|
select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000]
|
|
).
|
|
|
|
example16(Query, Cost) :- optimize(
|
|
select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000],
|
|
Query, Cost
|
|
).
|
|
|
|
|
|
/*
|
|
Example6. This may need a larger local stack size. Start Prolog as
|
|
|
|
---- pl -L4M
|
|
----
|
|
|
|
which initializes the local stack to 4 MB.
|
|
|
|
*/
|
|
example17 :- optimize(
|
|
select *
|
|
from [staedte, plz as p1, plz as p2, plz as p3]
|
|
where [
|
|
sname = p1:ort,
|
|
p1:plz = p2:plz + 1,
|
|
p2:plz = p3:plz * 5,
|
|
bev > 300000,
|
|
bev < 500000,
|
|
p2:plz > 50000,
|
|
p2:plz < 60000,
|
|
kennzeichen starts "W",
|
|
p3:ort contains "burg",
|
|
p3:ort starts "M"]
|
|
).
|
|
|
|
example17(Query, Cost) :- optimize(
|
|
select *
|
|
from [staedte, plz as p1, plz as p2, plz as p3]
|
|
where [
|
|
sname = p1:ort,
|
|
p1:plz = p2:plz + 1,
|
|
p2:plz = p3:plz * 5,
|
|
bev > 300000,
|
|
bev < 500000,
|
|
p2:plz > 50000,
|
|
p2:plz < 60000,
|
|
kennzeichen starts "W",
|
|
p3:ort contains "burg",
|
|
p3:ort starts "M"],
|
|
Query, Cost
|
|
).
|
|
|
|
|
|
/*
|
|
Example 18:
|
|
|
|
*/
|
|
example18 :- optimize(
|
|
select *
|
|
from [staedte, plz as p1]
|
|
where [
|
|
sname = p1:ort,
|
|
bev > 300000,
|
|
bev < 500000,
|
|
p1:plz > 50000,
|
|
p1:plz < 60000,
|
|
kennzeichen starts "W",
|
|
p1:ort contains "burg",
|
|
p1:ort starts "M"]
|
|
).
|
|
|
|
example18(Query, Cost) :- optimize(
|
|
select *
|
|
from [staedte, plz as p1]
|
|
where [
|
|
sname = p1:ort,
|
|
bev > 300000,
|
|
bev < 500000,
|
|
p1:plz > 50000,
|
|
p1:plz < 60000,
|
|
kennzeichen starts "W",
|
|
p1:ort contains "burg",
|
|
p1:ort starts "M"],
|
|
Query, Cost
|
|
).
|
|
|
|
/*
|
|
Example 19:
|
|
|
|
*/
|
|
example19 :- optimize(
|
|
select *
|
|
from [staedte, plz as p1, plz as p2]
|
|
where [
|
|
sname = p1:ort,
|
|
p1:plz = p2:plz + 1,
|
|
bev > 300000,
|
|
bev < 500000,
|
|
p1:plz > 50000,
|
|
p1:plz < 60000,
|
|
kennzeichen starts "W",
|
|
p1:ort contains "burg",
|
|
p1:ort starts "M"]
|
|
).
|
|
|
|
example19(Query, Cost) :- optimize(
|
|
select *
|
|
from [staedte, plz as p1, plz as p2]
|
|
where [
|
|
sname = p1:ort,
|
|
p1:plz = p2:plz + 1,
|
|
bev > 300000,
|
|
bev < 500000,
|
|
p1:plz > 50000,
|
|
p1:plz < 60000,
|
|
kennzeichen starts "W",
|
|
p1:ort contains "burg",
|
|
p1:ort starts "M"],
|
|
Query, Cost
|
|
).
|
|
|
|
|
|
/*
|
|
Example 20:
|
|
|
|
*/
|
|
example20 :- optimize(
|
|
select *
|
|
from [staedte as s, plz as p]
|
|
where [
|
|
p:ort = s:sname,
|
|
p:plz > 40000,
|
|
s:bev > 300000]
|
|
).
|
|
|
|
example20(Query, Cost) :- optimize(
|
|
select *
|
|
from [staedte as s, plz as p]
|
|
where [
|
|
p:ort = s:sname,
|
|
p:plz > 40000,
|
|
s:bev > 300000],
|
|
Query, Cost
|
|
).
|
|
|
|
/*
|
|
Example 21:
|
|
|
|
*/
|
|
example21 :- optimize(
|
|
select *
|
|
from [staedte, plz as p1, plz as p2, plz as p3]
|
|
where [
|
|
sname = p1:ort,
|
|
p1:plz = p2:plz + 1,
|
|
p2:plz = p3:plz * 5]
|
|
).
|
|
|
|
example21(Query, Cost) :- optimize(
|
|
select *
|
|
from [staedte, plz as p1, plz as p2, plz as p3]
|
|
where [
|
|
sname = p1:ort,
|
|
p1:plz = p2:plz + 1,
|
|
p2:plz = p3:plz * 5],
|
|
Query, Cost
|
|
).
|
|
|
|
/*
|
|
|
|
12 Optimizing and Calling Secondo
|
|
|
|
---- sql Term
|
|
sql(Term, SecondoQueryRest)
|
|
let(X, Term)
|
|
let(X, Term, SecondoQueryRest)
|
|
----
|
|
|
|
~Term~ must be one of the available select-from-where statements.
|
|
It is optimized and Secondo is called to execute it. ~SecondoQueryRest~
|
|
is a character string (atom) containing a sequence of Secondo
|
|
operators that can be appended to a given
|
|
plan found by the optimizer; in this case the optimizer returns a
|
|
plan producing a stream.
|
|
|
|
The two versions of ~let~ allow one to assign the result of a query
|
|
to a new object ~X~, using the optimizer.
|
|
|
|
*/
|
|
|
|
sql Term :-
|
|
mOptimize(Term, Query, Cost),
|
|
nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
|
|
write('Estimated Cost: '), write(Cost), nl, nl,
|
|
query(Query).
|
|
|
|
sql(Term, SecondoQueryRest) :-
|
|
mStreamOptimize(Term, SecondoQuery, Cost),
|
|
concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query),
|
|
nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
|
|
write('Estimated Cost: '), write(Cost), nl, nl,
|
|
query(Query).
|
|
|
|
let(X, Term) :-
|
|
mOptimize(Term, Query, Cost),
|
|
nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
|
|
write('Estimated Cost: '), write(Cost), nl, nl,
|
|
concat_atom(['let ', X, ' = ', Query], '', Command),
|
|
secondo(Command).
|
|
|
|
let(X, Term, SecondoQueryRest) :-
|
|
mStreamOptimize(Term, SecondoQuery, Cost),
|
|
concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query),
|
|
nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
|
|
write('Estimated Cost: '), write(Cost), nl, nl,
|
|
concat_atom(['let ', X, ' = ', Query], '', Command),
|
|
secondo(Command).
|
|
|
|
|
|
/*
|
|
---- streamOptimize(Term, Query, Cost) :-
|
|
----
|
|
|
|
Optimize the ~Term~ producing an incomplete Secondo query plan ~Query~
|
|
returning a stream.
|
|
|
|
*/
|
|
streamOptimize(Term, Query, Cost) :-
|
|
callLookup(Term, Term2),
|
|
queryToStream(Term2, Plan, Cost),
|
|
plan_to_atom_string(Plan, Query).
|
|
|
|
/*
|
|
---- mOptimize(Term, Query, Cost) :-
|
|
mStreamOptimize(union [Term], Query, Cost) :-
|
|
----
|
|
|
|
Means ``multi-optimize''. Optimize a ~Term~ possibly consisting of several
|
|
subexpressions to be independently optimized, as in union and intersection
|
|
queries. ~mStreamOptimize~ is a variant returning a stream.
|
|
|
|
*/
|
|
|
|
:-op(800, fx, union).
|
|
:-op(800, fx, intersection).
|
|
|
|
mOptimize(union Terms, Query, Cost) :-
|
|
mStreamOptimize(union Terms, Plan, Cost),
|
|
concat_atom([Plan, 'consume'], '', Query).
|
|
|
|
mOptimize(intersection Terms, Query, Cost) :-
|
|
mStreamOptimize(intersection Terms, Plan, Cost),
|
|
concat_atom([Plan, 'consume'], '', Query).
|
|
|
|
mOptimize(Term, Query, Cost) :-
|
|
optimize(Term, Query, Cost).
|
|
|
|
|
|
mStreamOptimize(union [Term], Query, Cost) :-
|
|
streamOptimize(Term, QueryPart, Cost),
|
|
concat_atom([QueryPart, 'sort rdup '], '', Query).
|
|
|
|
mStreamOptimize(union [Term | Terms], Query, Cost) :-
|
|
streamOptimize(Term, Plan1, Cost1),
|
|
mStreamOptimize(union Terms, Plan2, Cost2),
|
|
concat_atom([Plan1, 'sort rdup ', Plan2, 'mergeunion '], '', Query),
|
|
Cost is Cost1 + Cost2.
|
|
|
|
mStreamOptimize(intersection [Term], Query, Cost) :-
|
|
streamOptimize(Term, QueryPart, Cost),
|
|
concat_atom([QueryPart, 'sort rdup '], '', Query).
|
|
|
|
mStreamOptimize(intersection [Term | Terms], Query, Cost) :-
|
|
streamOptimize(Term, Plan1, Cost1),
|
|
mStreamOptimize(intersection Terms, Plan2, Cost2),
|
|
concat_atom([Plan1, 'sort rdup ', Plan2, 'mergesec '], '', Query),
|
|
Cost is Cost1 + Cost2.
|
|
|
|
mStreamOptimize(Term, Query, Cost) :-
|
|
streamOptimize(Term, Query, Cost).
|
|
|
|
|
|
|
|
/*
|
|
Some auxiliary stuff.
|
|
|
|
*/
|
|
|
|
bestPlanCount :-
|
|
bestPlan(P, _),
|
|
plan_to_atom_string(P, S),
|
|
atom_concat(S, ' count', Q),
|
|
nl, write(Q), nl,
|
|
query(Q).
|
|
|
|
bestPlanConsume :-
|
|
bestPlan(P, _),
|
|
plan_to_atom_string(P, S),
|
|
atom_concat(S, ' consume', Q),
|
|
nl, write(Q), nl,
|
|
query(Q).
|
|
|
|
|
|
%fapra 15/16
|
|
|
|
/*
|
|
Rename an attribute to match the renaming of its relation.
|
|
|
|
*/
|
|
|
|
% No renaming needed.
|
|
renamedRelAttr(RelAttr, Var, RelAttr) :-
|
|
Var = *, !.
|
|
|
|
renamedRelAttr(attr(Name, N, C), Var, attr(Var:Name, N, C)).
|
|
|
|
|
|
% Extract the down case name from an attr term.
|
|
attrnameDCAtom(Attr, DCAttrName) :-
|
|
Attr = attr(_:Name, _, _),
|
|
!,
|
|
atom_string(AName, Name),
|
|
downcase_atom(AName, DCAttrName).
|
|
|
|
attrnameDCAtom(Attr, DCAttrName) :-
|
|
Attr = attr(Name, _, _),
|
|
atom_string(AName, Name),
|
|
downcase_atom(AName, DCAttrName).
|
|
|
|
|
|
/*
|
|
Rame a tuple a stream.
|
|
|
|
*/
|
|
|
|
% No renaming needed.
|
|
renameStream(Stream, Var, Plan) :-
|
|
Var = *,
|
|
!,
|
|
Plan = Stream.
|
|
|
|
renameStream(Stream, Var, rename(Stream, Var)).
|
|
|
|
/*
|
|
Transform a relation to a tuple stream and rename it.
|
|
|
|
*/
|
|
|
|
% No renaming needed.
|
|
feedRenameRelation(Rel, Var, Plan) :-
|
|
Var = *,
|
|
!,
|
|
Plan = feed(Rel).
|
|
|
|
feedRenameRelation(Rel, Var, Plan) :-
|
|
Plan = rename(feed(Rel), Var).
|
|
|
|
feedRenameRelation(rel(Rel, Var,_), Plan) :-
|
|
feedRenameRelation(Rel, Var, Plan),!.
|
|
%end fapra 15/16
|
|
|
|
|
|
|
|
|