Files
secondo/OptimizerBasic/Distributed/optimizerNewProperties.pl

4951 lines
133 KiB
Perl
Raw Normal View History

2026-01-23 17:03:45 +08:00
/*
//paragraph [10] title: [{\Large \bf ] [}]
//characters [1] formula: [$] [$]
//[ae] [\"{a}]
//[oe] [\"{o}]
//[ue] [\"{u}]
//[ss] [{\ss}]
//[Ae] [\"{A}]
//[Oe] [\"{O}]
//[Ue] [\"{U}]
//[**] [$**$]
//[toc] [\tableofcontents]
//[=>] [\verb+=>+]
//[:Section Translation] [\label{sec:translation}]
//[Section Translation] [Section~\ref{sec:translation}]
//[:Section 4.1.1] [\label{sec:4.1.1}]
//[Section 4.1.1] [Section~\ref{sec:4.1.1}]
//[Figure pog1] [Figure~\ref{fig:pog1.eps}]
//[Figure pog2] [Figure~\ref{fig:pog2.eps}]
//[newpage] [\newpage]
[10] A Query Optimizer for Secondo
Ralf Hartmut G[ue]ting, November - December 2002
[toc]
[newpage]
1 Introduction
1.1 Overview
This document not only describes, but ~is~ an optimizer for Secondo database
systems. It contains the current source code for the optimizer, written in
PROLOG. It can be compiled by a PROLOG system (SWI-Prolog 5.0 or higher)
directly.
The current version of the optimizer is capable of handling conjunctive queries,
formulated in a relational environment. That is, it takes a set of
relations together with a set of selection or join predicates over these
relations and produces a query plan that can be executed by (the current
relational system implemented in) Secondo.
The selection of the query plan is based on cost estimates which in turn are
based on given selectivities of predicates. Selectivities of predicates are
maintained in a table (a set of PROLOG facts). If the selectivity of a predicate
is not available from that table, then an interaction with the Secondo system
should take place to determine the selectivity. There are various strategies
conceivable for doing this which will be described elsewhere. However, the
current version of the optimizer just emits a message that the selectivity is
missing and quits.
The optimizer also implements a simple SQL-like language for entering queries.
The notation is pretty much like SQL except that the lists occurring (lists of
attributes, relations, predicates) are written in PROLOG notation. Also note
that the where-clause is a list of predicates rather than an arbitrary boolean
expression and hence allows one to formulate conjunctive queries only.
1.2 Optimization Algorithm
The optimizer employs an as far as we know novel optimization algorithm which is
based on ~shortest path search in a predicated order graph~. This technique is
remarkably simple to implement, yet efficient.
A predicate order graph (POG) is the graph whose nodes represent sets of
evaluated predicates and whose edges represent predicates, containing all
possible orders of predicates. Such a graph for three predicates ~p~, ~q~, and
~r~ is shown in [Figure pog1].
Figure 1: A predicate order graph for three predicates ~p~, ~q~
and ~r~ [pog1.eps]
Here the bottom node has no predicate evaluated and the top node has all
predicates evaluated. The example illustrates, more precisely, possible
sequences of selections on an argument relation of size 1000. If selectivities
of predicates are given (for ~p~ its is 1/2, for ~q~ 1/10, and for ~r~ 1/5),
then we can annotate the POG with sizes of intermediate results as shown,
assuming that all predicates are independent (not ~correlated~). This means that
the selectivity of a predicate is the same regardless of the order of
evaluation, which of course does not need to be true.
If we can further compute for each edge of the POG possible evaluation
methods, adding a new ``executable'' edge for each method, and mark the
edge with estimated costs for this method, then finding a shortest path through
the POG corresponds to finding the cheapest query plan. [Figure pog2] shows an
example of a POG annotated with evaluation methods.
Figure 2: A POG annotated with evaluation methods [pog2.eps]
In this example, there is only a single method associated with each edge. In
general, however, there will be several methods. The example represents the
query:
---- select *
from Staedte, Laender, Regiert
where Land = LName and PName = 'CDU' and LName = PLand
----
for relation schemas
---- Staedte(SName, Bev, Land)
Laender(LName, LBev)
Regiert(PName, PLand)
----
Hence the optimization algorithm described and implemented in the following
sections proceeds in the following steps:
1 For given relations and predicates, construct the predicate order graph and
store it as a set of facts in memory (Sections 2 through 4).
2 For each edge, construct corresponding executable edges (called ~plan edges~
below). This is controlled by optimization rules describing how selections or
joins can be translated (Sections 5 and 6).
3 Based on sizes of arguments and selectivities (stored in the file
~database.pl~) compute the sizes of all intermediate results. Also annotate
edges of the POG with selectivities (Section 7).
4 For each plan edge, compute its cost and store it in memory (as a set of
facts). This is based on sizes of arguments and the selectivity associated with
the edge and on a cost function (predicate) written for each operator that may
occur in a query plan (Section 8).
5 The algorithm for finding shortest paths by Dijkstra is employed to find a
shortest path through the graph of plan edges annotated with costs (called ~cost
edges~). This path is transformed into a Secondo query plan and returned
(Section 9).
6 Finally, a simple subset of SQL in a PROLOG notation is implemented. So it
is possible to enter queries in this language. The optimizer determines from it
the lists of relations and predicates in the form needed for constructing the
POG, and then invokes step 1 (Section 11).
2 Data Structures
In the construction of the predicate order graph, the following data structures
are used.
---- pr(P, A)
pr(P, B, C)
----
A selection or join predicate, e.g. pr(p, a), pr(q, b, c). Means a
selection predicate p on relation a, and a join predicate q on relations
b and c.
---- arp(Arg, Rels, Preds)
----
An argument, relations, predicate triple. It describes a set of relations
~Rels~ on which the predicates ~Preds~ have been evaluated. To access the
result of this evaluation one needs to refer to ~Arg~.
Arg is either arg(N) or res(N), N an integer. Examples: arg(5), res(1)
Rels is a list of relation names, e.g. [a, b, c]
Preds is a list of predicate names, e.g. [p, q, r]
---- node(No, Preds, Partition)
----
A node.
~No~ is the number of the node into which the evaluated predicates
are encoded (each bit corresponds to a predicate number, e.g. node number
5 = 101 (binary) says that the first predicate (no 1) and the third
predicate (no 4) have been evaluated in this node. For predicate i,
its predicate number is "2^{i-1}"[1].
~Preds~ is the list of names of evaluated predicates, e.g. [p, q].
~Partition~ is a list of arp elements, see above.
---- edge(Source, Target, Term, Result, Node, PredNo)
----
An edge, representing a predicate.
~Source~ and ~Target~ are the numbers of source and target nodes in the
predicate order graph, e.g. 0 and 1.
~Term~ is either a selection or a join, for example,
select(arg(0), pr(p, a) or join(res(4), res(1), pr(q, a, b))
~Result~ is the number of the node into which the result of this predicate
application should be written. Normally it is the same as Target,
but for an edge leading to a node combining several independent results,
it the number of the ``real'' node to obtain this result. An example of this can
be found in [Figure pog2] where the join edge leading from node 3 to node 7 does
not use the result of node 3 (there is none) but rather the two independent
results from nodes 1 and 2 (this pair is conceptually the result available in
node 3).
~Node~ is the source node for this edge, in the form node(...) as
described above.
~PredNo~ is the predicate number for the predicate represented by this
edge. Predicate numbers are of the form "2^i" as explained
for nodes.
3 Construction of the Predicate Order Graph
3.1 pog
---- pog(Rels, Preds, Nodes, Edges) :-
----
For a given list of relations ~Rels~ and predicates ~Preds~, ~Nodes~ and
~Edges~ are the predicate order graph where edges are annotated with selection
and join operations applied to the correct arguments.
Example call:
---- pog([staedte, laender], [pr(p, staedte), pr(q, laender), pr(r, staedte,
laender)], N, E).
----
*/
pog(Rels, Preds, Nodes, Edges) :-
length(Rels, N), reverse(Rels, Rels2), deleteArguments,
partition(Rels2, N, Partition0),
length(Preds, M), reverse(Preds, Preds2),
pog2(Partition0, M, Preds2, Nodes, Edges),
deleteNodes, storeNodes(Nodes),
deleteEdges, storeEdges(Edges),
% RHG 2014 Create plan and cost edges during shortest path search.
% deletePlanEdges,
deleteVariables,
% createPlanEdges,
HighNode is 2**M -1,
retract(highNode(_)), assert(highNode(HighNode)),
deleteSizes.
% deleteCostEdges.
% end RHG 2014
/*
3.2 partition
---- partition(Rels, N, Partition0) :-
----
Given a list of ~N~ relations ~Rel~, return an initial partition such that
each relation r is packed into the form arp(arg(i), [r], []).
*/
partition([], _, []).
partition([Rel | Rels], N, [Arp | Arps]) :-
N1 is N-1,
Arp = arp(arg(N), [Rel], []),
assert(argument(N, Rel)),
partition(Rels, N1, Arps).
/*
3.3 pog2
---- pog2(Partition0, NoOfPreds, Preds, Nodes, Edges) :-
----
For the given start partition ~Partition0~, a list of predicates ~Preds~
containing ~NoOfPred~ predicates, return the ~Nodes~ and ~Edges~ of the
predicate order graph.
*/
pog2(Part0, _, [], [node(0, [], Part0)], []).
pog2(Part0, NoOfPreds, [Pred | Preds], Nodes, Edges) :-
N1 is NoOfPreds-1,
PredNo is 2**N1,
pog2(Part0, N1, Preds, NodesOld, EdgesOld),
newNodes(Pred, PredNo, NodesOld, NodesNew),
newEdges(Pred, PredNo, NodesOld, EdgesNew),
copyEdges(Pred, PredNo, EdgesOld, EdgesCopy),
append(NodesOld, NodesNew, Nodes),
append(EdgesOld, EdgesNew, Edges2),
append(Edges2, EdgesCopy, Edges).
/*
3.4 newNodes
---- newNodes(Pred, PredNo, NodesOld, NodesNew) :-
----
Given a predicate ~Pred~ with number ~PredNo~ and a list of nodes ~NodesOld~
resulting from evaluating all predicates with lower numbers, construct
a list of nodes which result from applying to each of the existing nodes
the predicate ~Pred~.
*/
newNodes(_, _, [], []).
newNodes(Pred, PNo, [Node | Nodes], [NodeNew | NodesNew]) :-
newNode(Pred, PNo, Node, NodeNew),
newNodes(Pred, PNo, Nodes, NodesNew).
newNode(Pred, PNo, node(No, Preds, Part), node(No2, [Pred | Preds], Part2)) :-
No2 is No + PNo,
copyPart(Pred, PNo, Part, Part2).
/*
3.5 copyPart
---- copyPart(Pred, PNo, Part, Part2) :-
----
copy the partition ~Part~ of a node so that the new partition ~Part2~
after applying the predicate ~Pred~ with number ~PNo~ results.
This means that for a selection predicate we have to find the arp
containing its relation and modify it accordingly, the other arps
in the partition are copied unchanged.
For a join predicate we have to find the two arps containing its
two relations and to merge them into a single arp; the remaining
arps are copied unchanged.
Or a join predicate may find its two relations in the same arp which means
another join on the same two relations has already been performed.
*/
copyPart(_, _, [], []).
copyPart(pr(P, Rel), PNo, Arps, [Arp2 | Arps2]) :-
select(X, Arps, Arps2),
X = arp(Arg, Rels, Preds),
member(Rel, Rels), !,
nodeNo(Arg, No),
ResNo is No + PNo,
Arp2 = arp(res(ResNo), Rels, [P | Preds]).
copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :-
select(X, Arps, Arps2),
X = arp(Arg, Rels, Preds),
member(R1, Rels),
member(R2, Rels), !,
nodeNo(Arg, No),
ResNo is No + PNo,
Arp2 = arp(res(ResNo), Rels, [P | Preds]).
copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :-
select(X, Arps, Rest),
X = arp(ArgX, RelsX, PredsX),
member(R1, RelsX),
select(Y, Rest, Arps2),
Y = arp(ArgY, RelsY, PredsY),
member(R2, RelsY), !,
nodeNo(ArgX, NoX),
nodeNo(ArgY, NoY),
ResNo is NoX + NoY + PNo,
append(RelsX, RelsY, Rels),
append(PredsX, PredsY, Preds),
Arp2 = arp(res(ResNo), Rels, [P | Preds]).
nodeNo(arg(_), 0).
nodeNo(res(N), N).
/*
3.6 newEdges
---- newEdges(Pred, PredNo, NodesOld, EdgesNew) :-
----
for each of the nodes in ~NodesOld~ return a new edge in ~EdgesNew~
built by applying the predicate ~Pred~ with number ~PNo~.
*/
newEdges(_, _, [], []).
newEdges(Pred, PNo, [Node | Nodes], [Edge | Edges]) :-
newEdge(Pred, PNo, Node, Edge),
newEdges(Pred, PNo, Nodes, Edges).
newEdge(pr(P, Rel), PNo, Node, Edge) :-
findRel(Rel, Node, Source, Arg),
Target is Source + PNo,
nodeNo(Arg, ArgNo),
Result is ArgNo + PNo,
Edge = edge(Source, Target, select(Arg, pr(P, Rel)), Result, Node, PNo).
newEdge(pr(P, R1, R2), PNo, Node, Edge) :-
findRels(R1, R2, Node, Source, Arg),
Target is Source + PNo,
nodeNo(Arg, ArgNo),
Result is ArgNo + PNo,
Edge = edge(Source, Target, select(Arg, pr(P, R1, R2)), Result, Node, PNo).
newEdge(pr(P, R1, R2), PNo, Node, Edge) :-
findRels(R1, R2, Node, Source, Arg1, Arg2),
Target is Source + PNo,
nodeNo(Arg1, Arg1No),
nodeNo(Arg2, Arg2No),
Result is Arg1No + Arg2No + PNo,
Edge = edge(Source, Target, join(Arg1, Arg2, pr(P, R1, R2)), Result,
Node, PNo).
/*
3.7 findRel
---- findRel(Rel, Node, Source, Arg):-
----
find the relation ~Rel~ within a node description ~Node~ and return the
node number ~No~ and the description ~Arg~ of the argument (e.g. res(3)) found
within the arp containing Rel.
---- findRels(Rel1, Rel2, Node, Source, Arg1, Arg2):-
----
similar for two relations.
*/
findRel(Rel, node(No, _, Arps), No, ArgX) :-
select(X, Arps, _),
X = arp(ArgX, RelsX, _),
member(Rel, RelsX).
findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX) :-
select(X, Arps, _),
X = arp(ArgX, RelsX, _),
member(Rel1, RelsX),
member(Rel2, RelsX).
findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX, ArgY) :-
select(X, Arps, Rest),
X = arp(ArgX, RelsX, _),
member(Rel1, RelsX), !,
select(Y, Rest, _),
Y = arp(ArgY, RelsY, _),
member(Rel2, RelsY).
/*
3.8 copyEdges
---- copyEdges(Pred, PredNo, EdgesOld, EdgesCopy):-
----
Given a set of edges ~EdgesOld~ and a predicate ~Pred~ with number ~PredNo~,
return a copy of each edge in ~EdgesOld~ in ~EdgesNew~ such that the
copied version reflects a previous application of predicate ~Pred~.
This is implemented by retrieving from each old edge its start node,
constructing for this start node and predicate ~Pred~ a target node to
which then the predicate associated with the old edge is applied.
*/
copyEdges(_, _, [], []).
copyEdges(Pred, PNo, [Edge | Edges], [Edge2 | Edges2]) :-
Edge = edge(_, _, Term, _, Node, PNo2),
pred(Term, Pred2),
newNode(Pred, PNo, Node, NodeNew),
newEdge(Pred2, PNo2, NodeNew, Edge2),
copyEdges(Pred, PNo, Edges, Edges2).
pred(select(_, P), P).
pred(join(_, _, P), P).
/*
3.9 writeEdgeList
---- writeEdgeList(List):-
----
Write the list of edges ~List~.
*/
writeEdgeList([edge(Source, Target, Term, _, _, _) | Edges]) :-
write(Source), write('-'), write(Target), write(':'), write(Term), nl,
writeEdgeList(Edges).
/*
4 Managing the Graph in Memory
4.1 Storing and Deleting Nodes and Edges
---- storeNodes(NodeList).
storeEdges(EdgeList).
deleteNodes.
deleteEdges.
----
Just as the names say. Store a list of nodes or edges, repectively, as facts;
and delete them from memory again.
*/
storeNodes([Node | Nodes]) :- assert(Node), storeNodes(Nodes).
storeNodes([]).
storeEdges([Edge | Edges]) :- assert(Edge), storeEdges(Edges).
storeEdges([]).
deleteNode :- retract(node(_, _, _)), fail.
deleteNodes :- not(deleteNode).
deleteEdge :- retract(edge(_, _, _, _, _, _)), fail.
deleteEdges :- not(deleteEdge).
deleteArgument :- retract(argument(_, _)), fail.
deleteArguments :- not(deleteArgument).
/*
4.2 Writing Nodes and Edges
---- writeNodes.
writeEdges.
----
Write the currently stored nodes and edges, respectively.
*/
writeNode :-
node(No, Preds, Partition),
write('Node: '), write(No), nl,
write('Preds: '), write(Preds), nl,
write('Partition: '), write(Partition), nl, nl,
fail.
writeNodes :- not(writeNode).
writeEdge :-
edge(Source, Target, Term, Result, _, _),
write('Source: '), write(Source), nl,
write('Target: '), write(Target), nl,
write('Term: '), write(Term), nl,
write('Result: '), write(Result), nl, nl,
fail.
writeEdges :- not(writeEdge).
/*
5 Rule-Based Translation of Selections and Joins
[:Section Translation]
5.1 Precise Notation for Input
Since now we have to look into the structure of predicates, and need to be
able to generate Secondo executable expressions in their precise format, we
need to define the input notation precisely.
5.1.1 The Source Language
[:Section 4.1.1]
We assume the queries can be entered basically as select-from-where
structures, as follows. Let schemas be given as:
---- plz(PLZ:string, Ort:string)
Staedte(SName:string, Bev:int, PLZ:int, Vorwahl:string, Kennzeichen:string)
----
Then we should be able to enter queries:
---- select SName, Bev
from Staedte
where Bev > 500000
----
In the next example we need to avoid the name conflict for PLZ
---- select *
from Staedte as s, plz as p
where s.SName = p.Ort and p.PLZ > 40000
----
In the PROLOG version, we will use the following notations:
---- rel(Name, Var, Case)
----
For example
---- rel(staedte, *, u)
----
is a term denoting the ~Staedte~ relation; ~u~ says that it is actually to be
written in upper case whereas
---- rel(plz, *, l)
----
denotes the ~plz~ relation to be written in lower case. The second argument
~Var~ contains an explicit variable if it has been assigned, otherwise the
symbol [*]. If an explicit variable has been used in the query, we need to
perfom renaming in the plan. For example, in the second query above, the
relations would be denoted as
---- rel(staedte, s, u)
rel(plz, p, l)
----
Within predicates, attributes are annotated as follows:
---- attr(Name, Arg, Case)
attr(ort, 2, u)
----
This says that ~ort~ is an attribute of the second argument within a join
condition, to be written in upper case. For a selection condition, the second
argument is ignored; it can be set to 0 or 1.
Hence for the two queries above, the translation would be
---- fromwhere(
[rel(staedte, *, u)],
[pr(attr(bev, 0, u) > 500000, rel(staedte, *, u))]
)
fromwhere(
[rel(staedte, s, u), rel(plz, p, l)],
[pr(attr(s:sName, 1, u) = attr(p:ort, 2, u),
rel(staedte, s, u), rel(plz, p, l)),
pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l))]
)
----
Note that the upper or lower case distinction refers only to the first letter
of a relation or attribute name. Other letters are written on the PROLOG side
in the same way as in Secondo.
Note further that if explicit variables are used, the attribute name will
include them, e.g. s:sName.
The projection occurring in the select-from-where statement is for the moment
not passed to the optimizer; it is treated outside.
So example 2 is rewritten as:
*/
example3 :- pog([rel(staedte, s, u), rel(plz, p, l)],
[pr(attr(p:ort, 2, u) = attr(s:sName, 1, u),
rel(staedte, s, u), rel(plz, p, l) ),
pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l)),
pr((attr(p:pLZ, 1, u) mod 5) = 0, rel(plz, p, l))], _, _).
/*
The two queries mentioned above are:
*/
example4 :- pog(
[rel(staedte, *, u)],
[pr(attr(bev, 1, u) > 500000, rel(staedte, *, u))],
_, _).
example5 :- pog(
[rel(staedte, s, u), rel(plz, p, l)],
[pr(attr(s:sName, 1, u) = attr(p:ort, 2, u), rel(staedte, s, u), rel(plz, p,
l)),
pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l))],
_, _).
/*
5.1.2 The Target Language
In the target language, we use the following operators:
---- feed: rel(Tuple) -> stream(Tuple)
consume: stream(Tuple) -> rel(Tuple)
filter: stream(Tuple) x (Tuple -> bool) -> stream(Tuple)
product: stream(Tuple1) x stream(Tuple2) -> stream(Tuple3)
where Tuple3 = Tuple1 o Tuple2
hashjoin: stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2
x nbuckets -> stream(Tuple3)
where Tuple3 = Tuple1 o Tuple2
attrname1 occurs in Tuple1
attrname2 occurs in Tuple2
nbuckets is the number of hash buckets
to be used
sortmergejoin: stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2
-> stream(Tuple3)
where Tuple3 = Tuple1 o Tuple2
attrname1 occurs in Tuple1
attrname2 occurs in Tuple2
loopjoin: stream(Tuple1) x (Tuple1 -> stream(Tuple2)
-> stream(Tuple3)
where Tuple3 = Tuple1 o Tuple2
exactmatch: btree(Tuple, AttrType) x rel(Tuple) x AttrType
-> stream(Tuple)
extend: stream(Tuple1) x (Newname x (Tuple -> Attrtype))+
-> stream(Tuple2)
where Tuple2 is Tuple1 to which pairs
(Newname, Attrtype) have been appended
remove: stream(Tuple1) x Attrname+ -> stream(Tuple2)
where Tuple2 is Tuple1 from which the mentioned
attributes have been removed.
project: stream(Tuple1) x Attrname+ -> stream(Tuple2)
where Tuple2 is Tuple1 projected on the
mentioned attributes.
rename stream(Tuple1) x NewName -> stream(Tuple2)
where Tuple2 is Tuple1 modified by appending
"_newname" to each attribute name
count stream(Tuple) -> int
count the number of tuples in a stream
sortby stream(Tuple) x (Attrname, asc/desc)+ -> stream(Tuple)
sort stream lexicographically by the given
attribute names
groupby stream(Tuple) x GroupAttrs x NewFields -> stream(Tuple2)
group stream by the grouping attributes; for each group
compute new fields each of which is specified in the
form Attrname : Expr. The argument stream must already
be sorted by the grouping attributes.
dloop darray(X) x string x (X->Y) -> darray(Y)
Performs a function on each element of a darray instance.The
string argument specifies the name of the result. If the
name is undefined or an empty string, a name is generated
automatically.
dloop2 darray(X) x darray(Y) x string x (fun : X x Y -> Z) -> darray(Z)
Performs a function on the elements of two darray instances.
The string argument specifies the name of the resulting
darray. If the string is undefined or empty, a name is
generated automatically.
dmap d[f]array x string x fun -> d[f]array
Performs a function on a distributed file array. If the
string argument is empty or undefined, a name for the result
is chosen automatically. If not, the string specifies the
name. The result is of type dfarray if the function produces
a tuple stream or a relationi; otherwise the result is a
darray.
dmap2 d[f]array x d[f]array x string x fun -> d[f]array
Joins the slots of two distributed arrays.
partition d[f]array(rel(tuple)) x string x (tuple->int) x int-> dfmatrix
Redistributes the contents of a dfarray value. The new slot
contents are kept on the worker where the values were stored
before redistributing them. The last argument (int)
determines the number of slots of the redistribution. If
this value is smaller or equal to zero, the number of slots
is overtaken from the array argument.
partitionF d[f]array(rel(X)) x string x ([fs]rel(X)->stream(Y)) x (Y ->
int) x int -> dfmatrix(rel(Y))
Repartitions a distributed [file] array. Before repartition,
a function is applied to the slots.
collect2 dfmatrix x string x int -> dfarray
Collects the slots of a matrix into a dfarray. The string
is the name of the resulting array, the int value specified
a port for file transfer. The port value can be any port
usable on all workers. A corresponding file transfer server
is started automatically.
areduce dfmatrix(rel(t)) x string x (fsrel(t)->Y) x int -> d[f]array(Y)
Performs a function on the distributed slots of an array.
The task distribution is dynamically, meaning that a fast
worker will handle more slots than a slower one. The result
type depends on the result of the function. For a relation
or a tuple stream, a dfarray will be created. For other non-
stream results, a darray is the resulting type.
dsummarize darray(DATA) -> stream(DATA) , d[f]array(rel(X)) -> stream(X)
Produces a stream of the darray elements.
getValue {darray(T),dfarray(T)} -> array(T)
Converts a distributed array into a normal one.
tie ((array t) (map t t t)) -> t
Calculates the "value" of an array evaluating the elements
of the array with a given function from left to right.
----
In PROLOG, all expressions involving such operators are written in prefix
notation.
Parameter functions are written as
---- fun([param(Var1, Type1), ..., paran(VarN, TypeN)], Expr)
----
5.1.3 Converting Plans to Atoms and Writing them.
Predicate ~plan\_to\_atom~ converts a plan to a string atom, which represents
the plan as a SECONDO query in text syntax. For attributes we have to
distinguish whether a leading ``.'' needs to be written (if the attribute occurs
within a parameter function) or whether just the attribute name is needed as in
the arguments for hashjoin, for example. Predicate ~wp~ (``write plan'') uses
predicate ~plan\_to\_atom~ to convert its argument to an atom and then writes
that atom to standard output.
*/
upper(Lower, Upper) :-
atom_codes(Lower, [First | Rest]),
to_upper(First, First2),
UpperList = [First2 | Rest],
atom_codes(Upper, UpperList).
wp(Plan) :-
plan_to_atom_string(Plan, PlanAtom),
write(PlanAtom).
/*
Function ~newVariable~ outputs a new unique variable name.
The variable name is unique in the sense that ~newVariable~ never
outputs the same name twice (in a PROLOG session).
It should be emphasized that the output
is not a PROLOG variable but a variable name to be used for defining
abstractions in the Secondo system.
*/
:-
dynamic(varDefined/1).
newVariable(Var) :-
varDefined(N),
!,
N1 is N + 1,
retract(varDefined(N)),
assert(varDefined(N1)),
atom_concat('var', N1, Var).
newVariable(Var) :-
assert(varDefined(1)),
Var = 'var1'.
deleteVariable :- retract(varDefined(_)), fail.
deleteVariables :- not(deleteVariable).
/*
Arguments:
*/
%fapra 2015/16
/*
To consider distributed queries with predicates containing non-relation
objects, it's necessary to replicate the objects to the
involved workers.
For now we assume that every found object is contained in the distributed
part of the query (function of dmap or dmap2).
A possible later extension is to examine the distributed relations and
to share the objects only to workers containing parts of those relations.
*/
:-
dynamic(replicatedObject/1).
%distributed query without objects
replicateObjects(QueryPart, QueryPart) :-
findall(X,replicatedObject(X), ObjectList),
length(ObjectList,0),!.
%distributed query using objects in predicate
replicateObjects(QueryPart, Result) :-
findall(X,replicatedObject(X), ObjectList),
length(ObjectList,Length),
Length >0,
maplist(createSharedClause,ObjectList,CommandList),
append(CommandList,[QueryPart], Result).
createSharedClause(Obj, SharedCommand) :-
atom_concat('share("',Obj,StrObj),
atom_concat(StrObj,'",TRUE)',SharedCommand).
plan_to_atom_string(X, Result) :-
isDistributedQuery,
retractall(replicatedObject(_)),
plan_to_atom(X,QueryPart),
replicateObjects(QueryPart, Result),
!.
plan_to_atom_string(X, Result) :-
not(isDistributedQuery),
plan_to_atom(X,Result),
!.
plan_to_atom(obj(Object,_,u), Result) :-
isDistributedQuery,
upper(Object, UpperObject),
atom_concat(UpperObject, ' ', Result),
assertOnce(replicatedObject(UpperObject)),
!.
plan_to_atom(obj(Object,_,l), Result) :-
isDistributedQuery,
atom_concat(Object, ' ', Result),
assertOnce(replicatedObject(Object)),
!.
plan_to_atom(obj(Object,_,u), Result) :-
upper(Object, UpperObject),
atom_concat(UpperObject, ' ', Result),
!.
plan_to_atom(obj(Object,_,l), Result) :-
atom_concat(Object, ' ', Result),
!.
plan_to_atom(dot, Result) :-
atom_concat('.', ' ', Result),
!.
%end fapra 2015/16
plan_to_atom(rel(Name, _, l), Result) :-
atom_concat(Name, ' ', Result),
!.
plan_to_atom(rel(Name, _, u), Result) :-
upper(Name, Name2),
atom_concat(Name2, ' ', Result),
!.
plan_to_atom(res(N), Result) :-
atom_concat('res(', N, Res1),
atom_concat(Res1, ') ', Result),
!.
plan_to_atom(Term, Result) :-
is_list(Term), Term = [First | _], atomic(First), !,
atom_codes(TermRes, Term),
normalize_space(atom(Out),TermRes),
concat_atom(['"', Out, '"'], '', Result).
/*
Lists:
*/
plan_to_atom([X], AtomX) :-
plan_to_atom(X, AtomX),
!.
plan_to_atom([X | Xs], Result) :-
plan_to_atom(X, XAtom),
plan_to_atom(Xs, XsAtom),
concat_atom([XAtom, ', ', XsAtom], '', Result),
!.
/*
Operators: only special syntax. General rules for standard syntax
see below.
*/
plan_to_atom(sample(Rel, S, T), Result) :-
plan_to_atom(Rel, ResRel),
concat_atom([ResRel, 'sample[', S, ', ', T, '] '], '', Result),
!.
plan_to_atom(hashjoin(X, Y, A, B, C), Result) :-
plan_to_atom(X, XAtom),
plan_to_atom(Y, YAtom),
plan_to_atom(A, AAtom),
plan_to_atom(B, BAtom),
concat_atom([XAtom, YAtom, 'hashjoin[',
AAtom, ', ', BAtom, ', ', C, '] '], '', Result),
!.
plan_to_atom(sortmergejoin(X, Y, A, B), Result) :-
plan_to_atom(X, XAtom),
plan_to_atom(Y, YAtom),
plan_to_atom(A, AAtom),
plan_to_atom(B, BAtom),
concat_atom([XAtom, YAtom, 'sortmergejoin[',
AAtom, ', ', BAtom, '] '], '', Result),
!.
plan_to_atom(mergejoin(X, Y, A, B), Result) :-
plan_to_atom(X, XAtom),
plan_to_atom(Y, YAtom),
plan_to_atom(A, AAtom),
plan_to_atom(B, BAtom),
concat_atom([XAtom, YAtom, 'mergejoin[',
AAtom, ', ', BAtom, '] '], '', Result),
!.
plan_to_atom(groupby(Stream, GroupAttrs, Fields), Result) :-
plan_to_atom(Stream, SAtom),
plan_to_atom(GroupAttrs, GAtom),
plan_to_atom(Fields, FAtom),
concat_atom([SAtom, 'groupby[', GAtom, '; ', FAtom, ']'], '', Result),
!.
plan_to_atom(field(NewAttr, Expr), Result) :-
plan_to_atom(attrname(NewAttr), NAtom),
plan_to_atom(Expr, EAtom),
concat_atom([NAtom, ': ', EAtom], '', Result).
plan_to_atom(exactmatchfun(IndexName, Rel, attr(Name, R, Case)), Result) :-
plan_to_atom(Rel, RelAtom),
plan_to_atom(a(Name, R, Case), AttrAtom),
newVariable(T),
concat_atom(['fun(', T, ' : TUPLE) ', IndexName,
' ', RelAtom, 'exactmatch[attr(', T, ', ', AttrAtom, ')] '], Result),
!.
plan_to_atom(newattr(Attr, Expr), Result) :-
plan_to_atom(Attr, AttrAtom),
plan_to_atom(Expr, ExprAtom),
concat_atom([AttrAtom, ': ', ExprAtom], '', Result),
!.
plan_to_atom(rename(X, Y), Result) :-
plan_to_atom(X, XAtom),
concat_atom([XAtom, '{', Y, '} '], '', Result),
!.
plan_to_atom(fun(Params, Expr), Result) :-
params_to_atom(Params, ParamAtom),
plan_to_atom(Expr, ExprAtom),
concat_atom(['fun ', ParamAtom, ExprAtom], '', Result),
!.
plan_to_atom(attribute(X, Y), Result) :-
plan_to_atom(X, XAtom),
plan_to_atom(Y, YAtom),
concat_atom(['attr(', XAtom, ', ', YAtom, ')'], '', Result),
!.
plan_to_atom(increment(X), Result) :-
plan_to_atom(X, XAtom),
concat_atom([XAtom, '++'], '', Result),
!.
%fapra 2015/16
plan_to_atom(dloop2(PreArg1, PreArg2, PostArg1, PostArg2), Result) :-
plan_to_atom(PreArg1, PreArg1Atom),
plan_to_atom(PreArg2, PreArg2Atom),
plan_to_atom(PostArg1, PostArg1Atom),
plan_to_atom(PostArg2, PostArg2Atom),
concat_atom(
[PreArg1Atom,
PreArg2Atom,
'dloop2[',
PostArg1Atom, ', ',
PostArg2Atom, ']'], '', Result),
!.
%end fapra 2015/16
/*
Sort orders and attribute names.
*/
plan_to_atom(asc(Attr), Result) :-
plan_to_atom(Attr, AttrAtom),
atom_concat(AttrAtom, ' asc', Result).
plan_to_atom(desc(Attr), Result) :-
plan_to_atom(Attr, AttrAtom),
atom_concat(AttrAtom, ' desc', Result).
plan_to_atom(attr(Name, Arg, Case), Result) :-
plan_to_atom(a(Name, Arg, Case), ResA),
atom_concat('.', ResA, Result).
plan_to_atom(attrname(attr(Name, Arg, Case)), Result) :-
plan_to_atom(a(Name, Arg, Case), Result).
plan_to_atom(a(A:B, _, _), Result) :-
upper(B, B2),
concat_atom([B2, '_', A], Result),
!.
plan_to_atom(a(X, _, _), X2) :-
upper(X, X2),
!.
%fapra 2015/16
plan_to_atom(our_attrname(attr(Name, Arg, Case)), Result) :-
plan_to_atom(our_a(Name, Arg, Case), Result).
plan_to_atom(our_a(_:B, _, _), Result) :-
upper(B, B2),
concat_atom(['..', B2], Result),
!.
plan_to_atom(our_a(X, _, _), Result) :-
upper(X, X2),
concat_atom(['..', X2], Result),
!.
plan_to_atom(simple_attrname(attr(Name, Arg, Case)), Result) :-
plan_to_atom(simple_a(Name, Arg, Case), Result), !.
plan_to_atom(simple_a(_:B, _, _), B2) :-
upper(B, B2),
!.
plan_to_atom(simple_a(X, _, _), X2) :-
upper(X, X2),
!.
plan_to_atom(extendstream(A, B, C), Plan) :-
plan_to_atom(A, PlanA),
plan_to_atom(B, PlanB),
plan_to_atom(C, PlanC),
concat_atom([PlanA, ' ', 'extendstream(',
PlanB, ': ', PlanC, ')'], Plan).
%end fapra 2015/16
/*
Translation of operators driven by predicate ~secondoOp~ in
file ~opSyntax~. There are rules for
* postfix, 1 or 2 arguments
* postfix followed by one argument in square brackets, in total 2
or 3 arguments
* prefix, 2 arguments
Other syntax, if not default (see below) needs to be coded explicitly.
*/
plan_to_atom(Term, Result) :-
functor(Term, Op, 1),
secondoOp(Op, postfix, 1),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
concat_atom([Res1, ' ', Op, ' '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 2),
secondoOp(Op, postfix, 2),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
concat_atom([Res1, ' ', Res2, ' ', Op, ' '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 2),
secondoOp(Op, postfixbrackets, 2),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
concat_atom([Res1, ' ', Op, '[', Res2, '] '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 3),
secondoOp(Op, postfixbrackets, 3),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
arg(3, Term, Arg3),
plan_to_atom(Arg3, Res3),
concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, '] '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 2),
secondoOp(Op, prefix, 2),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
concat_atom([Op, '(', Res1, ',', Res2, ') '], '', Result),
!.
%fapra 2015/16
/*
Additional plan\_to\_atom rules to map Distributed2-operators.
*/
plan_to_atom(Term, Result) :-
functor(Term, Op, 1),
secondoOp(Op, prefix, 1),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
concat_atom([Op, '(', Res1, ') '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 4),
secondoOp(Op, prefix, 4),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
arg(3, Term, Arg3),
plan_to_atom(Arg3, Res3),
arg(4, Term, Arg4),
plan_to_atom(Arg4, Res4),
concat_atom([Op, '(', Res1, ',', Res2, ', ', Res3,
', ', Res4, ') '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 4),
secondoOp(Op, postfixbrackets, 4),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
arg(3, Term, Arg3),
plan_to_atom(Arg3, Res3),
arg(4, Term, Arg4),
plan_to_atom(Arg4, Res4),
concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, ', ',
Res4, ']'], '' , Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 3),
secondoOp(Op, postfixbrackets2, 3),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
arg(3, Term, Arg3),
plan_to_atom(Arg3, Res3),
concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, '] '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 4),
secondoOp(Op, postfixbrackets3, 4),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
arg(3, Term, Arg3),
plan_to_atom(Arg3, Res3),
arg(4, Term, Arg4),
plan_to_atom(Arg4, Res4),
concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3,', ',
Res4, '] '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 5),
secondoOp(Op, postfixbrackets3, 5),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
arg(3, Term, Arg3),
plan_to_atom(Arg3, Res3),
arg(4, Term, Arg4),
plan_to_atom(Arg4, Res4),
arg(5, Term, Arg5),
plan_to_atom(Arg5, Res5),
concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, ', ',
Res4,', ',Res5, '] '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 5),
secondoOp(Op, postfixbrackets4, 5),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
arg(3, Term, Arg3),
plan_to_atom(Arg3, Res3),
arg(4, Term, Arg4),
plan_to_atom(Arg4, Res4),
arg(5, Term, Arg5),
plan_to_atom(Arg5, Res5),
concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, ', ',
Res4,', ',Res5, '] '], '', Result),
!.
plan_to_atom(Term, Result) :-
functor(Term, Op, 6),
secondoOp(Op, postfixbrackets5, 6),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
arg(2, Term, Arg2),
plan_to_atom(Arg2, Res2),
arg(3, Term, Arg3),
plan_to_atom(Arg3, Res3),
arg(4, Term, Arg4),
plan_to_atom(Arg4, Res4),
arg(5, Term, Arg5),
plan_to_atom(Arg5, Res5),
arg(6, Term, Arg6),
plan_to_atom(Arg6, Res6),
concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, ', ', Res4,', ',
Res5,', ',Res6, '] '], '', Result),
!.
%end fapra 2015/16
/*
Generic rules. Operators that are not
recognized are assumed to be:
* 1 argument: prefix
* 2 arguments: infix
* 3 arguments: prefix
*/
plan_to_atom(Term, Result) :-
functor(Term, Op, 1),
arg(1, Term, Arg1),
plan_to_atom(Arg1, Res1),
concat_atom([Op, '(', Res1, ')'], '', Result).
plan_to_atom(Term, Result) :-
functor(Term, Op, 2),
arg(1, Term, Arg1),
arg(2, Term, Arg2),
plan_to_atom(Arg1, Res1),
plan_to_atom(Arg2, Res2),
concat_atom(['(', Res1, ' ', Op, ' ', Res2, ')'], '', Result).
plan_to_atom(Term, Result) :-
functor(Term, Op, 3),
arg(1, Term, Arg1),
arg(2, Term, Arg2),
arg(3, Term, Arg3),
plan_to_atom(Arg1, Res1),
plan_to_atom(Arg2, Res2),
plan_to_atom(Arg3, Res3),
concat_atom([Op, '(', Res1, ', ', Res2, ', ', Res3, ')'], '', Result).
plan_to_atom(X, Result) :-
atomic(X),
term_to_atom(X, Result),
!.
plan_to_atom(X, _) :-
write('Error while converting term: '),
write(X),
nl.
params_to_atom([], ' ').
params_to_atom([param(Var, Type) | Params], Result) :-
type_to_atom(Type, TypeAtom),
params_to_atom(Params, ParamsAtom),
concat_atom(['(', Var, ': ', TypeAtom, ') ', ParamsAtom], '', Result),
!.
type_to_atom(tuple, 'TUPLE').
type_to_atom(tuple2, 'TUPLE2').
type_to_atom(group, 'GROUP').
/*
5.2 Optimization Rules
We introduce a predicate [=>] which can be read as ``translates into''.
5.2.1 Translation of the Arguments of an Edge of the POG
If the argument is of the form res(N), then it is a stream already and can be
used unchanged. If it is of the form arg(N), then it is a base relation; a
~feed~ must be applied and possibly a ~rename~.
*/
ordered(plz, ort).
ordered(orte, ort).
ordered(staedte, sName).
ordered(thousand, no).
ordered(ten, no).
order(Name, Attr) :-
ordered(Name, Attr), !.
order(_, none).
% The following rule is needed for listing all plan edges or cost edges,
% not for optimization as such.
res(N) => [res(N), none].
% arg(N) => feed(rel(Name, *, Case)) :-
% argument(N, rel(Name, *, Case)), !.
% arg(N) => rename(feed(rel(Name, Var, Case)), Var) :-
% argument(N, rel(Name, Var, Case)).
[res(N), P] => [res(N), P].
% Translate into distributed argument
arg(N) => [Plan, Properties] :-
isDistributedQuery,
!,
distributedarg(N) => [Plan, Properties].
/*
Treat transaltion into distributed arguments. The properties we use are...
~distribution~(DistributionType, DistributionAttribute, DistirbutionParameter):
DistributionType is share, spatial, modulo, function or random,
DistributionAttribute is the attribute of the relation used to determine
on which partition(s) to put a given tuple (in theory this could also be a list),
DistributionParamter is the parameter used for the distribution (like grid or
funciton object / operator).
~distributedobjecttype~(Type) (Type is darray, dfarray or dfmatrix).
~disjointpartitioning~ signals that, if we treat a partition as the multi set
of the tuples it contains, the union of all partitions is the original relation
(put differently, in as far as duplicates exist, they have been present in the
original relation).
Since some second plans eliminate duplicates anyways, they can do without their
arguments having this property (e.g. spatial join).
*/
% Translate into object found in SEC2DISTRIBUTED.
distributedarg(N) => [ObjName, X] :-
X =[distribution(DistType, DCDistAttr, DistParam),
distributedobjecttype(DistObjType),disjointpartitioning],
argument(N, Rel),
Rel = rel(Name, _, _),
distributedRels(rel(Name, _, _), ObjName, DistObjType,
DistType, DistAttr, DistParam),
not(DistType = spatial),
downcase_atom(DistAttr, DCDistAttr).
% Spatial partitioning with filtering on original attribute
% does not in general yield disjoint partitions
distributedarg(N) => [ObjName,
[distribution(DistType, DCDistAttr, DistParam),
distributedobjecttype(DistObjType)]] :-
argument(N, Rel),
Rel = rel(Name, _, _),
distributedRels(rel(Name, _, _), ObjName, DistObjType,
DistType, DistAttr, DistParam),
DistType = spatial,
downcase_atom(DistAttr, DCDistAttr).
% Filter spatially distributed argument on attribute original.
distributedarg(N) => [Plan,
[distribution(spatial, DCDistAttr, DistParam),
distributedobjecttype(DistObjType), disjointpartitioning]] :-
argument(N, Rel),
Rel = rel(Name, _, _),
distributedRels(rel(Name, _, _), ObjName, DistObjType,
spatial, DistAttr, DistParam),
downcase_atom(DistAttr, DCDistAttr),
Plan = dmap(ObjName, " ", filter(feed(rel(., *, u)), attr(original, l, u))).
/*
Redistributed argument relation to be spatially distributed using the
provided attribute. The distribution type must be spatial and the
attribute must be provided as a ground term. The grid may be provided
to be used for the distribution. If it is not provided we fall back to
using the grid object called grid. You need to have this in your database.
Yields a dfarray or a dfmatrix.
*/
distributedarg(N) => [Plan, [distribution(DistType,DistAttr,Grid),
distributedobjecttype(DistObjType)]] :-
% only use this in one direction. Might be generalized in the future.
ground(DistAttr),
ground(DistType),
% if we do not have a grid specified, use the grid-object
(ground(Grid) -> true; Grid = grid),
DistType = spatial,
argument(N, Rel),
Rel = rel(Name, _, _),
distributedRels(rel(Name, _, _), ObjName, _, OriginalDistType, _, _),
% cannot redistribute replicated relations
not(OriginalDistType = share),
spelled(Name:DistAttr, AttrTerm),
InnerPlan = partitionF(ObjName, " ", extendstream(feed(rel('.', *, u)),
attrname(attr(cell, *, u)), cellnumber(bbox(AttrTerm), Grid)),
attr('.Cell', *, u), 0), %there should be another option to add the 2nd dot
% collect into dfarray or simply be content with the dfmatrix
(DistObjType = dfarray,
Plan = collect2(InnerPlan, " ", 1238);
DistObjType = dfmatrix,
Plan = InnerPlan).
arg(N) => [feed(rel(Name, *, Case)), [order(X)]] :-
argument(N, rel(Name, *, Case)), !,
order(Name, X).
arg(N) => [rename(feed(rel(Name, Var, Case)), Var), [order(Var:X)]] :-
argument(N, rel(Name, Var, Case)), !,
order(Name, X).
/*
5.2.2 Translation of Selections
*/
%fapra 2015/16
% Translate selection into distributed selection.
select(Arg, Y) => X :-
isDistributedQuery,
!, /* Operand is distributed. Do not translate into local selection. */
distributedselect(Arg, Y) => X.
%end fapra 2015/16
% select(Arg, pr(Pred, _)) => filter(ArgS, Pred) :-
% Arg => ArgS.
% select(Arg, pr(Pred, _, _)) => filter(ArgS, Pred) :-
% Arg => ArgS.
select(Arg, pr(Pred, _)) => [filter(ArgS, Pred), P] :-
Arg => [ArgS, P].
select(Arg, pr(Pred, _, _)) => [filter(ArgS, Pred), P] :-
Arg => [ArgS, P].
/*
Translation of selections using indices.
*/
select(arg(N), Y) => [X, P] :-
indexselect(arg(N), Y) => [X, P], !.
select(arg(N), Y) => [X, [none]] :-
indexselect(arg(N), Y) => X.
indexselect(arg(N), pr(attr(AttrName, Arg, Case) = Y, Rel)) => X :-
indexselect(arg(N), pr(Y = attr(AttrName, Arg, Case), Rel)) => X.
indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) =>
[exactmatch(IndexName, rel(Name, *, Case), Y), [order(AttrName)]]
:-
argument(N, rel(Name, *, Case)),
!,
hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName).
indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) =>
[rename(exactmatch(IndexName, rel(Name, Var, Case), Y), Var),
[order(AttrName)]]
:-
argument(N, rel(Name, Var, Case)),
!,
hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName).
indexselect(arg(N), pr(attr(AttrName, Arg, Case) <= Y, Rel)) => X :-
indexselect(arg(N), pr(Y >= attr(AttrName, Arg, Case), Rel)) => X.
indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) =>
[leftrange(IndexName, rel(Name, *, Case), Y), [order(AttrName)]]
:-
argument(N, rel(Name, *, Case)),
!,
hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName).
indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) =>
[rename(leftrange(IndexName, rel(Name, Var, Case), Y), Var),
[order(AttrName)]]
:-
argument(N, rel(Name, Var, Case)),
!,
hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName).
indexselect(arg(N), pr(attr(AttrName, Arg, Case) >= Y, Rel)) => X :-
indexselect(arg(N), pr(Y <= attr(AttrName, Arg, Case), Rel)) => X.
indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) =>
[rightrange(IndexName, rel(Name, *, Case), Y), [order(AttrName)]]
:-
argument(N, rel(Name, *, Case)),
!,
hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName).
indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) =>
[rename(rightrange(IndexName, rel(Name, Var, Case), Y), Var),
[order(AttrName)]]
:-
argument(N, rel(Name, Var, Case)),
!,
hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName).
%fapra 2015/16
/*
Translation of selections that concern distributed relations.
*/
% Commutativity of intersects.
distributedselect(ObjName,
pr(Val intersects attr(Attr, Arg, Case), Rel)) => X :-
distributedselect(ObjName, pr(attr(Attr, Arg, Case) intersects Val, Rel))
=> X.
% Use spatial index for an intersection predicate.
distributedselect(arg(N), Pred)
=> [dmap2(IndexObj, RelObj, " ",
filter(filter(Intersection, InnerPred), attr(original, l, u)), 1238),
[distributedobjecttype(dfarray), disjointpartitioning]] :-
argument(N, Rel),
Pred = pr(Attr intersects Val, rel(_, Var, _)),
Pred = pr(InnerPred, _),
% We need a materialized argument relation to use the index
distributedRels(Rel, RelObj, _, _, _),
RelObj = rel(RelObjName, _, _),
% Lookup an rtree index for the relation + attribute
downcase_atom(RelObjName, DCRelObjName),
attrnameDCAtom(Attr, DCAttr),
distributedIndex(DCRelObjName, DCAttr, rtree, DCIndexObjName),
% Check the database object for the correct spelling
spelledObj(DCIndexObjName, IndexObjName,_, Case),
IndexObj = rel(IndexObjName, *, Case),
IndParam = rel('.', *, u),
RelParam = rel('..', *, u),
renameStream(windowintersects(IndParam, RelParam, Val),
Var, Intersection).
% Use btree index for a starts predicate.
distributedselect(arg(N), pr(Attr starts Val, rel(_, Var, _)))
=> [dmap2(IndexObj, RelObj, " ",
Range, 1238), [distributedobjecttype(dfarray), disjointpartitioning]] :-
argument(N, Rel),
distributedRels(Rel, RelObj, _, _, _),
RelObj = rel(RelObjName, _, _),
downcase_atom(RelObjName, DCRelObjName),
attrnameDCAtom(Attr, DCAttr),
% Lookup a btree index for the relation + attribute
distributedIndex(DCRelObjName, DCAttr, btree, DCIndexObjName),
spelledObj(DCIndexObjName, IndexObjName,_, Case),
IndexObj = rel(IndexObjName, *, Case),
IndParam = rel('.', *, u),
RelParam = rel('..', *, u),
renameStream(range(IndParam, RelParam, Val, increment(Val)),
Var, Range).
% Generic case.
distributedselect(Arg, pr(Cond, rel(_,Var,_))) =>
[dmap(ArgS," ", filter(Param,Cond)), P] :-
Arg => [ArgS, P],
% we accept darrays and dfarrays
(member(distributedobjecttype(dfarray), P) ;
member(distributedobjecttype(darray), P)),
% partitions of the argument relations need to disjoint
member(disjointpartitioning, P),
% rename if needed
feedRenameRelation(rel('.',*, u), Var, Param).
%end fapra 2015/16
/*
Here ~ArgS~ is meant to indicate ``argument stream''.
5.2.3 Translation of Joins
A join can always be translated to filtering the Cartesian product.
*/
%fapra 2015/16
% we have to variants of joins in place, see if the first one can
% handle. If yes, cut and use its result.
join(Arg1, Arg2, Pred) => SecondoPlan:-
isDistributedQuery,
distributedjoin(Arg1, Arg2, Pred) => _, !,
distributedjoin(Arg1, Arg2, Pred) => SecondoPlan.
join(Arg1, Arg2, Pred) => SecondoPlan:-
isDistributedQuery, !,
Arg1 = arg(N1),
Arg2 = arg(N2),
not(N1=N2),
Arg1 => [ObjName1, _],
Arg2 => [ObjName2, _],
distributedRels(_, ObjName1, _, _, _),
distributedRels(_, ObjName2, _, _, _),
distributedjoin(ObjName1, ObjName2, Pred) => SecondoPlan.
%end fapra 2015/16
join(Arg1, Arg2, pr(Pred, _, _)) => [filter(product(Arg1S, Arg2S), Pred), P1] :-
Arg1 => [Arg1S, P1],
Arg2 => [Arg2S, _].
/*
Index joins:
*/
join(Arg1, arg(N), pr(X=Y, _, _)) => [loopjoin(Arg1S, MatchExpr), P1] :-
isOfSecond(Attr2, X, Y),
isNotOfSecond(Expr1, X, Y),
argument(N, RelDescription),
hasIndex(RelDescription, Attr2, IndexName),
Arg1 => [Arg1S, P1],
exactmatch(IndexName, arg(N), Expr1) => MatchExpr.
join(arg(N), Arg2, pr(X=Y, _, _)) => [loopjoin(Arg2S, MatchExpr), P2] :-
isOfFirst(Attr1, X, Y),
isNotOfFirst(Expr2, X, Y),
argument(N, RelDescription),
hasIndex(RelDescription, Attr1, IndexName),
Arg2 => [Arg2S, P2],
exactmatch(IndexName, arg(N), Expr2) => MatchExpr.
exactmatch(IndexName, arg(N), Expr) =>
exactmatch(IndexName, rel(Name, *, Case), Expr) :-
argument(N, rel(Name, *, Case)),
!.
exactmatch(IndexName, arg(N), Expr) =>
rename(exactmatch(IndexName, rel(Name, Var, Case), Expr), Var) :-
argument(N, rel(Name, Var, Case)),
!.
/*
For a join with a predicate of the form X = Y we can distinguish four cases
depending on whether X and Y are attributes or more complex expressions. For
example, a query condition might be ``PLZA = PLZB'' in which case we have just
attribute names on both sides of the predicate operator, or it could be ``PLZA =
PLZB + 1''. In the latter case we have an expression on the right hand side.
This can still be translated to a hashjoin, for example, by first extending the
second argument by a new attribute containing the value of the expression. For
example, the query
---- select *
from plz as p1, plz as p2
where p1.PLZ = p2.PLZ + 1
----
can be translated to
---- plz feed {p1} plz feed {p2} extend[newPLZ: PLZ_p2 + 1]
hashjoin[PLZ_p1, newPLZ, 997]
remove[newPLZ]
consume
----
This technique is built into the optimizer as follows. We first define the four
cases (at the moment for equijoin only; this may later be extended) which also
translate the arguments into streams. Then the rules translating to join
methods can be formulated independently from this general technique. They
translate terms of the form join00(Arg1Stream, Arg2Stream, Pred).
*/
join(Arg1, Arg2, pr(X=Y, R1, R2)) => [JoinPlan, P] :-
X = attr(_, _, _),
Y = attr(_, _, _), !,
Arg1 => [Arg1S, P1],
Arg2 => [Arg2S, P2],
join00([Arg1S, P1], [Arg2S, P2], pr(X=Y, R1, R2)) => [JoinPlan, P].
join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
[remove(JoinPlan, [attrname(attr(r_expr, 2, l))]), P] :-
X = attr(_, _, _),
not(Y = attr(_, _, _)), !,
Arg1 => [Arg1S, P1],
Arg2 => [Arg2S, _],
Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]),
join00([Arg1S, P1], [Arg2Extend, none], pr(X=attr(r_expr, 2, l), R1, R2))
=> [JoinPlan, P].
join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
[remove(JoinPlan, [attrname(attr(l_expr, 2, l))]), P] :-
not(X = attr(_, _, _)),
Y = attr(_, _, _), !,
Arg1 => [Arg1S, _],
Arg2 => [Arg2S, P2],
Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]),
join00([Arg1Extend, none], [Arg2S, P2], pr(attr(l_expr, 1, l)=Y, R1, R2))
=> [JoinPlan, P].
join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
[remove(JoinPlan, [attrname(attr(l_expr, 1, l)),
attrname(attr(r_expr, 2, l))]), P] :-
not(X = attr(_, _, _)),
not(Y = attr(_, _, _)), !,
Arg1 => [Arg1S, _],
Arg2 => [Arg2S, _],
Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]),
Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]),
join00([Arg1Extend, none], [Arg2Extend, none],
pr(attr(l_expr, 1, l)=attr(r_expr, 2, l), R1, R2)) => [JoinPlan, P].
join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [sortmergejoin(Arg1S, Arg2S,
attrname(Attr1), attrname(Attr2)), [order(Name1), order(Name2)] ] :-
isOfFirst(Attr1, X, Y), Attr1 = attr(Name1, _, _),
isOfSecond(Attr2, X, Y), Attr2 = attr(Name2, _, _).
% use order property
join00([Arg1S, P1], [Arg2S, P2], pr(X = Y, _, _)) => [mergejoin(Arg1S, Arg2S,
attrname(Attr1), attrname(Attr2)), [order(Name1), order(Name2)] ] :-
isOfFirst(Attr1, X, Y), Attr1 = attr(Name1, _, _),
isOfSecond(Attr2, X, Y), Attr2 = attr(Name2, _, _),
select(order(Name1), P1, _),
select(order(Name2), P2, _).
% hashjoin has asymmetric cost, therefore consider both orders
join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [hashjoin(Arg1S, Arg2S,
attrname(Attr1), attrname(Attr2), 999997), [none]] :-
isOfFirst(Attr1, X, Y),
isOfSecond(Attr2, X, Y).
join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [hashjoin(Arg2S, Arg1S,
attrname(Attr2), attrname(Attr1), 999997), [none]] :-
isOfFirst(Attr1, X, Y),
isOfSecond(Attr2, X, Y).
%fapra 2015/16
% Translate a distributed spatial join with an intersection predicate.
distributedjoin(Arg1, Arg2, Pred)
=> [SecondoPlan, [DistAttr1, distributedobjecttype(dfarray),
disjointpartitioning]]:-
Pred = pr(Attr1 intersects Attr2, rel(_, Rel1Var, _), rel(_, Rel2Var, _)),
isOfFirst(Attr1, Rel1, Rel2),
isOfSecond(Attr2, Rel1, Rel2),
attrnameDCAtom(Attr1, Attr1Name),
attrnameDCAtom(Attr2, Attr2Name),
% allow using replicated + any distribution or both distributed by
% join predicate
((DistAttr1 = distribution(_, _, _),
DistAttr2 = distribution(share, _, _));
(DistAttr1 = distribution(spatial, Attr2Name, GridObj),
DistAttr2 = distribution(spatial, Attr1Name, GridObj))),
Arg1 => [ObjName1, [DistAttr1| Props1]],
Arg2 => [ObjName2, [DistAttr2| Props2]],
% rename the parameter relations if needed
feedRenameRelation(param1, Rel1Var, Param1Plan),
feedRenameRelation(param2, Rel2Var, Param2Plan),
% rename the cell attribute if needed
renamedRelAttr(attr(cell, 1, u), Rel1Var, CellAttr1),
renamedRelAttr(attr(cell, 2, u), Rel2Var, CellAttr2),
Scheme =
filter(
filter(
filter(
itSpatialJoin(
Param1Plan,
Param2Plan,
attrname(Attr1),
attrname(Attr2)
),
CellAttr1 = CellAttr2
),
gridintersects(
GridObj,
bbox(Attr1),
bbox(Attr2),
CellAttr1
)
),
Attr1 intersects Attr2
),
% We have the actual query now. Distribute it to the workers.
distributedquery([ObjName1, [DistAttr1| Props1]],
[ObjName2, [DistAttr2| Props2]], Scheme)
=> SecondoPlan.
/*
----
distributedquery(Arg1, Arg2, QueryScheme) =>
----
Distribute the query given by QueryScheme to the workers. The scheme has
the place holders param1 and param2 for its argument. The actual arguments
are given in Arg1 and Arg2 as a pair of a plan and a property list.
Several cases might arise depening on Arg1's and
Arg2's distribution type (replicated vs partitioned) and their distributed object
type (d(f)array vs dfmatrix).
*/
% Arg1 replicated, Arg2 partitioned, Arg2 is a d(f)array
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
not(isPartitioned([Arg1S, P1])),
isPartitioned([Arg2S, P2]),
not(isDfmatrix([Arg2S, P2])),
substituteSubterm(param2, rel('.', *, u), QueryScheme, QueryScheme1),
substituteSubterm(param1, Arg2S, QueryScheme1, QueryScheme2),
Query = dmap(Arg2S, " ", QueryScheme2), !.
% Arg2 replicated, Arg1 partitioned, Arg1 is a d(f)array
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
isPartitioned([Arg1S, P1]),
not(isPartitioned([Arg2S, P2])),
not(isDfmatrix([Arg1S, P1])),
substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1),
substituteSubterm(param2, Arg2S, QueryScheme1, QueryScheme2),
Query = dmap(Arg1S, " ", QueryScheme2), !.
% Arg1 partitioned, Arg2 partitioned, both are d(f)arrays
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
isPartitioned([Arg1S, P1]),
isPartitioned([Arg2S, P2]),
not(isDfmatrix([Arg2S, P2])),
not(isDfmatrix([Arg1S, P1])),
substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1),
substituteSubterm(param2, rel('..', *, u), QueryScheme1, QueryScheme2),
Query = dmap2(Arg1S, Arg2S, " ", QueryScheme2, 1238), !.
% Arg1 partitioned, Arg2 partitioned, both dfmatrices
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
isPartitioned([Arg1S, P1]),
isPartitioned([Arg2S, P2]),
isDfmatrix([Arg2S, P2]),
isDfmatrix([Arg1S, P1]),
substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1),
substituteSubterm(param2, rel('..', *, u), QueryScheme1, QueryScheme2),
Query = areduce2(Arg1S, Arg2S, "", QueryScheme2, 1238), !.
% Arg1 replicated, Arg2 replicated
distributedquery([Arg1S, P1], [Arg2S, P2], _) => _ :-
not(isPartitioned([Arg1S, P1])),
not(isPartitioned([Arg2S, P2])),
write('A potential plan edge could not be generated because '),
write('queries with two replicated arguments '),
write('cannot be formulated using DistributedAlgebra as of now.\n'),
fail.
%Equijoin
distributedjoin(ObjName1, ObjName2, pr(attr(X1,X2,X3)=attr(Y1,Y2,Y3),
Rel1, Rel2))
=> [SecondoPlan, [none]] :-
X=attr(X1,X2,X3),
Y=attr(Y1,Y2,Y3),
Rel1 = rel(_, _, _),
Rel2 = rel(_, _, _),
isOfFirst(_, X, Y),
isOfSecond(_, X, Y),
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
SecondoPlan, false).
%Standard Join
distributedjoin(ObjName1, ObjName2, pr(Pred,Rel1, Rel2))
=> [SecondoPlan, [none]] :-
Rel1 = rel(_, _, _),
Rel2 = rel(_, _, _),
buildStdSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2),
SecondoPlan, false).
/*
It is assumed that if "function" is specified in
the system relation "SEC2DISTRIBUTED", then a deterministic
function using the specified attribute was used.
The functions used for partitioning both used relations are assumed
to result in the same values if given the same attribute value. E.g.
both used the same hashvalue.
*/
/*
Equijoin Secondo Plan for both are partitioned by join attribute
using modulo.
Modulo is the most efficient compared to the other options,
because we do not need to repartition and also there is no
need to calculate the worker, on which a tuple is located,
the worker number is already the modulo value. Thus it is
slightly more efficient than any other function (i.e. hash).
In case it is possible in the future to deploy different secondo plans
to different workers (i.e. tell each worker which part of the shared
relation it should use), having 2 replicated relations
is the most efficient solution.
*/
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
SecondoPlan, _):-
plan_to_atom(simple_attrname(X), X2),
plan_to_atom(simple_attrname(Y), Y2),
distributedRels(_, ObjName1, _, 'modulo', X2),
distributedRels(_, ObjName2, _, 'modulo', Y2),
Rel1 = rel(_, Rel1Var, _),
Rel2 = rel(_, Rel2Var, _),
% rename the parameter relations of the dmapped plan if needed
feedRenameRelation(rel('.', *, u), Rel1Var, Feed1),
feedRenameRelation(rel('..', *, u), Rel2Var, Feed2),
!,
SecondoPlan = dmap2(ObjName1, ObjName2, " ",
hashjoin(Feed1, Feed2,attrname(X),
attrname(Y), 999997), 1238).
%Equijoin Secondo Plan for both are partitioned by join attribute
%using a function
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
SecondoPlan, _):-
plan_to_atom(simple_attrname(X), X2),
plan_to_atom(simple_attrname(Y), Y2),
distributedRels(_, ObjName1, _, 'function', X2),
distributedRels(_, ObjName2, _, 'function', Y2),
Rel1 = rel(_, Rel1Var, _),
Rel2 = rel(_, Rel2Var, _),
% rename the parameter relations of the dmapped plan if needed
feedRenameRelation(rel('.', *, u), Rel1Var, Feed1),
feedRenameRelation(rel('..', *, u), Rel2Var, Feed2),
!,
SecondoPlan = dmap2(ObjName1, ObjName2, " ",
hashjoin(Feed1, Feed2,attrname(X),
attrname(Y), 999997), 1238).
%Equijoin Secondo Plan for one replicated (relation) and
%one partitioned (darray/dfarray)
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
SecondoPlan, _):-
distributedRels(_ ,ObjName1,_ ,'share',_ ),
isPartitioned(ObjName2),
Rel1 = rel(_, Rel1Var, _),
Rel2 = rel(_, Rel2Var, _),
% rename the parameter relations of the dmapped plan if needed
feedRenameRelation(ObjName1, Rel1Var, Feed1),
feedRenameRelation(rel('.', *, u), Rel2Var, Feed2),
!,
SecondoPlan = dmap(ObjName2, " ",
hashjoin(Feed1,
Feed2,
attrname(X), attrname(Y), 999997)).
%Commutativity for Equijoin & Standard Join
buildSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2),
SecondoPlan, false):-
buildSecondoPlan(ObjName2, ObjName1, pr(Pred, Rel1, Rel2),
SecondoPlan, true).
%Equijoin Secondo Plan for repartitioning 2 "wrongly"
%partitioned relations (darray/dfarray)
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
SecondoPlan, _):-
isPartitioned(ObjName1),
isPartitioned(ObjName2),
Rel1 = rel(_, Rel1Var, _),
Rel2 = rel(_, Rel2Var, _),
% rename the parameter relations of the dmapped plan if needed
feedRenameRelation(rel('.', *, u), Rel1Var, Feed1),
feedRenameRelation(rel('..', *, u), Rel2Var, Feed2),
!,
SecondoPlan = dmap2(
collect2(
partitionF(ObjName1, "LeftPartOfJoin", feed(rel('.',*,u)),
hashvalue(our_attrname(X), 999997), 0),
"L", 1238),
collect2(
partitionF(ObjName2, "RightPartOfJoin", feed(rel('.',*,u)),
hashvalue(our_attrname(Y), 999997), 0),
"R", 1238),
" ",
hashjoin(Feed1,
Feed2,
attrname(X), attrname(Y), 999997),
1238).
%Equijoin Secondo Plan for repartitioning 2 replicated rels
buildSecondoPlan(ObjName1, ObjName2, pr(attr(_,_,_)=attr(_,_,_), _, _),
_, true):-
distributedRels(_ ,ObjName1,_ ,'share',_ ),
distributedRels(_, ObjName2, _,'share', _),
!,
write('Both relations are replicated, the query cannot be executed!'),
false.
% Plan yields a dfmatrix
isDfmatrix([_, P]) :-
member(distributedobjecttype(dfmatrix), P).
% Plan yields a partitioned distribution.
isPartitioned([_, P]):-
is_list(P), !,(
member(distribution('function', _, _), P);
member(distribution('modulo', _, _), P);
member(distribution('random', _, _), P);
member(distribution('spatial', _, _), P)).
% Secondo object represents a partitioned distribution.
isPartitioned(ObjName):-
distributedRels(_, ObjName,_ ,'function', _);
distributedRels(_, ObjName,_ ,'modulo', _);
distributedRels(_, ObjName,_ ,'random', _);
distributedRels(_, ObjName,_ ,'spatial', _).
%Standard Join Secondo Plan (one replicated, one partitioned)
buildStdSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2),
SecondoPlan, _):-
(DistArgrel = ObjName2, ReplArgrel = ObjName1;
DistArgrel = ObjName1, ReplArgrel = ObjName2),
distributedRels(_, ReplArgrel, _ , 'share', _),
isPartitioned(DistArgrel),
Rel1 = rel(_, Rel1Var, _),
Rel2 = rel(_, Rel2Var, _),
% rename the parameter relations of the dmapped plan if needed
feedRenameRelation(rel('.', *, u), Rel2Var, Feed2),
feedRenameRelation(ReplArgrel, Rel1Var, Feed1),
!,
SecondoPlan = dmap(DistArgrel, " ",
filter(product(Feed2,Feed1), Pred)).
%Standard Join Secondo Plan, both are partitioned
buildStdSecondoPlan(ObjName1, ObjName2, pr(_, _, _),
_, true):-
isPartitioned(ObjName1),
isPartitioned(ObjName2),
!,
write('The joined relations are both partitioned and thus'),
write(' not distributed correctly for standard join.'),
false.
%Standard Join Secondo Plan, if repartitioning is needed
buildStdSecondoPlan(_, _, pr(_, _, _), _, true):-
!,
write('The joined relations are not distributed correctly '),
write('for standard join.'),
false.
%end fapra 2015/16
/*
---- isOfFirst(Attr, X, Y)
isOfSecond(Attr, X, Y)
----
~Attr~ equal to either ~X~ or ~Y~ is an attribute of the first(second) relation.
*/
isOfFirst(X, X, _) :- X = attr(_, 1, _).
isOfFirst(Y, _, Y) :- Y = attr(_, 1, _).
isOfSecond(X, X, _) :- X = attr(_, 2, _).
isOfSecond(Y, _, Y) :- Y = attr(_, 2, _).
isNotOfFirst(Y, X, Y) :- X = attr(_, 1, _).
isNotOfFirst(X, X, Y) :- Y = attr(_, 1, _).
isNotOfSecond(Y, X, Y) :- X = attr(_, 2, _).
isNotOfSecond(X, X, Y) :- Y = attr(_, 2, _).
/*
6 Creating Query Plan Edges
*/
% RHG 2014
planEdge(Source, Target, Plan, Result) :-
edge(Source, Target, Term, Result, _, _),
Term => PlanExpr,
getProperties(PlanExpr, Plan, _).
% Version with properties
% Selection Edges
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P2] | PRest], Result) :-
edge(Source, Target, select(res(N), Pred), Result, _, _),
select([N, P], PropertiesIn, PRest),
select([res(N), P], Pred) => PlanExpr,
getProperties(PlanExpr, Plan, P2).
% Join Edges
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P2] | PRest], Result) :-
edge(Source, Target, join(arg(N), res(M), Pred), Result, _, _),
select([M, P], PropertiesIn, PRest),
join(arg(N), [res(M), P], Pred) => PlanExpr,
getProperties(PlanExpr, Plan, P2).
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P2] | PRest], Result) :-
edge(Source, Target, join(res(M), arg(N), Pred), Result, _, _),
select([M, P], PropertiesIn, PRest),
join([res(M), P], arg(N), Pred) => PlanExpr,
getProperties(PlanExpr, Plan, P2).
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P3] | PRest], Result) :-
edge(Source, Target, join(res(N), res(M), Pred), Result, _, _),
select([N, P], PropertiesIn, PIn2),
select([M, P2], PIn2, PRest),
join([res(N), P], [res(M), P2], Pred) => PlanExpr,
getProperties(PlanExpr, Plan, P3).
% Remaining edges without intermediate results
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P] | PropertiesIn],
Result) :-
edge(Source, Target, Term, Result, _, _),
Term = select(arg(_), _),
Term => PlanExpr,
getProperties(PlanExpr, Plan, P).
planEdge(Source, Target, PropertiesIn, Plan, [[Result, P] | PropertiesIn],
Result) :-
edge(Source, Target, Term, Result, _, _),
Term = join(arg(_), arg(_), _),
Term => PlanExpr,
getProperties(PlanExpr, Plan, P).
getProperties([Plan, P], Plan, P) :- !.
getProperties(Plan, Plan, none).
% end RHG 2014
createPlanEdge :-
edge(Source, Target, Term, Result, _, _),
Term => Plan,
assert(planEdge(Source, Target, Plan, Result)),
fail.
createPlanEdges :- not(createPlanEdge).
deletePlanEdge :-
retract(planEdge(_, _, _, _)), fail.
deletePlanEdges :- not(deletePlanEdge).
writePlanEdge :-
planEdge(Source, Target, Plan, Result),
write('Source: '), write(Source), nl,
write('Target: '), write(Target), nl,
write('Plan: '), wp(Plan), nl,
% write(Plan), nl,
write('Result: '), write(Result), nl, nl,
pe(N), retract(pe(_)), N1 is N + 1, assert(pe(N1)), % count edges
fail.
writePlanEdgesProp :-
planEdge(Source, Target, _, Plan, Prop, Result),
write('Source: '), write(Source), nl,
write('Target: '), write(Target), nl,
write('Plan: '), wp(Plan), nl,
write(Prop), nl,
% write(Plan), nl,
write('Result: '), write(Result), nl, nl,
pe(N), retract(pe(_)), N1 is N + 1, assert(pe(N1)), % count edges
fail.
writePlanEdges :-
assert(pe(0)),
not(writePlanEdge),
not(writePlanEdgesProp),
pe(N),
write('The total number of plan edges is '), write(N), write('.'), nl.
wpe :- writePlanEdges.
/*
7 Assigning Sizes and Selectivities to the Nodes and Edges of the POG
---- assignSizes.
deleteSizes.
----
Assign sizes (numbers of tuples) to all nodes in the pog, based on the
cardinalities of the argument relations and the selectivities of the
predicates. Store sizes as facts of the form resultSize(Result, Size). Store
selectivities as facts of the form edgeSelectivity(Source, Target, Sel).
Delete sizes from memory.
7.1 Assigning Sizes and Selectivities
It is important that edges are processed in the order in which they have been
created. This will ensure that for an edge the size of its argument nodes are
available.
*/
assignSizes :- not(assignSizes1).
assignSizes1 :-
edge(Source, Target, Term, Result, _, _),
assignSize(Source, Target, Term, Result),
fail.
%assignSize(Source, Target, select(Arg, Pred), Result) :-
% Pred = pr(attr(original, *, u), _),
% !, % predicate used for eliminating one of many spatially overlapping tuples
% resSize(Arg, Size),
% setNodeSize(Result, Size),
% % assume overlap is rather small
% assert(edgeSelectivity(Source, Target, 1)).
assignSize(Source, Target, select(Arg, Pred), Result) :-
resSize(Arg, Card),
selectivity(Pred, Sel),
Size is Card * Sel,
setNodeSize(Result, Size),
assert(edgeSelectivity(Source, Target, Sel)).
assignSize(Source, Target, join(Arg1, Arg2, Pred), Result) :-
resSize(Arg1, Card1),
resSize(Arg2, Card2),
selectivity(Pred, Sel),
Size is Card1 * Card2 * Sel,
setNodeSize(Result, Size),
assert(edgeSelectivity(Source, Target, Sel)).
/*
---- setNodeSize(Node, Size) :-
----
Set the size of node ~Node~ to ~Size~ if no size has been assigned before.
*/
setNodeSize(Node, _) :- resultSize(Node, _), !.
setNodeSize(Node, Size) :- assert(resultSize(Node, Size)).
/*
---- resSize(Arg, Size) :-
----
Argument ~Arg~ has size ~Size~.
*/
resSize(arg(N), Size) :- argument(N, rel(Rel, _, _)), card(Rel, Size), !.
resSize(arg(N), _) :- write('Error in optimizer: cannot find cardinality for '),
argument(N, Rel), wp(Rel), nl, fail.
resSize(res(N), Size) :- resultSize(N, Size), !.
/*
---- writeSizes :-
----
Write sizes and selectivities.
*/
writeSize :-
resultSize(Node, Size),
write('Node: '), write(Node), nl,
write('Size: '), write(Size), nl, nl,
fail.
writeSize :-
edgeSelectivity(Source, Target, Sel),
write('Source: '), write(Source), nl,
write('Target: '), write(Target), nl,
write('Selectivity: '), write(Sel), nl, nl,
fail.
writeSizes :- not(writeSize).
/*
---- deleteSizes :-
----
Delete node sizes and selectivities of edges.
*/
deleteSize :- retract(resultSize(_, _)), fail.
deleteSize :- retract(edgeSelectivity(_, _, _)), fail.
deleteSizes :- not(deleteSize).
/*
8 Computing Edge Costs for Plan Edges
8.1 The Costs of Terms
---- cost(Term, Sel, Size, Cost) :-
----
The cost of an executable ~Term~ representing a predicate with selectivity ~Sel~
is ~Cost~ and the size of the result is ~Size~.
This is evaluated recursively descending into the term. When the operator
realizing the predicate (e.g. ~filter~) is encountered, the selectivity ~Sel~ is
used to determine the size of the result. It is assumed that only a single
operator of this kind occurs within the term.
8.1.1 Arguments
*/
cost(Obj, Sel, Size, Cost) :-
distributedRels(Rel, Obj, _, DistType, _, _),
not(DistType = share),
cost(Rel, Sel, Size, Cost).
cost(rel(Rel, _, _), _, Size, 0) :-
card(Rel, Size).
cost(res(N), _, Size, 0) :-
resultSize(N, Size).
/*
8.1.2 Operators
*/
cost(feed(X), Sel, S, C) :-
cost(X, Sel, S, C1),
feedTC(A),
C is C1 + A * S.
/*
Here ~feedTC~ means ``feed tuple cost'', i.e., the cost per tuple, a constant to
be determined in experiments. These constants are kept in file ``Operators.pl''.
*/
cost(consume(X), Sel, S, C) :-
cost(X, Sel, S, C1),
consumeTC(A),
C is C1 + A * S.
cost(filter(X, Pred), _, S, C) :-
% This is special case for spatially distributed relations
% we cannot determine the selectivity for the predicate because
% it does not exist as a local relation on the master.
% We assume verly little overlap in the spatial distribution.
Pred=attr(original, l, u), !,
cost(X, 1, SizeX, CostX),
filterTC(A),
S is SizeX * 0.9,
C is CostX + A * SizeX.
cost(filter(X, _), Sel, S, C) :-
cost(X, 1, SizeX, CostX),
filterTC(A),
S is SizeX * Sel,
C is CostX + A * SizeX.
/*
For the moment we assume a cost of 1 for evaluating a predicate; this should be
changed shortly.
*/
cost(product(X, Y), _, S, C) :-
cost(X, 1, SizeX, CostX),
cost(Y, 1, SizeY, CostY),
productTC(A, B),
S is SizeX * SizeY,
C is CostX + CostY + SizeY * A + S * B.
cost(leftrange(_, Rel, _), Sel, Size, Cost) :-
cost(Rel, 1, RelSize, _),
leftrangeTC(C),
Size is Sel * RelSize,
Cost is Sel * RelSize * C.
cost(rightrange(_, Rel, _), Sel, Size, Cost) :-
cost(Rel, 1, RelSize, _),
leftrangeTC(C),
Size is Sel * RelSize,
Cost is Sel * RelSize * C.
/*
Simplistic cost estimation for loop joins.
If attribute values are assumed independent, then the selectivity
of a subquery appearing in an index join equals the overall
join selectivity. Therefore it is possible to estimate
the result size and cost of a subquery
(i.e. ~exactmatch~ and ~exactmatchfun~). As a subquery in an
index join is executed as often as a tuple from the left
input stream arrives, it is also possible to estimate the
overall index join cost.
*/
cost(exactmatchfun(_, Rel, _), Sel, Size, Cost) :-
cost(Rel, 1, RelSize, _),
exactmatchTC(A, B, C, D),
Size is Sel * RelSize,
Cost is A + B * (log10(RelSize) - C) + % query cost
Sel * RelSize * D. % size of result
cost(exactmatch(_, Rel, _), Sel, Size, Cost) :-
cost(Rel, 1, RelSize, _),
exactmatchTC(A, B, C, D),
Size is Sel * RelSize,
Cost is A + B * (log10(RelSize) - C) + % query cost
Sel * RelSize * D. % size of result
cost(loopjoin(X, Y), Sel, S, Cost) :-
cost(X, 1, SizeX, CostX),
cost(Y, Sel, SizeY, CostY),
S is SizeX * SizeY,
loopjoinTC(A),
Cost is CostX + % producing the first argument
SizeX * A + % base cost for loopjoin
SizeX * CostY. % sum of query costs
cost(fun(_, X), Sel, Size, Cost) :-
cost(X, Sel, Size, Cost).
cost(hashjoin(X, Y, _, _, 999997), Sel, S, C) :-
cost(X, 1, SizeX, CostX),
cost(Y, 1, SizeY, CostY),
hashjoinTC(A, B, D),
S is SizeX * SizeY * Sel,
C is CostX + CostY + % producing the arguments
A * SizeY + % A - time [microsecond] per build
B * SizeX + % B - time per probe
D * S. % C - time per result tuple
% table fits in memory assumed
cost(sortmergejoin(X, Y, _, _), Sel, S, C) :-
cost(X, 1, SizeX, CostX),
cost(Y, 1, SizeY, CostY),
sortmergejoinTC(A, B, D),
S is SizeX * SizeY * Sel,
C is CostX + CostY + % producing the arguments
A * (SizeX + SizeY) + % sorting the arguments
B * (SizeX + SizeY) + % merge step
D * S. % cost of results
cost(mergejoin(X, Y, _, _), Sel, S, C) :-
cost(X, 1, SizeX, CostX),
cost(Y, 1, SizeY, CostY),
sortmergejoinTC(_, B, D),
S is SizeX * SizeY * Sel,
C is CostX + CostY + % producing the arguments
B * (SizeX + SizeY) + % merge step
D * S. % cost of results
cost(extend(X, _), Sel, S, C) :-
cost(X, Sel, S, C1),
extendTC(A),
C is C1 + A * S.
cost(remove(X, _), Sel, S, C) :-
cost(X, Sel, S, C1),
removeTC(A),
C is C1 + A * S.
cost(project(X, _), Sel, S, C) :-
cost(X, Sel, S, C1),
projectTC(A),
C is C1 + A * S.
cost(rename(X, _), Sel, S, C) :-
cost(X, Sel, S, C1),
renameTC(A),
C is C1 + A * S.
%fapra 2015/16
% Taken from standard optimizer.
cost(itSpatialJoin(X, Y, _, _), Sel, S, C) :-
cost(X, 1, SizeX, CostX),
cost(Y, 1, SizeY, CostY),
itSpatialJoinTC(A, B),
S is SizeX * SizeY * Sel,
C is CostX + CostY +
A * (SizeX + SizeY) +
B * S.
cost(windowintersects(_, Rel, _), Sel, Size, Cost) :-
cost(Rel, 1, RelSize, _),
windowintersectsTC(A),
Size is Sel * RelSize,
Cost is Size * A.
cost(hashvalue(_,_), _, 1, 0).
cost(dmap(Obj, _, InnerPlan), Sel, S, C) :-
distributedRels(LocalMasterRel, Obj, _, _, _),
substituteSubterm(rel('.', *, u), LocalMasterRel, InnerPlan, LocalInnerPlan),
cost(LocalInnerPlan, Sel, S, InnerC),
!,
C is InnerC * S.
cost(dmap(Obj, _, InnerPlan), Sel, S, C) :-
substituteSubterm(rel('.', *, u), Obj, InnerPlan, LocalInnerPlan),
cost(LocalInnerPlan, Sel, S, InnerC),
!,
C is InnerC * S.
% if we cannot determine cost of first dmap-argument
cost(dmap(_, _, X), Sel, S, C) :-
cost(X, 1, SizeX, CostX),
dmapTC(A),
S is SizeX * Sel,
C is CostX + A * SizeX.
cost(dmap2(_, RelObj, _, InnerPlan, _), Sel, S, C) :-
distributedRels(LocalMasterRel, RelObj, _, _, _),
substituteSubterm(rel('..', *, u), LocalMasterRel,
InnerPlan, LocalInnerPlan),
dmap2TC(A),
cost(LocalMasterRel, 1, Card, _),
cost(LocalInnerPlan, Sel, _, InnerCost),
!,
S is Sel * Card,
C is InnerCost + A * S.
% we have two d/farray-objects as arguments
cost(dmap2(RelObj1, RelObj2, _, InnerPlan, _), Sel, _, C) :-
distributedRels(LocalMasterRel1, RelObj1, _, _, _),
distributedRels(LocalMasterRel2, RelObj2, _, _, _),
substituteSubterm(rel('.', *, u), LocalMasterRel1,
InnerPlan, LocalInnerPlan1),
substituteSubterm(rel('..', *, u), LocalMasterRel2,
LocalInnerPlan1, LocalInnerPlan),
dmap2TC(A),
cost(LocalMasterRel1, 1, Card1, _),
cost(LocalMasterRel2, 1, Card2, _),
cost(LocalInnerPlan, Sel, _, InnerCost),
!,
S1 is Sel * Card1,
S2 is Sel * Card2,
C is InnerCost + A * S1 + A * S2.
% we have two d/farray-values as arguments
cost(dmap2(Arg1, Arg2, _, InnerPlan, _), Sel, _, C) :-
cost(Arg1, _, _, C1),
cost(Arg2, _, _, C2),
substituteSubterm(rel('.', *, u), Arg1,
InnerPlan, LocalInnerPlan1),
substituteSubterm(rel('..', *, u), Arg2,
LocalInnerPlan1, LocalInnerPlan),
cost(LocalInnerPlan, Sel, _, InnerCost),
dmap2TC(A),
!,
ArgS1 is Sel * C1,
ArgS2 is Sel * C2,
C is InnerCost + A * ArgS1 + A * ArgS2.
cost(dmap2(RelObj1, RelObj2, _, InnerPlan, _), Sel, _, C) :-
substituteSubterm(rel('.', *, u), "#!SUBST1!#", RelObj1, RelObj_Mod1),
substituteSubterm(rel('.', *, u), "#!SUBST2!#", RelObj2, RelObj_Mod2),
substituteSubterm(rel('.', *, u), RelObj_Mod1, InnerPlan, TempPlan1),
substituteSubterm(rel('..', *, u), RelObj_Mod2, TempPlan1, TempPlan2),
substituteSubterm( "#!SUBST1!#", rel('.',*,u),TempPlan2, TempPlan3),
substituteSubterm( "#!SUBST2!#", rel('.',*,u),TempPlan3, FinallyGoodPlan),
dmap2TC(A),
cost(RelObj1, 1, Card1, _),
cost(RelObj2, 1, Card2, _),
cost(FinallyGoodPlan, Sel, _, InnerCost),
!,
S1 is Sel * Card1,
S2 is Sel * Card2,
C is InnerCost + A * S1 + A * S2.
% we have two d/fmatrix-values as arguments
cost(areduce2(Arg1, Arg2, _, InnerPlan, _), Sel, _, C) :-
cost(Arg1, _, _, C1),
cost(Arg2, _, _, C2),
substituteSubterm(rel('.', *, u), Arg1,
InnerPlan, LocalInnerPlan1),
substituteSubterm(rel('..', *, u), Arg2,
LocalInnerPlan1, LocalInnerPlan),
cost(LocalInnerPlan, Sel, _, InnerCost),
areduce2TC(A),
!,
ArgS1 is Sel * C1,
ArgS2 is Sel * C2,
C is InnerCost + A * ArgS1 + A * ArgS2.
cost(collect2(InnerPlan, _ , _), Sel, S, C) :-
cost(InnerPlan, Sel, S, InnerCost),
collect2TC(A),
C is InnerCost + A * S.
cost(partitionF(RelObj, _, InnerPlan, _, _), Sel, S, C) :-
distributedRels(LocalMasterRel, RelObj, _, _, _),
substituteSubterm(rel('.', *, u), LocalMasterRel,
InnerPlan, LocalInnerPlan),
partitionFTC(A),
cost(LocalMasterRel, 1, S, _),
cost(LocalInnerPlan, Sel, _, InnerCost),
!,
C is (InnerCost + A) * S.
% generic case
cost(partitionF(RelObj, _, _, _), _, S, C) :-
cost(RelObj, 1, RS, RC),
partitionFTC(A),
S is RS,
C is RC + S * A.
cost(extendstream(Stream, _, cellnumber(bbox(_), _)), _, S, C) :-
cost(Stream, 1, S, StreamC),
extendstreamTC(ETC),
bboxTC(BTC),
cellnumberTC(CTC),
TC is ETC + BTC + CTC,
C is S * TC + StreamC.
cost(range(_, Rel, _, _), Sel, S, C) :-
cost(Rel, 1, Card, _),
S is Sel * Card,
leftrangeTC(A),
C is A * S.
cost(dloop2(_, RelObj, _, InnerPlan), Sel, S, C) :-
distributedRels(LocalMasterRel, RelObj, _, _, _),
substituteSubterm(rel('..', *, u), LocalMasterRel,
InnerPlan, LocalInnerPlan),
dloopTC(A),
cost(LocalMasterRel, 1, Card, _),
cost(LocalInnerPlan, Sel, _, InnerCost),
!,
S is Sel * Card,
C is InnerCost + A * S.
/* dummy for dsummarize */
cost(dsummarize(_), _, _, 0).
cost(dsummarize(X), Sel, S, C) :-
cost(X, Sel, S, C1),
dsummarizeTC(A),
C is C1 + A * S.
%end fapra 2015/16
/*
8.2 Creating Cost Edges
These are plan edges extended by a cost measure.
*/
% RHG 2014
costEdge(Source, Target, Term, Result, Size, Cost) :-
planEdge(Source, Target, Term, Result),
edgeSelectivity(Source, Target, Sel),
cost(Term, Sel, Size, Cost).
% Version with properties
costEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result,
Size, Cost) :-
planEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result),
edgeSelectivity(Source, Target, Sel),
cost(Plan, Sel, Size, Cost).
% end RHG 2014
createCostEdge :-
planEdge(Source, Target, Term, Result),
edgeSelectivity(Source, Target, Sel),
cost(Term, Sel, Size, Cost),
assert(costEdge(Source, Target, Term, Result, Size, Cost)),
fail.
createCostEdges :- not(createCostEdge).
deleteCostEdge :-
retract(costEdge(_, _, _, _, _, _)), fail.
deleteCostEdges :- not(deleteCostEdge).
writeCostEdge :-
costEdge(Source, Target, Plan, Result, Size, Cost),
write('Source: '), write(Source), nl,
write('Target: '), write(Target), nl,
write('Plan: '), wp(Plan), nl,
write('Result: '), write(Result), nl,
write('Size: '), write(Size), nl,
write('Cost: '), write(Cost), nl,
nl,
ce(N), retract(ce(_)), N1 is N + 1, assert(ce(N1)), % count edges
fail.
writeCostEdges :-
assert(ce(0)),
not(writeCostEdge),
ce(N),
write('The total number of cost edges is '), write(N), write('.'), nl.
wce :- writeCostEdges.
writeCostEdgeUsed :-
costEdgeUsed(Source, Version, Target, PropertiesIn, Plan, PropertiesOut,
Result, Size, Cost),
write('Source: ('), write(Source), write(', '), write(Version),
write(')'), nl,
write('Target: '), write(Target), nl,
write('PropertiesIn: '), write(PropertiesIn), nl,
write('Plan: '), wp(Plan), nl,
write('PropertiesOut: '), write(PropertiesOut), nl,
write('Result: '), write(Result), nl,
write('Size: '), write(Size), nl,
write('Cost: '), write(Cost), nl,
nl,
ceu(N), retract(ceu(_)), N1 is N + 1, assert(ceu(N1)), % count edges
fail.
writeCostEdgesUsed :-
assert(ceu(0)),
not(writeCostEdgeUsed),
ceu(N),
write('The total number of cost edges used is '), write(N), write('.'), nl.
wceu :- writeCostEdgesUsed.
deleteCostEdgeUsed :-
retract(costEdgeUsed(_, _, _, _, _, _, _, _, _)), fail.
deleteCostEdgesUsed :- not(deleteCostEdgeUsed).
/*
---- assignCosts
----
This just puts together creation of sizes and cost edges.
*/
assignCosts :-
assignSizes.
% RHG 2014
% createCostEdges.
/*
9 Finding Shortest Paths = Cheapest Plans
The cheapest plan corresponds to the shortest path through the predicate order
graph.
9.1 Shortest Path Algorithm by Dijkstra
We implement the shortest path algorithm by Dijkstra. There are two
relevant sets of nodes:
* center: the nodes for which shortest paths have already been
computed
* boundary: the nodes that have been seen, but that have not yet been
expanded. These need to be kept in a priority queue.
A node, as used during shortest path computation, is represented as a term
---- node(n(Name, Version), Distance, [Path, Properties])
----
where ~Name~ is the node number, ~Version~ a version number of this node, ~Distance~ the distance along the shortest path to this node, ~Path~ is the list of edges forming the shortest path, and ~Properties~ the physical properties (such as order) for the result obtained at this node version.
The graph is represented by the set of ~costEdges~.
The center is represented as a set of facts of the form
---- center(n(Name, Version), node(n(Name, Version), Distance, [Path, Properties]))
----
Since predicates are generally indexed by their first argument, finding a node
in the center via the node number should be very efficient. We assume it is
possible in constant time.
The boundary is represented by an abstract data type as described in the
interface below. Essentially it is a priority queue implementation.
---- successor(Node, Succ) :-
----
~Succ~ is a successor of node ~Node~ via some edge. This includes computation
of the distance and path of the successor.
*/
% RHG 2014
% successor(node(Source,Distance, Path), node(Target, Distance2, Path2)) :-
% costEdge(Source, Target, Term, Result, Size, Cost),
% assert(costEdgeUsed(Source, Target, Term, Result, Size, Cost)),
% Distance2 is Distance + Cost,
% append(Path, [costEdge(Source, Target, Term, Result, Size, Cost)], Path2).
% Version with properties
successor(node(n(Source, Version), Distance, [Path, PropertiesIn]),
simplenode(Target, Distance2, [Path2, PropertiesOut])) :-
costEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result,
Size, Cost),
assert(costEdgeUsed(Source, Version, Target, PropertiesIn, Plan,
PropertiesOut, Result, Size, Cost)),
Distance2 is Distance + Cost,
append(Path, [costEdge(Source, Target, Plan, Result, Size, Cost)], Path2).
% end RHG 2014
/*
---- dijkstra(Source, Dest, Path, Length) :-
----
The shortest path from ~Source~ to ~Dest~ is ~Path~ of length ~Length~.
*/
dijkstra(Source, Dest, Path, Length) :-
emptyCenter,
b_empty(Boundary),
deleteCostEdgesUsed, % RHG
b_insert(Boundary, node(n(Source, 1), 0, [[], []]), Boundary1),
dijkstra1(Boundary1, n(Dest, 1), 0, notfound),
center(n(Dest, _), node(n(Dest, _), Length, [Path, _])).
emptyCenter :- not(emptyCenter1).
emptyCenter1 :- retract(center(_, _)), fail.
/*
---- dijkstra1(Boundary, Dest, NoOfCalls) :-
----
Compute the shortest paths to all nodes and store them in a predicate
~center~. Initially to be called with no fact ~center~ asserted, and ~Boundary~
just containing the start node.
For testing we check at which iteration the destination ~Dest~ is reached.
*/
dijkstra1(Boundary, _, _, found) :- !,
tree_height(Boundary, H),
write('Height of search tree for boundary is '), write(H), nl.
dijkstra1(Boundary, _, _, _) :- b_isEmpty(Boundary).
dijkstra1(Boundary, Dest, N, _) :-
% nl, nl,
% write('dijkstra1 called.'), nl,
% write('Boundary = '), write(Boundary), nl, write('====='), nl,
b_removemin(Boundary, Node, Bound2),
Node = node(Name, _, _),
% write('Node = '), write(Name), nl,
assert(center(Name, Node)),
% write('Center = '), writeCenter, nl, write('====='), nl,
checkDest(Name, Dest, N, Found),
putsuccessors(Bound2, Node, Bound3),
% write('putsuccessors succeeded.'), nl,
N1 is N+1,
dijkstra1(Bound3, Dest, N1, Found).
checkDest(n(Name, _), n(Name, _), N, found) :- write('Destination node '),
write(Name), write(' reached at iteration '), write(N), nl.
checkDest(_, _, _, notfound).
/*
Some auxiliary functions for testing:
*/
writeList([]).
writeList([X | Rest]) :- nl, nl, write('-----'), nl, write(X), writeList(Rest).
writeCenter :- not(writeCenter1).
writeCenter1 :-
center(_, node(Name, Distance, Path)),
write('Node: '), write(Name), nl,
write('Cost: '), write(Distance), nl,
write('Path: '), nl, write(Path), nl, fail.
writePath([]).
writePath([costEdge(Source, Target, Term, Result, Size, Cost) | Path]) :-
write(costEdge(Source, Target, Result, Size, Cost)), nl,
write(' '), wp(Term), nl,
writePath(Path).
/*
---- putsuccessors(Boundary, Node, BoundaryNew) :-
----
Insert into ~Boundary~ all successors of node ~Node~ not yet present in
the center, updating their distance if they are already present, to obtain
~BoundaryNew~.
*/
putsuccessors(Boundary, Node, BoundaryNew) :-
findall(Succ, successor(Node, Succ), Successors),
% write('successors of '), write(Node), nl,
% writeList(Successors), nl, nl,
putsucc1(Boundary, Successors, BoundaryNew).
% write('the new boundary is: '), write(BoundaryNew),
% nl, write('====='), nl.
/*
---- putsucc1(Boundary, Successors, BoundaryNew) :-
----
put all successors not yet in the center from the list ~Successors~ into the
~Boundary~ to get ~BoundaryNew~. The cases to be distinguished are:
* The list of successors is empty.
* The first successor simplenode(N, \_, \_) is already in the center, hence the shortest path to it is already known and it does not need to be inserted into the boundary.
* The first successor X = simplenode(N, \_, \_) exists in the boundary. That means, there exists a non-empty set V(N) with versions of N in the boundary. We say, X dominates Y iff the distance of X is less than or equal to that of Y and the properties of X include those of Y.
* If X is not dominated by any Y in V(N), then insert X into the boundary.
* If X dominates any Y in V(N), then remove Y from the boundary.
* The first successor does not exist in the boundary. It is inserted.
*/
putsucc1(Boundary, [], Boundary).
putsucc1(Boundary, [simplenode(N, _, _) | Successors], BNew) :-
center(n(N, 1), _), !,
putsucc1(Boundary, Successors, BNew).
putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :-
findall(Node, b_memberByName(Boundary, n(N, _), Node), Nodes),
insertIfNotDominated(Boundary, simplenode(N, D, P), Nodes, 1, Boundary2),
removeThoseDominated(Boundary2, simplenode(N, D, P), Nodes, Boundary3),
putsucc1(Boundary3, Successors, BNew).
% putsucc1(Boundary, [simplenode(N, D, [_, Properties]) | Successors],
% BNew) :-
% b_memberByName(Boundary, n(N, 1), node(n(N, 1), DistOld, [_, Properties)),
% DistOld =< D, !,
% putsucc1(Boundary, Successors, BNew).
% putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :-
% b_memberByName(Boundary, n(N, 1), node(n(N, 1), DistOld, _)),
% D < DistOld, !,
% b_deleteByName(Boundary, n(N, 1), Bound2),
% b_insert(Bound2, node(n(N, 1), D, P), Bound3),
% putsucc1(Bound3, Successors, BNew).
% the following not needed
% putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :-
% nl,
% write('putsucc1 called with final case'), nl,
% write(simplenode(N, D, P)), nl,
% b_insert(Boundary, node(n(N, 1), D, P), Bound2),
% putsucc1(Bound2, Successors, BNew).
insertIfNotDominated(Boundary, simplenode(N, D, P), [], Version, BoundaryOut) :-
b_insert(Boundary, node(n(N, Version), D, P), BoundaryOut).
% nl, write('***** inserted '), write(node(n(N, Version), D, P)), nl.
insertIfNotDominated(Boundary, simplenode(N, D, [Path, Prop]),
[node(n(N, V), DistOld, [_, PropOld]) | Nodes], Version, BoundaryOut) :-
( D < DistOld ; otherProperties(Prop, PropOld) ), % not dominated
( V > Version
-> Version2 is V + 1
; Version2 is Version + 1
),
insertIfNotDominated(Boundary, simplenode(N, D, [Path, Prop]), Nodes,
Version2, BoundaryOut).
insertIfNotDominated(Boundary, simplenode(N, D, [_, Prop]),
[node(n(N, _), DistOld, [_, PropOld]) | _], _, Boundary) :-
% nl, write('***** NOT inserted '), write(simplenode(N, D, [Path, Prop])), nl,
D >= DistOld,
included(Prop, PropOld). % is dominated and can be ignored.
removeThoseDominated(Boundary, simplenode(_, _, [_, _]), [], Boundary).
removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]),
[node(n(N, _), DistOld, [_, PropOld]) | Nodes], Boundary2) :-
( DistOld =< D ; otherProperties(PropOld, Prop) ), !, % not dominated
removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]), Nodes,
Boundary2).
removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]),
[node(n(N, V), _, [_, _]) | Nodes], Boundary3) :-
b_deleteByName(Boundary, n(N, V), Boundary2),
% nl, write('***** deleted '), write(n(N, V)), nl,
removeThoseDominated(Boundary2, simplenode(N, D, [Path, Prop]), Nodes,
Boundary3).
:-dynamic noProperties/0.
included(_, _) :- noProperties, !.
included([[Node, List1] | Props1], Props2) :-
select([Node, List2], Props2, Props2Rest),
included2(List1, List2),
included(Props1, Props2Rest).
included([], _).
included2([], _).
included2([P1 | Props1], Props2) :-
select(P1, Props2, Props2Rest),
included2(Props1, Props2Rest).
included2([none], _).
otherProperties(Props1, Props2) :-
not(included(Props1, Props2)).
/*
9.2 Interface ~Boundary~
The boundary is represented in a data structure with the following
operations:
---- b_empty(-Boundary) :-
----
Creates an empty boundary and returns it.
---- b_isEmpty(+Boundary) :-
----
Checks whether the boundary is empty.
---- b_removemin(+Boundary, -Node, -BoundaryOut) :-
----
Returns the node ~Node~ with minimal distance from the set ~Boundary~ and
returns also ~BoundaryOut~ where this node is removed.
---- b_insert(+Boundary, +Node, -BoundaryOut) :-
----
Inserts a node that must not yet be present (i.e., no other node of that
name).
---- b_memberByName(+Boundary, +Name, -Node) :-
----
If a node ~Node~ with name ~Name~ is present, it is returned.
---- b_deleteByName(+Boundary, +Name, -BoundaryOut) :-
----
Returns the boundary, where the node with name ~Name~ is deleted.
*/
/*
9.3 Constructing the Plan from the Shortest Path
---- plan(Path, Plan)
----
The plan corresponding to ~Path~ is ~Plan~.
*/
%fapra 15/16
plan(Path, Plan) :-
isDistributedQuery,
!,
deleteNodePlans,
mergePlanEdges(Path, MergedPath),
traversePath(MergedPath),
highNode(N),
nodePlan(N, Plan).
%end fapra 15/16
plan(Path, Plan) :-
deleteNodePlans,
traversePath(Path),
highNode(N),
nodePlan(N, Plan).
deleteNodePlans :- not(deleteNodePlan).
deleteNodePlan :- retract(nodePlan(_, _)), fail.
traversePath([]).
traversePath([costEdge(_, _, Term, Result, _, _) | Path]) :-
embedSubPlans(Term, Term2),
assert(nodePlan(Result, Term2)),
traversePath(Path).
embedSubPlans(res(N), Term) :-
nodePlan(N, Term), !.
embedSubPlans(Term, Term2) :-
compound(Term), !,
Term =.. [Functor | Args],
embedded(Args, Args2),
Term2 =.. [Functor | Args2].
embedSubPlans(Term, Term).
embedded([], []).
embedded([Arg | Args], [Arg2 | Args2]) :-
embedSubPlans(Arg, Arg2),
embedded(Args, Args2).
%fapra 15/16
/*
---- mergePlanEdges(PlanEdgeList, MergedEdgesList)
----
Merge the distribution of a query on a distributed query result
to the distribution of the query on a query result. Example:
dmap(... filter(.,bla1)) dmap filter(., bla2)
...becomes: dmap(... filter(filter(., bla1), bla2))
*/
mergePlanEdges([], []).
mergePlanEdges([X], [X]).
/*
Merge rule for two successive dmaps with filtrations as there parameters
should be the most common case.
*/
mergePlanEdges([Edge1, Edge2|Edges], MergedEdges) :-
Edge1 = costEdge(Source, _, Plan1, Res1, _, C1),
Edge2 = costEdge(_, Target, Plan2, Res2, S2, C2),
Plan1 = dmap(Arg, _, filter(FilterArg, Pred1)),
successiveFilterOnParam(FilterArg, ArgTerm),
Plan2 = dmap(res(Res1), ResName, filter(ArgTerm, Pred2)),
MergedPlan = dmap(Arg, ResName,
filter(filter(FilterArg, Pred1), Pred2)),
% the plan is already chosen at this point, so costs will have no influence
MergedCosts is C1 + C2,
MergedHead = costEdge(Source, Target, MergedPlan, Res2, S2, MergedCosts),
mergePlanEdges([MergedHead|Edges], MergedEdges).
% First two edges cannot be merges according to the above rules.
mergePlanEdges([X|Tail], [X|MergedTail]) :-
mergePlanEdges(Tail, MergedTail).
% Term is a dot or a nested filtration on a dot.
successiveFilterOnParam(Term, ArgTerm) :-
functor(Term, filter, 2),
arg(1, Term, FirstArg),
successiveFilterOnParam(FirstArg, ArgTerm).
successiveFilterOnParam(Term, Term) :-
Term = feed(rel('.', _, _)).
successiveFilterOnParam(Term, Term) :-
Term = rename(feed(rel('.', _, _)), _).
%end fapra 15/16
% highestNode(Path, N) :-
% reverse(Path, Path2),
% Path2 = [costEdge(_, N, _, _, _, _) | _].
/*
9.4 Computing the Best Plan for a Given Predicate Order Graph
*/
bestPlan :-
assignCosts,
highNode(N),
dijkstra(0, N, Path, Cost),
plan(Path, Plan),
write('The best plan is:'), nl, nl,
wp(Plan),
nl, nl,
write('The cost is: '), write(Cost), nl.
bestPlan(Plan, Cost) :-
assignCosts,
highNode(N),
dijkstra(0, N, Path, Cost),
plan(Path, Plan).
/*
10 A Larger Example
It is now time to test efficiency with a larger example. We consider the query:
---- select *
from Staedte, plz as p1, plz as p2, plz as p3,
where SName = p1.Ort
and p1.PLZ = p2.PLZ + 1
and p2.PLZ = p3.PLZ * 5
and Bev > 300000
and Bev < 500000
and p2.PLZ > 50000
and p2.PLZ < 60000
and Kennzeichen starts "W"
and p3.Ort contains "burg"
and p3.Ort starts "M"
----
This translates to:
*/
example6 :- pog(
[rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)],
[
pr(attr(sName, 1, u) = attr(p1:ort, 2, u),
rel(staedte, *, u), rel(plz, p1, l)),
pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1),
rel(plz, p1, l), rel(plz, p2, l)),
pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5),
rel(plz, p2, l), rel(plz, p3, l)),
pr(attr(bev, 1, u) > 300000, rel(staedte, *, u)),
pr(attr(bev, 1, u) < 500000, rel(staedte, *, u)),
pr(attr(p2:pLZ, 1, u) > 50000, rel(plz, p2, l)),
pr(attr(p2:pLZ, 1, u) < 60000, rel(plz, p2, l)),
pr(attr(kennzeichen, 1, u) starts "W", rel(staedte, *, u)),
pr(attr(p3:ort, 1, u) contains "burg", rel(plz, p3, l)),
pr(attr(p3:ort, 1, u) starts "M", rel(plz, p3, l))
],
_, _).
/*
This doesn't work (initially, now it works). Let's keep the numbers a bit
smaller and avoid too many big joins first.
*/
example7 :- pog(
[rel(staedte, *, u), rel(plz, p1, l)],
[
pr(attr(sName, 1, u) = attr(p1:ort, 2, u),
rel(staedte, *, u), rel(plz, p1, l)),
pr(attr(bev, 0, u) > 300000, rel(staedte, *, u)),
pr(attr(bev, 0, u) < 500000, rel(staedte, *, u)),
pr(attr(p1:pLZ, 0, u) > 50000, rel(plz, p1, l)),
pr(attr(p1:pLZ, 0, u) < 60000, rel(plz, p1, l)),
pr(attr(kennzeichen, 0, u) starts "F", rel(staedte, *, u)),
pr(attr(p1:ort, 0, u) contains "burg", rel(plz, p1, l)),
pr(attr(p1:ort, 0, u) starts "M", rel(plz, p1, l))
],
_, _).
example8 :- pog(
[rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l)],
[
pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u),
rel(plz, p1, l)),
pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), rel(plz, p1, l),
rel(plz, p2, l)),
pr(attr(bev, 0, u) > 300000, rel(staedte, *, u)),
pr(attr(bev, 0, u) < 500000, rel(staedte, *, u)),
pr(attr(p1:pLZ, 0, u) > 50000, rel(plz, p1, l)),
pr(attr(p1:pLZ, 0, u) < 60000, rel(plz, p1, l)),
pr(attr(kennzeichen, 0, u) starts "F", rel(staedte, *, u)),
pr(attr(p1:ort, 0, u) contains "burg", rel(plz, p1, l)),
pr(attr(p1:ort, 0, u) starts "M", rel(plz, p1, l))
],
_, _).
/*
Let's study a small example again with two independent conditions.
*/
example9 :- pog([rel(staedte, s, u), rel(plz, p, l)],
[pr(attr(p:ort, 2, u) = attr(s:sName, 1, u),
rel(staedte, s, u), rel(plz, p, l) ),
pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l)),
pr(attr(s:bev, 0, u) > 300000, rel(staedte, s, u))], _, _).
example10 :- pog(
[rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)],
[
pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u),
rel(plz, p1, l)),
pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), rel(plz, p1, l),
rel(plz, p2, l)),
pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5), rel(plz, p2, l),
rel(plz, p3, l))
],
_, _).
/*
11 A User Level Language
We have started to construct the optimizer by building the predicate order
graph, using a notation for relations and predicates as useful for that
purpose. Later, in [Section Translation], we have adapted the notation to be
able to translate and construct query plans as needed in Secondo. In this
section we will introduce a more user friendly notation for queries, pretty
similar to SQL, but suitable for being written directly in PROLOG.
11.1 The Language
The basic select-from-where statement will be written as
---- select <attr-list>
from <rel-list>
where <pred-list>
----
The first example query from [Section 4.1.1] can then be written as:
---- select [sname, bev]
from [staedte]
where [bev > 500000]
----
Instead of lists consisting of a single element we will also support writing
just the element, hence the query can also be written:
---- select [sname, bev]
from staedte
where bev > 500000
----
The second query can be written as:
---- select *
from [staedte as s, plz as p]
where [sname = p:ort, p:plz > 40000]
----
Note that all relation names and attribute names are written just in lower
case; the system will lookup the spelling in a table.
Furthermore, it will be possible to add a groupby- and an orderby-clause:
* groupby
---- select <aggr-list>
from <rel-list>
where <pred-list>
groupby <group-attr-list>
----
Example:
----
select [ort, min(plz) as minplz, max(plz) as maxplz, count(*) as cntplz]
from plz
where plz > 40000
groupby ort
----
* orderby
---- select <attr-list>
from <rel-list>
where <pred-list>
orderby <order-attr-list>
----
Example:
---- select [ort, plz]
from plz
orderby [ort asc, plz desc]
----
This example also shows that the where-clause may be omitted. It is also
possible to combine grouping and ordering:
----
select [ort, min(plz) as minplz, max(plz) as maxplz, count(*) as cntplz]
from plz
where plz > 40000
groupby ort
orderby cntplz desc
----
Currently only a basic part of this language has been implemented.
11.2 Structure
We introduce ~select~, ~from~, ~where~, and ~as~ as PROLOG operators:
*/
:- op(990, fx, sql).
:- op(985, xfx, >>).
:- op(950, fx, select).
:- op(960, xfx, from).
:- op(950, xfx, where).
:- op(930, xfx, as).
:- op(970, xfx, groupby).
:- op(980, xfx, orderby).
:- op(930, xf, asc).
:- op(930, xf, desc).
/*
This ensures that the select-from-where statement is viewed as a term with the
structure:
---- from(select(AttrList(), where(RelList, PredList))
----
That this works, can be tested with:
---- P = (select s:sname from staedte as s where s:bev > 500000),
P = (X from Y), X = (select AttrList), Y = (RelList where PredList),
RelList = (Rel as Var).
----
The result is:
---- P = select s:sname from staedte as s where s:bev>500000
X = select s:sname
Y = staedte as s where s:bev>500000
AttrList = s:sname
RelList = staedte as s
PredList = s:bev>500000
Rel = staedte
Var = s
----
11.3 Schema Lookup
The second task is to lookup attribute names in order to build the input
notation for the construction of the predicate order graph.
11.3.1 Tables
In the file ~database~ we maintain the following tables.
Relation schemas are written as:
---- relation(staedte, [sname, bev, plz, vorwahl, kennzeichen]).
relation(plz, [plz, ort]).
----
The spelling of relation or attribute names is given in a table
---- spelling(staedte:plz, pLZ).
spelling(staedte:sname, sName).
spelling(plz, lc(plz)).
spelling(plz:plz, pLZ).
----
The default assumption is that the first letter of a name is upper case and all
others are lower case. If this is true, then no entry in the table ~spelling~
is needed. If a name starts with a lower case letter, then this is expressed by
the functor ~lc~.
11.3.2 Looking up Relation and Attribute Names
*/
callLookup(Query, Query2) :-
newQuery,
lookup(Query, Query2), !.
%fapra 2015/16
/*
added clearIsDistributedQuery
*/
newQuery :- not(clearVariables), not(clearQueryRelations),
not(clearQueryAttributes), not(clearIsDistributedQuery),
not(clearIsLocalQuery).
clearVariables :- retract(variable(_, _)), fail.
clearQueryRelations :- retract(queryRel(_, _)), fail.
clearQueryAttributes :- retract(queryAttr(_)), fail.
clearIsDistributedQuery :- retract(isDistributedQuery), fail.
clearIsLocalQuery :- retract(isLocalQuery), fail.
%end fapra 2015/16
/*
---- lookup(Query, Query2) :-
----
~Query2~ is a modified version of ~Query~ where all relation names and
attribute names have the form as required in [Section Translation].
*/
lookup(select Attrs from Rels where Preds,
select Attrs2 from Rels2List where Preds2List) :-
lookupRels(Rels, Rels2),
checkDistributedQuery,
lookupAttrs(Attrs, Attrs2),
lookupPreds(Preds, Preds2),
makeList(Rels2, Rels2List),
makeList(Preds2, Preds2List).
lookup(select Attrs from Rels,
select Attrs2 from Rels2) :-
lookupRels(Rels, Rels2),
checkDistributedQuery,
lookupAttrs(Attrs, Attrs2).
lookup(Query orderby Attrs, Query2 orderby Attrs3) :-
lookup(Query, Query2),
makeList(Attrs, Attrs2),
lookupAttrs(Attrs2, Attrs3).
lookup(Query groupby Attrs, Query2 groupby Attrs3) :-
lookup(Query, Query2),
makeList(Attrs, Attrs2),
lookupAttrs(Attrs2, Attrs3).
makeList(L, L) :- is_list(L).
makeList(L, [L]) :- not(is_list(L)).
/*
11.3.3 Modification of the From-Clause
---- lookupRels(Rels, Rels2)
----
Modify the list of relation names. If there are relations without variables,
store them in a table ~queryRel~. Any two such relations must have distinct
sets of attribute names. Also, any two variables must be distinct.
*/
lookupRels([], []).
lookupRels([R | Rs], [R2 | R2s]) :-
lookupRel(R, R2),
lookupRels(Rs, R2s).
lookupRels(Rel, Rel2) :-
not(is_list(Rel)),
lookupRel(Rel, Rel2).
/*
---- lookupRel(Rel, Rel2) :-
----
Translate and store a single relation definition.
*/
:- dynamic
variable/2,
queryRel/2,
queryAttr/1.
lookupRel(Rel as Var, rel(Rel2, Var, Case)) :-
removeDistributedSuffix(Rel,DRel),
relation(DRel, _), !,
spelled(DRel, Rel2, Case),
not(defined(Var)),
assert(variable(Var, rel(Rel2, Var, Case))).
lookupRel(Rel, rel(Rel2, *, Case)) :-
removeDistributedSuffix(Rel,DRel),
relation(DRel, _), !,
spelled(DRel, Rel2, Case),
not(duplicateAttrs(Rel)),
assert(queryRel(DRel, rel(Rel2, *, Case))).
lookupRel(Term, Term) :-
write('Error in query: relation '), write(Term), write(' not known'),
nl, fail.
defined(Var) :-
variable(Var, _),
write('Error in query: doubly defined variable '), write(Var), write('.'), nl.
%fapra 2015/16
/*
Checks if all relations are distributed. Currently the
optimizer can only handle queries including relations, that
are all local or distributed. Situations with mixed
relationtypes will be discarded.
*/
%handle not distributed queries
checkDistributedQuery :-
not(isDistributedQuery),
isLocalQuery,
!.
checkDistributedQuery :-
isDistributedQuery,
not(isLocalQuery),
!.
checkDistributedQuery :-
write('Error in query: not all relations distributed '),
fail,
!.
%end fapra 2015/16
/*
---- duplicateAttrs(Rel) :-
----
There is a relation stored in ~queryRel~ that has attribute names also
occurring in ~Rel~.
*/
duplicateAttrs(Rel) :-
queryRel(Rel2, _),
relation(Rel2, Attrs2),
member(Attr, Attrs2),
relation(Rel, Attrs),
member(Attr, Attrs),
write('Error in query: duplicate attribute names in relations '),
write(Rel2), write(' and '), write(Rel), write('.'), nl.
/*
11.3.4 Modification of the Select-Clause
*/
lookupAttrs([], []).
lookupAttrs([A | As], [A2 | A2s]) :-
lookupAttr(A, A2),
lookupAttrs(As, A2s).
lookupAttrs(Attr, Attr2) :-
not(is_list(Attr)),
lookupAttr(Attr, Attr2).
lookupAttr(Var:Attr, attr(Var:Attr2, 0, Case)) :- !,
variable(Var, Rel2),
Rel2 = rel(Rel, _, _),
spelled(Rel:Attr, attr(Attr2, _, Case)).
lookupAttr(Attr asc, Attr2 asc) :- !,
lookupAttr(Attr, Attr2).
lookupAttr(Attr desc, Attr2 desc) :- !,
lookupAttr(Attr, Attr2).
lookupAttr(Attr, Attr2) :-
isAttribute(Attr, Rel), !,
spelled(Rel:Attr, Attr2).
lookupAttr(*, *) :- !.
lookupAttr(count(*), count(*)) :- !.
lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :-
lookupAttr(Expr, Expr2),
not(queryAttr(attr(Name, 0, u))),
!,
assert(queryAttr(attr(Name, 0, u))).
lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :-
lookupAttr(Expr, Expr2),
queryAttr(attr(Name, 0, u)),
!,
write('***** Error: attribute name '), write(Name),
write(' doubly defined in query.'),
nl.
lookupAttr(Term, Term2) :-
compound(Term),
functor(Term, Op, 1),
arg(1, Term, Arg1),
lookupAttr(Arg1, Res1),
functor(Term2, Op, 1),
arg(1, Term2, Res1).
lookupAttr(Name, attr(Name, 0, u)) :-
queryAttr(attr(Name, 0, u)),
!.
lookupAttr(Name, Name) :-
write('Error in attribute list: could not recognize '), write(Name), nl, fail.
isAttribute(Name, Rel) :-
queryRel(Rel, _),
relation(Rel, List),
member(Name, List).
/*
11.3.5 Modification of the Where-Clause
*/
lookupPreds([], []).
lookupPreds([P | Ps], [P2 | P2s]) :- !,
lookupPred(P, P2),
lookupPreds(Ps, P2s).
lookupPreds(Pred, Pred2) :-
not(is_list(Pred)),
lookupPred(Pred, Pred2).
lookupPred(Pred, pr(Pred2, Rel)) :-
lookupPred1(Pred, Pred2, 0, [], 1, [Rel]), !.
lookupPred(Pred, pr(Pred2, Rel1, Rel2)) :-
lookupPred1(Pred, Pred2, 0, [], 2, [Rel1, Rel2]), !.
lookupPred(Pred, _) :-
lookupPred1(Pred, _, 0, [], 0, []),
write('Error in query: constant predicate is not allowed.'), nl, fail, !.
lookupPred(Pred, _) :-
lookupPred1(Pred, _, 0, [], N, _),
N > 2,
write('Error in query: predicate involving more than two relations '),
write('is not allowed.'), nl, fail.
/*
---- lookupPred1(+Pred, Pred2, +N, +RelsBefore, -M, -RelsAfter) :-
----
~Pred2~ is the transformed version of ~Pred~; before this is called, ~N~
attributes in list ~RelsBefore~ have been found; after the transformation in
total ~M~ attributes referring to the relations in list ~RelsAfter~ have been
found.
*/
lookupPred1(Var:Attr, attr(Var:Attr2, N1, Case), N, RelsBefore, N1, RelsAfter)
:-
variable(Var, Rel2), !, Rel2 = rel(Rel, _, _),
spelled(Rel:Attr, attr(Attr2, _, Case)),
N1 is N + 1,
append(RelsBefore, [Rel2], RelsAfter).
lookupPred1(Attr, attr(Attr2, N1, Case), N, RelsBefore, N1, RelsAfter) :-
isAttribute(Attr, Rel), !,
spelled(Rel:Attr, attr(Attr2, _, Case)),
queryRel(Rel, Rel2),
N1 is N + 1,
append(RelsBefore, [Rel2], RelsAfter).
lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
compound(Term),
functor(Term, F, 1), !,
arg(1, Term, Arg1),
lookupPred1(Arg1, Arg1Out, N, RelsBefore, M, RelsAfter),
functor(Term2, F, 1),
arg(1, Term2, Arg1Out).
lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
compound(Term),
functor(Term, F, 2), !,
arg(1, Term, Arg1),
arg(2, Term, Arg2),
lookupPred1(Arg1, Arg1Out, N, RelsBefore, M1, RelsAfter1),
lookupPred1(Arg2, Arg2Out, M1, RelsAfter1, M, RelsAfter),
functor(Term2, F, 2),
arg(1, Term2, Arg1Out),
arg(2, Term2, Arg2Out).
lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
compound(Term),
functor(Term, F, 3), !,
arg(1, Term, Arg1),
arg(2, Term, Arg2),
arg(3, Term, Arg3),
lookupPred1(Arg1, Arg1Out, N, RelsBefore, M1, RelsAfter1),
lookupPred1(Arg2, Arg2Out, M1, RelsAfter1, M2, RelsAfter2),
lookupPred1(Arg3, Arg3Out, M2, RelsAfter2, M, RelsAfter),
functor(Term2, F, 3),
arg(1, Term2, Arg1Out),
arg(2, Term2, Arg2Out),
arg(3, Term2, Arg3Out).
% may need to be extended to operators with more than three arguments.
%fapra 2015/16
/*
Lookup generic, non- relation objects.
If ~Term~ is a secondo object, so mark it with the the functor
~obj(Term,Type,Case)~. Where ~Term~ is the identifier starting with
a lower case character and type the kind of object. ~Case~ indicates if the
object names first letter is written with a capital letter or not (u,l).
*/
lookupPred1(Term, ObjTerm, N, Rels, N, Rels) :-
atom(Term),
not(is_list(Term)),
spelledObj(Term,Obj,Type,Case),
ObjTerm = obj(Obj,Type,Case),
!.
lookupPred1(Term, Term, N, Rels, N, Rels) :-
atom(Term),
not(is_list(Term)),
write('Symbol '), write(Term),
write(' not recognized, supposed to be a Secondo object.'), nl, !.
lookupPred1(Term, Term, N, Rels, N, Rels).
%end fapra 2015/16
/*
11.3.6 Check the Spelling of Relation and Attribute Names
*/
spelled(Rel:Attr, attr(Attr2, 0, l)) :-
downcase_atom(Rel, DCRel),
downcase_atom(Attr, DCAttr),
spelling(DCRel:DCAttr, Attr3),
Attr3 = lc(Attr2),
!.
spelled(Rel:Attr, attr(Attr2, 0, u)) :-
downcase_atom(Rel, DCRel),
downcase_atom(Attr, DCAttr),
spelling(DCRel:DCAttr, Attr2),
!.
spelled(_:_, attr(_, 0, _)) :- !, fail. % no attr entry in spelling table
spelled(Rel, Rel2, l) :-
downcase_atom(Rel, DCRel),
spelling(DCRel, Rel3),
Rel3 = lc(Rel2),
!.
spelled(Rel, Rel2, u) :-
downcase_atom(Rel, DCRel),
spelling(DCRel, Rel2), !.
% if we do not get a spelling hint,
% assume it was spelled correctly
spelled(Rel, Rel, u) :-
atom_chars(Rel, [FirstChar|_]),
char_type(FirstChar, upper),
write('spelling of '),
write(Rel),
write(' could not be determined. Assume it is spelled uppercase'), !.
spelled(Rel, Rel, l) :-
atom_chars(Rel, [FirstChar|_]),
char_type(FirstChar, lower),
write('spelling of '),
write(Rel),
write(' could not be determined. Assume it is spelled uppercase'), !.
spelled(_, _, _) :- !, fail. % no rel entry in spelling table.
%fapra 2015/16
/*
11.3.7 Check the spelling of non-relation objects
*/
spelledObj(Term, Obj, Type, l) :-
downcase_atom(Term, DcObj),
objectCatalog(DcObj, LcObj, Type),
LcObj = lc(Obj),
!.
spelledObj(Term, Obj, Type, u) :-
downcase_atom(Term, DcObj),
objectCatalog(DcObj, Obj, Type),
!.
spelledObj(_, _, _, _) :- !, fail. % no entry, avoid backtracking.
%end fapra 2015/16
/*
10.3.8 Examples
We can now formulate several of the previous queries at the user level.
*/
example11 :- showTranslate(select [sname, bev] from staedte where bev > 500000).
showTranslate(Query) :-
callLookup(Query, Query2),
write(Query), nl,
write(Query2), nl.
example12 :- showTranslate(
select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000]
).
example13 :- showTranslate(
select *
from [staedte, plz as p1, plz as p2, plz as p3]
where [
sname = p1:ort,
p1:plz = p2:plz + 1,
p2:plz = p3:plz * 5,
bev > 300000,
bev < 500000,
p2:plz > 50000,
p2:plz < 60000,
kennzeichen starts "W",
p3:ort contains "burg",
p3:ort starts "M"]
).
/*
11.4 Translating a Query to a Plan
---- translate(Query, Stream, SelectClause, Cost) :-
----
~Query~ is translated into a ~Stream~ to which still the translation of the
~SelectClause~ needs to be applied. A ~Cost~ is returned which currently is
only the cost for evaluating the essential part, the conjunctive query.
*/
translate(Query orderby Attrs, sortby(Stream, AttrNames), Select, 0) :-
!,
translate(Query, Stream, Select, _),
attrnamesSort(Attrs, AttrNames).
translate(Query groupby Attrs,
groupby(sortby(Stream, AttrNamesSort), AttrNamesGroup, Fields),
select Select2, Cost) :-
translate(Query, Stream, SelectClause, Cost),
makeList(Attrs, Attrs2),
attrnames(Attrs2, AttrNamesGroup),
attrnamesSort(Attrs2, AttrNamesSort),
SelectClause = (select Select),
makeList(Select, SelAttrs),
translateFields(SelAttrs, Attrs2, Fields, Select2),
!.
translate(Select from Rels where Preds, Stream, Select, Cost) :-
pog(Rels, Preds, _, _),
bestPlan(Stream, Cost),
!.
%fapra 2015/16
translate(Select from Rel, feed(Rel), Select, 0) :-
not(isDistributedQuery),
not(is_list(Rel)),
!.
translate(Select from Rel, ObjName,Select, 0) :-
isDistributedQuery,
distributedRels(Rel, ObjName, _, _, _),
not(is_list(Rel)),
!.
translate(Select from Rel, dist(Rel,ObjName),Select, 0) :-
isDistributedQuery,
distributedRels(Rel, ObjName, _, _, _),
not(is_list(Rel)),
!.
translate(Select from [Rel], feed(Rel), Select, 0).
translate(Select from [Rel | Rels], product(feed(Rel), Stream), Select, 0) :-
not(isDistributedQuery),
translate(Select from Rels, Stream, Select, _).
%end fapra 2015/16
/*
---- translateFields(Select, GroupAttrs, Fields, Select2) :-
----
Translate the ~Select~ clause of a query containing ~groupby~. Grouping
was done by the attributes ~GroupAttrs~. Return a list ~Fields~ of terms
of the form ~field(Name, Expr)~; such a list can be used as an argument to the
groupby operator. Also, return a modified select clause ~Select2~,
which will translate to a corresponding projection operation.
*/
translateFields([], _, [], []).
translateFields([count(*) as NewAttr | Select], GroupAttrs,
[field(NewAttr , count(feed(group))) | Fields], [NewAttr | Select2]) :-
translateFields(Select, GroupAttrs, Fields, Select2),
!.
translateFields([sum(Attr) as NewAttr | Select], GroupAttrs,
[field(NewAttr, sum(feed(group), attrname(Attr))) | Fields],
[NewAttr| Select2]) :-
translateFields(Select, GroupAttrs, Fields, Select2),
!.
translateFields([Attr | Select], GroupAttrs, Fields, [Attr | Select2]) :-
member(Attr, GroupAttrs),
!,
translateFields(Select, GroupAttrs, Fields, Select2).
/*
Generic rule for aggregate functions, similar to sum.
*/
translateFields([Term as NewAttr | Select], GroupAttrs,
[field(NewAttr, Term2) | Fields],
[NewAttr| Select2]) :-
compound(Term),
functor(Term, AggrOp, 1),
arg(1, Term, Attr),
member(AggrOp, [min, max, avg]),
functor(Term2, AggrOp, 2),
arg(1, Term2, feed(group)),
arg(2, Term2, attrname(Attr)),
translateFields(Select, GroupAttrs, Fields, Select2),
!.
translateFields([Term | Select], GroupAttrs,
Fields,
Select2) :-
compound(Term),
functor(Term, AggrOp, 1),
arg(1, Term, Attr),
member(AggrOp, [count, sum, min, max, avg]),
functor(Term2, AggrOp, 2),
arg(1, Term2, feed(group)),
arg(2, Term2, attrname(Attr)),
translateFields(Select, GroupAttrs, Fields, Select2),
write('*****'), nl,
write('***** Error in groupby: missing name for new attribute'), nl,
write('*****'), nl,
!.
translateFields([Attr | Select], GroupAttrs, Fields, Select2) :-
not(member(Attr, GroupAttrs)),
!,
translateFields(Select, GroupAttrs, Fields, Select2),
write('*****'), nl,
write('***** Error in groupby: '),
write(Attr),
write(' is neither a grouping attribute'), nl,
write(' nor an aggregate expression.'), nl,
write('*****'), nl.
%fapra 15/16
% Extract parts from a query
destructureQuery(Select from Rel where Pred, Select, Rel, Pred).
% Pred is a predicate about the value of an attribute being equal to given value
attrValueEqualityPredicate(Pred, Value, Attr, Rel) :-
Pred = pr(Value = Attr, Rel),
Attr = attr(_, _, _).
attrValueEqualityPredicate(Pred, Value, Attr, Rel) :-
Pred = pr(Attr = Value, Rel),
Attr = attr(_, _, _).
/*
---- substituteSubterm(Substituted, Substitute, OriginalTerm, TermWithSubstitution)
----
Substituting ~Substituted~ for ~Substitute~ on ~OriginalTerm~ yields ~TermWithSubstitution~.
We have a cut in every clause to remove unnecessary choice points
during the search for planedges, which ois driven by meta predicates.
*/
% The whole term is to be substituted:
substituteSubterm(Substituted, Substitute, Substituted, Substitute):- !.
% The whole term doesn't match and it's not compound:
substituteSubterm(Substituted, _, OriginalTerm, OriginalTerm) :-
functor(OriginalTerm, _, 0),
OriginalTerm \= Substituted, !.
% The whole term doesn't match and it's compount - dive into its subterms:
substituteSubterm(Substituted, Substitute, OriginalTerm,
TermWithSubstitution) :-
functor(OriginalTerm, Functor, Arity),
functor(TermWithSubstitution, Functor, Arity),
substituteSubtermInNthSubterm(Arity, Substituted,
Substitute, OriginalTerm, TermWithSubstitution), !.
% Terminal case. All subterms have been processed.
substituteSubtermInNthSubterm(0, _, _, _, _):- !.
% Generic case. Process nth subterm.
substituteSubtermInNthSubterm(N, Substituted, Substitute,
OriginalTerm, TermWithSubstitution) :-
not(N = 0),
arg(N, OriginalTerm, OriginalNthTerm),
substituteSubterm(Substituted, Substitute,
OriginalNthTerm, NthTermWithSubstitution),
arg(N, TermWithSubstitution, NthTermWithSubstitution),
Next is N - 1,
substituteSubtermInNthSubterm(Next, Substituted,
Substitute, OriginalTerm, TermWithSubstitution), !.
/*
---- queryToPlan(Query, Plan, Cost) :-
----
Translate the ~Query~ into a ~Plan~. The ~Cost~ for evaluating the conjunctive
query is also returned. The ~Query~ must be such that relation and attribute
names have been looked up already.
fapra 15/16:
We have a duplicate of each non-distributed clause which treats the distributed case. These
clauses are guard with an isDistributedQuery goal.
end fapra 15/16
*/
queryToPlan(Query, consume(dsummarize(Stream)), Cost) :-
selectClause(Query, *),
isDistributedQuery,
!,
translate(Query, Stream, select *, Cost).
queryToPlan(Query, consume(Stream), Cost) :-
selectClause(Query, *),
!,
translate(Query, Stream, select *, Cost).
queryToPlan(Query, count(dsummarize(Stream)), Cost) :-
selectClause(Query, count(*)),
isDistributedQuery,
!,
translate(Query, Stream, select count(*), Cost).
queryToPlan(Query, count(Stream), Cost) :-
selectClause(Query, count(*)),
!,
translate(Query, Stream, select count(*), Cost).
%TF: changed to execute projection in dmap operator
queryToPlan(Query, consume(dsummarize(dmap(Stream," ",
project(Plan,AttrNames)))), Cost) :-
isDistributedQuery,
!,
translate(Query, dist(rel(_,Var,_),Stream), select Attrs, Cost), !,
feedRenameRelation(rel(dot,Var,_),Plan),
makeList(Attrs, Attrs2),
attrnames(Attrs2, AttrNames).
queryToPlan(Query, consume(project(Stream, AttrNames)), Cost) :-
translate(Query, Stream, select Attrs, Cost), !,
makeList(Attrs, Attrs2),
attrnames(Attrs2, AttrNames).
%end fapra 15/16
/*
---- queryToStream(Query, Plan, Cost) :-
----
Same as ~queryToPlan~, but returns a stream plan, if possible. To be used for
``mixed queries'' that add Secondo operators to the plan built by the optimizer.
*/
queryToStream(Query, Stream, Cost) :-
selectClause(Query, *),
translate(Query, Stream, select *, Cost), !.
queryToStream(Query, count(Stream), Cost) :-
selectClause(Query, count(*)),
translate(Query, Stream, select count(*), Cost), !.
queryToStream(Query, project(Stream, AttrNames), Cost) :-
translate(Query, Stream, select Attrs, Cost), !,
makeList(Attrs, Attrs2),
attrnames(Attrs2, AttrNames).
/*
---- selectClause(Query, C) :-
----
The select-clause of the ~Query~ is ~C~.
*/
% allows select [count(*)] to succeed. Activate later on in development.
%selectClause(select [X] from Y, Z) :-
% selectClause(select X from Y, Z).
selectClause(select * from _, *) :- !.
selectClause(select count(*) from _, count(*)) :- !.
selectClause(select Attrs from _, Attrs) :- !.
selectClause(Query groupby _, C) :- !,
selectClause(Query, C).
selectClause(Query orderby _, C) :- !,
selectClause(Query, C).
/*
---- attrnames(Attrs, AttrNames) :-
----
Transform each attribute X into attrname(X).
*/
attrnames([], []).
attrnames([Attr | Attrs], [attrname(Attr) | AttrNames]) :-
attrnames(Attrs, AttrNames).
/*
---- attrnamesSort(Attrs, AttrNames) :-
----
Transform attribute names of orderby clause.
*/
attrnamesSort([], []).
attrnamesSort([Attr | Attrs], [Attr2 | Attrs2]) :-
attrnameSort(Attr, Attr2),
attrnamesSort(Attrs, Attrs2).
attrnameSort(Attr asc, attrname(Attr) asc) :- !.
attrnameSort(Attr desc, attrname(Attr) desc) :- !.
attrnameSort(Attr, attrname(Attr) asc).
/*
11.3.8 Integration with Optimizer
---- optimize(Query).
----
Optimize ~Query~ and print the best ~Plan~.
*/
optimize(Query) :-
callLookup(Query, Query2),
queryToPlan(Query2, Plan, Cost),
writeln(Plan),
plan_to_atom_string(Plan, SecondoQuery),
write('The plan is: '), nl, nl,
write(SecondoQuery), nl, nl,
write('Estimated Cost: '), write(Cost), nl, nl.
optimize(Query, QueryOut, CostOut) :-
callLookup(Query, Query2),
queryToPlan(Query2, Plan, CostOut),
plan_to_atom_string(Plan, QueryOut).
/*
---- sqlToPlan(QueryText, Plan)
----
Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom.
*/
sqlToPlan(QueryText, Plan) :-
term_to_atom(sql Query, QueryText),
optimize(Query, Plan, _).
/*
---- sqlToPlan(QueryText, Plan)
----
Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom.
~QueryText~ starts not with sql in this version.
*/
sqlToPlan(QueryText, Plan) :-
term_to_atom(Query, QueryText),
optimize(Query, Plan, _).
/*
11.3.8 Examples
We can now formulate the previous example queries in the user level language.
Example3:
*/
example14 :- optimize(
select * from [staedte as s, plz as p]
where [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0]
).
example14(Query, Cost) :- optimize(
select * from [staedte as s, plz as p]
where [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0],
Query, Cost
).
/*
Example4:
*/
example15 :- optimize(
select * from staedte where bev > 500000
).
example15(Query, Cost) :- optimize(
select * from staedte where bev > 500000,
Query, Cost
).
/*
Example5:
*/
example16 :- optimize(
select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000]
).
example16(Query, Cost) :- optimize(
select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000],
Query, Cost
).
/*
Example6. This may need a larger local stack size. Start Prolog as
---- pl -L4M
----
which initializes the local stack to 4 MB.
*/
example17 :- optimize(
select *
from [staedte, plz as p1, plz as p2, plz as p3]
where [
sname = p1:ort,
p1:plz = p2:plz + 1,
p2:plz = p3:plz * 5,
bev > 300000,
bev < 500000,
p2:plz > 50000,
p2:plz < 60000,
kennzeichen starts "W",
p3:ort contains "burg",
p3:ort starts "M"]
).
example17(Query, Cost) :- optimize(
select *
from [staedte, plz as p1, plz as p2, plz as p3]
where [
sname = p1:ort,
p1:plz = p2:plz + 1,
p2:plz = p3:plz * 5,
bev > 300000,
bev < 500000,
p2:plz > 50000,
p2:plz < 60000,
kennzeichen starts "W",
p3:ort contains "burg",
p3:ort starts "M"],
Query, Cost
).
/*
Example 18:
*/
example18 :- optimize(
select *
from [staedte, plz as p1]
where [
sname = p1:ort,
bev > 300000,
bev < 500000,
p1:plz > 50000,
p1:plz < 60000,
kennzeichen starts "W",
p1:ort contains "burg",
p1:ort starts "M"]
).
example18(Query, Cost) :- optimize(
select *
from [staedte, plz as p1]
where [
sname = p1:ort,
bev > 300000,
bev < 500000,
p1:plz > 50000,
p1:plz < 60000,
kennzeichen starts "W",
p1:ort contains "burg",
p1:ort starts "M"],
Query, Cost
).
/*
Example 19:
*/
example19 :- optimize(
select *
from [staedte, plz as p1, plz as p2]
where [
sname = p1:ort,
p1:plz = p2:plz + 1,
bev > 300000,
bev < 500000,
p1:plz > 50000,
p1:plz < 60000,
kennzeichen starts "W",
p1:ort contains "burg",
p1:ort starts "M"]
).
example19(Query, Cost) :- optimize(
select *
from [staedte, plz as p1, plz as p2]
where [
sname = p1:ort,
p1:plz = p2:plz + 1,
bev > 300000,
bev < 500000,
p1:plz > 50000,
p1:plz < 60000,
kennzeichen starts "W",
p1:ort contains "burg",
p1:ort starts "M"],
Query, Cost
).
/*
Example 20:
*/
example20 :- optimize(
select *
from [staedte as s, plz as p]
where [
p:ort = s:sname,
p:plz > 40000,
s:bev > 300000]
).
example20(Query, Cost) :- optimize(
select *
from [staedte as s, plz as p]
where [
p:ort = s:sname,
p:plz > 40000,
s:bev > 300000],
Query, Cost
).
/*
Example 21:
*/
example21 :- optimize(
select *
from [staedte, plz as p1, plz as p2, plz as p3]
where [
sname = p1:ort,
p1:plz = p2:plz + 1,
p2:plz = p3:plz * 5]
).
example21(Query, Cost) :- optimize(
select *
from [staedte, plz as p1, plz as p2, plz as p3]
where [
sname = p1:ort,
p1:plz = p2:plz + 1,
p2:plz = p3:plz * 5],
Query, Cost
).
/*
12 Optimizing and Calling Secondo
---- sql Term
sql(Term, SecondoQueryRest)
let(X, Term)
let(X, Term, SecondoQueryRest)
----
~Term~ must be one of the available select-from-where statements.
It is optimized and Secondo is called to execute it. ~SecondoQueryRest~
is a character string (atom) containing a sequence of Secondo
operators that can be appended to a given
plan found by the optimizer; in this case the optimizer returns a
plan producing a stream.
The two versions of ~let~ allow one to assign the result of a query
to a new object ~X~, using the optimizer.
*/
sql Term :-
mOptimize(Term, Query, Cost),
nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
write('Estimated Cost: '), write(Cost), nl, nl,
query(Query).
sql(Term, SecondoQueryRest) :-
mStreamOptimize(Term, SecondoQuery, Cost),
concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query),
nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
write('Estimated Cost: '), write(Cost), nl, nl,
query(Query).
let(X, Term) :-
mOptimize(Term, Query, Cost),
nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
write('Estimated Cost: '), write(Cost), nl, nl,
concat_atom(['let ', X, ' = ', Query], '', Command),
secondo(Command).
let(X, Term, SecondoQueryRest) :-
mStreamOptimize(Term, SecondoQuery, Cost),
concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query),
nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
write('Estimated Cost: '), write(Cost), nl, nl,
concat_atom(['let ', X, ' = ', Query], '', Command),
secondo(Command).
/*
---- streamOptimize(Term, Query, Cost) :-
----
Optimize the ~Term~ producing an incomplete Secondo query plan ~Query~
returning a stream.
*/
streamOptimize(Term, Query, Cost) :-
callLookup(Term, Term2),
queryToStream(Term2, Plan, Cost),
plan_to_atom_string(Plan, Query).
/*
---- mOptimize(Term, Query, Cost) :-
mStreamOptimize(union [Term], Query, Cost) :-
----
Means ``multi-optimize''. Optimize a ~Term~ possibly consisting of several
subexpressions to be independently optimized, as in union and intersection
queries. ~mStreamOptimize~ is a variant returning a stream.
*/
:-op(800, fx, union).
:-op(800, fx, intersection).
mOptimize(union Terms, Query, Cost) :-
mStreamOptimize(union Terms, Plan, Cost),
concat_atom([Plan, 'consume'], '', Query).
mOptimize(intersection Terms, Query, Cost) :-
mStreamOptimize(intersection Terms, Plan, Cost),
concat_atom([Plan, 'consume'], '', Query).
mOptimize(Term, Query, Cost) :-
optimize(Term, Query, Cost).
mStreamOptimize(union [Term], Query, Cost) :-
streamOptimize(Term, QueryPart, Cost),
concat_atom([QueryPart, 'sort rdup '], '', Query).
mStreamOptimize(union [Term | Terms], Query, Cost) :-
streamOptimize(Term, Plan1, Cost1),
mStreamOptimize(union Terms, Plan2, Cost2),
concat_atom([Plan1, 'sort rdup ', Plan2, 'mergeunion '], '', Query),
Cost is Cost1 + Cost2.
mStreamOptimize(intersection [Term], Query, Cost) :-
streamOptimize(Term, QueryPart, Cost),
concat_atom([QueryPart, 'sort rdup '], '', Query).
mStreamOptimize(intersection [Term | Terms], Query, Cost) :-
streamOptimize(Term, Plan1, Cost1),
mStreamOptimize(intersection Terms, Plan2, Cost2),
concat_atom([Plan1, 'sort rdup ', Plan2, 'mergesec '], '', Query),
Cost is Cost1 + Cost2.
mStreamOptimize(Term, Query, Cost) :-
streamOptimize(Term, Query, Cost).
/*
Some auxiliary stuff.
*/
bestPlanCount :-
bestPlan(P, _),
plan_to_atom_string(P, S),
atom_concat(S, ' count', Q),
nl, write(Q), nl,
query(Q).
bestPlanConsume :-
bestPlan(P, _),
plan_to_atom_string(P, S),
atom_concat(S, ' consume', Q),
nl, write(Q), nl,
query(Q).
%fapra 15/16
/*
Rename an attribute to match the renaming of its relation.
*/
% No renaming needed.
renamedRelAttr(RelAttr, Var, RelAttr) :-
Var = *, !.
renamedRelAttr(attr(Name, N, C), Var, attr(Var:Name, N, C)).
% Extract the down case name from an attr term.
attrnameDCAtom(Attr, DCAttrName) :-
Attr = attr(_:Name, _, _),
!,
atom_string(AName, Name),
downcase_atom(AName, DCAttrName).
attrnameDCAtom(Attr, DCAttrName) :-
Attr = attr(Name, _, _),
atom_string(AName, Name),
downcase_atom(AName, DCAttrName).
/*
Rame a tuple a stream.
*/
% No renaming needed.
renameStream(Stream, Var, Plan) :-
Var = *,
!,
Plan = Stream.
renameStream(Stream, Var, rename(Stream, Var)).
/*
Transform a relation to a tuple stream and rename it.
*/
% No renaming needed.
feedRenameRelation(Rel, Var, Plan) :-
Var = *,
!,
Plan = feed(Rel).
feedRenameRelation(Rel, Var, Plan) :-
Plan = rename(feed(Rel), Var).
feedRenameRelation(rel(Rel, Var,_), Plan) :-
feedRenameRelation(Rel, Var, Plan),!.
%end fapra 15/16