secondo/OptimizerBasic/Distributed/optimizerNewProperties.pl

/*
//paragraph [10] title:    [{\Large \bf ]    [}]
//characters [1] formula:     [$]     [$]
//[ae] [\"{a}]
//[oe] [\"{o}]
//[ue] [\"{u}]
//[ss] [{\ss}]
//[Ae] [\"{A}]
//[Oe] [\"{O}]
//[Ue] [\"{U}]
//[**] [$**$]
//[toc] [\tableofcontents]
//[=>] [\verb+=>+]
//[:Section Translation] [\label{sec:translation}]
//[Section Translation] [Section~\ref{sec:translation}]
//[:Section 4.1.1] [\label{sec:4.1.1}]
//[Section 4.1.1] [Section~\ref{sec:4.1.1}]
//[Figure pog1] [Figure~\ref{fig:pog1.eps}]
//[Figure pog2] [Figure~\ref{fig:pog2.eps}]
//[newpage] [\newpage]

[10] A Query Optimizer for Secondo

Ralf Hartmut G[ue]ting, November - December 2002

[toc]

[newpage]

1 Introduction

1.1 Overview

This document not only describes, but ~is~ an optimizer for Secondo database
systems.  It contains the current source code for the optimizer, written in
PROLOG. It can be compiled by a PROLOG system (SWI-Prolog 5.0 or higher)
directly.

The current version of the optimizer is capable of handling conjunctive queries,
formulated in a relational environment. That is, it takes a set of
relations together with a set of selection or join predicates over these
relations and produces a query plan that can be executed by (the current
relational system implemented in) Secondo.

The selection of the query plan is based on cost estimates which in turn are
based on given selectivities of predicates. Selectivities of predicates are
maintained in a table (a set of PROLOG facts). If the selectivity of a predicate
is not available from that table, then an interaction with the Secondo system
should take place to determine the selectivity. There are various strategies
conceivable for doing this which will be described elsewhere. However, the
current version of the optimizer just emits a message that the selectivity is
missing and quits.

The optimizer also implements a simple SQL-like language for entering queries.
The notation is pretty much like SQL except that the lists occurring (lists of
attributes, relations, predicates) are written in PROLOG notation. Also note
that the where-clause is a list of predicates rather than an arbitrary boolean
expression and hence allows one to formulate conjunctive queries only.


1.2 Optimization Algorithm

The optimizer employs an as far as we know novel optimization algorithm which is
based on ~shortest path search in a predicated order graph~. This technique is
remarkably simple to implement, yet efficient.

A predicate order graph (POG) is the graph whose nodes represent sets of
evaluated predicates and whose edges represent predicates, containing all
possible orders of predicates. Such a graph for three predicates ~p~, ~q~, and
~r~ is shown in [Figure pog1].

        Figure 1: A predicate order graph for three predicates ~p~, ~q~
and ~r~  [pog1.eps]

Here the bottom node has no predicate evaluated and the top node has all
predicates evaluated. The example illustrates, more precisely, possible
sequences of selections on an argument relation of size 1000. If selectivities
of predicates are given (for ~p~ its is 1/2, for ~q~ 1/10, and for ~r~ 1/5),
then we can annotate the POG with sizes of intermediate results as shown,
assuming that all predicates are independent (not ~correlated~). This means that
the selectivity of a predicate is the same regardless of the order of
evaluation, which of course does not need to be true.

If we can further compute for each edge of the POG possible evaluation
methods, adding a new ``executable'' edge for each method, and mark the
edge with estimated costs for this method, then finding a shortest path through
the POG corresponds to finding the cheapest query plan. [Figure pog2] shows an
example of a POG annotated with evaluation methods.

        Figure 2: A POG annotated with evaluation methods [pog2.eps]

In this example, there is only a single method associated with each edge. In
general, however, there will be several methods. The example represents the
query:

----    select *
   from Staedte, Laender, Regiert
   where Land = LName and PName = 'CDU' and LName = PLand
----

for relation schemas

----    Staedte(SName, Bev, Land)
   Laender(LName, LBev)
   Regiert(PName, PLand)
----

Hence the optimization algorithm described and implemented in the following
sections proceeds in the following steps:

  1 For given relations and predicates, construct the predicate order graph and
store it as a set of facts in memory (Sections 2 through 4).

  2 For each edge, construct corresponding executable edges (called ~plan edges~
below). This is controlled by optimization rules describing how selections or
joins can be translated (Sections 5 and 6).

  3 Based on sizes of arguments and selectivities (stored in the file
~database.pl~) compute the sizes of all intermediate results. Also annotate
edges of the POG with selectivities (Section 7).

  4 For each plan edge, compute its cost and store it in memory (as a set of
facts). This is based on sizes of arguments and the selectivity associated with
the edge and on a cost function (predicate) written for each operator that may
occur in a query plan (Section 8).

  5 The algorithm for finding shortest paths by Dijkstra is employed to find a
shortest path through the graph of plan edges annotated with costs (called ~cost
edges~). This path is transformed into a Secondo query plan and returned
(Section 9).

  6 Finally, a simple subset of SQL in a PROLOG notation is implemented. So it
is possible to enter queries in this language. The optimizer determines from it
the lists of relations and predicates in the form needed for constructing the
POG, and then invokes step 1 (Section 11).


2 Data Structures

In the construction of the predicate order graph, the following data structures
are used.

----    pr(P, A)
   pr(P, B, C)
----

A selection or join predicate, e.g. pr(p, a), pr(q, b, c). Means a
selection predicate p on relation a, and a join predicate q on relations
b and c.

----    arp(Arg, Rels, Preds)
----

An argument, relations, predicate triple. It describes a set of relations
~Rels~ on which the predicates ~Preds~ have been evaluated. To access the
result of this evaluation one needs to refer to ~Arg~.

Arg is either arg(N) or res(N), N an integer. Examples: arg(5), res(1)

Rels is a list of relation names, e.g. [a, b, c]

Preds is a list of predicate names, e.g. [p, q, r]


----    node(No, Preds, Partition)
----

A node.

~No~ is the number of the node into which the evaluated predicates
are encoded (each bit corresponds to a predicate number, e.g. node number
5 = 101 (binary) says that the first predicate (no 1) and the third
predicate (no 4) have been evaluated in this node. For predicate i,
its predicate number is "2^{i-1}"[1].

~Preds~ is the list of names of evaluated predicates, e.g. [p, q].

~Partition~ is a list of arp elements, see above.


----    edge(Source, Target, Term, Result, Node, PredNo)
----

An edge, representing a predicate.

~Source~ and ~Target~ are the numbers of source and target nodes in the
predicate order graph, e.g. 0 and 1.

~Term~ is either a selection or a join, for example,
select(arg(0), pr(p, a) or join(res(4), res(1), pr(q, a, b))

~Result~ is the number of the node into which the result of this predicate
application should be written. Normally it is the same as Target,
but for an edge leading to a node combining several independent results,
it the number of the ``real'' node to obtain this result. An example of this can
be found in [Figure pog2] where the join edge leading from node 3 to node 7 does
not use the result of node 3 (there is none) but rather the two independent
results from nodes 1 and 2 (this pair is conceptually the result available in
node 3).

~Node~ is the source node for this edge, in the form node(...) as
described above.

~PredNo~ is the predicate number for the predicate represented by this
edge. Predicate numbers are of the form "2^i" as explained
for nodes.

3 Construction of the Predicate Order Graph

3.1 pog

----    pog(Rels, Preds, Nodes, Edges) :-
----

For a given list of relations ~Rels~ and predicates ~Preds~, ~Nodes~ and
~Edges~ are the predicate order graph where edges are annotated with selection
and join operations applied to the correct arguments.

Example call:

----    pog([staedte, laender], [pr(p, staedte), pr(q, laender), pr(r, staedte,
    laender)], N, E).
----

*/

pog(Rels, Preds, Nodes, Edges) :-
  length(Rels, N), reverse(Rels, Rels2), deleteArguments,
  partition(Rels2, N, Partition0),
  length(Preds, M), reverse(Preds, Preds2),
  pog2(Partition0, M, Preds2, Nodes, Edges),
  deleteNodes, storeNodes(Nodes),
  deleteEdges, storeEdges(Edges),
  % RHG 2014 Create plan and cost edges during shortest path search.
  % deletePlanEdges,
  deleteVariables,
  % createPlanEdges,
  HighNode is 2**M -1,
  retract(highNode(_)), assert(highNode(HighNode)),
  deleteSizes.
  % deleteCostEdges.
  % end RHG 2014
/*

3.2 partition

----    partition(Rels, N, Partition0) :-
----

Given a list of ~N~ relations ~Rel~, return an initial partition such that
each relation r is packed into the form arp(arg(i), [r], []).

*/

partition([], _, []).

partition([Rel | Rels], N, [Arp | Arps]) :-
  N1 is N-1,
  Arp = arp(arg(N), [Rel], []),
  assert(argument(N, Rel)),
  partition(Rels, N1, Arps).


/*

3.3 pog2

----    pog2(Partition0, NoOfPreds, Preds, Nodes, Edges) :-
----

For the given start partition ~Partition0~, a list of predicates ~Preds~
containing ~NoOfPred~ predicates, return the ~Nodes~ and ~Edges~ of the
predicate order graph.

*/

pog2(Part0, _, [], [node(0, [], Part0)], []).

pog2(Part0, NoOfPreds, [Pred | Preds], Nodes, Edges) :-
  N1 is NoOfPreds-1,
  PredNo is 2**N1,
  pog2(Part0, N1, Preds, NodesOld, EdgesOld),
  newNodes(Pred, PredNo, NodesOld, NodesNew),
  newEdges(Pred, PredNo, NodesOld, EdgesNew),
  copyEdges(Pred, PredNo, EdgesOld, EdgesCopy),
  append(NodesOld, NodesNew, Nodes),
  append(EdgesOld, EdgesNew, Edges2),
  append(Edges2, EdgesCopy, Edges).

/*
3.4 newNodes

----    newNodes(Pred, PredNo, NodesOld, NodesNew) :-
----

Given a predicate ~Pred~ with number ~PredNo~ and a list of nodes ~NodesOld~
resulting from evaluating all predicates with lower numbers, construct
a list of nodes which result from applying to each of the existing nodes
the predicate ~Pred~.

*/

newNodes(_, _, [], []).

newNodes(Pred, PNo, [Node | Nodes], [NodeNew | NodesNew]) :-
  newNode(Pred, PNo, Node, NodeNew),
  newNodes(Pred, PNo, Nodes, NodesNew).

newNode(Pred, PNo, node(No, Preds, Part), node(No2, [Pred | Preds], Part2)) :-
  No2 is No + PNo,
  copyPart(Pred, PNo, Part, Part2).

/*
3.5 copyPart

----    copyPart(Pred, PNo, Part, Part2) :-
----

copy the partition ~Part~ of a node so that the new partition ~Part2~
after applying the predicate ~Pred~ with number ~PNo~ results.

This means that for a selection predicate we have to find the arp
containing its relation and modify it accordingly, the other arps
in the partition are copied unchanged.

For a join predicate we have to find the two arps containing its
two relations and to merge them into a single arp; the remaining
arps are copied unchanged.

Or a join predicate may find its two relations in the same arp which means
another join on the same two relations has already been performed.

*/

copyPart(_, _, [], []).

copyPart(pr(P, Rel), PNo, Arps, [Arp2 | Arps2]) :-
  select(X, Arps, Arps2),
  X = arp(Arg, Rels, Preds),
  member(Rel, Rels), !,
  nodeNo(Arg, No),
  ResNo is No + PNo,
  Arp2 = arp(res(ResNo), Rels, [P | Preds]).

copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :-
  select(X, Arps, Arps2),
  X = arp(Arg, Rels, Preds),
  member(R1, Rels),
  member(R2, Rels), !,
  nodeNo(Arg, No),
  ResNo is No + PNo,
  Arp2 = arp(res(ResNo), Rels, [P | Preds]).

copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :-
  select(X, Arps, Rest),
  X = arp(ArgX, RelsX, PredsX),
  member(R1, RelsX),
  select(Y, Rest, Arps2),
  Y = arp(ArgY, RelsY, PredsY),
  member(R2, RelsY), !,
  nodeNo(ArgX, NoX),
  nodeNo(ArgY, NoY),
  ResNo is NoX + NoY + PNo,
  append(RelsX, RelsY, Rels),
  append(PredsX, PredsY, Preds),
  Arp2 = arp(res(ResNo), Rels, [P | Preds]).

nodeNo(arg(_), 0).
nodeNo(res(N), N).

/*
3.6 newEdges

----    newEdges(Pred, PredNo, NodesOld, EdgesNew) :-
----

for each of the nodes in ~NodesOld~ return a new edge in ~EdgesNew~
built by applying the predicate ~Pred~ with number ~PNo~.

*/

newEdges(_, _, [], []).

newEdges(Pred, PNo, [Node | Nodes], [Edge | Edges]) :-
  newEdge(Pred, PNo, Node, Edge),
  newEdges(Pred, PNo, Nodes, Edges).

newEdge(pr(P, Rel), PNo, Node, Edge) :-
  findRel(Rel, Node, Source, Arg),
  Target is Source + PNo,
  nodeNo(Arg, ArgNo),
  Result is ArgNo + PNo,
  Edge = edge(Source, Target, select(Arg, pr(P, Rel)), Result, Node, PNo).

newEdge(pr(P, R1, R2), PNo, Node, Edge) :-
  findRels(R1, R2, Node, Source, Arg),
  Target is Source + PNo,
  nodeNo(Arg, ArgNo),
  Result is ArgNo + PNo,
  Edge = edge(Source, Target, select(Arg, pr(P, R1, R2)), Result, Node, PNo).

newEdge(pr(P, R1, R2), PNo, Node, Edge) :-
  findRels(R1, R2, Node, Source, Arg1, Arg2),
  Target is Source + PNo,
  nodeNo(Arg1, Arg1No),
  nodeNo(Arg2, Arg2No),
  Result is Arg1No + Arg2No + PNo,
  Edge = edge(Source, Target, join(Arg1, Arg2, pr(P, R1, R2)), Result,
    Node, PNo).


/*
3.7 findRel

----    findRel(Rel, Node, Source, Arg):-
----

find the relation ~Rel~ within a node description ~Node~ and return the
node number ~No~ and the description ~Arg~ of the argument (e.g. res(3)) found
within the arp containing Rel.

----    findRels(Rel1, Rel2, Node, Source, Arg1, Arg2):-
----

similar for two relations.

*/

findRel(Rel, node(No, _, Arps), No, ArgX) :-
  select(X, Arps, _),
  X = arp(ArgX, RelsX, _),
  member(Rel, RelsX).


findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX) :-
  select(X, Arps, _),
  X = arp(ArgX, RelsX, _),
  member(Rel1, RelsX),
  member(Rel2, RelsX).

findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX, ArgY) :-
  select(X, Arps, Rest),
  X = arp(ArgX, RelsX, _),
  member(Rel1, RelsX), !,
  select(Y, Rest, _),
  Y = arp(ArgY, RelsY, _),
  member(Rel2, RelsY).


/*
3.8 copyEdges

----    copyEdges(Pred, PredNo, EdgesOld, EdgesCopy):-
----

Given a set of edges ~EdgesOld~ and a predicate ~Pred~ with number ~PredNo~,
return a copy of each edge in ~EdgesOld~ in ~EdgesNew~ such that the
copied version reflects a previous application of predicate ~Pred~.

This is implemented by retrieving from each old edge its start node,
constructing for this start node and predicate ~Pred~ a target node to
which then the predicate associated with the old edge is applied.

*/

copyEdges(_, _, [], []).

copyEdges(Pred, PNo, [Edge | Edges], [Edge2 | Edges2]) :-
  Edge = edge(_, _, Term, _, Node, PNo2),
  pred(Term, Pred2),
  newNode(Pred, PNo, Node, NodeNew),
  newEdge(Pred2, PNo2, NodeNew, Edge2),
  copyEdges(Pred, PNo, Edges, Edges2).

pred(select(_, P), P).
pred(join(_, _, P), P).

/*
3.9 writeEdgeList

----    writeEdgeList(List):-
----

Write the list of edges ~List~.

*/

writeEdgeList([edge(Source, Target, Term, _, _, _) | Edges]) :-
  write(Source), write('-'), write(Target), write(':'), write(Term), nl,
  writeEdgeList(Edges).

/*
4 Managing the Graph in Memory

4.1 Storing and Deleting Nodes and Edges

----    storeNodes(NodeList).
    storeEdges(EdgeList).
    deleteNodes.
    deleteEdges.
----

Just as the names say. Store a list of nodes or edges, repectively, as facts;
and delete them from memory again.

*/

storeNodes([Node | Nodes]) :- assert(Node), storeNodes(Nodes).
storeNodes([]).

storeEdges([Edge | Edges]) :- assert(Edge), storeEdges(Edges).
storeEdges([]).

deleteNode :- retract(node(_, _, _)), fail.
deleteNodes :- not(deleteNode).

deleteEdge :- retract(edge(_, _, _, _, _, _)), fail.
deleteEdges :- not(deleteEdge).

deleteArgument :- retract(argument(_, _)), fail.
deleteArguments :- not(deleteArgument).


/*
4.2 Writing Nodes and Edges

----    writeNodes.
   writeEdges.
----

Write the currently stored nodes and edges, respectively.

*/
writeNode :-
  node(No, Preds, Partition),
  write('Node: '), write(No), nl,
  write('Preds: '), write(Preds), nl,
  write('Partition: '), write(Partition), nl, nl,
  fail.
writeNodes :- not(writeNode).

writeEdge :-
  edge(Source, Target, Term, Result, _, _),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Term: '), write(Term), nl,
  write('Result: '), write(Result), nl, nl,
  fail.

writeEdges :- not(writeEdge).

/*
5 Rule-Based Translation of Selections and Joins
[:Section Translation]

5.1 Precise Notation for Input

Since now we have to look into the structure of predicates, and need to be
able to generate Secondo executable expressions in their precise format, we
need to define the input notation precisely.

5.1.1 The Source Language
[:Section 4.1.1]

We assume the queries can be entered basically as select-from-where
structures, as follows. Let schemas be given as:

----   plz(PLZ:string, Ort:string)
       Staedte(SName:string, Bev:int, PLZ:int, Vorwahl:string, Kennzeichen:string)
----

Then we should be able to enter queries:

----    select SName, Bev
       from Staedte
   where Bev > 500000
----

In the next example we need to avoid the name conflict for PLZ

----   select *
       from Staedte as s, plz as p
       where s.SName = p.Ort and p.PLZ > 40000
----

In the PROLOG version, we will use the following notations:

----   rel(Name, Var, Case)
----

For example

----    rel(staedte, *, u)
----

is a term denoting the ~Staedte~ relation; ~u~ says that it is actually to be
written in upper case whereas

----    rel(plz, *, l)
----

denotes the ~plz~ relation to be written in lower case. The second argument
~Var~ contains an explicit variable if it has been assigned, otherwise the
symbol [*]. If an explicit variable has been used in the query, we need to
perfom renaming in the plan. For example, in the second query above, the
relations would be denoted as

----    rel(staedte, s, u)
   rel(plz, p, l)
----

Within predicates, attributes are annotated as follows:

----    attr(Name, Arg, Case)

   attr(ort, 2, u)
----

This says that  ~ort~ is an attribute of the second argument within a join
condition, to be written in upper case. For a selection condition, the second
argument is ignored; it can be set to 0 or 1.

Hence for the two queries above, the translation would be

----    fromwhere(
   [rel(staedte, *, u)],
   [pr(attr(bev, 0, u) > 500000, rel(staedte, *, u))]
   )

   fromwhere(
   [rel(staedte, s, u), rel(plz, p, l)],
   [pr(attr(s:sName, 1, u) = attr(p:ort, 2, u),
        rel(staedte, s, u), rel(plz, p, l)),
   pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l))]
   )
----

Note that the upper or lower case distinction refers only to the first letter
of a relation or attribute name. Other letters are written on the PROLOG side
in the same way as in Secondo.

Note further that if explicit variables are used, the attribute name will
include them, e.g. s:sName.

The projection occurring in the select-from-where statement is for the moment
not passed to the optimizer; it is treated outside.

So example 2 is rewritten as:

*/

example3 :- pog([rel(staedte, s, u), rel(plz, p, l)],
  [pr(attr(p:ort, 2, u) = attr(s:sName, 1, u),
    rel(staedte, s, u), rel(plz, p, l) ),
   pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l)),
   pr((attr(p:pLZ, 1, u) mod 5) = 0, rel(plz, p, l))], _, _).

/*

The two queries mentioned above are:

*/

example4 :- pog(
  [rel(staedte, *, u)],
  [pr(attr(bev, 1, u) > 500000, rel(staedte, *, u))],
  _, _).

example5 :- pog(
  [rel(staedte, s, u), rel(plz, p, l)],
  [pr(attr(s:sName, 1, u) = attr(p:ort, 2, u), rel(staedte, s, u), rel(plz, p,
l)),
   pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l))],
  _, _).

/*

5.1.2 The Target Language

In the target language, we use the following operators:

----    feed:        rel(Tuple) -> stream(Tuple)
       consume:    stream(Tuple) -> rel(Tuple)

       filter:        stream(Tuple) x (Tuple -> bool) -> stream(Tuple)
       product:    stream(Tuple1) x stream(Tuple2) -> stream(Tuple3)

                where Tuple3 = Tuple1 o Tuple2

       hashjoin:    stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2
                x nbuckets -> stream(Tuple3)

                where     Tuple3 = Tuple1 o Tuple2
                    attrname1 occurs in Tuple1
                    attrname2 occurs in Tuple2
                    nbuckets is the number of hash buckets
                        to be used

       sortmergejoin:    stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2
                -> stream(Tuple3)

                where     Tuple3 = Tuple1 o Tuple2
                    attrname1 occurs in Tuple1
                    attrname2 occurs in Tuple2

       loopjoin:    stream(Tuple1) x (Tuple1 -> stream(Tuple2)
                -> stream(Tuple3)

                where     Tuple3 = Tuple1 o Tuple2

       exactmatch:    btree(Tuple, AttrType) x rel(Tuple) x AttrType
                -> stream(Tuple)

    extend:        stream(Tuple1) x (Newname x (Tuple -> Attrtype))+
                -> stream(Tuple2)

                where     Tuple2 is Tuple1 to which pairs
                    (Newname, Attrtype) have been appended

    remove:        stream(Tuple1) x Attrname+ -> stream(Tuple2)

                where    Tuple2 is Tuple1 from which the mentioned
                    attributes have been removed.

       project:    stream(Tuple1) x Attrname+ -> stream(Tuple2)

                where    Tuple2 is Tuple1 projected on the
                    mentioned attributes.

       rename        stream(Tuple1) x NewName -> stream(Tuple2)

                where     Tuple2 is Tuple1 modified by appending
                    "_newname" to each attribute name

       count        stream(Tuple) -> int

                count the number of tuples in a stream

       sortby        stream(Tuple) x (Attrname, asc/desc)+    -> stream(Tuple)

                sort stream lexicographically by the given
                attribute names

   groupby    stream(Tuple) x GroupAttrs x NewFields -> stream(Tuple2)

                group stream by the grouping attributes; for each group
                compute new fields each of which is specified in the
                form Attrname : Expr. The argument stream must already
                be sorted by the grouping attributes.

     dloop     darray(X) x string x  (X->Y) -> darray(Y)

                Performs a function on each element of a darray instance.The
                string argument specifies the name of the result. If the
                name is undefined or an empty string, a name is generated
                automatically.

   dloop2     darray(X) x darray(Y) x string x (fun : X x Y -> Z) -> darray(Z)

               Performs a function on the elements of two darray instances.
               The string argument specifies the name of the resulting
               darray. If the string is undefined or empty, a name is
               generated automatically.

    dmap    d[f]array x string x fun -> d[f]array

              Performs a function on a distributed file array. If the
              string argument is empty or undefined, a name for the result
              is chosen automatically. If not, the string specifies the
              name. The result is of type dfarray if the function produces
              a tuple stream or a relationi; otherwise the result is a
              darray.

  dmap2    d[f]array x d[f]array x string x fun -> d[f]array

             Joins the slots of two distributed arrays.

 partition  d[f]array(rel(tuple)) x string x (tuple->int) x int-> dfmatrix

             Redistributes the contents of a dfarray value. The new slot
             contents are kept on the worker where the values were stored
             before redistributing them. The last argument (int)
             determines the number of slots of the redistribution. If
             this value is smaller or equal to zero, the number of slots
             is overtaken from the array argument.

partitionF  d[f]array(rel(X)) x string x ([fs]rel(X)->stream(Y)) x (Y ->
            int) x int -> dfmatrix(rel(Y))

              Repartitions a distributed [file] array. Before repartition,
              a function is applied to the slots.

 collect2  dfmatrix x string x int -> dfarray

            Collects the slots of a matrix into a  dfarray. The string
            is the name of the resulting array, the int value specified
            a port for file transfer. The port value can be any port
            usable on all workers. A corresponding file transfer server
            is started automatically.

  areduce  dfmatrix(rel(t)) x string x (fsrel(t)->Y) x int -> d[f]array(Y)

            Performs a function on the distributed slots of an array.
            The task distribution is dynamically, meaning that a fast
            worker will handle more slots than a slower one. The result
            type depends on the result of the function. For a relation
            or a tuple stream, a dfarray will be created. For other non-
            stream results, a darray is the resulting type.

dsummarize    darray(DATA) -> stream(DATA) , d[f]array(rel(X)) -> stream(X)

               Produces a stream of the darray elements.

  getValue   {darray(T),dfarray(T)} -> array(T)

               Converts a distributed array into a normal one.

     tie     ((array t) (map t t t)) -> t

                Calculates the "value" of an array evaluating the elements
                of the array with a given function from left to right.


----

In PROLOG, all expressions involving such operators are written in prefix
notation.

Parameter functions are written as

----    fun([param(Var1, Type1), ..., paran(VarN, TypeN)], Expr)
----


5.1.3 Converting Plans to Atoms and Writing them.

Predicate ~plan\_to\_atom~ converts a plan to a string atom, which represents
the plan as a SECONDO query in text syntax. For attributes we have to
distinguish whether a leading ``.'' needs to be written (if the attribute occurs
within a parameter function) or whether just the attribute name is needed as in
the arguments for hashjoin, for example. Predicate ~wp~ (``write plan'') uses
predicate ~plan\_to\_atom~ to convert its argument to an atom and then writes
that atom to standard output.

*/

upper(Lower, Upper) :-
  atom_codes(Lower, [First | Rest]),
  to_upper(First, First2),
  UpperList = [First2 | Rest],
  atom_codes(Upper, UpperList).

wp(Plan) :-
  plan_to_atom_string(Plan, PlanAtom),
  write(PlanAtom).

/*

Function ~newVariable~ outputs a new unique variable name.
The variable name is unique in the sense that ~newVariable~ never
outputs the same name twice (in a PROLOG session).
It should be emphasized that the output
is not a PROLOG variable but a variable name to be used for defining
abstractions in the Secondo system.

*/

:-
  dynamic(varDefined/1).

newVariable(Var) :-
  varDefined(N),
  !,
  N1 is N + 1,
  retract(varDefined(N)),
  assert(varDefined(N1)),
  atom_concat('var', N1, Var).

newVariable(Var) :-
  assert(varDefined(1)),
  Var = 'var1'.

deleteVariable :- retract(varDefined(_)), fail.

deleteVariables :- not(deleteVariable).

/*
Arguments:

*/

%fapra 2015/16

/*
To consider distributed queries with predicates containing non-relation
 objects, it's necessary to replicate the objects to the
 involved workers.

For now we assume that every found object is contained in the distributed
 part of the query (function of dmap or dmap2).

A possible later extension is to examine the distributed relations and
 to share the objects only to workers containing parts of those relations.

*/

:-
  dynamic(replicatedObject/1).

%distributed query without objects
replicateObjects(QueryPart, QueryPart) :-
  findall(X,replicatedObject(X), ObjectList),
  length(ObjectList,0),!.

%distributed query using objects in predicate
replicateObjects(QueryPart, Result) :-
  findall(X,replicatedObject(X), ObjectList),
  length(ObjectList,Length),
  Length >0,
  maplist(createSharedClause,ObjectList,CommandList),
  append(CommandList,[QueryPart], Result).

createSharedClause(Obj, SharedCommand) :-
  atom_concat('share("',Obj,StrObj),
  atom_concat(StrObj,'",TRUE)',SharedCommand).

plan_to_atom_string(X, Result) :-
  isDistributedQuery,
  retractall(replicatedObject(_)),
  plan_to_atom(X,QueryPart),
  replicateObjects(QueryPart, Result),
  !.

plan_to_atom_string(X, Result) :-
  not(isDistributedQuery),
  plan_to_atom(X,Result),
  !.

plan_to_atom(obj(Object,_,u), Result) :-
  isDistributedQuery,
  upper(Object, UpperObject),
  atom_concat(UpperObject, ' ', Result),
  assertOnce(replicatedObject(UpperObject)),
  !.

plan_to_atom(obj(Object,_,l), Result) :-
  isDistributedQuery,
  atom_concat(Object, ' ', Result),
  assertOnce(replicatedObject(Object)),
  !.


plan_to_atom(obj(Object,_,u), Result) :-
  upper(Object, UpperObject),
  atom_concat(UpperObject, ' ', Result),
  !.

plan_to_atom(obj(Object,_,l), Result) :-
  atom_concat(Object, ' ', Result),
  !.

plan_to_atom(dot, Result) :-
  atom_concat('.', ' ', Result),
  !.

%end fapra 2015/16

plan_to_atom(rel(Name, _, l), Result) :-
  atom_concat(Name, ' ', Result),
  !.

plan_to_atom(rel(Name, _, u), Result) :-
  upper(Name, Name2),
  atom_concat(Name2, ' ', Result),
  !.

plan_to_atom(res(N), Result) :-
  atom_concat('res(', N, Res1),
  atom_concat(Res1, ') ', Result),
  !.


plan_to_atom(Term, Result) :-
    is_list(Term), Term = [First | _], atomic(First), !,
    atom_codes(TermRes, Term),
    normalize_space(atom(Out),TermRes),
    concat_atom(['"', Out, '"'], '', Result).

/*
Lists:

*/


plan_to_atom([X], AtomX) :-
  plan_to_atom(X, AtomX),
  !.

plan_to_atom([X | Xs], Result) :-
  plan_to_atom(X, XAtom),
  plan_to_atom(Xs, XsAtom),
  concat_atom([XAtom, ', ', XsAtom], '', Result),
  !.


/*
Operators: only special syntax. General rules for standard syntax
see below.

*/


plan_to_atom(sample(Rel, S, T), Result) :-
  plan_to_atom(Rel, ResRel),
  concat_atom([ResRel, 'sample[', S, ', ', T, '] '], '', Result),
  !.

plan_to_atom(hashjoin(X, Y, A, B, C), Result) :-
  plan_to_atom(X, XAtom),
  plan_to_atom(Y, YAtom),
  plan_to_atom(A, AAtom),
  plan_to_atom(B, BAtom),
  concat_atom([XAtom, YAtom, 'hashjoin[',
    AAtom, ', ', BAtom, ', ', C, '] '], '', Result),
  !.

plan_to_atom(sortmergejoin(X, Y, A, B), Result) :-
  plan_to_atom(X, XAtom),
  plan_to_atom(Y, YAtom),
  plan_to_atom(A, AAtom),
  plan_to_atom(B, BAtom),
  concat_atom([XAtom, YAtom, 'sortmergejoin[',
    AAtom, ', ', BAtom, '] '], '', Result),
  !.

plan_to_atom(mergejoin(X, Y, A, B), Result) :-
  plan_to_atom(X, XAtom),
  plan_to_atom(Y, YAtom),
  plan_to_atom(A, AAtom),
  plan_to_atom(B, BAtom),
  concat_atom([XAtom, YAtom, 'mergejoin[',
    AAtom, ', ', BAtom, '] '], '', Result),
  !.

plan_to_atom(groupby(Stream, GroupAttrs, Fields), Result) :-
  plan_to_atom(Stream, SAtom),
  plan_to_atom(GroupAttrs, GAtom),
  plan_to_atom(Fields, FAtom),
  concat_atom([SAtom, 'groupby[', GAtom, '; ', FAtom, ']'], '', Result),
  !.

plan_to_atom(field(NewAttr, Expr), Result) :-
  plan_to_atom(attrname(NewAttr), NAtom),
  plan_to_atom(Expr, EAtom),
  concat_atom([NAtom, ': ', EAtom], '', Result).

plan_to_atom(exactmatchfun(IndexName, Rel, attr(Name, R, Case)), Result) :-
  plan_to_atom(Rel, RelAtom),
  plan_to_atom(a(Name, R, Case), AttrAtom),
  newVariable(T),
  concat_atom(['fun(', T, ' : TUPLE) ', IndexName,
    ' ', RelAtom, 'exactmatch[attr(', T, ', ', AttrAtom, ')] '], Result),
  !.


plan_to_atom(newattr(Attr, Expr), Result) :-
  plan_to_atom(Attr, AttrAtom),
  plan_to_atom(Expr, ExprAtom),
  concat_atom([AttrAtom, ': ', ExprAtom], '', Result),
  !.


plan_to_atom(rename(X, Y), Result) :-
  plan_to_atom(X, XAtom),
  concat_atom([XAtom, '{', Y, '} '], '', Result),
  !.


plan_to_atom(fun(Params, Expr), Result) :-
  params_to_atom(Params, ParamAtom),
  plan_to_atom(Expr, ExprAtom),
  concat_atom(['fun ', ParamAtom, ExprAtom], '', Result),
  !.


plan_to_atom(attribute(X, Y), Result) :-
  plan_to_atom(X, XAtom),
  plan_to_atom(Y, YAtom),
  concat_atom(['attr(', XAtom, ', ', YAtom, ')'], '', Result),
  !.

plan_to_atom(increment(X), Result) :-
  plan_to_atom(X, XAtom),
  concat_atom([XAtom, '++'], '', Result),
  !.

%fapra 2015/16

plan_to_atom(dloop2(PreArg1, PreArg2, PostArg1, PostArg2), Result) :-
plan_to_atom(PreArg1, PreArg1Atom),
plan_to_atom(PreArg2, PreArg2Atom),
plan_to_atom(PostArg1, PostArg1Atom),
plan_to_atom(PostArg2, PostArg2Atom),
concat_atom(
  [PreArg1Atom,
  PreArg2Atom,
  'dloop2[',
  PostArg1Atom, ', ',
  PostArg2Atom, ']'], '', Result),
!.

%end fapra 2015/16

/*
Sort orders and attribute names.

*/

plan_to_atom(asc(Attr), Result) :-
  plan_to_atom(Attr, AttrAtom),
  atom_concat(AttrAtom, ' asc', Result).

plan_to_atom(desc(Attr), Result) :-
  plan_to_atom(Attr, AttrAtom),
  atom_concat(AttrAtom, ' desc', Result).

plan_to_atom(attr(Name, Arg, Case), Result) :-
  plan_to_atom(a(Name, Arg, Case), ResA),
  atom_concat('.', ResA, Result).

plan_to_atom(attrname(attr(Name, Arg, Case)), Result) :-
  plan_to_atom(a(Name, Arg, Case), Result).

plan_to_atom(a(A:B, _, _), Result) :-
  upper(B, B2),
  concat_atom([B2, '_', A], Result),
  !.

plan_to_atom(a(X, _, _), X2) :-
  upper(X, X2),
  !.

%fapra 2015/16

plan_to_atom(our_attrname(attr(Name, Arg, Case)), Result) :-
  plan_to_atom(our_a(Name, Arg, Case), Result).

plan_to_atom(our_a(_:B, _, _), Result) :-
  upper(B, B2),
  concat_atom(['..', B2], Result),
  !.

plan_to_atom(our_a(X, _, _), Result) :-
  upper(X, X2),
  concat_atom(['..', X2], Result),
  !.

plan_to_atom(simple_attrname(attr(Name, Arg, Case)), Result) :-
  plan_to_atom(simple_a(Name, Arg, Case), Result), !.

plan_to_atom(simple_a(_:B, _, _), B2) :-
  upper(B, B2),
  !.

plan_to_atom(simple_a(X, _, _), X2) :-
  upper(X, X2),
  !.

plan_to_atom(extendstream(A, B, C), Plan) :-
  plan_to_atom(A, PlanA),
  plan_to_atom(B, PlanB),
  plan_to_atom(C, PlanC),
  concat_atom([PlanA, ' ', 'extendstream(',
    PlanB, ': ', PlanC, ')'], Plan).

%end fapra 2015/16

/*
Translation of operators driven by predicate ~secondoOp~ in
file ~opSyntax~. There are rules for

  * postfix, 1 or 2 arguments

  * postfix followed by one argument in square brackets, in total 2
or 3 arguments

  * prefix, 2 arguments

Other syntax, if not default (see below) needs to be coded explicitly.

*/

plan_to_atom(Term, Result) :-
  functor(Term, Op, 1),
  secondoOp(Op, postfix, 1),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  concat_atom([Res1, ' ', Op, ' '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 2),
  secondoOp(Op, postfix, 2),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  concat_atom([Res1, ' ', Res2, ' ', Op, ' '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 2),
  secondoOp(Op, postfixbrackets, 2),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  concat_atom([Res1, ' ', Op, '[', Res2, '] '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 3),
  secondoOp(Op, postfixbrackets, 3),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg3, Res3),
  concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, '] '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 2),
  secondoOp(Op, prefix, 2),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  concat_atom([Op, '(', Res1, ',', Res2, ') '], '', Result),
    !.

%fapra 2015/16

/*
Additional plan\_to\_atom rules to map Distributed2-operators.

*/

plan_to_atom(Term, Result) :-
  functor(Term, Op, 1),
  secondoOp(Op, prefix, 1),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  concat_atom([Op, '(', Res1, ') '], '', Result),
    !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 4),
  secondoOp(Op, prefix, 4),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg3, Res3),
  arg(4, Term, Arg4),
  plan_to_atom(Arg4, Res4),
  concat_atom([Op, '(', Res1, ',', Res2, ', ', Res3,
  ', ', Res4, ') '], '', Result),
    !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 4),
  secondoOp(Op, postfixbrackets, 4),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg3, Res3),
  arg(4, Term, Arg4),
  plan_to_atom(Arg4, Res4),
  concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, ', ',
    Res4, ']'], ''  , Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 3),
  secondoOp(Op, postfixbrackets2, 3),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg3, Res3),
  concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, '] '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 4),
  secondoOp(Op, postfixbrackets3, 4),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg3, Res3),
  arg(4, Term, Arg4),
  plan_to_atom(Arg4, Res4),
  concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3,', ',
               Res4, '] '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 5),
  secondoOp(Op, postfixbrackets3, 5),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg3, Res3),
  arg(4, Term, Arg4),
  plan_to_atom(Arg4, Res4),
  arg(5, Term, Arg5),
  plan_to_atom(Arg5, Res5),
  concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, ', ',
   Res4,', ',Res5, '] '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 5),
  secondoOp(Op, postfixbrackets4, 5),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg3, Res3),
  arg(4, Term, Arg4),
  plan_to_atom(Arg4, Res4),
  arg(5, Term, Arg5),
  plan_to_atom(Arg5, Res5),
  concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, ', ',
   Res4,', ',Res5, '] '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 6),
  secondoOp(Op, postfixbrackets5, 6),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg3, Res3),
  arg(4, Term, Arg4),
  plan_to_atom(Arg4, Res4),
  arg(5, Term, Arg5),
  plan_to_atom(Arg5, Res5),
  arg(6, Term, Arg6),
  plan_to_atom(Arg6, Res6),
  concat_atom([Res1, ' ', Op, '[', Res2, ', ', Res3, ', ', Res4,', ',
   Res5,', ',Res6, '] '], '', Result),
  !.

%end fapra 2015/16

/*
Generic rules. Operators that are not
recognized are assumed to be:

  * 1 argument: prefix

  * 2 arguments: infix

  * 3 arguments: prefix

*/

plan_to_atom(Term, Result) :-
  functor(Term, Op, 1),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  concat_atom([Op, '(', Res1, ')'], '', Result).

plan_to_atom(Term, Result) :-
  functor(Term, Op, 2),
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg1, Res1),
  plan_to_atom(Arg2, Res2),
  concat_atom(['(', Res1, ' ', Op, ' ', Res2, ')'], '', Result).

plan_to_atom(Term, Result) :-
  functor(Term, Op, 3),
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg1, Res1),
  plan_to_atom(Arg2, Res2),
  plan_to_atom(Arg3, Res3),
  concat_atom([Op, '(', Res1, ', ', Res2, ', ', Res3, ')'], '', Result).

plan_to_atom(X, Result) :-
  atomic(X),
  term_to_atom(X, Result),
  !.

plan_to_atom(X, _) :-
  write('Error while converting term: '),
  write(X),
  nl.


params_to_atom([], ' ').

params_to_atom([param(Var, Type) | Params], Result) :-
  type_to_atom(Type, TypeAtom),
  params_to_atom(Params, ParamsAtom),
  concat_atom(['(', Var, ': ', TypeAtom, ') ', ParamsAtom], '', Result),
  !.

type_to_atom(tuple, 'TUPLE').
type_to_atom(tuple2, 'TUPLE2').
type_to_atom(group, 'GROUP').


/*

5.2 Optimization Rules

We introduce a predicate [=>] which can be read as ``translates into''.

5.2.1 Translation of the Arguments of an Edge of the POG

If the argument is of the form res(N), then it is a stream already and can be
used unchanged. If it is of the form arg(N), then it is a base relation; a
~feed~ must be applied and possibly a ~rename~.

*/

ordered(plz, ort).

ordered(orte, ort).

ordered(staedte, sName).

ordered(thousand, no).

ordered(ten, no).

order(Name, Attr) :-
  ordered(Name, Attr), !.

order(_, none).

% The following rule is needed for listing all plan edges or cost edges,
% not for optimization as such.

res(N) => [res(N), none].

% arg(N) => feed(rel(Name, *, Case)) :-
%   argument(N, rel(Name, *, Case)), !.

% arg(N) => rename(feed(rel(Name, Var, Case)), Var) :-
%   argument(N, rel(Name, Var, Case)).

[res(N), P] => [res(N), P].


% Translate into distributed argument
arg(N) => [Plan, Properties] :-
  isDistributedQuery,
  !,
  distributedarg(N) => [Plan, Properties].

/*
  Treat transaltion into distributed arguments. The properties we use are...

  ~distribution~(DistributionType, DistributionAttribute, DistirbutionParameter):
  DistributionType is share, spatial, modulo, function or random,
  DistributionAttribute is the attribute of the relation used to determine
  on which partition(s) to put a given tuple (in theory this could also be a list),
  DistributionParamter is the parameter used for the distribution (like grid or
  funciton object / operator).

  ~distributedobjecttype~(Type) (Type is darray, dfarray or dfmatrix).

  ~disjointpartitioning~ signals that, if we treat a partition as the multi set
  of the tuples it contains, the union of all partitions is the original relation
  (put differently, in as far as duplicates exist, they have been present in the
  original relation).

  Since some second plans eliminate duplicates anyways, they can do without their
  arguments having this property (e.g. spatial join).

*/

% Translate into object found in SEC2DISTRIBUTED.
distributedarg(N) => [ObjName, X] :-
  X =[distribution(DistType, DCDistAttr, DistParam),
  distributedobjecttype(DistObjType),disjointpartitioning],
  argument(N, Rel),
  Rel = rel(Name, _, _),
  distributedRels(rel(Name, _, _), ObjName, DistObjType,
    DistType, DistAttr, DistParam),
  not(DistType = spatial),
  downcase_atom(DistAttr, DCDistAttr).

% Spatial partitioning with filtering on original attribute
% does not in general yield disjoint partitions
distributedarg(N) => [ObjName,
  [distribution(DistType, DCDistAttr, DistParam),
  distributedobjecttype(DistObjType)]] :-
  argument(N, Rel),
  Rel = rel(Name, _, _),
  distributedRels(rel(Name, _, _), ObjName, DistObjType,
    DistType, DistAttr, DistParam),
  DistType = spatial,
  downcase_atom(DistAttr, DCDistAttr).

% Filter spatially distributed argument on attribute original.
distributedarg(N) => [Plan,
  [distribution(spatial, DCDistAttr, DistParam),
  distributedobjecttype(DistObjType), disjointpartitioning]] :-
  argument(N, Rel),
  Rel = rel(Name, _, _),
  distributedRels(rel(Name, _, _), ObjName, DistObjType,
    spatial, DistAttr, DistParam),
  downcase_atom(DistAttr, DCDistAttr),
  Plan = dmap(ObjName, " ", filter(feed(rel(., *, u)), attr(original, l, u))).

/*
  Redistributed argument relation to be spatially distributed using the
  provided attribute. The distribution type must be spatial and the
  attribute must be provided as a ground term. The grid may be provided
  to be used for the distribution. If it is not provided we fall back to
  using the grid object called grid. You need to have this in your database.
  Yields a dfarray or a dfmatrix.

*/

distributedarg(N) => [Plan, [distribution(DistType,DistAttr,Grid),
  distributedobjecttype(DistObjType)]] :-
  % only use this in one direction. Might be generalized in the future.
  ground(DistAttr),
  ground(DistType),
  % if we do not have a grid specified, use the grid-object
  (ground(Grid) -> true; Grid = grid),
  DistType = spatial,
  argument(N, Rel),
  Rel = rel(Name, _, _),
  distributedRels(rel(Name, _, _), ObjName, _, OriginalDistType, _, _),
  % cannot redistribute replicated relations
  not(OriginalDistType = share),
  spelled(Name:DistAttr, AttrTerm),
  InnerPlan = partitionF(ObjName, " ", extendstream(feed(rel('.', *, u)),
    attrname(attr(cell, *, u)), cellnumber(bbox(AttrTerm), Grid)),
    attr('.Cell', *, u), 0), %there should be another option to add the 2nd dot
  % collect into dfarray or simply be content with the dfmatrix
  (DistObjType = dfarray,
    Plan = collect2(InnerPlan, " ", 1238);
    DistObjType = dfmatrix,
    Plan = InnerPlan).

arg(N) => [feed(rel(Name, *, Case)), [order(X)]] :-
  argument(N, rel(Name, *, Case)), !,
  order(Name, X).

arg(N) => [rename(feed(rel(Name, Var, Case)), Var), [order(Var:X)]] :-
  argument(N, rel(Name, Var, Case)), !,
  order(Name, X).

/*
5.2.2 Translation of Selections

*/

%fapra 2015/16

% Translate selection into distributed selection.
select(Arg, Y) => X :-
  isDistributedQuery,
  !, /* Operand is distributed. Do not translate into local selection. */
  distributedselect(Arg, Y) => X.

%end fapra 2015/16

% select(Arg, pr(Pred, _)) => filter(ArgS, Pred) :-
%   Arg => ArgS.

% select(Arg, pr(Pred, _, _)) => filter(ArgS, Pred) :-
%    Arg => ArgS.

select(Arg, pr(Pred, _)) => [filter(ArgS, Pred), P] :-
  Arg  => [ArgS, P].


select(Arg, pr(Pred, _, _)) => [filter(ArgS, Pred), P] :-
  Arg  => [ArgS, P].


/*

Translation of selections using indices.

*/

select(arg(N), Y) => [X, P] :-
  indexselect(arg(N), Y) => [X, P], !.

select(arg(N), Y) => [X, [none]] :-
  indexselect(arg(N), Y) => X.

indexselect(arg(N), pr(attr(AttrName, Arg, Case) = Y, Rel)) => X :-
  indexselect(arg(N), pr(Y = attr(AttrName, Arg, Case), Rel)) => X.

indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) =>
  [exactmatch(IndexName, rel(Name, *, Case), Y), [order(AttrName)]]
  :-
  argument(N, rel(Name, *, Case)),
  !,
  hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName).

indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) =>
  [rename(exactmatch(IndexName, rel(Name, Var, Case), Y), Var),
   [order(AttrName)]]
  :-
  argument(N, rel(Name, Var, Case)),
  !,
  hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName).

indexselect(arg(N), pr(attr(AttrName, Arg, Case) <= Y, Rel)) => X :-
  indexselect(arg(N), pr(Y >= attr(AttrName, Arg, Case), Rel)) => X.

indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) =>
  [leftrange(IndexName, rel(Name, *, Case), Y), [order(AttrName)]]
  :-
  argument(N, rel(Name, *, Case)),
  !,
  hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName).

indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) =>
  [rename(leftrange(IndexName, rel(Name, Var, Case), Y), Var),
   [order(AttrName)]]
  :-
  argument(N, rel(Name, Var, Case)),
  !,
  hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName).

indexselect(arg(N), pr(attr(AttrName, Arg, Case) >= Y, Rel)) => X :-
  indexselect(arg(N), pr(Y <= attr(AttrName, Arg, Case), Rel)) => X.

indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) =>
  [rightrange(IndexName, rel(Name, *, Case), Y), [order(AttrName)]]
  :-
  argument(N, rel(Name, *, Case)),
  !,
  hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName).

indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) =>
  [rename(rightrange(IndexName, rel(Name, Var, Case), Y), Var),
   [order(AttrName)]]
  :-
  argument(N, rel(Name, Var, Case)),
  !,
  hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), IndexName).

%fapra 2015/16

/*
Translation of selections that concern distributed relations.

*/

% Commutativity of intersects.
distributedselect(ObjName,
  pr(Val intersects attr(Attr, Arg, Case), Rel)) => X :-
  distributedselect(ObjName, pr(attr(Attr, Arg, Case) intersects Val, Rel))
    => X.

% Use spatial index for an intersection predicate.
distributedselect(arg(N), Pred)
  => [dmap2(IndexObj, RelObj, " ",
    filter(filter(Intersection, InnerPred), attr(original, l, u)), 1238),
    [distributedobjecttype(dfarray), disjointpartitioning]] :-
    argument(N, Rel),
    Pred = pr(Attr intersects Val, rel(_, Var, _)),
    Pred = pr(InnerPred, _),
    % We need a materialized argument relation to use the index
    distributedRels(Rel, RelObj, _, _, _),
    RelObj = rel(RelObjName, _, _),
    % Lookup an rtree index for the relation + attribute
    downcase_atom(RelObjName, DCRelObjName),
    attrnameDCAtom(Attr, DCAttr),
    distributedIndex(DCRelObjName, DCAttr, rtree, DCIndexObjName),
    % Check the database object for the correct spelling
    spelledObj(DCIndexObjName, IndexObjName,_, Case),
    IndexObj = rel(IndexObjName, *, Case),
    IndParam = rel('.', *, u),
    RelParam = rel('..', *, u),
    renameStream(windowintersects(IndParam, RelParam, Val),
      Var, Intersection).

% Use btree index for a starts predicate.
distributedselect(arg(N), pr(Attr starts Val, rel(_, Var, _)))
  => [dmap2(IndexObj, RelObj, " ",
    Range, 1238), [distributedobjecttype(dfarray), disjointpartitioning]] :-
    argument(N, Rel),
    distributedRels(Rel, RelObj, _, _, _),
    RelObj = rel(RelObjName, _, _),
    downcase_atom(RelObjName, DCRelObjName),
    attrnameDCAtom(Attr, DCAttr),
    % Lookup a btree index for the relation + attribute
    distributedIndex(DCRelObjName, DCAttr, btree, DCIndexObjName),
    spelledObj(DCIndexObjName, IndexObjName,_, Case),
    IndexObj = rel(IndexObjName, *, Case),
    IndParam = rel('.', *, u),
    RelParam = rel('..', *, u),
    renameStream(range(IndParam, RelParam, Val, increment(Val)),
      Var, Range).


% Generic case.
distributedselect(Arg, pr(Cond, rel(_,Var,_))) =>
  [dmap(ArgS," ", filter(Param,Cond)), P] :-
  Arg  => [ArgS, P],
  % we accept darrays and dfarrays
  (member(distributedobjecttype(dfarray), P) ;
    member(distributedobjecttype(darray), P)),
  % partitions of the argument relations need to disjoint
  member(disjointpartitioning, P),
  % rename if needed
  feedRenameRelation(rel('.',*, u), Var, Param).

%end fapra 2015/16


/*
Here ~ArgS~ is meant to indicate ``argument stream''.

5.2.3 Translation of Joins

A join can always be translated to filtering the Cartesian product.

*/

%fapra 2015/16

% we have to variants of joins in place, see if the first one can
% handle. If yes, cut and use its result.
join(Arg1, Arg2, Pred) => SecondoPlan:-
  isDistributedQuery,
  distributedjoin(Arg1, Arg2, Pred) => _, !,
  distributedjoin(Arg1, Arg2, Pred) => SecondoPlan.

join(Arg1, Arg2, Pred) => SecondoPlan:-
  isDistributedQuery, !,
  Arg1 = arg(N1),
  Arg2 = arg(N2),
  not(N1=N2),
  Arg1 => [ObjName1, _],
  Arg2 => [ObjName2, _],
  distributedRels(_, ObjName1, _, _, _),
  distributedRels(_, ObjName2, _, _, _),
  distributedjoin(ObjName1, ObjName2, Pred) => SecondoPlan.

%end fapra 2015/16

join(Arg1, Arg2, pr(Pred, _, _)) => [filter(product(Arg1S, Arg2S), Pred), P1] :-
  Arg1 => [Arg1S, P1],
  Arg2 => [Arg2S, _].


/*

Index joins:

*/


join(Arg1, arg(N), pr(X=Y, _, _)) => [loopjoin(Arg1S, MatchExpr), P1] :-
  isOfSecond(Attr2, X, Y),
  isNotOfSecond(Expr1, X, Y),
  argument(N, RelDescription),
  hasIndex(RelDescription, Attr2, IndexName),
  Arg1 => [Arg1S, P1],
  exactmatch(IndexName, arg(N), Expr1) => MatchExpr.

join(arg(N), Arg2, pr(X=Y, _, _)) => [loopjoin(Arg2S, MatchExpr), P2] :-
  isOfFirst(Attr1, X, Y),
  isNotOfFirst(Expr2, X, Y),
  argument(N, RelDescription),
  hasIndex(RelDescription, Attr1, IndexName),
  Arg2 => [Arg2S, P2],
  exactmatch(IndexName, arg(N), Expr2) => MatchExpr.


exactmatch(IndexName, arg(N), Expr) =>
  exactmatch(IndexName, rel(Name, *, Case), Expr) :-
  argument(N, rel(Name, *, Case)),
  !.

exactmatch(IndexName, arg(N), Expr) =>
  rename(exactmatch(IndexName, rel(Name, Var, Case), Expr), Var) :-
  argument(N, rel(Name, Var, Case)),
  !.


/*

For a join with a predicate of the form X = Y we can distinguish four cases
depending on whether X and Y are attributes or more complex expressions. For
example, a query condition might be ``PLZA = PLZB'' in which case we have just
attribute names on both sides of the predicate operator, or it could be ``PLZA =
PLZB + 1''. In the latter case we have an expression on the right hand side.
This can still be translated to a hashjoin, for example, by first extending the
second argument by a new attribute containing the value of the expression. For
example, the query

----    select *
    from plz as p1, plz as p2
    where p1.PLZ = p2.PLZ + 1
----

can be translated to

----    plz feed {p1} plz feed {p2} extend[newPLZ: PLZ_p2 + 1]
    hashjoin[PLZ_p1, newPLZ, 997]
    remove[newPLZ]
    consume
----

This technique is built into the optimizer as follows. We first define the four
cases (at the moment for equijoin only; this may later be extended) which also
translate the arguments into streams. Then the rules translating to join
methods can be formulated independently from this general technique. They
translate terms of the form join00(Arg1Stream, Arg2Stream, Pred).

*/

join(Arg1, Arg2, pr(X=Y, R1, R2)) => [JoinPlan, P] :-
  X = attr(_, _, _),
  Y = attr(_, _, _), !,
  Arg1 => [Arg1S, P1],
  Arg2 => [Arg2S, P2],
  join00([Arg1S, P1], [Arg2S, P2], pr(X=Y, R1, R2)) => [JoinPlan, P].

join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
    [remove(JoinPlan, [attrname(attr(r_expr, 2, l))]), P] :-
  X = attr(_, _, _),
  not(Y = attr(_, _, _)), !,
  Arg1 => [Arg1S, P1],
  Arg2 => [Arg2S, _],
  Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]),
  join00([Arg1S, P1], [Arg2Extend, none], pr(X=attr(r_expr, 2, l), R1, R2))
   => [JoinPlan, P].

join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
    [remove(JoinPlan, [attrname(attr(l_expr, 2, l))]), P] :-
  not(X = attr(_, _, _)),
  Y = attr(_, _, _), !,
  Arg1 => [Arg1S, _],
  Arg2 => [Arg2S, P2],
  Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]),
  join00([Arg1Extend, none], [Arg2S, P2], pr(attr(l_expr, 1, l)=Y, R1, R2))
   => [JoinPlan, P].

join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
    [remove(JoinPlan, [attrname(attr(l_expr, 1, l)),
        attrname(attr(r_expr, 2, l))]), P] :-
  not(X = attr(_, _, _)),
  not(Y = attr(_, _, _)), !,
  Arg1 => [Arg1S, _],
  Arg2 => [Arg2S, _],
  Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]),
  Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]),
  join00([Arg1Extend, none], [Arg2Extend, none],
    pr(attr(l_expr, 1, l)=attr(r_expr, 2, l), R1, R2)) => [JoinPlan, P].


join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [sortmergejoin(Arg1S, Arg2S,
    attrname(Attr1), attrname(Attr2)), [order(Name1), order(Name2)] ] :-
  isOfFirst(Attr1, X, Y), Attr1 = attr(Name1, _, _),
  isOfSecond(Attr2, X, Y), Attr2 = attr(Name2, _, _).

% use order property

join00([Arg1S, P1], [Arg2S, P2], pr(X = Y, _, _)) => [mergejoin(Arg1S, Arg2S,
    attrname(Attr1), attrname(Attr2)), [order(Name1), order(Name2)] ] :-
  isOfFirst(Attr1, X, Y), Attr1 = attr(Name1, _, _),
  isOfSecond(Attr2, X, Y), Attr2 = attr(Name2, _, _),
  select(order(Name1), P1, _),
  select(order(Name2), P2, _).

% hashjoin has asymmetric cost, therefore consider both orders

join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [hashjoin(Arg1S, Arg2S,
    attrname(Attr1), attrname(Attr2), 999997), [none]]   :-
  isOfFirst(Attr1, X, Y),
  isOfSecond(Attr2, X, Y).

join00([Arg1S, _], [Arg2S, _], pr(X = Y, _, _)) => [hashjoin(Arg2S, Arg1S,
    attrname(Attr2), attrname(Attr1), 999997), [none]]   :-
  isOfFirst(Attr1, X, Y),
  isOfSecond(Attr2, X, Y).

%fapra 2015/16

% Translate a distributed spatial join with an intersection predicate.
distributedjoin(Arg1, Arg2, Pred)
=> [SecondoPlan, [DistAttr1, distributedobjecttype(dfarray),
disjointpartitioning]]:-
  Pred = pr(Attr1 intersects Attr2, rel(_, Rel1Var, _), rel(_, Rel2Var, _)),
  isOfFirst(Attr1, Rel1, Rel2),
  isOfSecond(Attr2, Rel1, Rel2),
  attrnameDCAtom(Attr1, Attr1Name),
  attrnameDCAtom(Attr2, Attr2Name),
  % allow using replicated + any distribution or both distributed by
  % join predicate
  ((DistAttr1 = distribution(_, _, _),
    DistAttr2 = distribution(share, _, _));
  (DistAttr1 = distribution(spatial, Attr2Name, GridObj),
    DistAttr2 = distribution(spatial, Attr1Name, GridObj))),
  Arg1 => [ObjName1, [DistAttr1| Props1]],
  Arg2 => [ObjName2, [DistAttr2| Props2]],
  % rename the parameter relations if needed
  feedRenameRelation(param1, Rel1Var, Param1Plan),
  feedRenameRelation(param2, Rel2Var, Param2Plan),
  % rename the cell attribute if needed
  renamedRelAttr(attr(cell, 1, u), Rel1Var, CellAttr1),
  renamedRelAttr(attr(cell, 2, u), Rel2Var, CellAttr2),
  Scheme =
    filter(
        filter(
            filter(
                itSpatialJoin(
                    Param1Plan,
                    Param2Plan,
                    attrname(Attr1),
                    attrname(Attr2)
                    ),
                CellAttr1 = CellAttr2
                ),
            gridintersects(
                GridObj,
                bbox(Attr1),
                bbox(Attr2),
                CellAttr1
                )
            ),
        Attr1 intersects Attr2
        ),
  % We have the actual query now. Distribute it to the workers.
  distributedquery([ObjName1, [DistAttr1| Props1]],
    [ObjName2, [DistAttr2| Props2]], Scheme)
    => SecondoPlan.

/*
  ----
  distributedquery(Arg1, Arg2, QueryScheme) =>
  ----

  Distribute the query given by QueryScheme to the workers. The scheme has
  the place holders param1 and param2 for its argument. The actual arguments
  are given in Arg1 and Arg2 as a pair of a plan and a property list.
  Several cases might arise depening on Arg1's and
  Arg2's distribution type (replicated vs partitioned) and their distributed object
  type (d(f)array vs dfmatrix).

*/

% Arg1 replicated, Arg2 partitioned, Arg2 is a d(f)array
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
  not(isPartitioned([Arg1S, P1])),
  isPartitioned([Arg2S, P2]),
  not(isDfmatrix([Arg2S, P2])),
  substituteSubterm(param2, rel('.', *, u), QueryScheme, QueryScheme1),
  substituteSubterm(param1, Arg2S, QueryScheme1, QueryScheme2),
  Query = dmap(Arg2S, " ", QueryScheme2), !.

% Arg2 replicated, Arg1 partitioned, Arg1 is a d(f)array
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
  isPartitioned([Arg1S, P1]),
  not(isPartitioned([Arg2S, P2])),
  not(isDfmatrix([Arg1S, P1])),
  substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1),
  substituteSubterm(param2, Arg2S, QueryScheme1, QueryScheme2),
  Query = dmap(Arg1S, " ", QueryScheme2), !.

% Arg1 partitioned, Arg2 partitioned, both are d(f)arrays
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
  isPartitioned([Arg1S, P1]),
  isPartitioned([Arg2S, P2]),
  not(isDfmatrix([Arg2S, P2])),
  not(isDfmatrix([Arg1S, P1])),
  substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1),
  substituteSubterm(param2, rel('..', *, u), QueryScheme1, QueryScheme2),
  Query = dmap2(Arg1S, Arg2S, " ", QueryScheme2, 1238), !.

% Arg1 partitioned, Arg2 partitioned, both dfmatrices
distributedquery([Arg1S, P1], [Arg2S, P2], QueryScheme) => Query :-
  isPartitioned([Arg1S, P1]),
  isPartitioned([Arg2S, P2]),
  isDfmatrix([Arg2S, P2]),
  isDfmatrix([Arg1S, P1]),
  substituteSubterm(param1, rel('.', *, u), QueryScheme, QueryScheme1),
  substituteSubterm(param2, rel('..', *, u), QueryScheme1, QueryScheme2),
  Query = areduce2(Arg1S, Arg2S, "", QueryScheme2, 1238), !.

% Arg1 replicated, Arg2 replicated
distributedquery([Arg1S, P1], [Arg2S, P2], _) => _ :-
  not(isPartitioned([Arg1S, P1])),
  not(isPartitioned([Arg2S, P2])),
  write('A potential plan edge could not be generated because '),
  write('queries with two replicated arguments '),
  write('cannot be formulated using DistributedAlgebra as of now.\n'),
  fail.

%Equijoin
distributedjoin(ObjName1, ObjName2, pr(attr(X1,X2,X3)=attr(Y1,Y2,Y3),
                Rel1, Rel2))
=> [SecondoPlan, [none]] :-
 X=attr(X1,X2,X3),
 Y=attr(Y1,Y2,Y3),
 Rel1 = rel(_, _, _),
 Rel2 = rel(_, _, _),
 isOfFirst(_, X, Y),
 isOfSecond(_, X, Y),
 buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
		  SecondoPlan, false).

%Standard Join
distributedjoin(ObjName1, ObjName2, pr(Pred,Rel1, Rel2))
=> [SecondoPlan, [none]] :-
 Rel1 = rel(_, _, _),
 Rel2 = rel(_, _, _),
 buildStdSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2),
		  SecondoPlan, false).

/*
It is assumed that if "function" is specified in
the system relation "SEC2DISTRIBUTED", then a deterministic
function using the specified attribute was used.
The functions used for partitioning both used relations are assumed
to result in the same values if given the same attribute value. E.g.
both used the same hashvalue.

*/

/*

Equijoin Secondo Plan for both are partitioned by join attribute
 using modulo.
 Modulo is the most efficient compared to the other options,
 because we do not need to repartition and also there is no
 need to calculate the worker, on which a tuple is located,
 the worker number is already the modulo value. Thus it is
 slightly more efficient than any other function (i.e. hash).
 In case it is possible in the future to deploy different secondo plans
 to different workers (i.e. tell each worker which part of the shared
 relation it should use), having 2 replicated relations
 is the most efficient solution.

*/

buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
		 SecondoPlan, _):-
 plan_to_atom(simple_attrname(X), X2),
 plan_to_atom(simple_attrname(Y), Y2),
 distributedRels(_, ObjName1, _, 'modulo', X2),
 distributedRels(_, ObjName2, _, 'modulo', Y2),
 Rel1 = rel(_, Rel1Var, _),
 Rel2 = rel(_, Rel2Var, _),
 % rename the parameter relations of the dmapped plan if needed
 feedRenameRelation(rel('.', *, u), Rel1Var, Feed1),
 feedRenameRelation(rel('..', *, u), Rel2Var, Feed2),
 !,
 SecondoPlan = dmap2(ObjName1, ObjName2, " ",
               hashjoin(Feed1, Feed2,attrname(X),
               attrname(Y), 999997), 1238).

%Equijoin Secondo Plan for both are partitioned by join attribute
%using a function
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
		 SecondoPlan, _):-
 plan_to_atom(simple_attrname(X), X2),
 plan_to_atom(simple_attrname(Y), Y2),
 distributedRels(_, ObjName1, _, 'function', X2),
 distributedRels(_, ObjName2, _, 'function', Y2),
 Rel1 = rel(_, Rel1Var, _),
 Rel2 = rel(_, Rel2Var, _),
 % rename the parameter relations of the dmapped plan if needed
 feedRenameRelation(rel('.', *, u), Rel1Var, Feed1),
 feedRenameRelation(rel('..', *, u), Rel2Var, Feed2),
 !,
 SecondoPlan = dmap2(ObjName1, ObjName2, " ",
               hashjoin(Feed1, Feed2,attrname(X),
               attrname(Y), 999997), 1238).

%Equijoin Secondo Plan for one replicated (relation) and
%one partitioned (darray/dfarray)
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
		 SecondoPlan, _):-
 distributedRels(_ ,ObjName1,_ ,'share',_ ),
 isPartitioned(ObjName2),
 Rel1 = rel(_, Rel1Var, _),
 Rel2 = rel(_, Rel2Var, _),
 % rename the parameter relations of the dmapped plan if needed
 feedRenameRelation(ObjName1, Rel1Var, Feed1),
 feedRenameRelation(rel('.', *, u), Rel2Var, Feed2),
  !,
 SecondoPlan = dmap(ObjName2, " ",
 hashjoin(Feed1,
	  Feed2,
          attrname(X), attrname(Y), 999997)).

%Commutativity for Equijoin & Standard Join
buildSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2),
	         SecondoPlan, false):-
 buildSecondoPlan(ObjName2, ObjName1, pr(Pred, Rel1, Rel2),
		  SecondoPlan, true).


%Equijoin Secondo Plan for repartitioning 2 "wrongly"
%partitioned relations (darray/dfarray)
buildSecondoPlan(ObjName1, ObjName2, pr(X=Y, Rel1, Rel2),
	         SecondoPlan, _):-
  isPartitioned(ObjName1),
  isPartitioned(ObjName2),
  Rel1 = rel(_, Rel1Var, _),
  Rel2 = rel(_, Rel2Var, _),
  % rename the parameter relations of the dmapped plan if needed
  feedRenameRelation(rel('.', *, u), Rel1Var, Feed1),
  feedRenameRelation(rel('..', *, u), Rel2Var, Feed2),
  !,
  SecondoPlan = dmap2(
	collect2(
	   partitionF(ObjName1, "LeftPartOfJoin", feed(rel('.',*,u)),
	   hashvalue(our_attrname(X), 999997), 0),
	   "L", 1238),
	collect2(
	   partitionF(ObjName2, "RightPartOfJoin", feed(rel('.',*,u)),
	   hashvalue(our_attrname(Y), 999997), 0),
	   "R", 1238),
	" ",
	hashjoin(Feed1,
	      Feed2,
	      attrname(X), attrname(Y), 999997),
	      1238).

%Equijoin Secondo Plan for repartitioning 2 replicated rels
buildSecondoPlan(ObjName1, ObjName2, pr(attr(_,_,_)=attr(_,_,_), _, _),
	         _, true):-
  distributedRels(_ ,ObjName1,_ ,'share',_ ),
  distributedRels(_, ObjName2, _,'share', _),
  !,
  write('Both relations are replicated, the query cannot be executed!'),
  false.

% Plan yields a dfmatrix
isDfmatrix([_, P]) :-
  member(distributedobjecttype(dfmatrix), P).

% Plan yields a partitioned distribution.
isPartitioned([_, P]):-
 is_list(P), !,(
 member(distribution('function', _, _), P);
 member(distribution('modulo', _, _), P);
 member(distribution('random', _, _), P);
 member(distribution('spatial', _, _), P)).

% Secondo object represents a partitioned distribution.
isPartitioned(ObjName):-
 distributedRels(_, ObjName,_ ,'function', _);
 distributedRels(_, ObjName,_ ,'modulo', _);
 distributedRels(_, ObjName,_ ,'random', _);
 distributedRels(_, ObjName,_ ,'spatial', _).

%Standard Join Secondo Plan (one replicated, one partitioned)
buildStdSecondoPlan(ObjName1, ObjName2, pr(Pred, Rel1, Rel2),
	         SecondoPlan, _):-
  (DistArgrel = ObjName2, ReplArgrel = ObjName1;
    DistArgrel = ObjName1, ReplArgrel = ObjName2),
  distributedRels(_, ReplArgrel, _ , 'share', _),
  isPartitioned(DistArgrel),
  Rel1 = rel(_, Rel1Var, _),
  Rel2 = rel(_, Rel2Var, _),
  % rename the parameter relations of the dmapped plan if needed
  feedRenameRelation(rel('.', *, u), Rel2Var, Feed2),
  feedRenameRelation(ReplArgrel, Rel1Var, Feed1),
  !,
  SecondoPlan = dmap(DistArgrel, " ",
  filter(product(Feed2,Feed1), Pred)).

%Standard Join Secondo Plan, both are partitioned
buildStdSecondoPlan(ObjName1, ObjName2, pr(_, _, _),
	         _, true):-
  isPartitioned(ObjName1),
  isPartitioned(ObjName2),
  !,
  write('The joined relations are both partitioned and thus'),
  write(' not distributed correctly for standard join.'),
  false.

%Standard Join Secondo Plan, if repartitioning is needed
buildStdSecondoPlan(_, _, pr(_, _, _), _, true):-
  !,
  write('The joined relations are not distributed correctly '),
  write('for standard join.'),
  false.

%end fapra 2015/16

/*

----    isOfFirst(Attr, X, Y)
       isOfSecond(Attr, X, Y)
----

~Attr~ equal to either ~X~ or ~Y~ is an attribute of the first(second) relation.

*/


isOfFirst(X, X, _) :- X = attr(_, 1, _).
isOfFirst(Y, _, Y) :- Y = attr(_, 1, _).
isOfSecond(X, X, _) :- X = attr(_, 2, _).
isOfSecond(Y, _, Y) :- Y = attr(_, 2, _).

isNotOfFirst(Y, X, Y) :- X = attr(_, 1, _).
isNotOfFirst(X, X, Y) :- Y = attr(_, 1, _).
isNotOfSecond(Y, X, Y) :- X = attr(_, 2, _).
isNotOfSecond(X, X, Y) :- Y = attr(_, 2, _).


/*
6 Creating Query Plan Edges

*/

% RHG 2014

planEdge(Source, Target, Plan, Result) :-
  edge(Source, Target, Term, Result, _, _),
  Term => PlanExpr,
  getProperties(PlanExpr, Plan, _).

% Version with properties

% Selection Edges

planEdge(Source, Target,  PropertiesIn, Plan, [[Result, P2] | PRest], Result) :-
  edge(Source, Target, select(res(N), Pred), Result, _, _),
  select([N, P], PropertiesIn, PRest),
  select([res(N), P], Pred) => PlanExpr,
  getProperties(PlanExpr, Plan, P2).

% Join Edges

planEdge(Source, Target,  PropertiesIn, Plan, [[Result, P2] | PRest], Result) :-
  edge(Source, Target, join(arg(N), res(M), Pred), Result, _, _),
  select([M, P], PropertiesIn, PRest),
  join(arg(N), [res(M), P], Pred) => PlanExpr,
  getProperties(PlanExpr, Plan, P2).

planEdge(Source, Target,  PropertiesIn, Plan, [[Result, P2] | PRest], Result) :-
  edge(Source, Target, join(res(M), arg(N), Pred), Result, _, _),
  select([M, P], PropertiesIn, PRest),
  join([res(M), P], arg(N), Pred) => PlanExpr,
  getProperties(PlanExpr, Plan, P2).


planEdge(Source, Target,  PropertiesIn, Plan, [[Result, P3] | PRest], Result) :-
  edge(Source, Target, join(res(N), res(M), Pred), Result, _, _),
  select([N, P], PropertiesIn, PIn2),
  select([M, P2], PIn2, PRest),
  join([res(N), P], [res(M), P2], Pred) => PlanExpr,
  getProperties(PlanExpr, Plan, P3).

% Remaining edges without intermediate results

planEdge(Source, Target, PropertiesIn, Plan, [[Result, P] | PropertiesIn],
   Result) :-
  edge(Source, Target, Term, Result, _, _),
  Term = select(arg(_), _),
  Term => PlanExpr,
  getProperties(PlanExpr, Plan, P).

planEdge(Source, Target, PropertiesIn, Plan, [[Result, P] | PropertiesIn],
   Result) :-
  edge(Source, Target, Term, Result, _, _),
  Term = join(arg(_), arg(_), _),
  Term => PlanExpr,
  getProperties(PlanExpr, Plan, P).


getProperties([Plan, P], Plan, P) :- !.

getProperties(Plan, Plan, none).

% end RHG 2014


createPlanEdge :-
  edge(Source, Target, Term, Result, _, _),
  Term => Plan,
  assert(planEdge(Source, Target, Plan, Result)),
  fail.

createPlanEdges :- not(createPlanEdge).

deletePlanEdge :-
  retract(planEdge(_, _, _, _)), fail.

deletePlanEdges :- not(deletePlanEdge).

writePlanEdge :-
  planEdge(Source, Target, Plan, Result),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Plan: '), wp(Plan), nl,
  % write(Plan), nl,
  write('Result: '), write(Result), nl, nl,
  pe(N), retract(pe(_)), N1 is N + 1, assert(pe(N1)),  % count edges
  fail.


writePlanEdgesProp :-
  planEdge(Source, Target, _, Plan, Prop, Result),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Plan: '), wp(Plan), nl,
  write(Prop), nl,
  % write(Plan), nl,
  write('Result: '), write(Result), nl, nl,
  pe(N), retract(pe(_)), N1 is N + 1, assert(pe(N1)),  % count edges
  fail.

writePlanEdges :-
  assert(pe(0)),
  not(writePlanEdge),
  not(writePlanEdgesProp),
  pe(N),
  write('The total number of plan edges is '), write(N), write('.'), nl.

wpe :- writePlanEdges.


/*
7 Assigning Sizes and Selectivities to the Nodes and Edges of the POG

----    assignSizes.
    deleteSizes.
----

Assign sizes (numbers of tuples) to all nodes in the pog, based on the
cardinalities of the argument relations and the selectivities of the
predicates. Store sizes as facts of the form resultSize(Result, Size). Store
selectivities as facts of the form edgeSelectivity(Source, Target, Sel).

Delete sizes from memory.

7.1 Assigning Sizes and Selectivities

It is important that edges are processed in the order in which they have been
created. This will ensure that for an edge the size of its argument nodes are
available.

*/

assignSizes :- not(assignSizes1).

assignSizes1 :-
  edge(Source, Target, Term, Result, _, _),
  assignSize(Source, Target, Term, Result),
  fail.

%assignSize(Source, Target, select(Arg, Pred), Result) :-
%  Pred = pr(attr(original, *, u), _),
%  !, % predicate used for eliminating one of many spatially overlapping tuples
%  resSize(Arg, Size),
%  setNodeSize(Result, Size),
%  % assume overlap is rather small
%  assert(edgeSelectivity(Source, Target, 1)).

assignSize(Source, Target, select(Arg, Pred), Result) :-
  resSize(Arg, Card),
  selectivity(Pred, Sel),
  Size is Card * Sel,
  setNodeSize(Result, Size),
  assert(edgeSelectivity(Source, Target, Sel)).

assignSize(Source, Target, join(Arg1, Arg2, Pred), Result) :-
  resSize(Arg1, Card1),
  resSize(Arg2, Card2),
  selectivity(Pred, Sel),
  Size is Card1 * Card2 * Sel,
  setNodeSize(Result, Size),
  assert(edgeSelectivity(Source, Target, Sel)).

/*
----    setNodeSize(Node, Size) :-
----

Set the size of node ~Node~ to ~Size~ if no size has been assigned before.

*/

setNodeSize(Node, _) :- resultSize(Node, _), !.
setNodeSize(Node, Size) :- assert(resultSize(Node, Size)).

/*
----    resSize(Arg, Size) :-
----

Argument ~Arg~ has size ~Size~.

*/

resSize(arg(N), Size) :- argument(N, rel(Rel, _, _)), card(Rel, Size), !.
resSize(arg(N), _) :- write('Error in optimizer: cannot find cardinality for '),
  argument(N, Rel), wp(Rel), nl, fail.
resSize(res(N), Size) :- resultSize(N, Size), !.

/*
----    writeSizes :-
----

Write sizes and selectivities.

*/

writeSize :-
  resultSize(Node, Size),
  write('Node: '), write(Node), nl,
  write('Size: '), write(Size), nl, nl,
  fail.
writeSize :-
  edgeSelectivity(Source, Target, Sel),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Selectivity: '), write(Sel), nl, nl,
  fail.
writeSizes :- not(writeSize).

/*
----    deleteSizes :-
----

Delete node sizes and selectivities of edges.

*/

deleteSize :- retract(resultSize(_, _)), fail.
deleteSize :- retract(edgeSelectivity(_, _, _)), fail.
deleteSizes :- not(deleteSize).

/*
8 Computing Edge Costs for Plan Edges

8.1 The Costs of Terms

----    cost(Term, Sel, Size, Cost) :-
----

The cost of an executable ~Term~ representing a predicate with selectivity ~Sel~
is ~Cost~ and the size of the result is ~Size~.

This is evaluated recursively descending into the term. When the operator
realizing the predicate (e.g. ~filter~) is encountered, the selectivity ~Sel~ is
used to determine the size of the result. It is assumed that only a single
operator of this kind occurs within the term.

8.1.1 Arguments

*/

cost(Obj, Sel, Size, Cost) :-
  distributedRels(Rel, Obj, _, DistType, _, _),
  not(DistType = share),
  cost(Rel, Sel, Size, Cost).


cost(rel(Rel, _, _), _, Size, 0) :-
  card(Rel, Size).

cost(res(N), _, Size, 0) :-
  resultSize(N, Size).

/*
8.1.2 Operators

*/

cost(feed(X), Sel, S, C) :-
  cost(X, Sel, S, C1),
  feedTC(A),
  C is C1 + A * S.

/*
Here ~feedTC~ means ``feed tuple cost'', i.e., the cost per tuple, a constant to
be determined in experiments. These constants are kept in file ``Operators.pl''.

*/

cost(consume(X), Sel, S, C) :-
  cost(X, Sel, S, C1),
  consumeTC(A),
  C is C1 + A * S.

cost(filter(X, Pred), _, S, C) :-
  % This is special case for spatially distributed relations
  % we cannot determine the selectivity for the predicate because
  % it does not exist as a local relation on the master.
  % We assume verly little overlap in the spatial distribution.
  Pred=attr(original, l, u), !,
  cost(X, 1, SizeX, CostX),
  filterTC(A),
  S is SizeX * 0.9,
  C is CostX + A * SizeX.

cost(filter(X, _), Sel, S, C) :-
  cost(X, 1, SizeX, CostX),
  filterTC(A),
  S is SizeX * Sel,
  C is CostX + A * SizeX.


/*
For the moment we assume a cost of 1 for evaluating a predicate; this should be
changed shortly.

*/

cost(product(X, Y), _, S, C) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, 1, SizeY, CostY),
  productTC(A, B),
  S is SizeX * SizeY,
  C is CostX + CostY + SizeY * A + S * B.

cost(leftrange(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  leftrangeTC(C),
  Size is Sel * RelSize,
  Cost is Sel * RelSize * C.

cost(rightrange(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  leftrangeTC(C),
  Size is Sel * RelSize,
  Cost is Sel * RelSize * C.

/*

Simplistic cost estimation for loop joins.

If attribute values are assumed independent, then the selectivity
of a subquery appearing in an index join equals the overall
join selectivity. Therefore it is possible to estimate
the result size and cost of a subquery
(i.e. ~exactmatch~ and ~exactmatchfun~). As a subquery in an
index join is executed as often as a tuple from the left
input stream arrives, it is also possible to estimate the
overall index join cost.

*/

cost(exactmatchfun(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  exactmatchTC(A, B, C, D),
  Size is Sel * RelSize,
  Cost is A + B * (log10(RelSize) - C) +   % query cost
    Sel * RelSize * D.            % size of result

cost(exactmatch(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  exactmatchTC(A, B, C, D),
  Size is Sel * RelSize,
  Cost is A + B * (log10(RelSize) - C) +   % query cost
    Sel * RelSize * D.            % size of result

cost(loopjoin(X, Y), Sel, S, Cost) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, Sel, SizeY, CostY),
  S is SizeX * SizeY,
  loopjoinTC(A),
  Cost is CostX +      % producing the first argument
    SizeX * A +       % base cost for loopjoin
    SizeX * CostY.      % sum of query costs

cost(fun(_, X), Sel, Size, Cost) :-
  cost(X, Sel, Size, Cost).


cost(hashjoin(X, Y, _, _, 999997), Sel, S, C) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, 1, SizeY, CostY),
  hashjoinTC(A, B, D),
  S is SizeX * SizeY * Sel,
  C is CostX + CostY +      % producing the arguments
    A * SizeY +         % A - time [microsecond] per build
    B * SizeX +         % B - time per probe
    D * S.         % C - time per result tuple
            % table fits in memory assumed

cost(sortmergejoin(X, Y, _, _), Sel, S, C) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, 1, SizeY, CostY),
  sortmergejoinTC(A, B, D),
  S is SizeX * SizeY * Sel,
  C is CostX + CostY +       % producing the arguments
    A * (SizeX + SizeY) +   % sorting the arguments
    B * (SizeX + SizeY) +   % merge step
    D * S.                % cost of results

cost(mergejoin(X, Y, _, _), Sel, S, C) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, 1, SizeY, CostY),
  sortmergejoinTC(_, B, D),
  S is SizeX * SizeY * Sel,
  C is CostX + CostY +       % producing the arguments
    B * (SizeX + SizeY) +   % merge step
    D * S.                % cost of results

cost(extend(X, _), Sel, S, C) :-
  cost(X, Sel, S, C1),
  extendTC(A),
  C is C1 + A * S.

cost(remove(X, _), Sel, S, C) :-
  cost(X, Sel, S, C1),
  removeTC(A),
  C is C1 + A * S.

cost(project(X, _), Sel, S, C) :-
  cost(X, Sel, S, C1),
  projectTC(A),
  C is C1 + A * S.

cost(rename(X, _), Sel, S, C) :-
  cost(X, Sel, S, C1),
  renameTC(A),
  C is C1 + A * S.

%fapra 2015/16

% Taken from standard optimizer.
cost(itSpatialJoin(X, Y, _, _), Sel, S, C) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, 1, SizeY, CostY),
  itSpatialJoinTC(A, B),
  S is SizeX * SizeY * Sel,
  C is CostX + CostY +
  A * (SizeX + SizeY) +
  B * S.

cost(windowintersects(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  windowintersectsTC(A),
  Size is Sel * RelSize,
  Cost is Size * A.

cost(hashvalue(_,_), _, 1, 0).

cost(dmap(Obj, _, InnerPlan), Sel, S, C) :-
  distributedRels(LocalMasterRel, Obj, _, _, _),
  substituteSubterm(rel('.', *, u), LocalMasterRel, InnerPlan, LocalInnerPlan),
  cost(LocalInnerPlan, Sel, S, InnerC),
  !,
  C is InnerC * S.

cost(dmap(Obj, _, InnerPlan), Sel, S, C) :-
  substituteSubterm(rel('.', *, u), Obj, InnerPlan, LocalInnerPlan),
  cost(LocalInnerPlan, Sel, S, InnerC),
  !,
  C is InnerC * S.

% if we cannot determine cost of first dmap-argument
cost(dmap(_, _, X), Sel, S, C) :-
  cost(X, 1, SizeX, CostX),
  dmapTC(A),
  S is SizeX * Sel,
  C is CostX + A * SizeX.

 cost(dmap2(_, RelObj, _, InnerPlan, _), Sel, S, C) :-
  distributedRels(LocalMasterRel, RelObj, _, _, _),
  substituteSubterm(rel('..', *, u), LocalMasterRel,
    InnerPlan, LocalInnerPlan),
  dmap2TC(A),
  cost(LocalMasterRel, 1, Card, _),
  cost(LocalInnerPlan, Sel, _, InnerCost),
  !,
  S is Sel * Card,
  C is InnerCost + A * S.

% we have two d/farray-objects as arguments
cost(dmap2(RelObj1, RelObj2, _, InnerPlan, _), Sel, _, C) :-
  distributedRels(LocalMasterRel1, RelObj1, _, _, _),
  distributedRels(LocalMasterRel2, RelObj2, _, _, _),
  substituteSubterm(rel('.', *, u), LocalMasterRel1,
    InnerPlan, LocalInnerPlan1),
  substituteSubterm(rel('..', *, u), LocalMasterRel2,
    LocalInnerPlan1, LocalInnerPlan),
  dmap2TC(A),
  cost(LocalMasterRel1, 1, Card1, _),
  cost(LocalMasterRel2, 1, Card2, _),
  cost(LocalInnerPlan, Sel, _, InnerCost),
  !,
  S1 is Sel * Card1,
  S2 is Sel * Card2,
  C is InnerCost + A * S1 + A * S2.

% we have two d/farray-values as arguments
cost(dmap2(Arg1, Arg2, _, InnerPlan, _), Sel, _, C) :-
  cost(Arg1, _, _, C1),
  cost(Arg2, _, _, C2),
  substituteSubterm(rel('.', *, u), Arg1,
    InnerPlan, LocalInnerPlan1),
  substituteSubterm(rel('..', *, u), Arg2,
    LocalInnerPlan1, LocalInnerPlan),
  cost(LocalInnerPlan, Sel, _, InnerCost),
  dmap2TC(A),
  !,
  ArgS1 is Sel * C1,
  ArgS2 is Sel * C2,
  C is InnerCost + A * ArgS1 + A * ArgS2.

 cost(dmap2(RelObj1, RelObj2, _, InnerPlan, _), Sel, _, C) :-
 substituteSubterm(rel('.', *, u), "#!SUBST1!#", RelObj1, RelObj_Mod1),
  substituteSubterm(rel('.', *, u), "#!SUBST2!#", RelObj2, RelObj_Mod2),
  substituteSubterm(rel('.', *, u), RelObj_Mod1, InnerPlan, TempPlan1),
  substituteSubterm(rel('..', *, u), RelObj_Mod2, TempPlan1, TempPlan2),
  substituteSubterm( "#!SUBST1!#", rel('.',*,u),TempPlan2, TempPlan3),
  substituteSubterm( "#!SUBST2!#", rel('.',*,u),TempPlan3, FinallyGoodPlan),
  dmap2TC(A),
  cost(RelObj1, 1, Card1, _),
  cost(RelObj2, 1, Card2, _),
  cost(FinallyGoodPlan, Sel, _, InnerCost),
  !,
  S1 is Sel * Card1,
  S2 is Sel * Card2,
  C is InnerCost + A * S1 + A * S2.

% we have two d/fmatrix-values as arguments
cost(areduce2(Arg1, Arg2, _, InnerPlan, _), Sel, _, C) :-
  cost(Arg1, _, _, C1),
  cost(Arg2, _, _, C2),
  substituteSubterm(rel('.', *, u), Arg1,
    InnerPlan, LocalInnerPlan1),
  substituteSubterm(rel('..', *, u), Arg2,
    LocalInnerPlan1, LocalInnerPlan),
  cost(LocalInnerPlan, Sel, _, InnerCost),
  areduce2TC(A),
  !,
  ArgS1 is Sel * C1,
  ArgS2 is Sel * C2,
  C is InnerCost + A * ArgS1 + A * ArgS2.

 cost(collect2(InnerPlan, _ , _), Sel, S, C) :-
  cost(InnerPlan, Sel, S, InnerCost),
  collect2TC(A),
  C is InnerCost + A * S.

 cost(partitionF(RelObj, _, InnerPlan, _, _), Sel, S, C) :-
   distributedRels(LocalMasterRel, RelObj, _, _, _),
  substituteSubterm(rel('.', *, u), LocalMasterRel,
    InnerPlan, LocalInnerPlan),
  partitionFTC(A),
  cost(LocalMasterRel, 1, S, _),
  cost(LocalInnerPlan, Sel, _, InnerCost),
  !,
  C is (InnerCost + A) * S.

  % generic case
 cost(partitionF(RelObj, _, _, _), _, S, C) :-
  cost(RelObj, 1, RS, RC),
  partitionFTC(A),
  S is RS,
  C is RC + S * A.

 cost(extendstream(Stream, _, cellnumber(bbox(_), _)), _, S, C) :-
  cost(Stream, 1, S, StreamC),
  extendstreamTC(ETC),
  bboxTC(BTC),
  cellnumberTC(CTC),
  TC is  ETC + BTC + CTC,
  C is S * TC + StreamC.

cost(range(_, Rel, _, _), Sel, S, C) :-
  cost(Rel, 1, Card, _),
  S is Sel * Card,
  leftrangeTC(A),
  C is A * S.

cost(dloop2(_, RelObj, _, InnerPlan), Sel, S, C) :-
  distributedRels(LocalMasterRel, RelObj, _, _, _),
  substituteSubterm(rel('..', *, u), LocalMasterRel,
    InnerPlan, LocalInnerPlan),
  dloopTC(A),
  cost(LocalMasterRel, 1, Card, _),
  cost(LocalInnerPlan, Sel, _, InnerCost),
  !,
  S is Sel * Card,
  C is InnerCost + A * S.


/* dummy for dsummarize */
cost(dsummarize(_), _, _, 0).

cost(dsummarize(X), Sel, S, C) :-
  cost(X, Sel, S, C1),
  dsummarizeTC(A),
  C is C1 + A * S.

%end fapra 2015/16

/*
8.2 Creating Cost Edges

These are plan edges extended by a cost measure.

*/

% RHG 2014

costEdge(Source, Target, Term, Result, Size, Cost) :-
  planEdge(Source, Target, Term, Result),
  edgeSelectivity(Source, Target, Sel),
  cost(Term, Sel, Size, Cost).

% Version with properties

costEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result,
   Size, Cost) :-
  planEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result),
  edgeSelectivity(Source, Target, Sel),
  cost(Plan, Sel, Size, Cost).


% end RHG 2014

createCostEdge :-
  planEdge(Source, Target, Term, Result),
  edgeSelectivity(Source, Target, Sel),
  cost(Term, Sel, Size, Cost),
  assert(costEdge(Source, Target, Term, Result, Size, Cost)),
  fail.

createCostEdges :- not(createCostEdge).

deleteCostEdge :-
  retract(costEdge(_, _, _, _, _, _)), fail.

deleteCostEdges :- not(deleteCostEdge).

writeCostEdge :-
  costEdge(Source, Target, Plan, Result, Size, Cost),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Plan: '), wp(Plan), nl,
  write('Result: '), write(Result), nl,
  write('Size: '), write(Size), nl,
  write('Cost: '), write(Cost), nl,
  nl,
  ce(N), retract(ce(_)), N1 is N + 1, assert(ce(N1)),  % count edges
  fail.

writeCostEdges :-
  assert(ce(0)),
  not(writeCostEdge),
  ce(N),
  write('The total number of cost edges is '), write(N), write('.'), nl.

wce :- writeCostEdges.


writeCostEdgeUsed :-
  costEdgeUsed(Source, Version, Target, PropertiesIn, Plan, PropertiesOut,
   Result, Size, Cost),
  write('Source: ('), write(Source), write(', '), write(Version),
   write(')'), nl,
  write('Target: '), write(Target), nl,
  write('PropertiesIn: '), write(PropertiesIn), nl,
  write('Plan: '), wp(Plan), nl,
  write('PropertiesOut: '), write(PropertiesOut), nl,
  write('Result: '), write(Result), nl,
  write('Size: '), write(Size), nl,
  write('Cost: '), write(Cost), nl,
  nl,
  ceu(N), retract(ceu(_)), N1 is N + 1, assert(ceu(N1)),  % count edges
  fail.

writeCostEdgesUsed :-
  assert(ceu(0)),
  not(writeCostEdgeUsed),
  ceu(N),
  write('The total number of cost edges used is '), write(N), write('.'), nl.

wceu :- writeCostEdgesUsed.

deleteCostEdgeUsed :-
  retract(costEdgeUsed(_, _, _, _, _, _, _, _, _)), fail.

deleteCostEdgesUsed :- not(deleteCostEdgeUsed).


/*
----    assignCosts
----

This just puts together creation of sizes and cost edges.

*/

assignCosts :-
  assignSizes.
  % RHG 2014
  % createCostEdges.


/*
9 Finding Shortest Paths = Cheapest Plans

The cheapest plan corresponds to the shortest path through the predicate order
graph.

9.1 Shortest Path Algorithm by Dijkstra

We implement the shortest path algorithm by Dijkstra. There are two
relevant sets of nodes:

  * center: the nodes for which shortest paths have already been
computed

  * boundary: the nodes that have been seen, but that have not yet been
expanded. These need to be kept in a priority queue.

A node, as used during shortest path computation, is represented as a term

----    node(n(Name, Version), Distance, [Path, Properties])
----

where ~Name~ is the node number, ~Version~ a version number of this node, ~Distance~ the distance along the shortest path to this node, ~Path~ is the list of edges forming the shortest path, and ~Properties~ the physical properties (such as order) for the result obtained at this node version.

The graph is represented by the set of ~costEdges~.

The center is represented as a set of facts of the form

----    center(n(Name, Version), node(n(Name, Version), Distance, [Path, Properties]))
----

Since predicates are generally indexed by their first argument, finding a node
in the center via the node number should be very efficient. We assume it is
possible in constant time.

The boundary is represented by an abstract data type as described in the
interface below. Essentially it is a priority queue implementation.


----    successor(Node, Succ) :-
----

~Succ~ is a successor of node ~Node~ via some edge. This includes computation
of the distance and path of the successor.

*/

% RHG 2014

% successor(node(Source,Distance, Path), node(Target, Distance2, Path2)) :-
%   costEdge(Source, Target, Term, Result, Size, Cost),
%   assert(costEdgeUsed(Source, Target, Term, Result, Size, Cost)),
%   Distance2 is Distance + Cost,
%   append(Path, [costEdge(Source, Target, Term, Result, Size, Cost)], Path2).

% Version with properties

successor(node(n(Source, Version), Distance, [Path, PropertiesIn]),
   simplenode(Target, Distance2, [Path2, PropertiesOut])) :-
  costEdge(Source, Target, PropertiesIn, Plan, PropertiesOut, Result,
   Size, Cost),
  assert(costEdgeUsed(Source, Version, Target, PropertiesIn, Plan,
   PropertiesOut, Result, Size, Cost)),
  Distance2 is Distance + Cost,
  append(Path, [costEdge(Source, Target, Plan, Result, Size, Cost)], Path2).

% end RHG 2014


/*

----    dijkstra(Source, Dest, Path, Length) :-
----

The shortest path from ~Source~ to ~Dest~ is ~Path~ of length ~Length~.

*/

dijkstra(Source, Dest, Path, Length) :-
  emptyCenter,
  b_empty(Boundary),
  deleteCostEdgesUsed,   % RHG
  b_insert(Boundary, node(n(Source, 1), 0, [[], []]), Boundary1),
  dijkstra1(Boundary1, n(Dest, 1), 0, notfound),
  center(n(Dest, _), node(n(Dest, _), Length, [Path, _])).

emptyCenter :- not(emptyCenter1).

emptyCenter1 :- retract(center(_, _)), fail.


/*
----    dijkstra1(Boundary, Dest, NoOfCalls) :-
----

Compute the shortest paths to all nodes and store them in a predicate
~center~. Initially to be called with no fact ~center~ asserted, and ~Boundary~
just containing the start node.

For testing we check at which iteration the destination ~Dest~ is reached.

*/

dijkstra1(Boundary, _, _, found) :- !,
    tree_height(Boundary, H),
      write('Height of search tree for boundary is '), write(H), nl.

dijkstra1(Boundary, _, _, _) :- b_isEmpty(Boundary).

dijkstra1(Boundary, Dest, N, _) :-
%   nl, nl,
%   write('dijkstra1 called.'), nl,
%        write('Boundary = '), write(Boundary), nl, write('====='), nl,
  b_removemin(Boundary, Node, Bound2),
  Node = node(Name, _, _),
%   write('Node = '), write(Name), nl,
  assert(center(Name, Node)),
%        write('Center = '), writeCenter, nl, write('====='), nl,
  checkDest(Name, Dest, N, Found),
  putsuccessors(Bound2, Node, Bound3),
%   write('putsuccessors succeeded.'), nl,
  N1 is N+1,
  dijkstra1(Bound3, Dest, N1, Found).

checkDest(n(Name, _), n(Name, _), N, found) :- write('Destination node '),
   write(Name), write(' reached at iteration '), write(N), nl.

checkDest(_, _, _, notfound).


/*
Some auxiliary functions for testing:

*/

writeList([]).
writeList([X | Rest]) :- nl, nl, write('-----'), nl, write(X), writeList(Rest).

writeCenter :- not(writeCenter1).
writeCenter1 :-
  center(_, node(Name, Distance, Path)),
  write('Node: '), write(Name), nl,
  write('Cost: '), write(Distance), nl,
  write('Path: '), nl, write(Path), nl, fail.

writePath([]).
writePath([costEdge(Source, Target, Term, Result, Size, Cost) | Path]) :-
  write(costEdge(Source, Target, Result, Size, Cost)), nl,
  write('    '), wp(Term), nl,
  writePath(Path).

/*
----    putsuccessors(Boundary, Node, BoundaryNew) :-
----

Insert into ~Boundary~  all successors of node ~Node~ not yet present in
the center, updating their distance if they are already present, to obtain
~BoundaryNew~.

*/
putsuccessors(Boundary, Node, BoundaryNew) :-
  findall(Succ, successor(Node, Succ), Successors),

%    write('successors of '), write(Node), nl,
%    writeList(Successors), nl, nl,

  putsucc1(Boundary, Successors, BoundaryNew).

%    write('the new boundary is: '), write(BoundaryNew),
%    nl, write('====='), nl.

/*
----    putsucc1(Boundary, Successors, BoundaryNew) :-
----

put all successors not yet in the center from the list ~Successors~ into the
~Boundary~ to get ~BoundaryNew~. The cases to be distinguished are:

  * The list of successors is empty.

  * The first successor simplenode(N, \_, \_) is already in the center, hence the shortest path to it is already known and it does not need to be inserted into the boundary.

  * The first successor X = simplenode(N, \_, \_) exists in the boundary. That means, there exists a non-empty set V(N) with versions of N in the boundary. We say, X dominates Y iff the distance of X is less than or equal to that of Y and the properties of X include those of Y.

    * If X is not dominated by any Y in V(N), then insert X into the boundary.

    * If X dominates any Y in V(N), then remove Y from the boundary.

  * The first successor does not exist in the boundary. It is inserted.

*/

putsucc1(Boundary, [], Boundary).

putsucc1(Boundary, [simplenode(N, _, _) | Successors], BNew) :-
  center(n(N, 1), _), !,
  putsucc1(Boundary, Successors, BNew).

putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :-
  findall(Node, b_memberByName(Boundary, n(N, _), Node), Nodes),
  insertIfNotDominated(Boundary, simplenode(N, D, P), Nodes, 1, Boundary2),
  removeThoseDominated(Boundary2, simplenode(N, D, P), Nodes, Boundary3),
  putsucc1(Boundary3, Successors, BNew).

% putsucc1(Boundary, [simplenode(N, D, [_, Properties]) | Successors],
%   BNew) :-
%   b_memberByName(Boundary, n(N, 1), node(n(N, 1), DistOld, [_, Properties)),
%   DistOld =< D, !,
%   putsucc1(Boundary, Successors, BNew).

% putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :-
%   b_memberByName(Boundary, n(N, 1), node(n(N, 1), DistOld, _)),
%   D < DistOld, !,
%   b_deleteByName(Boundary, n(N, 1), Bound2),
%   b_insert(Bound2, node(n(N, 1), D, P), Bound3),
%   putsucc1(Bound3, Successors, BNew).

% the following not needed

% putsucc1(Boundary, [simplenode(N, D, P) | Successors], BNew) :-
%    nl,
%    write('putsucc1 called with final case'), nl,
%    write(simplenode(N, D, P)), nl,
%   b_insert(Boundary, node(n(N, 1), D, P), Bound2),
%   putsucc1(Bound2, Successors, BNew).


insertIfNotDominated(Boundary, simplenode(N, D, P), [], Version, BoundaryOut) :-
  b_insert(Boundary, node(n(N, Version), D, P), BoundaryOut).
%   nl, write('***** inserted '), write(node(n(N, Version), D, P)), nl.

insertIfNotDominated(Boundary, simplenode(N, D, [Path, Prop]),
  [node(n(N, V), DistOld, [_, PropOld]) | Nodes], Version, BoundaryOut) :-
  ( D < DistOld ; otherProperties(Prop, PropOld) ),   % not dominated
  ( V > Version
    -> Version2 is V + 1
    ; Version2 is Version + 1
  ),
  insertIfNotDominated(Boundary, simplenode(N, D, [Path, Prop]), Nodes,
    Version2, BoundaryOut).


insertIfNotDominated(Boundary, simplenode(N, D, [_, Prop]),
  [node(n(N, _), DistOld, [_, PropOld]) | _], _, Boundary) :-
% nl, write('***** NOT inserted '), write(simplenode(N, D, [Path, Prop])), nl,
  D >= DistOld,
  included(Prop, PropOld).   % is dominated and can be ignored.


removeThoseDominated(Boundary, simplenode(_, _, [_, _]), [], Boundary).

removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]),
  [node(n(N, _), DistOld, [_, PropOld]) | Nodes], Boundary2) :-
  ( DistOld =< D ; otherProperties(PropOld, Prop) ), !,   % not dominated
  removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]), Nodes,
    Boundary2).

removeThoseDominated(Boundary, simplenode(N, D, [Path, Prop]),
  [node(n(N, V), _, [_, _]) | Nodes], Boundary3) :-
  b_deleteByName(Boundary, n(N, V), Boundary2),
%   nl, write('***** deleted '), write(n(N, V)), nl,
  removeThoseDominated(Boundary2, simplenode(N, D, [Path, Prop]), Nodes,
    Boundary3).


:-dynamic noProperties/0.

included(_, _) :- noProperties, !.

included([[Node, List1] | Props1], Props2) :-
  select([Node, List2], Props2, Props2Rest),
  included2(List1, List2),
  included(Props1, Props2Rest).

included([], _).


included2([], _).

included2([P1 | Props1], Props2) :-
  select(P1, Props2, Props2Rest),
  included2(Props1, Props2Rest).

included2([none], _).


otherProperties(Props1, Props2) :-
  not(included(Props1, Props2)).


/*

9.2 Interface ~Boundary~

The boundary is represented in a data structure with the following
operations:

----    b_empty(-Boundary) :-
----

Creates an empty boundary and returns it.

----    b_isEmpty(+Boundary) :-
----

Checks whether the boundary is empty.


----    b_removemin(+Boundary, -Node, -BoundaryOut) :-
----

Returns the node ~Node~ with minimal distance from the set ~Boundary~ and
returns also ~BoundaryOut~ where this node is removed.

----    b_insert(+Boundary, +Node, -BoundaryOut) :-
----

Inserts a node that must not yet be present (i.e., no other node of that
name).

----    b_memberByName(+Boundary, +Name, -Node) :-
----

If a node ~Node~ with name ~Name~ is present, it is returned.

----     b_deleteByName(+Boundary, +Name, -BoundaryOut) :-
----

Returns the boundary, where the node with name ~Name~ is deleted.

*/

/*
9.3 Constructing the Plan from the Shortest Path

----    plan(Path, Plan)
----

The plan corresponding to ~Path~ is ~Plan~.

*/

%fapra 15/16

plan(Path, Plan) :-
  isDistributedQuery,
  !,
  deleteNodePlans,
  mergePlanEdges(Path, MergedPath),
  traversePath(MergedPath),
  highNode(N),
  nodePlan(N, Plan).

%end fapra 15/16

plan(Path, Plan) :-
  deleteNodePlans,
  traversePath(Path),
  highNode(N),
  nodePlan(N, Plan).


deleteNodePlans :- not(deleteNodePlan).

deleteNodePlan :- retract(nodePlan(_, _)), fail.

traversePath([]).

traversePath([costEdge(_, _, Term, Result, _, _) | Path]) :-
  embedSubPlans(Term, Term2),
  assert(nodePlan(Result, Term2)),
  traversePath(Path).

embedSubPlans(res(N), Term) :-
  nodePlan(N, Term), !.

embedSubPlans(Term, Term2) :-
  compound(Term), !,
  Term =.. [Functor | Args],
  embedded(Args, Args2),
  Term2 =.. [Functor | Args2].

embedSubPlans(Term, Term).


embedded([], []).

embedded([Arg | Args], [Arg2 | Args2]) :-
  embedSubPlans(Arg, Arg2),
  embedded(Args, Args2).

%fapra 15/16

/*

  ----    mergePlanEdges(PlanEdgeList, MergedEdgesList)
  ----

  Merge the distribution of a query on a distributed query result
  to the distribution of the query on a query result. Example:
  dmap(... filter(.,bla1)) dmap filter(., bla2)
  ...becomes:  dmap(... filter(filter(., bla1), bla2))

*/

mergePlanEdges([], []).
mergePlanEdges([X], [X]).

/*
  Merge rule for two successive dmaps with filtrations as there parameters
  should be the most common case.

*/

mergePlanEdges([Edge1, Edge2|Edges], MergedEdges) :-
  Edge1 = costEdge(Source, _, Plan1, Res1, _, C1),
  Edge2 = costEdge(_, Target, Plan2, Res2, S2, C2),
  Plan1 = dmap(Arg, _, filter(FilterArg, Pred1)),
  successiveFilterOnParam(FilterArg, ArgTerm),
  Plan2 = dmap(res(Res1), ResName, filter(ArgTerm, Pred2)),
  MergedPlan = dmap(Arg, ResName,
    filter(filter(FilterArg, Pred1), Pred2)),
  % the plan is already chosen at this point, so costs will have no influence
  MergedCosts is C1 + C2,
  MergedHead = costEdge(Source, Target, MergedPlan, Res2, S2, MergedCosts),
  mergePlanEdges([MergedHead|Edges], MergedEdges).

% First two edges cannot be merges according to the above rules.
mergePlanEdges([X|Tail], [X|MergedTail]) :-
  mergePlanEdges(Tail, MergedTail).

% Term is a dot or a nested filtration on a dot.
successiveFilterOnParam(Term, ArgTerm) :-
  functor(Term, filter, 2),
  arg(1, Term, FirstArg),
  successiveFilterOnParam(FirstArg, ArgTerm).

successiveFilterOnParam(Term, Term) :-
  Term = feed(rel('.', _, _)).

successiveFilterOnParam(Term, Term) :-
  Term = rename(feed(rel('.', _, _)), _).

%end fapra 15/16

% highestNode(Path, N) :-
%  reverse(Path, Path2),
%  Path2 = [costEdge(_, N, _, _, _, _) | _].


/*
9.4 Computing the Best Plan for a Given Predicate Order Graph

*/

bestPlan :-
  assignCosts,
  highNode(N),
  dijkstra(0, N, Path, Cost),
  plan(Path, Plan),
  write('The best plan is:'), nl, nl,
  wp(Plan),
  nl, nl,
  write('The cost is: '), write(Cost), nl.

bestPlan(Plan, Cost) :-
  assignCosts,
  highNode(N),
  dijkstra(0, N, Path, Cost),
  plan(Path, Plan).

/*
10 A Larger Example

It is now time to test efficiency with a larger example. We consider the query:

----    select *
    from Staedte, plz as p1, plz as p2, plz as p3,
    where SName = p1.Ort
      and p1.PLZ = p2.PLZ + 1
      and p2.PLZ = p3.PLZ * 5
      and Bev > 300000
      and Bev < 500000
      and p2.PLZ > 50000
       and p2.PLZ < 60000
      and Kennzeichen starts "W"
      and p3.Ort contains "burg"
      and p3.Ort starts "M"
----

This translates to:

*/

example6 :- pog(
  [rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)],
  [
    pr(attr(sName, 1, u) = attr(p1:ort, 2, u),
       rel(staedte, *, u), rel(plz, p1, l)),
    pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1),
            rel(plz, p1, l), rel(plz, p2, l)),
    pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5),
            rel(plz, p2, l), rel(plz, p3, l)),
    pr(attr(bev, 1, u) > 300000,  rel(staedte, *, u)),
    pr(attr(bev, 1, u) < 500000,  rel(staedte, *, u)),
    pr(attr(p2:pLZ, 1, u) > 50000,  rel(plz, p2, l)),
    pr(attr(p2:pLZ, 1, u) < 60000,  rel(plz, p2, l)),
    pr(attr(kennzeichen, 1, u) starts "W",  rel(staedte, *, u)),
    pr(attr(p3:ort, 1, u) contains "burg",  rel(plz, p3, l)),
    pr(attr(p3:ort, 1, u) starts "M",  rel(plz, p3, l))
  ],
  _, _).

/*
This doesn't work (initially, now it works). Let's keep the numbers a bit
smaller and avoid too many big joins first.

*/
example7 :- pog(
  [rel(staedte, *, u), rel(plz, p1, l)],
  [
    pr(attr(sName, 1, u) = attr(p1:ort, 2, u),
       rel(staedte, *, u), rel(plz, p1, l)),
    pr(attr(bev, 0, u) > 300000,  rel(staedte, *, u)),
    pr(attr(bev, 0, u) < 500000,  rel(staedte, *, u)),
    pr(attr(p1:pLZ, 0, u) > 50000,  rel(plz, p1, l)),
    pr(attr(p1:pLZ, 0, u) < 60000,  rel(plz, p1, l)),
    pr(attr(kennzeichen, 0, u) starts "F",  rel(staedte, *, u)),
    pr(attr(p1:ort, 0, u) contains "burg",  rel(plz, p1, l)),
    pr(attr(p1:ort, 0, u) starts "M",  rel(plz, p1, l))
  ],
  _, _).

example8 :- pog(
  [rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l)],
  [
    pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u),
       rel(plz, p1, l)),
    pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), rel(plz, p1, l),
        rel(plz, p2, l)),
    pr(attr(bev, 0, u) > 300000,  rel(staedte, *, u)),
    pr(attr(bev, 0, u) < 500000,  rel(staedte, *, u)),
    pr(attr(p1:pLZ, 0, u) > 50000,  rel(plz, p1, l)),
    pr(attr(p1:pLZ, 0, u) < 60000,  rel(plz, p1, l)),
    pr(attr(kennzeichen, 0, u) starts "F",  rel(staedte, *, u)),
    pr(attr(p1:ort, 0, u) contains "burg",  rel(plz, p1, l)),
    pr(attr(p1:ort, 0, u) starts "M",  rel(plz, p1, l))
  ],
  _, _).

/*
Let's study a small example again with two independent conditions.

*/

example9 :- pog([rel(staedte, s, u), rel(plz, p, l)],
  [pr(attr(p:ort, 2, u) = attr(s:sName, 1, u),
    rel(staedte, s, u), rel(plz, p, l) ),
   pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l)),
   pr(attr(s:bev, 0, u) > 300000, rel(staedte, s, u))], _, _).

example10 :- pog(
  [rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)],
  [
    pr(attr(sName, 1, u) = attr(p1:ort, 2, u), rel(staedte, *, u),
       rel(plz, p1, l)),
    pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), rel(plz, p1, l),
       rel(plz, p2, l)),
    pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5), rel(plz, p2, l),
       rel(plz, p3, l))
  ],
  _, _).

/*
11 A User Level Language

We have started to construct the optimizer by building the predicate order
graph, using a notation for relations and predicates as useful for that
purpose. Later, in [Section Translation], we have adapted the notation to be
able to translate and construct query plans as needed in Secondo. In this
section we will introduce a more user friendly notation for queries, pretty
similar to SQL, but suitable for being written directly in PROLOG.

11.1 The Language

The basic select-from-where statement will be written as

----    select <attr-list>
    from <rel-list>
    where <pred-list>
----

The first example query from [Section 4.1.1] can then be written as:

----    select [sname, bev]
    from [staedte]
    where [bev > 500000]
----

Instead of lists consisting of a single element we will also support writing
just the element, hence the query can also be written:

----    select [sname, bev]
    from staedte
    where bev > 500000
----

The second query can be written as:

----    select *
    from [staedte as s, plz as p]
    where [sname = p:ort, p:plz > 40000]
----

Note that all relation names and attribute names are written just in lower
case; the system will lookup the spelling in a table.

Furthermore, it will be possible to add a groupby- and an orderby-clause:

  * groupby

----    select <aggr-list>
    from <rel-list>
    where <pred-list>
    groupby <group-attr-list>
----

Example:

----
    select [ort, min(plz) as minplz, max(plz) as maxplz,  count(*) as cntplz]
    from plz
    where plz > 40000
    groupby ort
----

  * orderby

----    select <attr-list>
    from <rel-list>
    where <pred-list>
    orderby <order-attr-list>
----

Example:

----    select [ort, plz]
    from plz
    orderby [ort asc, plz desc]
----

This example also shows that the where-clause may be omitted. It is also
possible to combine grouping and ordering:

----
    select [ort, min(plz) as minplz, max(plz) as maxplz,  count(*) as cntplz]
    from plz
    where plz > 40000
    groupby ort
    orderby cntplz desc
----

Currently only a basic part of this language has been implemented.


11.2 Structure

We introduce ~select~, ~from~, ~where~, and ~as~ as PROLOG operators:

*/

:- op(990, fx, sql).
:- op(985, xfx, >>).
:- op(950, fx, select).
:- op(960, xfx, from).
:- op(950, xfx, where).
:- op(930, xfx, as).
:- op(970, xfx, groupby).
:- op(980, xfx, orderby).
:- op(930, xf, asc).
:- op(930, xf, desc).

/*
This ensures that the select-from-where statement is viewed as a term with the
structure:

----    from(select(AttrList(), where(RelList, PredList))
----

That this works, can be tested with:

----    P = (select s:sname from staedte as s where s:bev > 500000),
    P = (X from Y), X = (select AttrList), Y = (RelList where PredList),
    RelList = (Rel as Var).
----

The result is:

----    P = select s:sname from staedte as s where s:bev>500000
    X = select s:sname
    Y = staedte as s where s:bev>500000
    AttrList = s:sname
    RelList = staedte as s
    PredList = s:bev>500000
    Rel = staedte
    Var = s
----

11.3 Schema Lookup

The second task is to lookup attribute names in order to build the input
notation for the construction of the predicate order graph.

11.3.1 Tables

In the file ~database~ we maintain the following tables.

Relation schemas are written as:

----    relation(staedte, [sname, bev, plz, vorwahl, kennzeichen]).
    relation(plz, [plz, ort]).
----

The spelling of relation or attribute names is given in a table

----    spelling(staedte:plz, pLZ).
    spelling(staedte:sname, sName).
    spelling(plz, lc(plz)).
    spelling(plz:plz, pLZ).
----

The default assumption is that the first letter of a name is upper case and all
others are lower case. If this is true, then no entry in the table ~spelling~
is needed. If a name starts with a lower case letter, then this is expressed by
the functor ~lc~.

11.3.2 Looking up Relation and Attribute Names

*/

callLookup(Query, Query2) :-
  newQuery,
  lookup(Query, Query2), !.

%fapra 2015/16

/*
added clearIsDistributedQuery

*/

newQuery :- not(clearVariables), not(clearQueryRelations),
  not(clearQueryAttributes), not(clearIsDistributedQuery),
  not(clearIsLocalQuery).

clearVariables :- retract(variable(_, _)), fail.

clearQueryRelations :- retract(queryRel(_, _)), fail.

clearQueryAttributes :- retract(queryAttr(_)), fail.

clearIsDistributedQuery :- retract(isDistributedQuery), fail.

clearIsLocalQuery :- retract(isLocalQuery), fail.

%end fapra 2015/16

/*

----    lookup(Query, Query2) :-
----

~Query2~ is a modified version of ~Query~ where all relation names and
attribute names have the form as required in [Section Translation].

*/

lookup(select Attrs from Rels where Preds,
    select Attrs2 from Rels2List where Preds2List) :-
  lookupRels(Rels, Rels2),
  checkDistributedQuery,
  lookupAttrs(Attrs, Attrs2),
  lookupPreds(Preds, Preds2),
  makeList(Rels2, Rels2List),
  makeList(Preds2, Preds2List).

lookup(select Attrs from Rels,
    select Attrs2 from Rels2) :-
  lookupRels(Rels, Rels2),
  checkDistributedQuery,
  lookupAttrs(Attrs, Attrs2).

lookup(Query orderby Attrs, Query2 orderby Attrs3) :-
  lookup(Query, Query2),
  makeList(Attrs, Attrs2),
  lookupAttrs(Attrs2, Attrs3).

lookup(Query groupby Attrs, Query2 groupby Attrs3) :-
  lookup(Query, Query2),
  makeList(Attrs, Attrs2),
  lookupAttrs(Attrs2, Attrs3).


makeList(L, L) :- is_list(L).

makeList(L, [L]) :- not(is_list(L)).

/*

11.3.3 Modification of the From-Clause

----    lookupRels(Rels, Rels2)
----

Modify the list of relation names. If there are relations without variables,
store them in a table ~queryRel~. Any two such relations must have distinct
sets of attribute names. Also, any two variables must be distinct.

*/

lookupRels([], []).

lookupRels([R | Rs], [R2 | R2s]) :-
  lookupRel(R, R2),
  lookupRels(Rs, R2s).

lookupRels(Rel, Rel2) :-
  not(is_list(Rel)),
  lookupRel(Rel, Rel2).

/*
----    lookupRel(Rel, Rel2) :-
----

Translate and store a single relation definition.

*/

:- dynamic
  variable/2,
  queryRel/2,
  queryAttr/1.

lookupRel(Rel as Var, rel(Rel2, Var, Case)) :-
  removeDistributedSuffix(Rel,DRel),
  relation(DRel, _), !,
  spelled(DRel, Rel2, Case),
  not(defined(Var)),
  assert(variable(Var, rel(Rel2, Var, Case))).

lookupRel(Rel, rel(Rel2, *, Case)) :-
  removeDistributedSuffix(Rel,DRel),
  relation(DRel, _), !,
  spelled(DRel, Rel2, Case),
  not(duplicateAttrs(Rel)),
  assert(queryRel(DRel, rel(Rel2, *, Case))).

lookupRel(Term, Term) :-
  write('Error in query: relation '), write(Term), write(' not known'),
  nl, fail.

defined(Var) :-
  variable(Var, _),
  write('Error in query: doubly defined variable '), write(Var), write('.'), nl.


%fapra 2015/16

/*
Checks if all relations are distributed. Currently the
optimizer can only handle queries including relations, that
are all local or distributed. Situations with mixed
relationtypes will be discarded.

*/

%handle not distributed queries
checkDistributedQuery :-
   not(isDistributedQuery),
   isLocalQuery,
   !.

checkDistributedQuery :-
   isDistributedQuery,
   not(isLocalQuery),
   !.

checkDistributedQuery :-
  write('Error in query: not all relations distributed '),
  fail,
  !.

%end fapra 2015/16

/*
----    duplicateAttrs(Rel) :-
----

There is a relation stored in ~queryRel~ that has attribute names also
occurring in ~Rel~.

*/

duplicateAttrs(Rel) :-
  queryRel(Rel2, _),
  relation(Rel2, Attrs2),
  member(Attr, Attrs2),
  relation(Rel, Attrs),
  member(Attr, Attrs),
  write('Error in query: duplicate attribute names in relations '),
  write(Rel2), write(' and '), write(Rel), write('.'), nl.

/*
11.3.4 Modification of the Select-Clause

*/

lookupAttrs([], []).

lookupAttrs([A | As], [A2 | A2s]) :-
  lookupAttr(A, A2),
  lookupAttrs(As, A2s).

lookupAttrs(Attr, Attr2) :-
  not(is_list(Attr)),
  lookupAttr(Attr, Attr2).

lookupAttr(Var:Attr, attr(Var:Attr2, 0, Case)) :- !,
  variable(Var, Rel2),
  Rel2 = rel(Rel, _, _),
  spelled(Rel:Attr, attr(Attr2, _, Case)).

lookupAttr(Attr asc, Attr2 asc) :- !,
  lookupAttr(Attr, Attr2).

lookupAttr(Attr desc, Attr2 desc) :- !,
  lookupAttr(Attr, Attr2).

lookupAttr(Attr, Attr2) :-
  isAttribute(Attr, Rel), !,
  spelled(Rel:Attr, Attr2).

lookupAttr(*, *) :- !.

lookupAttr(count(*), count(*)) :- !.

lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :-
  lookupAttr(Expr, Expr2),
  not(queryAttr(attr(Name, 0, u))),
  !,
  assert(queryAttr(attr(Name, 0, u))).

lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :-
  lookupAttr(Expr, Expr2),
  queryAttr(attr(Name, 0, u)),
  !,
  write('***** Error: attribute name '), write(Name),
  write(' doubly defined in query.'),
  nl.

lookupAttr(Term, Term2) :-
  compound(Term),
  functor(Term, Op, 1),
  arg(1, Term, Arg1),
  lookupAttr(Arg1, Res1),
  functor(Term2, Op, 1),
  arg(1, Term2, Res1).

lookupAttr(Name, attr(Name, 0, u)) :-
  queryAttr(attr(Name, 0, u)),
  !.

lookupAttr(Name, Name) :-
  write('Error in attribute list: could not recognize '), write(Name), nl, fail.

isAttribute(Name, Rel) :-
  queryRel(Rel, _),
  relation(Rel, List),
  member(Name, List).


/*
11.3.5 Modification of the Where-Clause

*/

lookupPreds([], []).

lookupPreds([P | Ps], [P2 | P2s]) :- !,
  lookupPred(P, P2),
  lookupPreds(Ps, P2s).

lookupPreds(Pred, Pred2) :-
  not(is_list(Pred)),
  lookupPred(Pred, Pred2).


lookupPred(Pred, pr(Pred2, Rel)) :-
  lookupPred1(Pred, Pred2, 0, [], 1, [Rel]), !.

lookupPred(Pred, pr(Pred2, Rel1, Rel2)) :-
  lookupPred1(Pred, Pred2, 0, [], 2, [Rel1, Rel2]), !.

lookupPred(Pred, _) :-
  lookupPred1(Pred, _, 0, [], 0, []),
  write('Error in query: constant predicate is not allowed.'), nl, fail, !.

lookupPred(Pred, _) :-
  lookupPred1(Pred, _, 0, [], N, _),
  N > 2,
  write('Error in query: predicate involving more than two relations '),
  write('is not allowed.'), nl, fail.

/*
----    lookupPred1(+Pred, Pred2, +N, +RelsBefore, -M, -RelsAfter) :-
----

~Pred2~ is the transformed version of ~Pred~; before this is called, ~N~
attributes in list ~RelsBefore~ have been found; after the transformation in
total ~M~ attributes referring to the relations in list ~RelsAfter~ have been
found.

*/

lookupPred1(Var:Attr, attr(Var:Attr2, N1, Case), N, RelsBefore, N1, RelsAfter)
  :-
  variable(Var, Rel2), !,   Rel2 = rel(Rel, _, _),
  spelled(Rel:Attr, attr(Attr2, _, Case)),
  N1 is N + 1,
  append(RelsBefore, [Rel2], RelsAfter).

lookupPred1(Attr, attr(Attr2, N1, Case), N, RelsBefore, N1, RelsAfter) :-
  isAttribute(Attr, Rel), !,
  spelled(Rel:Attr, attr(Attr2, _, Case)),
  queryRel(Rel, Rel2),
  N1 is N + 1,
  append(RelsBefore, [Rel2], RelsAfter).

lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
  compound(Term),
  functor(Term, F, 1), !,
  arg(1, Term, Arg1),
  lookupPred1(Arg1, Arg1Out, N, RelsBefore, M, RelsAfter),
  functor(Term2, F, 1),
  arg(1, Term2, Arg1Out).

lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
  compound(Term),
  functor(Term, F, 2), !,
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  lookupPred1(Arg1, Arg1Out, N, RelsBefore, M1, RelsAfter1),
  lookupPred1(Arg2, Arg2Out, M1, RelsAfter1, M, RelsAfter),
  functor(Term2, F, 2),
  arg(1, Term2, Arg1Out),
  arg(2, Term2, Arg2Out).

lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
  compound(Term),
  functor(Term, F, 3), !,
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  arg(3, Term, Arg3),
  lookupPred1(Arg1, Arg1Out, N, RelsBefore, M1, RelsAfter1),
  lookupPred1(Arg2, Arg2Out, M1, RelsAfter1, M2, RelsAfter2),
  lookupPred1(Arg3, Arg3Out, M2, RelsAfter2, M, RelsAfter),
  functor(Term2, F, 3),
  arg(1, Term2, Arg1Out),
  arg(2, Term2, Arg2Out),
  arg(3, Term2, Arg3Out).

% may need to be extended to operators with more than three arguments.

%fapra 2015/16

/*
Lookup generic, non- relation objects.

If ~Term~ is a secondo object, so mark it with the the functor
 ~obj(Term,Type,Case)~. Where ~Term~ is the identifier starting with
 a lower case character and type the kind of object. ~Case~ indicates if the
 object names first letter is written with a capital letter or not (u,l).

*/

lookupPred1(Term, ObjTerm, N, Rels, N, Rels) :-
  atom(Term),
  not(is_list(Term)),
  spelledObj(Term,Obj,Type,Case),
  ObjTerm = obj(Obj,Type,Case),
  !.

lookupPred1(Term, Term, N, Rels, N, Rels) :-
  atom(Term),
  not(is_list(Term)),
  write('Symbol '), write(Term),
  write(' not recognized, supposed to be a Secondo object.'), nl, !.

lookupPred1(Term, Term, N, Rels, N, Rels).

%end fapra 2015/16

/*
11.3.6 Check the Spelling of Relation and Attribute Names

*/

spelled(Rel:Attr, attr(Attr2, 0, l)) :-
  downcase_atom(Rel, DCRel),
  downcase_atom(Attr, DCAttr),
  spelling(DCRel:DCAttr, Attr3),
  Attr3 = lc(Attr2),
  !.

spelled(Rel:Attr, attr(Attr2, 0, u)) :-
  downcase_atom(Rel, DCRel),
  downcase_atom(Attr, DCAttr),
  spelling(DCRel:DCAttr, Attr2),
  !.

spelled(_:_, attr(_, 0, _)) :- !, fail. % no attr entry in spelling table

spelled(Rel, Rel2, l) :-
  downcase_atom(Rel, DCRel),
  spelling(DCRel, Rel3),
  Rel3 = lc(Rel2),
  !.

spelled(Rel, Rel2, u) :-
  downcase_atom(Rel, DCRel),
  spelling(DCRel, Rel2), !.

% if we do not get a spelling hint,
% assume it was spelled correctly

spelled(Rel, Rel, u) :-
  atom_chars(Rel, [FirstChar|_]),
  char_type(FirstChar, upper),
  write('spelling of '),
  write(Rel),
  write(' could not be determined. Assume it is spelled uppercase'), !.

spelled(Rel, Rel, l) :-
  atom_chars(Rel, [FirstChar|_]),
  char_type(FirstChar, lower),
  write('spelling of '),
  write(Rel),
  write(' could not be determined. Assume it is spelled uppercase'), !.

spelled(_, _, _) :- !, fail.  % no rel entry in spelling table.

%fapra 2015/16

/*
11.3.7 Check the spelling of non-relation objects

*/

spelledObj(Term, Obj, Type, l) :-
  downcase_atom(Term, DcObj),
  objectCatalog(DcObj, LcObj, Type),
  LcObj = lc(Obj),
  !.

spelledObj(Term, Obj, Type, u) :-
  downcase_atom(Term, DcObj),
  objectCatalog(DcObj, Obj, Type),
  !.

spelledObj(_, _, _, _) :- !, fail.  % no entry, avoid backtracking.

%end fapra 2015/16

/*
10.3.8 Examples

We can now formulate several of the previous queries at the user level.

*/

example11 :- showTranslate(select [sname, bev] from staedte where bev > 500000).

showTranslate(Query) :-
  callLookup(Query, Query2),
  write(Query), nl,
  write(Query2), nl.

example12 :- showTranslate(
  select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000]
  ).

example13 :- showTranslate(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5,
    bev > 300000,
    bev < 500000,
    p2:plz > 50000,
    p2:plz < 60000,
    kennzeichen starts "W",
    p3:ort contains "burg",
    p3:ort starts "M"]
  ).

/*
11.4 Translating a Query to a Plan

----    translate(Query, Stream, SelectClause, Cost) :-
----

~Query~ is translated into a ~Stream~ to which still the translation of the
~SelectClause~ needs to be applied. A ~Cost~ is returned which currently is
only the cost for evaluating the essential part, the conjunctive query.

*/

translate(Query orderby Attrs, sortby(Stream, AttrNames), Select, 0) :-
  !,
  translate(Query, Stream, Select, _),
  attrnamesSort(Attrs, AttrNames).

translate(Query groupby Attrs,
    groupby(sortby(Stream, AttrNamesSort), AttrNamesGroup, Fields),
    select Select2, Cost) :-
  translate(Query, Stream, SelectClause, Cost),
  makeList(Attrs, Attrs2),
  attrnames(Attrs2, AttrNamesGroup),
  attrnamesSort(Attrs2, AttrNamesSort),
  SelectClause = (select Select),
  makeList(Select, SelAttrs),
  translateFields(SelAttrs, Attrs2, Fields, Select2),
  !.

translate(Select from Rels where Preds, Stream, Select, Cost) :-
  pog(Rels, Preds, _, _),
  bestPlan(Stream, Cost),
  !.

%fapra 2015/16

translate(Select from Rel, feed(Rel), Select, 0) :-
  not(isDistributedQuery),
  not(is_list(Rel)),
  !.

translate(Select from Rel, ObjName,Select, 0) :-
  isDistributedQuery,
  distributedRels(Rel, ObjName, _, _, _),
  not(is_list(Rel)),
  !.

translate(Select from Rel, dist(Rel,ObjName),Select, 0) :-
  isDistributedQuery,
  distributedRels(Rel, ObjName, _, _, _),
  not(is_list(Rel)),
  !.

translate(Select from [Rel], feed(Rel), Select, 0).

translate(Select from [Rel | Rels], product(feed(Rel), Stream), Select, 0) :-
  not(isDistributedQuery),
  translate(Select from Rels, Stream, Select, _).

%end fapra 2015/16

/*
----    translateFields(Select, GroupAttrs, Fields, Select2) :-
----

Translate the ~Select~ clause of a query containing ~groupby~. Grouping
was done by the attributes ~GroupAttrs~. Return a list ~Fields~ of terms
of the form ~field(Name, Expr)~; such a list can be used as an argument to the
groupby operator. Also, return a modified select clause ~Select2~,
which will translate to a corresponding projection operation.

*/

translateFields([], _, [], []).

translateFields([count(*) as NewAttr | Select], GroupAttrs,
    [field(NewAttr , count(feed(group))) | Fields], [NewAttr | Select2]) :-
  translateFields(Select, GroupAttrs, Fields, Select2),
  !.

translateFields([sum(Attr) as NewAttr | Select], GroupAttrs,
    [field(NewAttr, sum(feed(group), attrname(Attr))) | Fields],
    [NewAttr| Select2]) :-
  translateFields(Select, GroupAttrs, Fields, Select2),
  !.

translateFields([Attr | Select], GroupAttrs, Fields, [Attr | Select2]) :-
  member(Attr, GroupAttrs),
  !,
  translateFields(Select, GroupAttrs, Fields, Select2).


/*
Generic rule for aggregate functions, similar to sum.

*/

translateFields([Term as NewAttr | Select], GroupAttrs,
    [field(NewAttr, Term2) | Fields],
    [NewAttr| Select2]) :-
  compound(Term),
  functor(Term, AggrOp, 1),
  arg(1, Term, Attr),
  member(AggrOp, [min, max, avg]),
  functor(Term2, AggrOp, 2),
  arg(1, Term2, feed(group)),
  arg(2, Term2, attrname(Attr)),
  translateFields(Select, GroupAttrs, Fields, Select2),
  !.

translateFields([Term | Select], GroupAttrs,
    Fields,
    Select2) :-
  compound(Term),
  functor(Term, AggrOp, 1),
  arg(1, Term, Attr),
  member(AggrOp, [count, sum, min, max, avg]),
  functor(Term2, AggrOp, 2),
  arg(1, Term2, feed(group)),
  arg(2, Term2, attrname(Attr)),
  translateFields(Select, GroupAttrs, Fields, Select2),
  write('*****'), nl,
  write('***** Error in groupby: missing name for new attribute'), nl,
  write('*****'), nl,
  !.


translateFields([Attr | Select], GroupAttrs, Fields, Select2) :-
  not(member(Attr, GroupAttrs)),
  !,
  translateFields(Select, GroupAttrs, Fields, Select2),
  write('*****'), nl,
  write('***** Error in groupby: '),
  write(Attr),
  write(' is neither a grouping attribute'), nl,
  write('      nor an aggregate expression.'), nl,
  write('*****'), nl.

%fapra 15/16

% Extract parts from a query
destructureQuery(Select from Rel where Pred, Select, Rel, Pred).

% Pred is a predicate about the value of an attribute being equal to given value
attrValueEqualityPredicate(Pred, Value, Attr, Rel) :-
  Pred = pr(Value = Attr, Rel),
  Attr = attr(_, _, _).

attrValueEqualityPredicate(Pred, Value, Attr, Rel) :-
  Pred = pr(Attr = Value, Rel),
  Attr = attr(_, _, _).

/*

----   substituteSubterm(Substituted, Substitute, OriginalTerm, TermWithSubstitution)
----

Substituting ~Substituted~ for ~Substitute~ on ~OriginalTerm~ yields ~TermWithSubstitution~.
We have a cut in every clause to remove unnecessary choice points
during the search for planedges, which ois driven by meta predicates.

*/

% The whole term is to be substituted:
substituteSubterm(Substituted, Substitute, Substituted, Substitute):- !.

% The whole term doesn't match and it's not compound:
substituteSubterm(Substituted, _, OriginalTerm, OriginalTerm) :-
  functor(OriginalTerm, _, 0),
  OriginalTerm \= Substituted, !.

% The whole term doesn't match and it's compount - dive into its subterms:
substituteSubterm(Substituted, Substitute, OriginalTerm,
  TermWithSubstitution) :-
  functor(OriginalTerm, Functor, Arity),
  functor(TermWithSubstitution, Functor, Arity),
  substituteSubtermInNthSubterm(Arity, Substituted,
    Substitute, OriginalTerm, TermWithSubstitution), !.

% Terminal case. All subterms have been processed.
substituteSubtermInNthSubterm(0, _, _, _, _):- !.

% Generic case. Process nth subterm.
substituteSubtermInNthSubterm(N, Substituted, Substitute,
  OriginalTerm, TermWithSubstitution) :-
  not(N = 0),
  arg(N, OriginalTerm, OriginalNthTerm),
  substituteSubterm(Substituted, Substitute,
    OriginalNthTerm, NthTermWithSubstitution),
  arg(N, TermWithSubstitution, NthTermWithSubstitution),
  Next is N - 1,
  substituteSubtermInNthSubterm(Next, Substituted,
    Substitute, OriginalTerm, TermWithSubstitution), !.


/*

----    queryToPlan(Query, Plan, Cost) :-
----

Translate the ~Query~ into a ~Plan~. The ~Cost~ for evaluating the conjunctive
query is also returned. The ~Query~ must be such that relation and attribute
names have been looked up already.

fapra 15/16:
We have a duplicate of each non-distributed clause which treats the distributed case. These
clauses are guard with an isDistributedQuery goal.
end fapra 15/16

*/

queryToPlan(Query, consume(dsummarize(Stream)), Cost) :-
  selectClause(Query, *),
  isDistributedQuery,
  !,
  translate(Query, Stream, select *, Cost).

queryToPlan(Query, consume(Stream), Cost) :-
  selectClause(Query, *),
  !,
  translate(Query, Stream, select *, Cost).

queryToPlan(Query, count(dsummarize(Stream)), Cost) :-
  selectClause(Query, count(*)),
  isDistributedQuery,
  !,
  translate(Query, Stream, select count(*), Cost).

queryToPlan(Query, count(Stream), Cost) :-
  selectClause(Query, count(*)),
  !,
  translate(Query, Stream, select count(*), Cost).

%TF: changed to execute projection in dmap operator
queryToPlan(Query, consume(dsummarize(dmap(Stream," ",
  project(Plan,AttrNames)))), Cost) :-
  isDistributedQuery,
  !,
  translate(Query, dist(rel(_,Var,_),Stream), select Attrs, Cost), !,
  feedRenameRelation(rel(dot,Var,_),Plan),
  makeList(Attrs, Attrs2),
  attrnames(Attrs2, AttrNames).

queryToPlan(Query, consume(project(Stream, AttrNames)), Cost) :-
  translate(Query, Stream, select Attrs, Cost), !,
  makeList(Attrs, Attrs2),
  attrnames(Attrs2, AttrNames).

%end fapra 15/16

/*

----    queryToStream(Query, Plan, Cost) :-
----

Same as ~queryToPlan~, but returns a stream plan, if possible. To be used for
``mixed queries'' that add Secondo operators to the plan built by the optimizer.

*/

queryToStream(Query,  Stream, Cost) :-
  selectClause(Query, *),
  translate(Query, Stream, select *, Cost), !.

queryToStream(Query, count(Stream), Cost) :-
  selectClause(Query, count(*)),
  translate(Query, Stream, select count(*), Cost), !.

queryToStream(Query,  project(Stream, AttrNames), Cost) :-
  translate(Query, Stream, select Attrs, Cost), !,
  makeList(Attrs, Attrs2),
  attrnames(Attrs2, AttrNames).


/*
----    selectClause(Query, C) :-
----

The select-clause of the ~Query~ is ~C~.

*/
% allows select [count(*)] to succeed. Activate later on in development.
%selectClause(select [X] from Y, Z) :-
%  selectClause(select X from Y, Z).

selectClause(select * from _, *) :- !.

selectClause(select count(*) from _, count(*)) :- !.

selectClause(select Attrs from _, Attrs) :- !.

selectClause(Query groupby _, C) :- !,
  selectClause(Query, C).

selectClause(Query orderby _, C) :- !,
  selectClause(Query, C).


/*

----    attrnames(Attrs, AttrNames) :-
----

Transform each attribute X into attrname(X).

*/

attrnames([], []).

attrnames([Attr | Attrs], [attrname(Attr) | AttrNames]) :-
  attrnames(Attrs, AttrNames).

/*

----    attrnamesSort(Attrs, AttrNames) :-
----

Transform attribute names of orderby clause.

*/

attrnamesSort([], []).

attrnamesSort([Attr | Attrs], [Attr2 | Attrs2]) :-
  attrnameSort(Attr, Attr2),
  attrnamesSort(Attrs, Attrs2).

attrnameSort(Attr asc, attrname(Attr) asc) :- !.

attrnameSort(Attr desc, attrname(Attr) desc) :- !.

attrnameSort(Attr, attrname(Attr) asc).


/*


11.3.8 Integration with Optimizer

----    optimize(Query).
----

Optimize ~Query~ and print the best ~Plan~.

*/

optimize(Query) :-
  callLookup(Query, Query2),
  queryToPlan(Query2, Plan, Cost),
  writeln(Plan),
  plan_to_atom_string(Plan, SecondoQuery),
  write('The plan is: '), nl, nl,
  write(SecondoQuery), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl.


optimize(Query, QueryOut, CostOut) :-
  callLookup(Query, Query2),
  queryToPlan(Query2, Plan, CostOut),
  plan_to_atom_string(Plan, QueryOut).

/*
----    sqlToPlan(QueryText, Plan)
----

Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom.

*/
sqlToPlan(QueryText, Plan) :-
  term_to_atom(sql Query, QueryText),
  optimize(Query, Plan, _).


/*
----    sqlToPlan(QueryText, Plan)
----

Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom.
~QueryText~ starts not with sql in this version.

*/
sqlToPlan(QueryText, Plan) :-
  term_to_atom(Query, QueryText),
  optimize(Query, Plan, _).


/*
11.3.8 Examples

We can now formulate the previous example queries in the user level language.


Example3:

*/

example14 :- optimize(
  select * from [staedte as s, plz as p]
  where [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0]
  ).

example14(Query, Cost) :- optimize(
  select * from [staedte as s, plz as p]
  where [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0],
  Query, Cost
  ).


/*
Example4:

*/
example15 :- optimize(
  select * from staedte where bev > 500000
  ).

example15(Query, Cost) :- optimize(
  select * from staedte where bev > 500000,
  Query, Cost
  ).

/*
Example5:

*/
example16 :-  optimize(
  select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000]
  ).

example16(Query, Cost) :-  optimize(
  select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000],
  Query, Cost
  ).


/*
Example6. This may need a larger local stack size. Start Prolog as

----    pl -L4M
----

which initializes the local stack to 4 MB.

*/
example17 :- optimize(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5,
    bev > 300000,
    bev < 500000,
    p2:plz > 50000,
    p2:plz < 60000,
    kennzeichen starts "W",
    p3:ort contains "burg",
    p3:ort starts "M"]
  ).

example17(Query, Cost) :- optimize(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5,
    bev > 300000,
    bev < 500000,
    p2:plz > 50000,
    p2:plz < 60000,
    kennzeichen starts "W",
    p3:ort contains "burg",
    p3:ort starts "M"],
  Query, Cost
  ).


/*
Example 18:

*/
example18 :- optimize(
  select *
  from [staedte, plz as p1]
  where [
    sname = p1:ort,
    bev > 300000,
    bev < 500000,
    p1:plz > 50000,
    p1:plz < 60000,
    kennzeichen starts "W",
    p1:ort contains "burg",
    p1:ort starts "M"]
  ).

example18(Query, Cost) :- optimize(
  select *
  from [staedte, plz as p1]
  where [
    sname = p1:ort,
    bev > 300000,
    bev < 500000,
    p1:plz > 50000,
    p1:plz < 60000,
    kennzeichen starts "W",
    p1:ort contains "burg",
    p1:ort starts "M"],
  Query, Cost
  ).

/*
Example 19:

*/
example19 :- optimize(
  select *
  from [staedte, plz as p1, plz as p2]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    bev > 300000,
    bev < 500000,
    p1:plz > 50000,
    p1:plz < 60000,
    kennzeichen starts "W",
    p1:ort contains "burg",
    p1:ort starts "M"]
  ).

example19(Query, Cost) :- optimize(
  select *
  from [staedte, plz as p1, plz as p2]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    bev > 300000,
    bev < 500000,
    p1:plz > 50000,
    p1:plz < 60000,
    kennzeichen starts "W",
    p1:ort contains "burg",
    p1:ort starts "M"],
  Query, Cost
  ).


/*
Example 20:

*/
example20 :- optimize(
  select *
  from [staedte as s, plz as p]
  where [
    p:ort = s:sname,
    p:plz > 40000,
    s:bev > 300000]
  ).

example20(Query, Cost) :- optimize(
  select *
  from [staedte as s, plz as p]
  where [
    p:ort = s:sname,
    p:plz > 40000,
    s:bev > 300000],
  Query, Cost
  ).

/*
Example 21:

*/
example21 :- optimize(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5]
  ).

example21(Query, Cost) :- optimize(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5],
  Query, Cost
  ).

/*

12 Optimizing and Calling Secondo

----    sql Term
    sql(Term, SecondoQueryRest)
    let(X, Term)
    let(X, Term, SecondoQueryRest)
----

~Term~ must be one of the available select-from-where statements.
It is optimized and Secondo is called to execute it. ~SecondoQueryRest~
is a character string (atom) containing a sequence of Secondo
operators that can be appended to a given
plan found by the optimizer; in this case the optimizer returns a
plan producing a stream.

The two versions of ~let~ allow one to assign the result of a query
to a new object ~X~, using the optimizer.

*/

sql Term :-
  mOptimize(Term, Query, Cost),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  query(Query).

sql(Term, SecondoQueryRest) :-
  mStreamOptimize(Term, SecondoQuery, Cost),
  concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  query(Query).

let(X, Term) :-
  mOptimize(Term, Query, Cost),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  concat_atom(['let ', X, ' = ', Query], '', Command),
  secondo(Command).

let(X, Term, SecondoQueryRest) :-
  mStreamOptimize(Term, SecondoQuery, Cost),
  concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  concat_atom(['let ', X, ' = ', Query], '', Command),
  secondo(Command).


/*
----    streamOptimize(Term, Query, Cost) :-
----

Optimize the ~Term~ producing an incomplete Secondo query plan ~Query~
returning a stream.

*/
streamOptimize(Term, Query, Cost) :-
  callLookup(Term, Term2),
  queryToStream(Term2, Plan, Cost),
  plan_to_atom_string(Plan,  Query).

/*
----    mOptimize(Term, Query, Cost) :-
    mStreamOptimize(union [Term], Query, Cost) :-
----

Means ``multi-optimize''. Optimize a ~Term~ possibly consisting of several
subexpressions to be independently optimized, as in union and intersection
 queries. ~mStreamOptimize~ is a variant returning a stream.

*/

:-op(800, fx, union).
:-op(800, fx, intersection).

mOptimize(union Terms, Query, Cost) :-
  mStreamOptimize(union Terms, Plan, Cost),
  concat_atom([Plan, 'consume'], '', Query).

mOptimize(intersection Terms, Query, Cost) :-
  mStreamOptimize(intersection Terms, Plan, Cost),
  concat_atom([Plan, 'consume'], '', Query).

mOptimize(Term, Query, Cost) :-
  optimize(Term, Query, Cost).


mStreamOptimize(union [Term], Query, Cost) :-
  streamOptimize(Term, QueryPart, Cost),
  concat_atom([QueryPart, 'sort rdup '], '', Query).

mStreamOptimize(union [Term | Terms], Query, Cost) :-
  streamOptimize(Term, Plan1, Cost1),
  mStreamOptimize(union Terms, Plan2, Cost2),
  concat_atom([Plan1, 'sort rdup ', Plan2, 'mergeunion '], '', Query),
  Cost is Cost1 + Cost2.

mStreamOptimize(intersection [Term], Query, Cost) :-
  streamOptimize(Term, QueryPart, Cost),
  concat_atom([QueryPart, 'sort rdup '], '', Query).

mStreamOptimize(intersection [Term | Terms], Query, Cost) :-
  streamOptimize(Term, Plan1, Cost1),
  mStreamOptimize(intersection Terms, Plan2, Cost2),
  concat_atom([Plan1, 'sort rdup ', Plan2, 'mergesec '], '', Query),
  Cost is Cost1 + Cost2.

mStreamOptimize(Term, Query, Cost) :-
  streamOptimize(Term, Query, Cost).


/*
Some auxiliary stuff.

*/

bestPlanCount :-
  bestPlan(P, _),
  plan_to_atom_string(P, S),
  atom_concat(S, ' count', Q),
  nl, write(Q), nl,
  query(Q).

bestPlanConsume :-
  bestPlan(P, _),
  plan_to_atom_string(P, S),
  atom_concat(S, ' consume', Q),
  nl, write(Q), nl,
  query(Q).


%fapra 15/16

/*
  Rename an attribute to match the renaming of its relation.

*/

% No renaming needed.
renamedRelAttr(RelAttr, Var, RelAttr) :-
  Var = *, !.

renamedRelAttr(attr(Name, N, C), Var, attr(Var:Name, N, C)).


% Extract the down case name from an attr term.
attrnameDCAtom(Attr, DCAttrName) :-
  Attr = attr(_:Name, _, _),
  !,
  atom_string(AName, Name),
  downcase_atom(AName, DCAttrName).

attrnameDCAtom(Attr, DCAttrName) :-
  Attr = attr(Name, _, _),
  atom_string(AName, Name),
  downcase_atom(AName, DCAttrName).


/*
  Rame a tuple a stream.

*/

% No renaming needed.
renameStream(Stream, Var, Plan) :-
  Var = *,
  !,
  Plan = Stream.

renameStream(Stream, Var, rename(Stream, Var)).

/*
  Transform a relation to a tuple stream and rename it.

*/

% No renaming needed.
feedRenameRelation(Rel, Var, Plan) :-
  Var = *,
  !,
  Plan = feed(Rel).

feedRenameRelation(Rel, Var, Plan) :-
  Plan = rename(feed(Rel), Var).

feedRenameRelation(rel(Rel, Var,_), Plan) :-
  feedRenameRelation(Rel, Var, Plan),!.
%end fapra 15/16