/*
----
This file is part of SECONDO.

Copyright (C) 2004, University in Hagen, Department of Computer Science,
Database Systems for New Applications.

SECONDO is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

SECONDO is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with SECONDO; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
----

//paragraph [10] title: [{\Large \bf ]  [}]
//characters [1] formula:       [$]     [$]
//[ae] [\"{a}]
//[oe] [\"{o}]
//[ue] [\"{u}]
//[ss] [{\ss}]
//[Ae] [\"{A}]
//[Oe] [\"{O}]
//[Ue] [\"{U}]
//[**] [$**$]
//[toc] [\tableofcontents]
//[=>] [\verb+=>+]
//[:Section Translation] [\label{sec:translation}]
//[Section Translation] [Section~\ref{sec:translation}]
//[:Section 4.1.1] [\label{sec:4.1.1}]
//[Section 4.1.1] [Section~\ref{sec:4.1.1}]
//[Figure pog1] [Figure~\ref{fig:pog1.eps}]
//[Figure pog2] [Figure~\ref{fig:pog2.eps}]
//[newpage] [\newpage]

[10] A Query Optimizer for Secondo

Ralf Hartmut G[ue]ting, November - December 2002

[toc]

[newpage]

1 Introduction

1.1 Overview

This document not only describes, but ~is~ an optimizer for Secondo database
systems.  It contains the current source code for the optimizer, written in
PROLOG. It can be compiled by a PROLOG system (SWI-Prolog 5.0 or higher)
directly.

The current version of the optimizer is capable of handling conjunctive queries,
formulated in a relational environment. That is, it takes a set of
relations together with a set of selection or join predicates over these
relations and produces a query plan that can be executed by (the current
relational system implemented in) Secondo.

The selection of the query plan is based on cost estimates which in turn are
based on given selectivities of predicates. Selectivities of predicates are
maintained in a table (a set of PROLOG facts). If the selectivity of a predicate
is not available from that table, then an interaction with the Secondo system
should take place to determine the selectivity. There are various strategies
conceivable for doing this which will be described elsewhere. However, the
current version of the optimizer just emits a message that the selectivity is
missing and quits.

The optimizer also implements a simple SQL-like language for entering queries.
The notation is pretty much like SQL except that the lists occurring (lists of
attributes, relations, predicates) are written in PROLOG notation. Also note
that the where-clause is a list of predicates rather than an arbitrary boolean
expression and hence allows one to formulate conjunctive queries only.


1.2 Optimization Algorithm

The optimizer employs an as far as we know novel optimization algorithm which is
based on ~shortest path search in a predicated order graph~. This technique is
remarkably simple to implement, yet efficient.

A predicate order graph (POG) is the graph whose nodes represent sets of
evaluated predicates and whose edges represent predicates, containing all
possible orders of predicates. Such a graph for three predicates ~p~, ~q~, and
~r~ is shown in [Figure pog1].

                Figure 1: A predicate order graph for three predicates ~p~, ~q~
and ~r~  [pog1.eps]

Here the bottom node has no predicate evaluated and the top node has all
predicates evaluated. The example illustrates, more precisely, possible
sequences of selections on an argument relation of size 1000. If selectivities
of predicates are given (for ~p~ its is 1/2, for ~q~ 1/10, and for ~r~ 1/5),
then we can annotate the POG with sizes of intermediate results as shown,
assuming that all predicates are independent (not ~correlated~). This means that
the selectivity of a predicate is the same regardless of the order of
evaluation, which of course does not need to be true.

If we can further compute for each edge of the POG possible evaluation
methods, adding a new ``executable'' edge for each method, and mark the
edge with estimated costs for this method, then finding a shortest path through
the POG corresponds to finding the cheapest query plan. [Figure pog2] shows an
example of a POG annotated with evaluation methods.

                Figure 2: A POG annotated with evaluation methods [pog2.eps]

In this example, there is only a single method associated with each edge. In
general, however, there will be several methods. The example represents the
query:

----    select *
        from Staedte, Laender, Regiert
        where Land = LName and PName = 'CDU' and LName = PLand
----

for relation schemas

----    Staedte(SName, Bev, Land)
        Laender(LName, LBev)
        Regiert(PName, PLand)
----

Hence the optimization algorithm described and implemented in the following
sections proceeds in the following steps:

  1 For given relations and predicates, construct the predicate order graph and
store it as a set of facts in memory (Sections 2 through 4).

  2 For each edge, construct corresponding executable edges (called ~plan edges~
below). This is controlled by optimization rules describing how selections or
joins can be translated (Sections 5 and 6).

  3 Based on sizes of arguments and selectivities (stored in the file
~database.pl~) compute the sizes of all intermediate results. Also annotate
edges of the POG with selectivities (Section 7).

  4 For each plan edge, compute its cost and store it in memory (as a set of
facts). This is based on sizes of arguments and the selectivity associated with
the edge and on a cost function (predicate) written for each operator that may
occur in a query plan (Section 8).

  5 The algorithm for finding shortest paths by Dijkstra is employed to find a
shortest path through the graph of plan edges annotated with costs (called ~cost
edges~). This path is transformed into a Secondo query plan and returned
(Section 9).

  6 Finally, a simple subset of SQL in a PROLOG notation is implemented. So it
is possible to enter queries in this language. The optimizer determines from it
the lists of relations and predicates in the form needed for constructing the
POG, and then invokes step 1 (Section 11).


2 Data Structures

In the construction of the predicate order graph, the following data structures
are used.

----    pr(P, A)
        pr(P, B, C)
----

A selection or join predicate, e.g. pr(p, a), pr(q, b, c). Means a
selection predicate p on relation a, and a join predicate q on relations
b and c.

----    arp(Arg, Rels, Preds)
----

An argument, relations, predicate triple. It describes a set of relations
~Rels~ on which the predicates ~Preds~ have been evaluated. To access the
result of this evaluation one needs to refer to ~Arg~.

Arg is either arg(N) or res(N), N an integer. Examples: arg(5), res(1)

Rels is a list of relation names, e.g. [a, b, c]

Preds is a list of predicate names, e.g. [p, q, r]


----    node(No, Preds, Partition)
----

A node.

~No~ is the number of the node into which the evaluated predicates
are encoded (each bit corresponds to a predicate number, e.g. node number
5 = 101 (binary) says that the first predicate (no 1) and the third
predicate (no 4) have been evaluated in this node. For predicate i,
its predicate number is "2^{i-1}"[1].

~Preds~ is the list of names of evaluated predicates, e.g. [p, q].

~Partition~ is a list of arp elements, see above.


----    edge(Source, Target, Term, Result, Node, PredNo)
----

An edge, representing a predicate.

~Source~ and ~Target~ are the numbers of source and target nodes in the
predicate order graph, e.g. 0 and 1.

~Term~ is either a selection or a join, for example,
select(arg(0), pr(p, a) or join(res(4), res(1), pr(q, a, b))

~Result~ is the number of the node into which the result of this predicate
application should be written. Normally it is the same as Target,
but for an edge leading to a node combining several independent results,
it the number of the ``real'' node to obtain this result. An example of this can
be found in [Figure pog2] where the join edge leading from node 3 to node 7 does
not use the result of node 3 (there is none) but rather the two independent
results from nodes 1 and 2 (this pair is conceptually the result available in
node 3).

~Node~ is the source node for this edge, in the form node(...) as
described above.

~PredNo~ is the predicate number for the predicate represented by this
edge. Predicate numbers are of the form "2^i" as explained
for nodes.

3 Construction of the Predicate Order Graph

3.1 pog

----    pog(Rels, Preds, Nodes, Edges) :-
----

For a given list of relations ~Rels~ and predicates ~Preds~, ~Nodes~ and
~Edges~ are the predicate order graph where edges are annotated with selection
and join operations applied to the correct arguments.

Example call:

----    pog([staedte, laender], [pr(p, staedte), pr(q, laender), pr(r, staedte,
        laender)], N, E).
----

*/

usingVersion(entropy).

pog(Rels, Preds, Nodes, Edges) :-

  length(Rels, N), reverse(Rels, Rels2), deleteArguments,
  partition(Rels2, N, Partition0),
  length(Preds, M), reverse(Preds, Preds2),
  pog2(Partition0, M, Preds2, Nodes, Edges),
  deleteNodes, storeNodes(Nodes),
  deleteEdges, storeEdges(Edges),
  deletePlanEdges, deleteVariables, deleteCounters, createPlanEdges,
  HighNode is 2**M -1,
  retract(highNode(_)), assert(highNode(HighNode)),
  deleteSizes,
  deleteCostEdges.

/*

3.2 partition

----    partition(Rels, N, Partition0) :-
----

Given a list of ~N~ relations ~Rel~, return an initial partition such that
each relation r is packed into the form arp(arg(i), [r], []).

*/

partition([], _, []).

partition([Rel | Rels], N, [Arp | Arps]) :-
  N1 is N-1,
  Arp = arp(arg(N), [Rel], []),
  assert(argument(N, Rel)),
  partition(Rels, N1, Arps).


/*

3.3 pog2

----    pog2(Partition0, NoOfPreds, Preds, Nodes, Edges) :-
----

For the given start partition ~Partition0~, a list of predicates ~Preds~
containing ~NoOfPred~ predicates, return the ~Nodes~ and ~Edges~ of the
predicate order graph.

*/

pog2(Part0, _, [], [node(0, [], Part0)], []).

pog2(Part0, NoOfPreds, [Pred | Preds], Nodes, Edges) :-
  N1 is NoOfPreds-1,
  PredNo is 2**N1,
  pog2(Part0, N1, Preds, NodesOld, EdgesOld),
  newNodes(Pred, PredNo, NodesOld, NodesNew),
  newEdges(Pred, PredNo, NodesOld, EdgesNew),
  copyEdges(Pred, PredNo, EdgesOld, EdgesCopy),
  append(NodesOld, NodesNew, Nodes),
  append(EdgesOld, EdgesNew, Edges2),
  append(Edges2, EdgesCopy, Edges).

/*
3.4 newNodes

----    newNodes(Pred, PredNo, NodesOld, NodesNew) :-
----

Given a predicate ~Pred~ with number ~PredNo~ and a list of nodes ~NodesOld~
resulting from evaluating all predicates with lower numbers, construct
a list of nodes which result from applying to each of the existing nodes
the predicate ~Pred~.

*/

newNodes(_, _, [], []).

newNodes(Pred, PNo, [Node | Nodes], [NodeNew | NodesNew]) :-
  newNode(Pred, PNo, Node, NodeNew),
  newNodes(Pred, PNo, Nodes, NodesNew).

newNode(Pred, PNo, node(No, Preds, Part), node(No2, [Pred | Preds], Part2)) :-
  No2 is No + PNo,
  copyPart(Pred, PNo, Part, Part2).

/*
3.5 copyPart

----    copyPart(Pred, PNo, Part, Part2) :-
----

copy the partition ~Part~ of a node so that the new partition ~Part2~
after applying the predicate ~Pred~ with number ~PNo~ results.

This means that for a selection predicate we have to find the arp
containing its relation and modify it accordingly, the other arps
in the partition are copied unchanged.

For a join predicate we have to find the two arps containing its
two relations and to merge them into a single arp; the remaining
arps are copied unchanged.

Or a join predicate may find its two relations in the same arp which means
another join on the same two relations has already been performed.

*/

copyPart(_, _, [], []).

copyPart(pr(P, Rel), PNo, Arps, [Arp2 | Arps2]) :-
  select(X, Arps, Arps2),
  X = arp(Arg, Rels, Preds),
  member(Rel, Rels), !,
  nodeNo(Arg, No),
  ResNo is No + PNo,
  Arp2 = arp(res(ResNo), Rels, [P | Preds]).

copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :-
  select(X, Arps, Arps2),
  X = arp(Arg, Rels, Preds),
  member(R1, Rels),
  member(R2, Rels), !,
  nodeNo(Arg, No),
  ResNo is No + PNo,
  Arp2 = arp(res(ResNo), Rels, [P | Preds]).

copyPart(pr(P, R1, R2), PNo, Arps, [Arp2 | Arps2]) :-
  select(X, Arps, Rest),
  X = arp(ArgX, RelsX, PredsX),
  member(R1, RelsX),
  select(Y, Rest, Arps2),
  Y = arp(ArgY, RelsY, PredsY),
  member(R2, RelsY), !,
  nodeNo(ArgX, NoX),
  nodeNo(ArgY, NoY),
  ResNo is NoX + NoY + PNo,
  append(RelsX, RelsY, Rels),
  append(PredsX, PredsY, Preds),
  Arp2 = arp(res(ResNo), Rels, [P | Preds]).

nodeNo(arg(_), 0).
nodeNo(res(N), N).

/*
3.6 newEdges

----    newEdges(Pred, PredNo, NodesOld, EdgesNew) :-
----

for each of the nodes in ~NodesOld~ return a new edge in ~EdgesNew~
built by applying the predicate ~Pred~ with number ~PNo~.

*/

newEdges(_, _, [], []).

newEdges(Pred, PNo, [Node | Nodes], [Edge | Edges]) :-
  newEdge(Pred, PNo, Node, Edge),
  newEdges(Pred, PNo, Nodes, Edges).

newEdge(pr(P, Rel), PNo, Node, Edge) :-
  findRel(Rel, Node, Source, Arg),
  Target is Source + PNo,
  nodeNo(Arg, ArgNo),
  Result is ArgNo + PNo,
  Edge = edge(Source, Target, select(Arg, pr(P, Rel)), Result, Node, PNo).

newEdge(pr(P, R1, R2), PNo, Node, Edge) :-
  findRels(R1, R2, Node, Source, Arg),
  Target is Source + PNo,
  nodeNo(Arg, ArgNo),
  Result is ArgNo + PNo,
  Edge = edge(Source, Target, select(Arg, pr(P, R1, R2)), Result, Node, PNo).

newEdge(pr(P, R1, R2), PNo, Node, Edge) :-
  findRels(R1, R2, Node, Source, Arg1, Arg2),
  Target is Source + PNo,
  nodeNo(Arg1, Arg1No),
  nodeNo(Arg2, Arg2No),
  Result is Arg1No + Arg2No + PNo,
  Edge = edge(Source, Target, join(Arg1, Arg2, pr(P, R1, R2)), Result,
    Node, PNo).


/*
3.7 findRel

----    findRel(Rel, Node, Source, Arg):-
----

find the relation ~Rel~ within a node description ~Node~ and return the
node number ~No~ and the description ~Arg~ of the argument (e.g. res(3)) found
within the arp containing Rel.

----    findRels(Rel1, Rel2, Node, Source, Arg1, Arg2):-
----

similar for two relations.

*/

findRel(Rel, node(No, _, Arps), No, ArgX) :-
  select(X, Arps, _),
  X = arp(ArgX, RelsX, _),
  member(Rel, RelsX).


findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX) :-
  select(X, Arps, _),
  X = arp(ArgX, RelsX, _),
  member(Rel1, RelsX),
  member(Rel2, RelsX).

findRels(Rel1, Rel2, node(No, _, Arps), No, ArgX, ArgY) :-
  select(X, Arps, Rest),
  X = arp(ArgX, RelsX, _),
  member(Rel1, RelsX), !,
  select(Y, Rest, _),
  Y = arp(ArgY, RelsY, _),
  member(Rel2, RelsY).


/*
3.8 copyEdges

----    copyEdges(Pred, PredNo, EdgesOld, EdgesCopy):-
----

Given a set of edges ~EdgesOld~ and a predicate ~Pred~ with number ~PredNo~,
return a copy of each edge in ~EdgesOld~ in ~EdgesNew~ such that the
copied version reflects a previous application of predicate ~Pred~.

This is implemented by retrieving from each old edge its start node,
constructing for this start node and predicate ~Pred~ a target node to
which then the predicate associated with the old edge is applied.

*/

copyEdges(_, _, [], []).

copyEdges(Pred, PNo, [Edge | Edges], [Edge2 | Edges2]) :-
  Edge = edge(_, _, Term, _, Node, PNo2),
  pred(Term, Pred2),
  newNode(Pred, PNo, Node, NodeNew),
  newEdge(Pred2, PNo2, NodeNew, Edge2),
  copyEdges(Pred, PNo, Edges, Edges2).

pred(select(_, P), P).
pred(join(_, _, P), P).

/*
3.9 writeEdgeList

----    writeEdgeList(List):-
----

Write the list of edges ~List~.

*/

writeEdgeList([edge(Source, Target, Term, _, _, _) | Edges]) :-
  write(Source), write('-'), write(Target), write(':'), write(Term), nl,
  writeEdgeList(Edges).

/*
4 Managing the Graph in Memory

4.1 Storing and Deleting Nodes and Edges

----    storeNodes(NodeList).
        storeEdges(EdgeList).
        deleteNodes.
        deleteEdges.
----

Just as the names say. Store a list of nodes or edges, repectively, as facts;
and delete them from memory again.

*/

storeNodes([Node | Nodes]) :- assert(Node), storeNodes(Nodes).
storeNodes([]).

storeEdges([Edge | Edges]) :- assert(Edge), storeEdges(Edges).
storeEdges([]).

deleteNode :- retract(node(_, _, _)), fail.
deleteNodes :- not(deleteNode).

deleteEdge :- retract(edge(_, _, _, _, _, _)), fail.
deleteEdges :- not(deleteEdge).

deleteArgument :- retract(argument(_, _)), fail.
deleteArguments :- not(deleteArgument).


/*
4.2 Writing Nodes and Edges

----    writeNodes.
        writeEdges.
----

Write the currently stored nodes and edges, respectively.

*/
writeNode :-
  node(No, Preds, Partition),
  write('Node: '), write(No), nl,
  write('Preds: '), write(Preds), nl,
  write('Partition: '), write(Partition), nl, nl,
  fail.
writeNodes :- not(writeNode).

writeEdge :-
  edge(Source, Target, Term, Result, _, _),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Term: '), write(Term), nl,
  write('Result: '), write(Result), nl, nl,
  fail.

writeEdges :- not(writeEdge).

/*
5 Rule-Based Translation of Selections and Joins
[:Section Translation]

5.1 Precise Notation for Input

Since now we have to look into the structure of predicates, and need to be
able to generate Secondo executable expressions in their precise format, we
need to define the input notation precisely.

5.1.1 The Source Language
[:Section 4.1.1]

We assume the queries can be entered basically as select-from-where
structures, as follows. Let schemas be given as:

----    plz(PLZ:string, Ort:string)
        Staedte(SName:string, Bev:int, PLZ:int, Vorwahl:string, Kennzeichen:string)
----

Then we should be able to enter queries:

----    select SName, Bev
        from Staedte
        where Bev > 500000
----

In the next example we need to avoid the name conflict for PLZ

----    select *
        from Staedte as s, plz as p
        where s.SName = p.Ort and p.PLZ > 40000
----

In the PROLOG version, we will use the following notations:

----    rel(Name, Var, Case)
----

For example

----    rel(staedte, *, u)
----

is a term denoting the ~Staedte~ relation; ~u~ says that it is actually to be
written in upper case whereas

----    rel(plz, *, l)
----

denotes the ~plz~ relation to be written in lower case. The second argument
~Var~ contains an explicit variable if it has been assigned, otherwise the
symbol [*]. If an explicit variable has been used in the query, we need to
perfom renaming in the plan. For example, in the second query above, the
relations would be denoted as

----    rel(staedte, s, u)
        rel(plz, p, l)
----

Within predicates, attributes are annotated as follows:

----    attr(Name, Arg, Case)

        attr(ort, 2, u)
----

This says that  ~ort~ is an attribute of the second argument within a join
condition, to be written in upper case. For a selection condition, the second
argument is ignored; it can be set to 0 or 1.

Hence for the two queries above, the translation would be

----    fromwhere(
          [rel(staedte, *, u)],
          [pr(attr(bev, 0, u) > 500000, rel(staedte, *, u))]
        )

        fromwhere(
          [rel(staedte, s, u), rel(plz, p, l)],
          [pr(attr(s:sName, 1, u) = attr(p:ort, 2, u),
                rel(staedte, s, u), rel(plz, p, l)),
           pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l))]
        )
----

Note that the upper or lower case distinction refers only to the first letter
of a relation or attribute name. Other letters are written on the PROLOG side
in the same way as in Secondo.

Note further that if explicit variables are used, the attribute name will
include them, e.g. s:sName.

The projection occurring in the select-from-where statement is for the moment
not passed to the optimizer; it is treated outside.

So example 2 is rewritten as:

*/

example3 :- pog([rel(staedte, s, u), rel(plz, p, l)],
  [pr(attr(p:ort, 2, u) = attr(s:sName, 1, u),
        rel(staedte, s, u), rel(plz, p, l) ),
   pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l)),
   pr((attr(p:pLZ, 1, u) mod 5) = 0, rel(plz, p, l))], _, _).

/*

The two queries mentioned above are:

*/

example4 :- pog(
  [rel(staedte, *, u)],
  [pr(attr(bev, 1, u) > 500000, rel(staedte, *, u))],
  _, _).

example5 :- pog(
  [rel(staedte, s, u), rel(plz, p, l)],
  [pr(attr(s:sName, 1, u) = attr(p:ort, 2, u), rel(staedte, s, u), rel(plz, p,
l)),
   pr(attr(p:pLZ, 1, u) > 40000, rel(plz, p, l))],
  _, _).

/*

5.1.2 The Target Language

In the target language, we use the following operators:

----    feed:           rel(Tuple) -> stream(Tuple)
        consume:        stream(Tuple) -> rel(Tuple)

        filter:         stream(Tuple) x (Tuple -> bool) -> stream(Tuple)
        product:        stream(Tuple1) x stream(Tuple2) -> stream(Tuple3)

                                where Tuple3 = Tuple1 o Tuple2

        hashjoin:       stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2
                                x nbuckets -> stream(Tuple3)

                                where   Tuple3 = Tuple1 o Tuple2
                                        attrname1 occurs in Tuple1
                                        attrname2 occurs in Tuple2
                                        nbuckets is the number of hash buckets
                                                to be used

        sortmergejoin:  stream(Tuple1) x stream(Tuple2) x attrname1 x attrname2
                                -> stream(Tuple3)

                                where   Tuple3 = Tuple1 o Tuple2
                                        attrname1 occurs in Tuple1
                                        attrname2 occurs in Tuple2

        loopjoin:       stream(Tuple1) x (Tuple1 -> stream(Tuple2)
                                -> stream(Tuple3)

                                where   Tuple3 = Tuple1 o Tuple2

        exactmatch:     btree(Tuple, AttrType) x rel(Tuple) x AttrType
                                -> stream(Tuple)

        extend:         stream(Tuple1) x (Newname x (Tuple -> Attrtype))+
                                -> stream(Tuple2)

                                where   Tuple2 is Tuple1 to which pairs
                                        (Newname, Attrtype) have been appended

        remove:         stream(Tuple1) x Attrname+ -> stream(Tuple2)

                                where   Tuple2 is Tuple1 from which the mentioned
                                        attributes have been removed.

        project:        stream(Tuple1) x Attrname+ -> stream(Tuple2)

                                where   Tuple2 is Tuple1 projected on the
                                        mentioned attributes.

        rename          stream(Tuple1) x NewName -> stream(Tuple2)

                                where   Tuple2 is Tuple1 modified by appending
                                        "_newname" to each attribute name

        count           stream(Tuple) -> int

                                count the number of tuples in a stream

        sortby          stream(Tuple) x (Attrname, asc/desc)+   -> stream(Tuple)

                                sort stream lexicographically by the given
                                attribute names

        groupby stream(Tuple) x GroupAttrs x NewFields -> stream(Tuple2)

                                group stream by the grouping attributes; for each group
                                compute new fields each of which is specified in the
                                form Attrname : Expr. The argument stream must already
                                be sorted by the grouping attributes.
----

In PROLOG, all expressions involving such operators are written in prefix
notation.

Parameter functions are written as

----    fun([param(Var1, Type1), ..., paran(VarN, TypeN)], Expr)
----


5.1.3 Converting Plans to Atoms and Writing them.

Predicate ~plan\_to\_atom~ converts a plan to a string atom, which represents
the plan as a SECONDO query in text syntax. For attributes we have to
distinguish whether a leading ``.'' needs to be written (if the attribute occurs
within a parameter function) or whether just the attribute name is needed as in
the arguments for hashjoin, for example. Predicate ~wp~ (``write plan'') uses
predicate ~plan\_to\_atom~ to convert its argument to an atom and then writes that
atom to standard output.

*/

upper(Lower, Upper) :-
  atom_chars(Lower, [First | Rest]),
  char_type(First2, to_upper(First)),
  append([First2], Rest, UpperList),
  atom_chars(Upper, UpperList).
  /*atom_codes(Lower, [First | Rest]),
  to_upper(First, First2),
  UpperList = [First2 | Rest],
  atom_codes(Upper, UpperList).*/

wp(Plan) :-
  plan_to_atom(Plan, PlanAtom),
  write(PlanAtom).

/*

Function ~newVariable~ outputs a new unique variable name.
The variable name is unique in the sense that ~newVariable~ never
outputs the same name twice (in a PROLOG session).
It should be emphasized that the output
is not a PROLOG variable but a variable name to be used for defining
abstractions in the Secondo system.

*/

:-
  dynamic(varDefined/1).

newVariable(Var) :-
  varDefined(N),
  !,
  N1 is N + 1,
  retract(varDefined(N)),
  assert(varDefined(N1)),
  atom_concat('var', N1, Var).

newVariable(Var) :-
  assert(varDefined(1)),
  Var = 'var1'.

deleteVariable :- retract(varDefined(_)), fail.

deleteVariables :- not(deleteVariable).


/*
Arguments:

*/

plan_to_atom(counter(N,Term), Result) :-
  plan_to_atom( Term, TermRes ),
  my_concat_atom( [ TermRes, ' {', N,'} '], Result ),
  !.

plan_to_atom(rel(Name, _, l), Result) :-
  atom_concat(Name, ' ', Result),
  !.

plan_to_atom(rel(Name, _, u), Result) :-
  upper(Name, Name2),
  atom_concat(Name2, ' ', Result),
  !.

plan_to_atom(res(N), Result) :-
  atom_concat('res(', N, Res1),
  atom_concat(Res1, ') ', Result),
  !.


plan_to_atom(Term, Result) :-
    is_list(Term), Term = [First | _], atomic(First), !,
    atom_codes(TermRes, Term),
    my_concat_atom(['"', TermRes, '"'], '', Result).

/*
Lists:

*/


plan_to_atom([X], AtomX) :-
  plan_to_atom(X, AtomX),
  !.

plan_to_atom([X | Xs], Result) :-
  plan_to_atom(X, XAtom),
  plan_to_atom(Xs, XsAtom),
  my_concat_atom([XAtom, ', ', XsAtom], '', Result),
  !.


/*
Operators: only special syntax. General rules for standard syntax
see below.

*/


plan_to_atom(sample(Rel, S, T), Result) :-
  plan_to_atom(Rel, ResRel),
  my_concat_atom([ResRel, 'sample[', S, ', ', T, '] '], '', Result),
  !.

plan_to_atom(hashjoin(X, Y, A, B, C), Result) :-
  plan_to_atom(X, XAtom),
  plan_to_atom(Y, YAtom),
  plan_to_atom(A, AAtom),
  plan_to_atom(B, BAtom),
  my_concat_atom([XAtom, YAtom, 'hashjoin[',
    AAtom, ', ', BAtom, ', ', C, '] '], '', Result),
  !.

plan_to_atom(sortmergejoin(X, Y, A, B), Result) :-
  plan_to_atom(X, XAtom),
  plan_to_atom(Y, YAtom),
  plan_to_atom(A, AAtom),
  plan_to_atom(B, BAtom),
  my_concat_atom([XAtom, YAtom, 'sortmergejoin[',
    AAtom, ', ', BAtom, '] '], '', Result),
  !.

plan_to_atom(groupby(Stream, GroupAttrs, Fields), Result) :-
  plan_to_atom(Stream, SAtom),
  plan_to_atom(GroupAttrs, GAtom),
  plan_to_atom(Fields, FAtom),
  my_concat_atom([SAtom, 'groupby[', GAtom, '; ', FAtom, ']'], '', Result),
  !.

plan_to_atom(extend(Stream, Fields), Result) :-
  plan_to_atom(Stream, SAtom),
  plan_to_atom(Fields, FAtom),
  my_concat_atom([SAtom, 'extend[', FAtom, ']'], '', Result),
  !.

plan_to_atom(field(NewAttr, Expr), Result) :-
  plan_to_atom(attrname(NewAttr), NAtom),
  plan_to_atom(Expr, EAtom),
  my_concat_atom([NAtom, ': ', EAtom], '', Result).


plan_to_atom(exactmatchfun(IndexName, Rel, attr(Name, R, Case)), Result) :-
  plan_to_atom(Rel, RelAtom),
  plan_to_atom(a(Name, R, Case), AttrAtom),
  newVariable(T),
  my_concat_atom(['fun(', T, ' : TUPLE) ', IndexName,
    ' ', RelAtom, 'exactmatch[attr(', T, ', ', AttrAtom, ')] '], Result),
  !.


plan_to_atom(newattr(Attr, Expr), Result) :-
  plan_to_atom(Attr, AttrAtom),
  plan_to_atom(Expr, ExprAtom),
  my_concat_atom([AttrAtom, ': ', ExprAtom], '', Result),
  !.


plan_to_atom(rename(X, Y), Result) :-
  plan_to_atom(X, XAtom),
  my_concat_atom([XAtom, '{', Y, '} '], '', Result),
  !.


plan_to_atom(fun(Params, Expr), Result) :-
  params_to_atom(Params, ParamAtom),
  plan_to_atom(Expr, ExprAtom),
  my_concat_atom(['fun ', ParamAtom, ExprAtom], '', Result),
  !.


plan_to_atom(attribute(X, Y), Result) :-
  plan_to_atom(X, XAtom),
  plan_to_atom(Y, YAtom),
  my_concat_atom(['attr(', XAtom, ', ', YAtom, ')'], '', Result),
  !.


plan_to_atom(date(X), Result) :-
  plan_to_atom(X, XAtom),
  my_concat_atom(['[const instant value ', XAtom, ']'], '', Result),
  !.

plan_to_atom(interval(X, Y), Result) :-
  my_concat_atom(['[const duration value (', X, ' ', Y, ')]'], '', Result),
  !.


/*
Sort orders and attribute names.

*/

plan_to_atom(asc(Attr), Result) :-
  plan_to_atom(Attr, AttrAtom),
  atom_concat(AttrAtom, ' asc', Result).

plan_to_atom(desc(Attr), Result) :-
  plan_to_atom(Attr, AttrAtom),
  atom_concat(AttrAtom, ' desc', Result).

plan_to_atom(attr(Name, Arg, Case), Result) :-
  plan_to_atom(a(Name, Arg, Case), ResA),
  atom_concat('.', ResA, Result).

plan_to_atom(attrname(attr(Name, Arg, Case)), Result) :-
  plan_to_atom(a(Name, Arg, Case), Result).

plan_to_atom(a(A:B, _, l), Result) :-
  my_concat_atom([B, '_', A], '', Result),
  !.

plan_to_atom(a(A:B, _, u), Result) :-
  upper(B, B2),
  my_concat_atom([B2, '_', A], Result),
  !.

plan_to_atom(a(X, _, l), X) :-
  !.

plan_to_atom(a(X, _, u), X2) :-
  upper(X, X2),
  !.


/*
Translation of operators driven by predicate ~secondoOp~ in
file ~opSyntax~. There are rules for

  * postfix, 1 or 2 arguments

  * postfix followed by one argument in square brackets, in total 2
or 3 arguments

  * prefix, 2 arguments

Other syntax, if not default (see below) needs to be coded explicitly.

*/

plan_to_atom(Term, Result) :-
  functor(Term, Op, 1),
  secondoOp(Op, postfix, 1),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  my_concat_atom([Res1, ' ', Op, ' '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 2),
  secondoOp(Op, postfix, 2),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  my_concat_atom([Res1, ' ', Res2, ' ', Op, ' '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 2),
  secondoOp(Op, postfixbrackets, 2),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  my_concat_atom([Res1, ' ', Op, '[', Res2, '] '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 3),
  secondoOp(Op, postfixbrackets, 3),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg3, Res3),
  my_concat_atom([Res1, ' ', Res2, ' ', Op, '[', Res3, '] '], '', Result),
  !.

plan_to_atom(Term, Result) :-
  functor(Term, Op, 2),
  secondoOp(Op, prefix, 2),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg2, Res2),
  my_concat_atom([Op, '(', Res1, ',', Res2, ') '], '', Result),
  !.


/*
Generic rules. Operators that are not
recognized are assumed to be:

  * 1 argument: prefix

  * 2 arguments: infix

  * 3 arguments: prefix

*/

plan_to_atom(Term, Result) :-
  functor(Term, Op, 1),
  arg(1, Term, Arg1),
  plan_to_atom(Arg1, Res1),
  my_concat_atom([Op, '(', Res1, ')'], '', Result).

plan_to_atom(Term, Result) :-
  functor(Term, Op, 2),
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  plan_to_atom(Arg1, Res1),
  plan_to_atom(Arg2, Res2),
  my_concat_atom(['(', Res1, ' ', Op, ' ', Res2, ')'], '', Result).

plan_to_atom(Term, Result) :-
  functor(Term, Op, 3),
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  arg(3, Term, Arg3),
  plan_to_atom(Arg1, Res1),
  plan_to_atom(Arg2, Res2),
  plan_to_atom(Arg3, Res3),
  my_concat_atom([Op, '(', Res1, ', ', Res2, ', ', Res3, ')'], '', Result).

plan_to_atom(X, Result) :-
  atomic(X),
  term_to_atom(X, Result),
  !.

plan_to_atom(X, _) :-
  write('Error while converting term: '),
  write(X),
  nl.


params_to_atom([], ' ').

params_to_atom([param(Var, Type) | Params], Result) :-
  type_to_atom(Type, TypeAtom),
  params_to_atom(Params, ParamsAtom),
  my_concat_atom(['(', Var, ': ', TypeAtom, ') ', ParamsAtom], '', Result),
  !.

type_to_atom(tuple, 'TUPLE').
type_to_atom(tuple2, 'TUPLE2').
type_to_atom(group, 'GROUP').


/*

5.2 Optimization Rules

We introduce a predicate [=>] which can be read as ``translates into''.

5.2.1 Translation of the Arguments of an Edge of the POG

If the argument is of the form res(N), then it is a stream already and can be
used unchanged. If it is of the form arg(N), then it is a base relation; a
~feed~ must be applied and possibly a ~rename~.

*/

res(N) => res(N).

arg(N) => feed(rel(Name, *, Case)) :-
  argument(N, rel(Name, *, Case)), !.

arg(N) => rename(feed(rel(Name, Var, Case)), Var) :-
  argument(N, rel(Name, Var, Case)).

/*
5.2.2 Translation of Selections

*/

select(Arg, pr(Pred, _)) => filter(ArgS, Pred) :-
  Arg => ArgS.

select(Arg, pr(Pred, _, _)) => filter(ArgS, Pred) :-
  Arg => ArgS.


/*

Translation of selections using indices.

*/
select(arg(N), Y) => X :-
  indexselect(arg(N), Y) => X.

indexselect(arg(N), pr(attr(AttrName, Arg, Case) = Y, Rel)) => X :-
  indexselect(arg(N), pr(Y = attr(AttrName, Arg, Case), Rel)) => X.

indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) =>
  exactmatch(IndexName, rel(Name, *, Case), Y)
  :-
  argument(N, rel(Name, *, Case)),
  !,
  hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), 
            IndexName, btree).

indexselect(arg(N), pr(Y = attr(AttrName, Arg, AttrCase), _)) =>
  rename(exactmatch(IndexName, rel(Name, Var, Case), Y), Var)
  :-
  argument(N, rel(Name, Var, Case)),
  !,
  hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), 
           IndexName, btree).

indexselect(arg(N), pr(attr(AttrName, Arg, Case) <= Y, Rel)) => X :-
  indexselect(arg(N), pr(Y >= attr(AttrName, Arg, Case), Rel)) => X.

indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) =>
  leftrange(IndexName, rel(Name, *, Case), Y)
  :-
  argument(N, rel(Name, *, Case)),
  !,
  hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName, btree).

indexselect(arg(N), pr(Y >= attr(AttrName, Arg, AttrCase), _)) =>
  rename(leftrange(IndexName, rel(Name, Var, Case), Y), Var)
  :-
  argument(N, rel(Name, Var, Case)),
  !,
  hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), 
            IndexName, btree).

indexselect(arg(N), pr(attr(AttrName, Arg, Case) >= Y, Rel)) => X :-
  indexselect(arg(N), pr(Y <= attr(AttrName, Arg, Case), Rel)) => X.

indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) =>
  rightrange(IndexName, rel(Name, *, Case), Y)
  :-
  argument(N, rel(Name, *, Case)),
  !,
  hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName, btree).

indexselect(arg(N), pr(Y <= attr(AttrName, Arg, AttrCase), _)) =>
  rename(rightrange(IndexName, rel(Name, Var, Case), Y), Var)
  :-
  argument(N, rel(Name, Var, Case)),
  !,
  hasIndex(rel(Name, Var, Case), attr(AttrName, Arg, AttrCase), 
           IndexName, btree).

%fapra1590
indexselect(arg(N), pr(Y touches attr(AttrName, Arg, Case), Rel)) => X :-
  indexselect(arg(N), pr(attr(AttrName, Arg, Case) touches Y, Rel)) => X.

indexselect(arg(N), pr(attr(AttrName, Arg, AttrCase) touches Y, _)) =>
  filter(windowintersects(IndexName, rel(Name, *, Case), bbox(Y)), 
         attr(AttrName, Arg, AttrCase) intersects Y)
  :-
  argument(N, rel(Name, *, Case)),
  !,
  hasIndex(rel(Name, *, Case), attr(AttrName, Arg, AttrCase), IndexName, btree).

/*
Here ~ArgS~ is meant to indicate ``argument stream''.

5.2.3 Translation of Joins

A join can always be translated to filtering the Cartesian product.

*/

join(Arg1, Arg2, pr(Pred, _, _)) => filter(product(Arg1S, Arg2S), Pred) :-
  Arg1 => Arg1S,
  Arg2 => Arg2S.

join(Arg1, Arg2, pr(X<Y, _, _)) => loopjoin(Arg1S, fun([param(t, tuple)],
  filter(Arg2S, attribute(t, attrname(Attr1)) < Attr2))) :-
  X = attr(_, _, _),
  Y = attr(_, _, _), !,
  Arg1 => Arg1S,
  Arg2 => Arg2S,
  isOfFirst(Attr1, X, Y),
  isOfSecond(Attr2, X, Y).

/*

Index joins:

*/


join(Arg1, arg(N), pr(X=Y, _, _)) => loopjoin(Arg1S, MatchExpr) :-
  isOfSecond(Attr2, X, Y),
  isNotOfSecond(Expr1, X, Y),
  argument(N, RelDescription),
  hasIndex(RelDescription, Attr2, IndexName, btree),
  Arg1 => Arg1S,
  exactmatch(IndexName, arg(N), Expr1) => MatchExpr.

join(arg(N), Arg2, pr(X=Y, _, _)) => loopjoin(Arg2S, MatchExpr) :-
  isOfFirst(Attr1, X, Y),
  isNotOfFirst(Expr2, X, Y),
  argument(N, RelDescription),
  hasIndex(RelDescription, Attr1, IndexName, btree),
  Arg2 => Arg2S,
  exactmatch(IndexName, arg(N), Expr2) => MatchExpr.


exactmatch(IndexName, arg(N), Expr) =>
  exactmatch(IndexName, rel(Name, *, Case), Expr) :-
  argument(N, rel(Name, *, Case)),
  !.

exactmatch(IndexName, arg(N), Expr) =>
  rename(exactmatch(IndexName, rel(Name, Var, Case), Expr), Var) :-
  argument(N, rel(Name, Var, Case)),
  !.


/*

For a join with a predicate of the form X = Y we can distinguish four cases
depending on whether X and Y are attributes or more complex expressions. For
example, a query condition might be ``PLZA = PLZB'' in which case we have just
attribute names on both sides of the predicate operator, or it could be ``PLZA =
PLZB + 1''. In the latter case we have an expression on the right hand side.
This can still be translated to a hashjoin, for example, by first extending the
second argument by a new attribute containing the value of the expression. For
example, the query

----    select *
        from plz as p1, plz as p2
        where p1.PLZ = p2.PLZ + 1
----

can be translated to

----    plz feed {p1} plz feed {p2} extend[newPLZ: PLZ_p2 + 1]
        hashjoin[PLZ_p1, newPLZ, 997]
        remove[newPLZ]
        consume
----

This technique is built into the optimizer as follows. We first define the four
cases (at the moment for equijoin only; this may later be extended) which also
translate the arguments into streams. Then the rules translating to join
methods can be formulated independently from this general technique. They
translate terms of the form join00(Arg1Stream, Arg2Stream, Pred).

*/

join(Arg1, Arg2, pr(X=Y, R1, R2)) => JoinPlan :-
  X = attr(_, _, _),
  Y = attr(_, _, _), !,
  Arg1 => Arg1S,
  Arg2 => Arg2S,
  join00(Arg1S, Arg2S, pr(X=Y, R1, R2)) => JoinPlan.

join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
        remove(JoinPlan, [attrname(attr(r_expr, 2, l))]) :-
  X = attr(_, _, _),
  not(Y = attr(_, _, _)), !,
  Arg1 => Arg1S,
  Arg2 => Arg2S,
  Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]),
  join00(Arg1S, Arg2Extend, pr(X=attr(r_expr, 2, l), R1, R2)) => JoinPlan.

join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
        remove(JoinPlan, [attrname(attr(l_expr, 2, l))]) :-
  not(X = attr(_, _, _)),
  Y = attr(_, _, _), !,
  Arg1 => Arg1S,
  Arg2 => Arg2S,
  Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]),
  join00(Arg1Extend, Arg2S, pr(attr(l_expr, 1, l)=Y, R1, R2)) => JoinPlan.

join(Arg1, Arg2, pr(X=Y, R1, R2)) =>
        remove(JoinPlan, [attrname(attr(l_expr, 1, l)),
                attrname(attr(r_expr, 2, l))]) :-
  not(X = attr(_, _, _)),
  not(Y = attr(_, _, _)), !,
  Arg1 => Arg1S,
  Arg2 => Arg2S,
  Arg1Extend = extend(Arg1S, [newattr(attrname(attr(l_expr, 1, l)), X)]),
  Arg2Extend = extend(Arg2S, [newattr(attrname(attr(r_expr, 2, l)), Y)]),
  join00(Arg1Extend, Arg2Extend,
        pr(attr(l_expr, 1, l)=attr(r_expr, 2, l), R1, R2)) => JoinPlan.


join00(Arg1S, Arg2S, pr(X = Y, _, _)) => sortmergejoin(Arg1S, Arg2S,
        attrname(Attr1), attrname(Attr2))   :-
  isOfFirst(Attr1, X, Y),
  isOfSecond(Attr2, X, Y).


join00(Arg1S, Arg2S, pr(X = Y, _, _)) => hashjoin(Arg1S, Arg2S,
        attrname(Attr1), attrname(Attr2), 997)   :-
  isOfFirst(Attr1, X, Y),
  isOfSecond(Attr2, X, Y).

/*

----    isOfFirst(Attr, X, Y)
        isOfSecond(Attr, X, Y)
----

~Attr~ equal to either ~X~ or ~Y~ is an attribute of the first(second) relation.

*/


isOfFirst(X, X, _) :- X = attr(_, 1, _).
isOfFirst(Y, _, Y) :- Y = attr(_, 1, _).
isOfSecond(X, X, _) :- X = attr(_, 2, _).
isOfSecond(Y, _, Y) :- Y = attr(_, 2, _).

isNotOfFirst(Y, X, Y) :- X = attr(_, 1, _).
isNotOfFirst(X, X, Y) :- Y = attr(_, 1, _).
isNotOfSecond(Y, X, Y) :- X = attr(_, 2, _).
isNotOfSecond(X, X, Y) :- Y = attr(_, 2, _).


/*
6 Creating Query Plan Edges

*/

createPlanEdge :-
  edge(Source, Target, Term, Result, _, _),
  Term => Plan,
  assert(planEdge(Source, Target, Plan, Result)),
  fail.

createPlanEdges :- not(createPlanEdge).

deletePlanEdge :-
  retract(planEdge(_, _, _, _)), fail.

deletePlanEdges :- not(deletePlanEdge).

writePlanEdge :-
  planEdge(Source, Target, Plan, Result),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Plan: '), wp(Plan), nl,
  % write(Plan), nl,
  write('Result: '), write(Result), nl, nl,
  fail.

writePlanEdges :- not(writePlanEdge).


/*
7 Assigning Sizes and Selectivities to the Nodes and Edges of the POG

----    assignSizes.
        deleteSizes.
----

Assign sizes (numbers of tuples) to all nodes in the pog, based on the
cardinalities of the argument relations and the selectivities of the
predicates. Store sizes as facts of the form resultSize(Result, Size). Store
selectivities as facts of the form edgeSelectivity(Source, Target, Sel).

Delete sizes from memory.

7.1 Assigning Sizes and Selectivities

It is important that edges are processed in the order in which they have been
created. This will ensure that for an edge the size of its argument nodes are
available.

*/

assignSizes :- not(assignSizes1).

assignSizes1 :-
  edge(Source, Target, Term, Result, _, _),
  assignSize(Source, Target, Term, Result),
  fail.

assignSize(Source, Target, select(Arg, Pred), Result) :-
  resSize(Arg, Card),
  selectivity(Pred, Sel),
  Size is Card * Sel,
  setNodeSize(Result, Size),
  assert(edgeSelectivity(Source, Target, Sel)).

assignSize(Source, Target, join(Arg1, Arg2, Pred), Result) :-
  resSize(Arg1, Card1),
  resSize(Arg2, Card2),
  selectivity(Pred, Sel),
  Size is Card1 * Card2 * Sel,
  setNodeSize(Result, Size),
  assert(edgeSelectivity(Source, Target, Sel)).

/*
----    setNodeSize(Node, Size) :-
----

Set the size of node ~Node~ to ~Size~ if no size has been assigned before.

*/

setNodeSize(Node, _) :- resultSize(Node, _), !.
setNodeSize(Node, Size) :- assert(resultSize(Node, Size)).

/*
----    resSize(Arg, Size) :-
----

Argument ~Arg~ has size ~Size~.

*/

resSize(arg(N), Size) :- argument(N, rel(Rel, _, _)), card(Rel, Size), !.
resSize(arg(N), _) :- write('Error in optimizer: cannot find cardinality for '),
  argument(N, Rel), wp(Rel), nl, fail.
resSize(res(N), Size) :- resultSize(N, Size), !.

/*
----    writeSizes :-
----

Write sizes and selectivities.

*/

writeSize :-
  resultSize(Node, Size),
  write('Node: '), write(Node), nl,
  write('Size: '), write(Size), nl, nl,
  fail.
writeSize :-
  edgeSelectivity(Source, Target, Sel),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Selectivity: '), write(Sel), nl, nl,
  fail.
writeSizes :- not(writeSize).

writeFirstSize :-
  firstResultSize(Node, Size),
  write('Node: '), write(Node), nl,
  write('Size: '), write(Size), nl, nl,
  fail.
writeFirstSize :-
  firstEdgeSelectivity(Source, Target, Sel),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Selectivity: '), write(Sel), nl, nl,
  fail.
writeFirstSizes :- not(writeFirstSize).

compareSize :-
  firstResultSize(Node, Size1),
  resultSize(Node, Size2),
  write('Node: '), write(Node),
  write(', Size: '), write(Size1),
  write(' ==> '), write(Size2), nl, nl,
  fail.
compareSize :-
  firstEdgeSelectivity(Source, Target, Sel1),
  edgeSelectivity(Source, Target, Sel2),
  write('Source: '), write(Source),
  write(', Target: '), write(Target),
  write(', Selectivity: '), write(Sel1),
  write(' ==> '), write(Sel2), nl, nl,
  fail.
compareSizes :- not(compareSize).

/*
----    deleteSizes :-
----

Delete node sizes and selectivities of edges.

*/

deleteSize :- retract(resultSize(_, _)), fail.
deleteSize :- retract(edgeSelectivity(_, _, _)), fail.
deleteSizes :- not(deleteSize).


/*
8 Computing Edge Costs for Plan Edges

8.1 The Costs of Terms

----    cost(Term, Sel, Size, Cost) :-
----

The cost of an executable ~Term~ representing a predicate with selectivity ~Sel~
is ~Cost~ and the size of the result is ~Size~.

This is evaluated recursively descending into the term. When the operator
realizing the predicate (e.g. ~filter~) is encountered, the selectivity ~Sel~ is
used to determine the size of the result. It is assumed that only a single
operator of this kind occurs within the term.

8.1.1 Arguments

*/

cost(rel(Rel, _, _), _, Size, 0) :-
  card(Rel, Size).

cost(res(N), _, Size, 0) :-
  resSize(res(N), Size).

/*
8.1.2 Operators

*/

cost(feed(X), Sel, S, C) :-
  cost(X, Sel, S, C1),
  feedTC(A),
  C is C1 + A * S.

/*
Here ~feedTC~ means ``feed tuple cost'', i.e., the cost per tuple, a constant to
be determined in experiments. These constants are kept in file ``Operators.pl''.

*/

cost(consume(X), Sel, S, C) :-
  cost(X, Sel, S, C1),
  consumeTC(A),
  C is C1 + A * S.

cost(filter(X, _), Sel, S, C) :-
  cost(X, 1, SizeX, CostX),
  filterTC(A),
  S is SizeX * Sel,
  C is CostX + A * SizeX.

/*
For the moment we assume a cost of 1 for evaluating a predicate; this should be
changed shortly.

*/

cost(product(X, Y), _, S, C) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, 1, SizeY, CostY),
  productTC(A, B),
  S is SizeX * SizeY,
  C is CostX + CostY + SizeY * B + S * A.


cost(leftrange(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  leftrangeTC(C),
  Size is Sel * RelSize,
  Cost is Sel * RelSize * C.

cost(rightrange(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  leftrangeTC(C),
  Size is Sel * RelSize,
  Cost is Sel * RelSize * C.

/*

Simplistic cost estimation for loop joins.

If attribute values are assumed independent, then the selectivity
of a subquery appearing in an index join equals the overall
join selectivity. Therefore it is possible to estimate
the result size and cost of a subquery
(i.e. ~exactmatch~ and ~exactmatchfun~). As a subquery in an
index join is executed as often as a tuple from the left
input stream arrives, it is also possible to estimate the
overall index join cost.

*/
cost(exactmatchfun(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  exactmatchTC(C),
  Size is Sel * RelSize,
  Cost is Sel * RelSize * C.

cost(exactmatch(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  exactmatchTC(C),
  Size is Sel * RelSize,
  Cost is Sel * RelSize * C.

cost(loopjoin(X, Y), Sel, S, Cost) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, Sel, SizeY, CostY),
  S is SizeX * SizeY,
  loopjoinTC(C),
  Cost is C * SizeX + CostX + SizeX * CostY.

cost(fun(_, X), Sel, Size, Cost) :-
  cost(X, Sel, Size, Cost).


/*

Previously the cost function for ~hashjoin~ contained a term

----    A * SizeX + A * SizeY
----

which should account for the cost of distributing tuples
into the buckets. However in experiments the cost of
hashing was always ten or more times smaller than the cost
of computing products of buckets. Therefore that term
was considered unnecessary.

*/
cost(hashjoin(X, Y, _, _, NBuckets), Sel, S, C) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, 1, SizeY, CostY),
  hashjoinTC(A, B),
  S is SizeX * SizeY * Sel,
  C is CostX + CostY +                          % producing the arguments
    A * NBuckets * (SizeX/NBuckets + 1) *       % computing the product for each
      (SizeY/NBuckets +1) +                     % pair of buckets
    B * S.                                      % producing the result tuples


cost(sortmergejoin(X, Y, _, _), Sel, S, C) :-
  cost(X, 1, SizeX, CostX),
  cost(Y, 1, SizeY, CostY),
  sortmergejoinTC(A, B),
  S is SizeX * SizeY * Sel,
  C is CostX + CostY +                      % producing the arguments
    A * SizeX * log(SizeX + 1) +
    A * SizeY * log(SizeY + 1) +            % sorting the arguments
    B * S.                                  % parallel scan of sorted relations


cost(extend(X, _), Sel, S, C) :-
  cost(X, Sel, S, C1),
  extendTC(A),
  C is C1 + A * S.

cost(remove(X, _), Sel, S, C) :-
  cost(X, Sel, S, C1),
  removeTC(A),
  C is C1 + A * S.

cost(project(X, _), Sel, S, C) :-
  cost(X, Sel, S, C1),
  projectTC(A),
  C is C1 + A * S.

cost(rename(X, _), Sel, S, C) :-
  cost(X, Sel, S, C1),
  renameTC(A),
  C is C1 + A * S.

%fapra1590
cost(windowintersects(_, Rel, _), Sel, Size, Cost) :-
  cost(Rel, 1, RelSize, _),
  windowintersectsTC(C),
  Size is Sel * RelSize,
  Cost is Sel * RelSize * C.
/*
8.2 Creating Cost Edges

These are plan edges extended by a cost measure.

*/

createCostEdge :-
  planEdge(Source, Target, Term, Result),
  edgeSelectivity(Source, Target, Sel),
  cost(Term, Sel, Size, Cost),
  assert(costEdge(Source, Target, Term, Result, Size, Cost)),
  fail.

createCostEdges :- not(createCostEdge).

deleteCostEdge :-
  retract(costEdge(_, _, _, _, _, _)), fail.

deleteCostEdges :- not(deleteCostEdge).

writeCostEdge :-
  costEdge(Source, Target, Plan, Result, Size, Cost),
  write('Source: '), write(Source), nl,
  write('Target: '), write(Target), nl,
  write('Plan: '), wp(Plan), nl,
  write('Result: '), write(Result), nl,
  write('Size: '), write(Size), nl,
  write('Cost: '), write(Cost), nl,
  nl,
  fail.

:-assert(helpLine(writeCostEdges,0,[],
                  'List estimated costs for edges in the current POG.')).

writeCostEdges :- not(writeCostEdge).

/*
----    assignCosts
----

This just puts together creation of sizes and cost edges.

*/

assignCosts :-
  assignSizes,
  createCostEdges.


/*
9 Finding Shortest Paths = Cheapest Plans

The cheapest plan corresponds to the shortest path through the predicate order
graph.

9.1 Shortest Path Algorithm by Dijkstra

We implement the shortest path algorithm by Dijkstra. There are two
relevant sets of nodes:

  * center: the nodes for which shortest paths have already been
computed

  * boundary: the nodes that have been seen, but that have not yet been
expanded. These need to be kept in a priority queue.

A node, as used during shortest path computation, is represented as a term

----    node(Name, Distance, Path)
----

where ~Name~ is the node number, ~Distance~ the distance along the shortest
path to this node, and ~Path~ is the list of edges forming the shortest path.

The graph is represented by the set of ~costEdges~.

The center is represented as a set of facts of the form

----    center(NodeNumber, node(Name, Distance, Path))
----

Since predicates are generally indexed by their first argument, finding a node
in the center via the node number should be very efficient. We assume it is
possible in constant time.

The boundary is represented by an abstract data type as described in the
interface below. Essentially it is a priority queue implementation.


----    successor(Node, Succ) :-
----

~Succ~ is a successor of node ~Node~ via some edge. This includes computation
of the distance and path of the successor.

*/

successor(node(Source, Distance, Path), node(Target, Distance2, Path2)) :-
  costEdge(Source, Target, Term, Result, Size, Cost),
  Distance2 is Distance + Cost,
  append(Path, [costEdge(Source, Target, Term, Result, Size, Cost)], Path2).

/*

----    dijkstra(Source, Dest, Path, Length) :-
----

The shortest path from ~Source~ to ~Dest~ is ~Path~ of length ~Length~.

*/

dijkstra(Source, Dest, Path, Length) :-
  emptyCenter,
  b_empty(Boundary),
  b_insert(Boundary, node(Source, 0, []), Boundary1),
  dijkstra1(Boundary1, Dest, 0, notfound),
  center(Dest, node(Dest, Length, Path)).

emptyCenter :- not(emptyCenter1).

emptyCenter1 :- retract(center(_, _)), fail.


/*
----    dijkstra1(Boundary, Dest, NoOfCalls) :-
----

Compute the shortest paths to all nodes and store them in a predicate
~center~. Initially to be called with no fact ~center~ asserted, and ~Boundary~
just containing the start node.

For testing we check at which iteration the destination ~Dest~ is reached.

*/

dijkstra1(Boundary, _, _, found) :- !,
        tree_height(Boundary, H),
        write('Height of search tree for boundary is '), write(H), nl.

dijkstra1(Boundary, _, _, _) :- b_isEmpty(Boundary).
dijkstra1(Boundary, Dest, N, _) :-
        % write('Boundary = '), writeList(Boundary), nl, write('====='), nl,
  b_removemin(Boundary, Node, Bound2),
  Node = node(Name, _, _),
  assert(center(Name, Node)),
  checkDest(Name, Dest, N, Found),
  putsuccessors(Bound2, Node, Bound3),
  N1 is N+1,
  dijkstra1(Bound3, Dest, N1, Found).

checkDest(Name, Name, N, found) :- write('Destination node '), write(Name),
  write(' reached at iteration '), write(N), nl.

checkDest(_, _, _, notfound).


/*
Some auxiliary functions for testing:

*/

writeList([]).
writeList([X | Rest]) :- nl, nl, write('-----'), nl, write(X), writeList(Rest).

writeCenter :- not(writeCenter1).
writeCenter1 :-
  center(_, node(Name, Distance, Path)),
  write('Node: '), write(Name), nl,
  write('Cost: '), write(Distance), nl,
  write('Path: '), nl, writePath(Path), nl, fail.

writePath([]).
writePath([costEdge(Source, Target, Term, Result, Size, Cost) | Path]) :-
  write(costEdge(Source, Target, Result, Size, Cost)), nl,
  write('    '), wp(Term), nl,
  writePath(Path).

/*
----    putsuccessors(Boundary, Node, BoundaryNew) :-
----

Insert into ~Boundary~  all successors of node ~Node~ not yet present in
the center, updating their distance if they are already present, to obtain
~BoundaryNew~.

*/
putsuccessors(Boundary, Node, BoundaryNew) :-
  findall(Succ, successor(Node, Succ), Successors),

        % write('successors of '), write(Node), nl,
        % writeList(Successors), nl, nl,

  putsucc1(Boundary, Successors, BoundaryNew).

/*
----    putsucc1(Boundary, Successors, BoundaryNew) :-
----

put all successors not yet in the center from the list ~Successors~ into the
~Boundary~ to get ~BoundaryNew~. The four cases to be distinguished are:

  * The list of successors is empty.

  * The first successor is already in the center, hence the shortest path to it
is already known and it does not need to be inserted into the boundary.

  * The first successor is already present in the boundary, at a smaller or
equal distance than the one via the curent edge. It can also be ignored.

  * The first succesor exists in the boundary, but at a higher distance. In
this case it replaces the previous node entry in the boundary.

  * The first successor does not exist in the boundary. It is inserted.

*/

putsucc1(Boundary, [], Boundary).

putsucc1(Boundary, [node(N, _, _) | Successors], BNew) :-
  center(N, _), !,
  putsucc1(Boundary, Successors, BNew).

putsucc1(Boundary, [node(N, D, _) | Successors], BNew) :-
  b_memberByName(Boundary, N, node(N, DistOld, _)),
  DistOld =< D, !,
  putsucc1(Boundary, Successors, BNew).

putsucc1(Boundary, [node(N, D, P) | Successors], BNew) :-
  b_memberByName(Boundary, N, node(N, DistOld, _)),
  D < DistOld, !,
  b_deleteByName(Boundary, N, Bound2),
  b_insert(Bound2, node(N, D, P), Bound3),
  putsucc1(Bound3, Successors, BNew).

putsucc1(Boundary, [node(N, D, P) | Successors], BNew) :-
  b_insert(Boundary, node(N, D, P), Bound2),
  putsucc1(Bound2, Successors, BNew).


/*

9.2 Interface ~Boundary~

The boundary is represented in a data structure with the following
operations:

----    b_empty(-Boundary) :-
----

Creates an empty boundary and returns it.

----    b_isEmpty(+Boundary) :-
----

Checks whether the boundary is empty.


----    b_removemin(+Boundary, -Node, -BoundaryOut) :-
----

Returns the node ~Node~ with minimal distance from the set ~Boundary~ and
returns also ~BoundaryOut~ where this node is removed.

----    b_insert(+Boundary, +Node, -BoundaryOut) :-
----

Inserts a node that must not yet be present (i.e., no other node of that
name).

----    b_memberByName(+Boundary, +Name, -Node) :-
----

If a node ~Node~ with name ~Name~ is present, it is returned.

----    b_deleteByName(+Boundary, +Name, -BoundaryOut) :-
----

Returns the boundary, where the node with name ~Name~ is deleted.

*/

/*
9.3 Constructing the Plan from the Shortest Path

----    plan(Path, Plan)
----

The plan corresponding to ~Path~ is ~Plan~.

*/
plan(Path, Plan) :-
  deleteNodePlans,
  traversePath(Path),
  highestNode(Path, N),
  nodePlan(N, Plan).


deleteNodePlans :- not(deleteNodePlan).

deleteNodePlan :- retract(nodePlan(_, _)), fail.

% nCounter tracks the number of counters in queries to get the
% size of intermediate results.
:-
  dynamic(nCounter/1).

nextCounter(C) :-
  nCounter(N),
  !,
  N1 is N + 1,
  retract(nCounter(N)),
  assert(nCounter(N1)),
  C = N1.

nextCounter(C) :-
  assert(nCounter(1)),
  C = 1.

deleteCounter :- retract(nCounter(_)), fail.

deleteCounters :- not(deleteCounter).

traversePath([]).

traversePath([costEdge(Source, Target, Term, Result, _, _) | Path]) :-
  embedSubPlans(Term, Term2),
  nextCounter(Nc),
  assert(nodePlan(Result, counter(Nc,Term2))),
  assert(smallResultCounter(Nc, Source, Target, Result)),
  traversePath(Path).

deleteSmallResultCounter :-
  retractall(smallResultCounter(_,_,_,_)).

createSmallResultSize([]).
createSmallResultSize([ [Nc,Value] | T ]) :-
  smallResultCounter(Nc, _, _, Result ),
  assert(smallResultSize(Result, Value)),
  createSmallResultSize( T ).

createSmallResultSizes2 :-
  deleteSmallResultSize,
  secondo('list counters', C ), !,
  createSmallResultSize( C ).

createSmallResultSizes :-
  deleteSmallResultSize, !,
  not(createSmallResultSizes2).

deleteSmallResultSize :-
  retractall(smallResultSize(_,_)).

createSmallSelectivity :-
  deleteSmallSelectivity, !,
  not(createSmallSelectivity2).

deleteSmallSelectivity :-
  retractall(small_cond_sel(_,_,_,_)).

compute_sel( 0, 0, Sel ) :-
  Sel is 1.

compute_sel( Num, Den, Sel ) :-
  Sel is Num / Den.

assignSmallSelectivity(Source, Target, Result, select(Arg, _), Value) :-
  newResSize(Arg, Card),
  compute_sel( Value, Card, Sel ),!,
  assert(small_cond_sel(Source, Target, Result, Sel)).

assignSmallSelectivity(Source, Target, Result, join(Arg1, Arg2, _), Value) :-
  newResSize(Arg1, Card1),
  newResSize(Arg2, Card2),
  Card is Card1 * Card2,
  compute_sel( Value, Card, Sel ),!,
  assert(small_cond_sel(Source, Target, Result, Sel)).

createSmallSelectivity2 :-
  smallResultCounter(_, Source, Target, Result),
  smallResultSize(Result, Value),
  edge(Source,Target,Term,Result,_,_),
  assignSmallSelectivity(Source, Target, Result, Term, Value),
  fail.

small(rel(Rel, Var, Case), rel(Rel2, Var, Case)) :-
  atom_concat(Rel, '_small', Rel2).

newResSize(arg(N), Size) :- argument(N, R ), small( R, rel(SRel, _, _)), 
                            card(SRel, Size), !.
newResSize(res(N), Size) :- smallResultSize(N, Size), !.

/*
prepare\_query\_small prepares the query to be executed in the small database.
Assumes that the small database has the same indexes that are in the full database,
but with the sufix '\_small'

*/

prepare_query_small( count(Term), count(Result) ) :-
  query_small(Term, Result).

prepare_query_small( Term, count(Result) ) :-
  query_small(Term, Result).

query_small(rel(Name, V, C), Result) :-
  atom_concat( Name, '_small', NameSmall ),
  Result = rel(NameSmall, V, C),
  !.

query_small(exactmatch(IndexName, R, V), Result) :-
  atom_concat(IndexName,'_small', IndexNameSmall),
  query_small( R, R2 ),
  Result = exactmatch(IndexNameSmall, R2, V),
  !.

% To be modified - it should handle functors with any number of arguments. 
% Currently it
% handles only from 1 to 5 arguments. It should handle lists, too.

query_small( Term, Result ) :-
  functor(Term, Fun, 1 ),
  arg(1, Term, Arg1),
  query_small(Arg1, Res1),
  Result =.. [Fun | [Res1]],
  !.

query_small( Term, Result ) :-
  functor(Term, Fun, 2 ),
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  query_small(Arg1, Res1),
  query_small(Arg2, Res2),
  Result =.. [Fun | [Res1, Res2]],
  !.

query_small( Term, Result ) :-
  functor(Term, Fun, 3 ),
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  arg(3, Term, Arg3),
  query_small(Arg1, Res1),
  query_small(Arg2, Res2),
  query_small(Arg3, Res3),
  Result =.. [Fun | [Res1, Res2, Res3]],
  !.

query_small( Term, Result ) :-
  functor(Term, Fun, 4 ),
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  arg(3, Term, Arg3),
  arg(4, Term, Arg4),
  query_small(Arg1, Res1),
  query_small(Arg2, Res2),
  query_small(Arg3, Res3),
  query_small(Arg4, Res4),
  Result =.. [Fun | [Res1, Res2, Res3, Res4]],
  !.

query_small( Term, Result ) :-
  functor(Term, Fun, 5 ),
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  arg(3, Term, Arg3),
  arg(4, Term, Arg4),
  arg(5, Term, Arg5),
  query_small(Arg1, Res1),
  query_small(Arg2, Res2),
  query_small(Arg3, Res3),
  query_small(Arg4, Res4),
  query_small(Arg5, Res5),
  Result =.. [Fun | [Res1, Res2, Res3, Res4, Res5]],
  !.

query_small( Term, Result ) :-
  Result = Term,
  !.

embedSubPlans(res(N), Term) :-
  nodePlan(N, Term), !.

embedSubPlans(Term, Term2) :-
  compound(Term), !,
  Term =.. [Functor | Args],
  embedded(Args, Args2),
  Term2 =.. [Functor | Args2].

embedSubPlans(Term, Term).


embedded([], []).

embedded([Arg | Args], [Arg2 | Args2]) :-
  embedSubPlans(Arg, Arg2),
  embedded(Args, Args2).


highestNode(Path, N) :-
  reverse(Path, Path2),
  Path2 = [costEdge(_, N, _, _, _, _) | _].


/*
9.4 Computing the Best Plan for a Given Predicate Order Graph

*/

:-assert(helpLine(bestPlan,0,[],'Show the best plan for the current POG.')).

bestPlan :-
  assignCosts,
  highNode(N),
  dijkstra(0, N, Path, Cost),
  plan(Path, Plan),
  write('The best plan is:'), nl, nl,
  wp(Plan),
  nl, nl,
  write('The cost is: '), write(Cost), nl.

bestPlan(Plan, Cost) :-
  assignCosts,
  highNode(N),
  dijkstra(0, N, Path, Cost),
  plan(Path, Plan).

/*
10 A Larger Example

It is now time to test efficiency with a larger example. We consider the query:

----    select *
        from Staedte, plz as p1, plz as p2, plz as p3,
        where SName = p1.Ort
          and p1.PLZ = p2.PLZ + 1
          and p2.PLZ = p3.PLZ * 5
          and Bev > 300000
          and Bev < 500000
          and p2.PLZ > 50000
          and p2.PLZ < 60000
          and Kennzeichen starts "W"
          and p3.Ort contains "burg"
          and p3.Ort starts "M"
----

This translates to:

*/

example6 :- pog(
  [rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)],
  [
    pr(attr(sName, 1, u) = attr(p1:ort, 2, u), 
         rel(staedte, *, u), rel(plz, p1, l)),
    pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), 
         rel(plz, p1, l), rel(plz, p2, l)),
    pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5), 
         rel(plz, p2, l), rel(plz, p3, l)),
    pr(attr(bev, 1, u) > 300000,  rel(staedte, *, u)),
    pr(attr(bev, 1, u) < 500000,  rel(staedte, *, u)),
    pr(attr(p2:pLZ, 1, u) > 50000,  rel(plz, p2, l)),
    pr(attr(p2:pLZ, 1, u) < 60000,  rel(plz, p2, l)),
    pr(attr(kennzeichen, 1, u) starts "W",  rel(staedte, *, u)),
    pr(attr(p3:ort, 1, u) contains "burg",  rel(plz, p3, l)),
    pr(attr(p3:ort, 1, u) starts "M",  rel(plz, p3, l))
  ],
  _, _).

/*
This doesn't work (initially, now it works). Let's keep the numbers a bit
smaller and avoid too many big joins first.

*/
example7 :- pog(
  [rel(staedte, *, u), rel(plz, p1, l)],
  [
    pr(attr(sName, 1, u) = attr(p1:ort, 2, u), 
       rel(staedte, *, u), rel(plz, p1, l)),
    pr(attr(bev, 0, u) > 300000,  rel(staedte, *, u)),
    pr(attr(bev, 0, u) < 500000,  rel(staedte, *, u)),
    pr(attr(p1:pLZ, 0, u) > 50000,  rel(plz, p1, l)),
    pr(attr(p1:pLZ, 0, u) < 60000,  rel(plz, p1, l)),
    pr(attr(kennzeichen, 0, u) starts "F",  rel(staedte, *, u)),
    pr(attr(p1:ort, 0, u) contains "burg",  rel(plz, p1, l)),
    pr(attr(p1:ort, 0, u) starts "M",  rel(plz, p1, l))
  ],
  _, _).

example8 :- pog(
  [rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l)],
  [
    pr(attr(sName, 1, u) = attr(p1:ort, 2, u), 
       rel(staedte, *, u), rel(plz, p1, l)),
    pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1),
        rel(plz, p1, l), rel(plz, p2, l)),
    pr(attr(bev, 0, u) > 300000,  rel(staedte, *, u)),
    pr(attr(bev, 0, u) < 500000,  rel(staedte, *, u)),
    pr(attr(p1:pLZ, 0, u) > 50000,  rel(plz, p1, l)),
    pr(attr(p1:pLZ, 0, u) < 60000,  rel(plz, p1, l)),
    pr(attr(kennzeichen, 0, u) starts "F",  rel(staedte, *, u)),
    pr(attr(p1:ort, 0, u) contains "burg",  rel(plz, p1, l)),
    pr(attr(p1:ort, 0, u) starts "M",  rel(plz, p1, l))
  ],
  _, _).

/*
Let's study a small example again with two independent conditions.

*/

example9 :- pog([rel(staedte, s, u), rel(plz, p, l)],
  [pr(attr(p:ort, 2, u) = attr(s:sName, 1, u),
        rel(staedte, s, u), rel(plz, p, l) ),
   pr(attr(p:pLZ, 0, u) > 40000, rel(plz, p, l)),
   pr(attr(s:bev, 0, u) > 300000, rel(staedte, s, u))], _, _).

example10 :- pog(
  [rel(staedte, *, u), rel(plz, p1, l), rel(plz, p2, l), rel(plz, p3, l)],
  [
    pr(attr(sName, 1, u) = attr(p1:ort, 2, u), 
       rel(staedte, *, u), rel(plz, p1, l)),
    pr(attr(p1:pLZ, 1, u) = (attr(p2:pLZ, 2, u) + 1), 
       rel(plz, p1, l), rel(plz, p2, l)),
    pr(attr(p2:pLZ, 1, u) = (attr(p3:pLZ, 2, u) * 5), 
      rel(plz, p2, l), rel(plz, p3, l))
  ],
  _, _).

/*
11 A User Level Language

We have started to construct the optimizer by building the predicate order
graph, using a notation for relations and predicates as useful for that
purpose. Later, in [Section Translation], we have adapted the notation to be
able to translate and construct query plans as needed in Secondo. In this
section we will introduce a more user friendly notation for queries, pretty
similar to SQL, but suitable for being written directly in PROLOG.

11.1 The Language

The basic select-from-where statement will be written as

----    select <attr-list>
        from <rel-list>
        where <pred-list>
----

The first example query from [Section 4.1.1] can then be written as:

----    select [sname, bev]
        from [staedte]
        where [bev > 500000]
----

Instead of lists consisting of a single element we will also support writing
just the element, hence the query can also be written:

----    select [sname, bev]
        from staedte
        where bev > 500000
----

The second query can be written as:

----    select *
        from [staedte as s, plz as p]
        where [sname = p:ort, p:plz > 40000]
----

Note that all relation names and attribute names are written just in lower
case; the system will lookup the spelling in a table.

Furthermore, it will be possible to add a groupby- and an orderby-clause:

  * groupby

----    select <aggr-list>
        from <rel-list>
        where <pred-list>
        groupby <group-attr-list>
----

Example:

----    select [ort, min(plz) as minplz, max(plz) as maxplz,  count(*) as cntplz]
        from plz
        where plz > 40000
        groupby ort
----

  * orderby

----    select <attr-list>
        from <rel-list>
        where <pred-list>
        orderby <order-attr-list>
----

Example:

----    select [ort, plz]
        from plz
        orderby [ort asc, plz desc]
----

This example also shows that the where-clause may be omitted. It is also
possible to combine grouping and ordering:

----    select [ort, min(plz) as minplz, max(plz) as maxplz,  count(*) as cntplz]
        from plz
        where plz > 40000
        groupby ort
        orderby cntplz desc
----

Currently only a basic part of this language has been implemented.


11.2 Structure

We introduce ~select~, ~from~, ~where~, and ~as~ as PROLOG operators:

*/

:- op(990, fx, sql).
:- op(990, fx, sql2).
:- op(985, xfx, >>).
:- op(950, fx, select).
:- op(960, xfx, from).
:- op(950, xfx, where).
:- op(930, xfx, as).
:- op(970, xfx, groupby).
:- op(980, xfx, orderby).
:- op(986, xfx, first).
:- op(930, xf, asc).
:- op(930, xf, desc).

/*
This ensures that the select-from-where statement is viewed as a term with the
structure:

----    from(select(AttrList), where(RelList, PredList))
----

That this works, can be tested with:

----    P = (select s:sname from staedte as s where s:bev > 500000),
        P = (X from Y), X = (select AttrList), Y = (RelList where PredList),
        RelList = (Rel as Var).
----

The result is:

----    P = select s:sname from staedte as s where s:bev>500000
        X = select s:sname
        Y = staedte as s where s:bev>500000
        AttrList = s:sname
        RelList = staedte as s
        PredList = s:bev>500000
        Rel = staedte
        Var = s
----

11.3 Schema Lookup

The second task is to lookup attribute names in order to build the input
notation for the construction of the predicate order graph.

11.3.1 Tables

In the file ~database~ we maintain the following tables.

Relation schemas are written as:

----    relation(staedte, [sname, bev, plz, vorwahl, kennzeichen]).
        relation(plz, [plz, ort]).
----

The spelling of relation or attribute names is given in a table

----    spelling(staedte:plz, pLZ).
        spelling(staedte:sname, sName).
        spelling(plz, lc(plz)).
        spelling(plz:plz, pLZ).
----

The default assumption is that the first letter of a name is upper case and all
others are lower case. If this is true, then no entry in the table ~spelling~
is needed. If a name starts with a lower case letter, then this is expressed by
the functor ~lc~.

11.3.2 Looking up Relation and Attribute Names

*/

callLookup(Query, Query2) :-
  newQuery,
  lookup(Query, Query2), !.

newQuery :- not(clearVariables), not(clearQueryRelations),
  not(clearQueryAttributes).

clearVariables :- retract(variable(_, _)), fail.

clearQueryRelations :- retract(queryRel(_, _)), fail.

clearQueryAttributes :- retract(queryAttr(_)), fail.

/*

----    lookup(Query, Query2) :-
----

~Query2~ is a modified version of ~Query~ where all relation names and
attribute names have the form as required in [Section Translation].

*/

lookup(select Attrs from Rels where Preds,
        select Attrs2 from Rels2List where Preds2List) :-
  lookupRels(Rels, Rels2),
  lookupAttrs(Attrs, Attrs2),
  lookupPreds(Preds, Preds2),
  makeList(Rels2, Rels2List),
  makeList(Preds2, Preds2List).

lookup(select Attrs from Rels,
        select Attrs2 from Rels2) :-
  lookupRels(Rels, Rels2),
  lookupAttrs(Attrs, Attrs2).

lookup(Query orderby Attrs, Query2 orderby Attrs3) :-
  lookup(Query, Query2),
  makeList(Attrs, Attrs2),
  lookupAttrs(Attrs2, Attrs3).

lookup(Query groupby Attrs, Query2 groupby Attrs3) :-
  lookup(Query, Query2),
  makeList(Attrs, Attrs2),
  lookupAttrs(Attrs2, Attrs3).

lookup(Query first N, Query2 first N) :-
  lookup(Query, Query2).


makeList(L, L) :- is_list(L).

makeList(L, [L]) :- not(is_list(L)).

/*

11.3.3 Modification of the From-Clause

----    lookupRels(Rels, Rels2)
----

Modify the list of relation names. If there are relations without variables,
store them in a table ~queryRel~. Any two such relations must have distinct
sets of attribute names. Also, any two variables must be distinct.

*/

lookupRels([], []).

lookupRels([R | Rs], [R2 | R2s]) :-
  lookupRel(R, R2),
  lookupRels(Rs, R2s).

lookupRels(Rel, Rel2) :-
  not(is_list(Rel)),
  lookupRel(Rel, Rel2).

/*
----    lookupRel(Rel, Rel2) :-
----

Translate and store a single relation definition.

*/

:- dynamic
  variable/2,
  queryRel/2,
  queryAttr/1.

lookupRel(Rel as Var, rel(Rel2, Var, Case)) :-
  relation(Rel, _), !,
  spelled(Rel, Rel2, Case),
  not(defined(Var)),
  assert(variable(Var, rel(Rel2, Var, Case))).

lookupRel(Rel, rel(Rel2, *, Case)) :-
  relation(Rel, _), !,
  spelled(Rel, Rel2, Case),
  not(duplicateAttrs(Rel)),
  assert(queryRel(Rel, rel(Rel2, *, Case))).

lookupRel(Term, Term) :-
  write('Error in query: relation '), write(Term), write(' not known'),
  nl, fail.

defined(Var) :-
  variable(Var, _),
  write('Error in query: doubly defined variable '), write(Var), write('.'), nl.

/*
----    duplicateAttrs(Rel) :-
----

There is a relation stored in ~queryRel~ that has attribute names also
occurring in ~Rel~.

*/

duplicateAttrs(Rel) :-
  queryRel(Rel2, _),
  relation(Rel2, Attrs2),
  member(Attr, Attrs2),
  relation(Rel, Attrs),
  member(Attr, Attrs),
  write('Error in query: duplicate attribute names in relations '),
  write(Rel2), write(' and '), write(Rel), write('.'), nl.

/*
11.3.4 Modification of the Select-Clause

*/

lookupAttrs([], []).

lookupAttrs([A | As], [A2 | A2s]) :-
  lookupAttr(A, A2),
  lookupAttrs(As, A2s).

lookupAttrs(Attr, Attr2) :-
  not(is_list(Attr)),
  lookupAttr(Attr, Attr2).

lookupAttr(Var:Attr, attr(Var:Attr2, 0, Case)) :- !,
  variable(Var, Rel2),
  Rel2 = rel(Rel, _, _),
  spelled(Rel:Attr, attr(Attr2, _, Case)).

lookupAttr(Attr asc, Attr2 asc) :- !,
  lookupAttr(Attr, Attr2).

lookupAttr(Attr desc, Attr2 desc) :- !,
  lookupAttr(Attr, Attr2).

lookupAttr(Attr, Attr2) :-
  isAttribute(Attr, Rel), !,
  spelled(Rel:Attr, Attr2).

lookupAttr(*, *) :- !.

lookupAttr(count(*), count(*)) :- !.

lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :-
  lookupAttr(Expr, Expr2),
  not(queryAttr(attr(Name, 0, u))),
  !,
  assert(queryAttr(attr(Name, 0, u))).

lookupAttr(Expr as Name, Expr2 as attr(Name, 0, u)) :-
  lookupAttr(Expr, Expr2),
  queryAttr(attr(Name, 0, u)),
  !,
  write('***** Error: attribute name '), write(Name),
  write(' doubly defined in query.'),
  nl.

lookupAttr(Term, Term2) :-
  compound(Term),
  functor(Term, Op, 1),
  arg(1, Term, Arg1),
  lookupAttr(Arg1, Res1),
  functor(Term2, Op, 1),
  arg(1, Term2, Res1).

lookupAttr(Term, Term2) :-
  compound(Term),
  functor(Term, Op, 2),
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  lookupAttr(Arg1, Res1),
  lookupAttr(Arg2, Res2),
  functor(Term2, Op, 2),
  arg(1, Term2, Res1),
  arg(2, Term2, Res2).

% may need to be extended to more than two arguments in a term.


lookupAttr(Name, attr(Name, 0, u)) :-
  queryAttr(attr(Name, 0, u)),
  !.

lookupAttr(Term, Term) :-
  atom(Term),
  write('Symbol '),
  write(Term),
  write(' in attribute list not recognized. Supposed to be a Secondo object ').

lookupAttr(Term, Term).

isAttribute(Name, Rel) :-
  queryRel(Rel, _),
  relation(Rel, List),
  member(Name, List).


/*
11.3.5 Modification of the Where-Clause

*/

lookupPreds([], []).

lookupPreds([P | Ps], [P2 | P2s]) :- !,
  lookupPred(P, P2),
  lookupPreds(Ps, P2s).

lookupPreds(Pred, Pred2) :-
  not(is_list(Pred)),
  lookupPred(Pred, Pred2).


lookupPred(Pred, pr(Pred2, Rel)) :-
  lookupPred1(Pred, Pred2, 0, [], 1, [Rel]), !.

lookupPred(Pred, pr(Pred2, Rel1, Rel2)) :-
  lookupPred1(Pred, Pred2, 0, [], 2, [Rel1, Rel2]), !.

lookupPred(Pred, _) :-
  lookupPred1(Pred, _, 0, [], 0, []),
  write('Error in query: constant predicate is not allowed.'), nl, fail, !.

lookupPred(Pred, _) :-
  lookupPred1(Pred, _, 0, [], N, _),
  N > 2,
  write('Error in query: predicate involving more than two relations '),
  write('is not allowed.'), nl, fail.

/*
----    lookupPred1(+Pred, Pred2, +N, +RelsBefore, -M, -RelsAfter) :-
----

~Pred2~ is the transformed version of ~Pred~; before this is called, ~N~
attributes in list ~RelsBefore~ have been found; after the transformation in
total ~M~ attributes referring to the relations in list ~RelsAfter~ have been
found.

*/

lookupPred1(Var:Attr, attr(Var:Attr2, N1, Case), N, RelsBefore, N1, RelsAfter)
  :-
  variable(Var, Rel2), !,   Rel2 = rel(Rel, _, _),
  spelled(Rel:Attr, attr(Attr2, _, Case)),
  N1 is N + 1,
  append(RelsBefore, [Rel2], RelsAfter).

lookupPred1(Attr, attr(Attr2, N1, Case), N, RelsBefore, N1, RelsAfter) :-
  isAttribute(Attr, Rel), !,
  spelled(Rel:Attr, attr(Attr2, _, Case)),
  queryRel(Rel, Rel2),
  N1 is N + 1,
  append(RelsBefore, [Rel2], RelsAfter).

lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
  compound(Term),
  functor(Term, F, 1), !,
  arg(1, Term, Arg1),
  lookupPred1(Arg1, Arg1Out, N, RelsBefore, M, RelsAfter),
  functor(Term2, F, 1),
  arg(1, Term2, Arg1Out).

lookupPred1(Term, Term2, N, RelsBefore, M, RelsAfter) :-
  compound(Term),
  functor(Term, F, 2), !,
  arg(1, Term, Arg1),
  arg(2, Term, Arg2),
  lookupPred1(Arg1, Arg1Out, N, RelsBefore, M1, RelsAfter1),
  lookupPred1(Arg2, Arg2Out, M1, RelsAfter1, M, RelsAfter),
  functor(Term2, F, 2),
  arg(1, Term2, Arg1Out),
  arg(2, Term2, Arg2Out).

% may need to be extended to operators with more than two arguments.

lookupPred1(Term, Term, N, Rels, N, Rels) :-
  atom(Term),
  not(is_list(Term)),
  write('Symbol '), write(Term),
  write(' not recognized, supposed to be a Secondo object.'), nl, !.

lookupPred1(Term, Term, N, Rels, N, Rels).


/*
11.3.6 Check the Spelling of Relation and Attribute Names

*/

spelled(Rel:Attr, attr(Attr2, 0, l)) :-
  downcase_atom(Rel, DCRel),
  downcase_atom(Attr, DCAttr),
  spelling(DCRel:DCAttr, Attr3),
  Attr3 = lc(Attr2),
  !.

spelled(Rel:Attr, attr(Attr2, 0, u)) :-
  downcase_atom(Rel, DCRel),
  downcase_atom(Attr, DCAttr),
  spelling(DCRel:DCAttr, Attr2),
  !.

spelled(_:_, attr(_, 0, _)) :- !, fail. % no attr entry in spelling table

spelled(Rel, Rel2, l) :-
  downcase_atom(Rel, DCRel),
  spelling(DCRel, Rel3),
  Rel3 = lc(Rel2),
  !.

spelled(Rel, Rel2, u) :-
  downcase_atom(Rel, DCRel),
  spelling(DCRel, Rel2), !.

spelled(_, _, _) :- !, fail.  % no rel entry in spelling table.


/*
10.3.7 Examples

We can now formulate several of the previous queries at the user level.

*/

example11 :- showTranslate(select [sname, bev] from staedte where bev > 500000).

showTranslate(Query) :-
  callLookup(Query, Query2),
  write(Query), nl,
  write(Query2), nl.

example12 :- showTranslate(
  select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000]
  ).

example13 :- showTranslate(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5,
    bev > 300000,
    bev < 500000,
    p2:plz > 50000,
    p2:plz < 60000,
    kennzeichen starts "W",
    p3:ort contains "burg",
    p3:ort starts "M"]
  ).

/*
11.4 Translating a Query to a Plan

----    translate(Query, Stream, SelectClause, Cost) :-
----

~Query~ is translated into a ~Stream~ to which still the translation of the
~SelectClause~ needs to be applied. A ~Cost~ is returned which currently is
only the cost for evaluating the essential part, the conjunctive query.

*/

translate(Query groupby Attrs,
          groupby(sortby(Stream, AttrNamesSort), AttrNamesGroup, Fields),
          select Select2, Cost) :-
  translate(Query, Stream, SelectClause, Cost),
  makeList(Attrs, Attrs2),
  attrnames(Attrs2, AttrNamesGroup),
  attrnamesSort(Attrs2, AttrNamesSort),
  SelectClause = (select Select),
  makeList(Select, SelAttrs),
  translateFields(SelAttrs, Attrs2, Fields, Select2),
  !.

translate(Select from Rels where Preds, Stream, Select, Cost) :-
  pog(Rels, Preds, _, _),
  bestPlan(Stream, Cost),
  !.

translate(Select from Rel, feed(Rel), Select, 0) :-
  not(is_list(Rel)),
  !.

translate(Select from [Rel], feed(Rel), Select, 0).

translate(Select from [Rel | Rels], product(feed(Rel), Stream), Select, 0) :-
  translate(Select from Rels, Stream, Select, _).


/*
----    translateFields(Select, GroupAttrs, Fields, Select2) :-
----

Translate the ~Select~ clause of a query containing ~groupby~. Grouping
was done by the attributes ~GroupAttrs~. Return a list ~Fields~ of terms
of the form ~field(Name, Expr)~; such a list can be used as an argument to the
groupby operator. Also, return a modified select clause ~Select2~,
which will translate to a corresponding projection operation.

*/

translateFields([], _, [], []).

translateFields([count(*) as NewAttr | Select], GroupAttrs,
        [field(NewAttr , count(feed(group))) | Fields], [NewAttr | Select2]) :-
  translateFields(Select, GroupAttrs, Fields, Select2),
  !.

translateFields([sum(attr(Name, Var, Case)) as NewAttr | Select], GroupAttrs,
        [field(NewAttr, sum(feed(group), 
         attrname(attr(Name, Var, Case)))) | Fields],
        [NewAttr| Select2]) :-
  translateFields(Select, GroupAttrs, Fields, Select2),
  !.

translateFields([sum(Expr) as NewAttr | Select], GroupAttrs,
        [field(NewAttr,
          sum(
            extend(feed(group), field(attr(xxxExprField, 0, l), Expr)),
            attrname(attr(xxxExprField, 0, l))
            ))
        | Fields],
        [NewAttr| Select2]) :-
  translateFields(Select, GroupAttrs, Fields, Select2),
  !.

translateFields([Attr | Select], GroupAttrs, Fields, [Attr | Select2]) :-
  member(Attr, GroupAttrs),
  !,
  translateFields(Select, GroupAttrs, Fields, Select2).


/*
Generic rule for aggregate functions, similar to sum.

*/

translateFields([Term as NewAttr | Select], GroupAttrs,
        [field(NewAttr, Term2) | Fields],
        [NewAttr| Select2]) :-
  compound(Term),
  functor(Term, AggrOp, 1),
  arg(1, Term, attr(Name, Var, Case)),
  member(AggrOp, [min, max, avg]),
  functor(Term2, AggrOp, 2),
  arg(1, Term2, feed(group)),
  arg(2, Term2, attrname(attr(Name, Var, Case))),
  translateFields(Select, GroupAttrs, Fields, Select2),
  !.

translateFields([Term as NewAttr | Select], GroupAttrs,
        [field(NewAttr, Term2) | Fields],
        [NewAttr| Select2]) :-
  compound(Term),
  functor(Term, AggrOp, 1),
  arg(1, Term, Expr),
  member(AggrOp, [min, max, avg]),
  functor(Term2, AggrOp, 2),
  arg(1, Term2, extend(feed(group), field(attr(xxxExprField, 0, l), Expr))),
  arg(2, Term2, attrname(attr(xxxExprField, 0, l))),
  translateFields(Select, GroupAttrs, Fields, Select2),
  !.


translateFields([Term | Select], GroupAttrs,
        Fields,
        Select2) :-
  compound(Term),
  functor(Term, AggrOp, 1),
  arg(1, Term, Attr),
  member(AggrOp, [count, sum, min, max, avg]),
  functor(Term2, AggrOp, 2),
  arg(1, Term2, feed(group)),
  arg(2, Term2, attrname(Attr)),
  translateFields(Select, GroupAttrs, Fields, Select2),
  write('*****'), nl,
  write('***** Error in groupby: missing name for new attribute'), nl,
  write('*****'), nl,
  !.


translateFields([Attr | Select], GroupAttrs, Fields, Select2) :-
  not(member(Attr, GroupAttrs)),
  !,
  translateFields(Select, GroupAttrs, Fields, Select2),
  write('*****'), nl,
  write('***** Error in groupby: '),
  write(Attr),
  write(' is neither a grouping attribute'), nl,
  write('      nor an aggregate expression.'), nl,
  write('*****'), nl.


/*

----    queryToPlan(Query, Plan, Cost) :-
----

Translate the ~Query~ into a ~Plan~. The ~Cost~ for evaluating the conjunctive
query is also returned. The ~Query~ must be such that relation and attribute
names have been looked up already.

*/

queryToPlan(Query, Stream, Cost) :-
  countQuery(Query),
  queryToStream(Query, Stream, Cost),
  !.

queryToPlan(Query, consume(Stream), Cost) :-
  queryToStream(Query, Stream, Cost).


/*
----    countQuery(Query) :-
----

Check whether ~Query~ is a counting query.

*/


countQuery(select count(*) from _) :- !.

countQuery(Query groupby _) :-
  countQuery(Query).

countQuery(Query orderby _) :-
  countQuery(Query).

/*

----    queryToStream(Query, Plan, Cost) :-
----

Same as ~queryToPlan~, but returns a stream plan, if possible.

*/

queryToStream(Query first N, head(Stream, N), Cost) :-
  queryToStream(Query, Stream, Cost),
  !.

queryToStream(Query orderby SortAttrs, Stream2, Cost) :-
  translate2(Query, Stream, Select, Cost),
  finish(Stream, Select, SortAttrs, Stream2),
  !.

queryToStream(Query, Stream2, Cost) :-
  translate2(Query, Stream, Select, Cost),
  finish(Stream, Select, [], Stream2).

/*
  Entropy stuff.

*/

translate2(Query, Stream2, Select, Cost2) :-
  deleteSmallResults,
  retractall(highNode(_)),assert(highNode(0)),
  translate(Query, Stream1, Select, Cost1), !,
  try_entropy(Stream1, Stream2, Cost1, Cost2), !,
  warn_plan_changed(Stream1, Stream2).

try_entropy(Stream1, Stream2, Cost1, Cost2) :-
  useEntropy, highNode(HN), HN > 1, HN < 256, !,
  nl, write('*** Trying to use the Entropy-approach ***********' ), nl, !,
  plan_to_atom(Stream1, FirstQuery),
  prepare_query_small(Stream1, PlanSmall),
  plan_to_atom(PlanSmall, SmallQuery),
  write('The plan in small database is: '), nl, write(SmallQuery), nl, nl,
  write('Executing the query in the small database...'),
  deleteEntropyNodes, !,
  query(SmallQuery), !, nl,
  assignEntropyCost, !,
  write('First Plan:'), nl, write( FirstQuery ), nl, nl,
  write('Estimated Cost: '), write(Cost1), nl, nl,
  entropyBestPlan(Stream2, Cost2).

try_entropy(Stream1, Stream1, Cost1, Cost1).

entropyBestPlan(Plan,Cost) :-
  deleteCounters,
  highNode(N),
  dijkstra(0, N, Path, Cost),
  plan(Path, Plan).

warn_plan_changed(Plan1, Plan2) :-
  not(Plan1 = Plan2),
  nl,
  write( '*******************************************************' ), nl,
  write( '* * *  INITIAL PLAN CHANGED BY ENTROPY APPROACH!  * * *' ), nl,
  write( '*******************************************************' ), nl.

warn_plan_changed(_,_).


/*
----    finish(Stream, Select, Sort, Stream2) :-
----

Given a ~Stream~, a ~Select~ clause, and a set of attributes for sorting,
apply the final tranformations (extend, sort, project) to obtain ~Stream2~.

*/

finish(Stream, Select, Sort, Stream2) :-
  selectClause(Select, Extend, Project),
  finish2(Stream, Extend, Sort, Project, Stream2).


selectClause(select *, [], *).

selectClause(select count(*), [], count(*)).

selectClause(select Attrs, Extend, Project) :-
  makeList(Attrs, Attrs2),
  extendProject(Attrs2, Extend, Project).


finish2(Stream, Extend, Sort, Project, Stream4) :-
  fExtend(Stream, Extend, Stream2),
  fSort(Stream2, Sort, Stream3),
  fProject(Stream3, Project, Stream4).


fExtend(Stream, [], Stream) :- !.

fExtend(Stream, Extend, extend(Stream, Extend)).


fSort(Stream, [], Stream) :- !.

fSort(Stream, SortAttrs, sortby(Stream, AttrNames)) :-
  attrnamesSort(SortAttrs, AttrNames).


fProject(Stream, *, Stream) :- !.

fProject(Stream, count(*), count(Stream)) :- !.

fProject(Stream, Project, project(Stream, AttrNames)) :-
  attrnames(Project, AttrNames).


extendProject([], [], []).

extendProject([Expr as Name | Attrs], [field(Name, Expr) | Extend],
        [Name | Project]) :-
  !,
  extendProject(Attrs, Extend, Project).

extendProject([attr(Name, Var, Case) | Attrs], Extend,
        [attr(Name, Var, Case) | Project]) :-
  extendProject(Attrs, Extend, Project).


/*

----    attrnames(Attrs, AttrNames) :-
----

Transform each attribute X into attrname(X).

*/

attrnames([], []).

attrnames([Attr | Attrs], [attrname(Attr) | AttrNames]) :-
  attrnames(Attrs, AttrNames).

/*

----    attrnamesSort(Attrs, AttrNames) :-
----

Transform attribute names of orderby clause.

*/

attrnamesSort([], []).

attrnamesSort([Attr | Attrs], [Attr2 | Attrs2]) :-
  attrnameSort(Attr, Attr2),
  attrnamesSort(Attrs, Attrs2).

attrnameSort(Attr asc, attrname(Attr) asc) :- !.

attrnameSort(Attr desc, attrname(Attr) desc) :- !.

attrnameSort(Attr, attrname(Attr) asc).


/*


11.3.8 Integration with Optimizer

----    optimize(Query).
----

Optimize ~Query~ and print the best ~Plan~.

*/

optimize(Query) :-
  callLookup(Query, Query2),
  queryToPlan(Query2, Plan, Cost),
  plan_to_atom(Plan, SecondoQuery),
  write('The plan is: '), nl, nl,
  write(SecondoQuery), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl.


optimize(Query, QueryOut, CostOut) :-
  callLookup(Query, Query2),
  queryToPlan(Query2, Plan, CostOut),
  plan_to_atom(Plan, QueryOut).

/*
----    sqlToPlan(QueryText, Plan)
----

Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom.

*/
sqlToPlan(QueryText, Plan) :-
  term_to_atom(sql Query, QueryText),
  optimize(Query, Plan, _).


/*
----    sqlToPlan(QueryText, Plan)
----

Transform an SQL ~QueryText~ into a ~Plan~. The query is given as a text atom.
~QueryText~ starts not with sql in this version.

*/
sqlToPlan(QueryText, Plan) :-
  term_to_atom(Query, QueryText),
  optimize(Query, Plan, _).


/*
11.3.8 Examples

We can now formulate the previous example queries in the user level language.


Example3:

*/

example14 :- optimize(
  select * from [staedte as s, plz as p] where
    [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0]
  ).

example14(Query, Cost) :- optimize(
  select * from [staedte as s, plz as p] where 
           [p:ort = s:sname, p:plz > 40000, (p:plz mod 5) = 0],
  Query, Cost
  ).


/*
Example4:

*/
example15 :- optimize(
  select * from staedte where bev > 500000
  ).

example15(Query, Cost) :- optimize(
  select * from staedte where bev > 500000,
  Query, Cost
  ).

/*
Example5:

*/
example16 :-  optimize(
  select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000]
  ).

example16(Query, Cost) :-  optimize(
  select * from [staedte as s, plz as p] where [s:sname = p:ort, p:plz > 40000],
  Query, Cost
  ).


/*
Example6. This may need a larger local stack size. Start Prolog as

----    pl -L4M
----

which initializes the local stack to 4 MB.

*/
example17 :- optimize(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5,
    bev > 300000,
    bev < 500000,
    p2:plz > 50000,
    p2:plz < 60000,
    kennzeichen starts "W",
    p3:ort contains "burg",
    p3:ort starts "M"]
  ).

example17(Query, Cost) :- optimize(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5,
    bev > 300000,
    bev < 500000,
    p2:plz > 50000,
    p2:plz < 60000,
    kennzeichen starts "W",
    p3:ort contains "burg",
    p3:ort starts "M"],
  Query, Cost
  ).


/*
Example 18:

*/
example18 :- optimize(
  select *
  from [staedte, plz as p1]
  where [
    sname = p1:ort,
    bev > 300000,
    bev < 500000,
    p1:plz > 50000,
    p1:plz < 60000,
    kennzeichen starts "W",
    p1:ort contains "burg",
    p1:ort starts "M"]
  ).

example18(Query, Cost) :- optimize(
  select *
  from [staedte, plz as p1]
  where [
    sname = p1:ort,
    bev > 300000,
    bev < 500000,
    p1:plz > 50000,
    p1:plz < 60000,
    kennzeichen starts "W",
    p1:ort contains "burg",
    p1:ort starts "M"],
  Query, Cost
  ).

/*
Example 19:

*/
example19 :- optimize(
  select *
  from [staedte, plz as p1, plz as p2]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    bev > 300000,
    bev < 500000,
    p1:plz > 50000,
    p1:plz < 60000,
    kennzeichen starts "W",
    p1:ort contains "burg",
    p1:ort starts "M"]
  ).

example19(Query, Cost) :- optimize(
  select *
  from [staedte, plz as p1, plz as p2]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    bev > 300000,
    bev < 500000,
    p1:plz > 50000,
    p1:plz < 60000,
    kennzeichen starts "W",
    p1:ort contains "burg",
    p1:ort starts "M"],
  Query, Cost
  ).


/*
Example 20:

*/
example20 :- optimize(
  select *
  from [staedte as s, plz as p]
  where [
    p:ort = s:sname,
    p:plz > 40000,
    s:bev > 300000]
  ).

example20(Query, Cost) :- optimize(
  select *
  from [staedte as s, plz as p]
  where [
    p:ort = s:sname,
    p:plz > 40000,
    s:bev > 300000],
  Query, Cost
  ).

/*
Example 21:

*/
example21 :- optimize(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5]
  ).

example21(Query, Cost) :- optimize(
  select *
  from [staedte, plz as p1, plz as p2, plz as p3]
  where [
    sname = p1:ort,
    p1:plz = p2:plz + 1,
    p2:plz = p3:plz * 5],
  Query, Cost
  ).

/*

12 Optimizing and Calling Secondo

----    sql Term
        sql(Term, SecondoQueryRest)
        let(X, Term)
        let(X, Term, SecondoQueryRest)
----

~Term~ must be one of the available select-from-where statements.
It is optimized and Secondo is called to execute it. ~SecondoQueryRest~
is a character string (atom) containing a sequence of Secondo
operators that can be appended to a given
plan found by the optimizer; in this case the optimizer returns a
plan producing a stream.

The two versions of ~let~ allow one to assign the result of a query
to a new object ~X~, using the optimizer.

*/
sql Term :-
  isDatabaseOpen,
  mOptimize(Term, Query, Cost),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  query(Query).

sql(Term, SecondoQueryRest) :-
  isDatabaseOpen,
  mStreamOptimize(Term, SecondoQuery, Cost),
  my_concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  query(Query).

sql2 Term :-
  isDatabaseOpen,
  use_entropy,
  mOptimize(Term, Query, Cost),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  query(Query),
  dont_use_entropy.

sql2(Term, SecondoQueryRest) :-
  isDatabaseOpen,
  use_entropy,
  mStreamOptimize(Term, SecondoQuery, Cost),
  my_concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  query(Query),
  dont_use_entropy.

let(X, Term) :-
  isDatabaseOpen,
  mOptimize(Term, Query, Cost),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  my_concat_atom(['let ', X, ' = ', Query], '', Command),
  secondo(Command).

let(X, Term, SecondoQueryRest) :-
  isDatabaseOpen,
  mStreamOptimize(Term, SecondoQuery, Cost),
  my_concat_atom([SecondoQuery, ' ', SecondoQueryRest], '', Query),
  nl, write('The best plan is: '), nl, nl, write(Query), nl, nl,
  write('Estimated Cost: '), write(Cost), nl, nl,
  my_concat_atom(['let ', X, ' = ', Query], '', Command),
  secondo(Command).


/*
----    streamOptimize(Term, Query, Cost) :-
----

Optimize the ~Term~ producing an incomplete Secondo query plan ~Query~
returning a stream.

*/
streamOptimize(Term, Query, Cost) :-
  callLookup(Term, Term2),
  queryToStream(Term2, Plan, Cost),
  plan_to_atom(Plan,  Query).

/*
----    mOptimize(Term, Query, Cost) :-
        mStreamOptimize(union [Term], Query, Cost) :-
----

Means ``multi-optimize''. Optimize a ~Term~ possibly consisting of several subexpressions to be independently optimized, as in union and intersection queries. ~mStreamOptimize~ is a variant
returning a stream.

*/

:-op(800, fx, union).
:-op(800, fx, intersection).

mOptimize(union Terms, Query, Cost) :-
  mStreamOptimize(union Terms, Plan, Cost),
  my_concat_atom([Plan, 'consume'], '', Query).

mOptimize(intersection Terms, Query, Cost) :-
  mStreamOptimize(intersection Terms, Plan, Cost),
  my_concat_atom([Plan, 'consume'], '', Query).

mOptimize(Term, Query, Cost) :-
  optimize(Term, Query, Cost).

mStreamOptimize(union [Term], Query, Cost) :-
  streamOptimize(Term, QueryPart, Cost),
  my_concat_atom([QueryPart, 'sort rdup '], '', Query).

mStreamOptimize(union [Term | Terms], Query, Cost) :-
  streamOptimize(Term, Plan1, Cost1),
  mStreamOptimize(union Terms, Plan2, Cost2),
  my_concat_atom([Plan1, 'sort rdup ', Plan2, 'mergeunion '], '', Query),
  Cost is Cost1 + Cost2.

mStreamOptimize(intersection [Term], Query, Cost) :-
  streamOptimize(Term, QueryPart, Cost),
  my_concat_atom([QueryPart, 'sort rdup '], '', Query).

mStreamOptimize(intersection [Term | Terms], Query, Cost) :-
  streamOptimize(Term, Plan1, Cost1),
  mStreamOptimize(intersection Terms, Plan2, Cost2),
  my_concat_atom([Plan1, 'sort rdup ', Plan2, 'mergesec '], '', Query),
  Cost is Cost1 + Cost2.

mStreamOptimize(Term, Query, Cost) :-
  streamOptimize(Term, Query, Cost).

/*
Some auxiliary stuff.

*/

listCounters :-
  secondo('list counters').

bestPlanCount :-
  bestPlan(P, _),
  plan_to_atom(P, S),
  atom_concat(S, ' count', Q),
  nl, write(Q), nl,
  query(Q).

bestPlanConsume :-
  bestPlan(P, _),
  plan_to_atom(P, S),
  atom_concat(S, ' consume', Q),
  nl, write(Q), nl,
  query(Q).

desplay(int, N) :-
  !,
  write(N),nl,
  fail.

entropySel( 0, Target, Sel ) :-
  entropy_node( Target, Sel ).

entropySel( Source, Target, Sel ) :-
  entropy_node( Source, P1 ),
  entropy_node( Target, P2 ),
  P1 > 0,
  Sel is P2 / P1.
/*
Now it is assuming an implicit order. Should be altered to work in the same way as conditional probabilities

*/

createMarginalProbabilities( MP ) :-
  createMarginalProbability( 0, MP ).

createMarginalProbability( N, [Sel|T] ) :-
  edgeSelectivity(N, M, Sel),
  createMarginalProbability( M, T ).

createMarginalProbability( _, [] ).

createJointProbabilities( JP ) :-
  createJointProbability( 0, 1, [_|JP] ).

createJointProbability( N0, AccSel, [[N1,CP1]|T] ) :-
  small_cond_sel( N0, N1, _, Sel ),
  CP1 is Sel * AccSel,
  createJointProbability( N1, CP1, T ).

createJointProbability( _, _, [] ).


assignEntropyCost :-
  createSmallResultSizes, !,
  createSmallSelectivity, !,
  createMarginalProbabilities( MP ),!,
  createJointProbabilities( JP ),!,
  saveFirstSizes,
  deleteSizes,
  deleteCostEdges,
  maximize_entropy(MP, JP, Result), !,
  createEntropyNode(Result),
  assignEntropySizes,
  write( MP ), write(', ') , write( JP ), nl, nl,
  createCostEdges.

assignEntropySizes :- not(assignEntropySizes1).

assignEntropySizes1 :-
  edge(Source, Target, Term, Result, _, _),
  assignEntropySize(Source, Target, Term, Result),
  fail.

assignEntropySize(Source, Target, select(Arg, _), Result) :-
  resSize(Arg, Card),
  entropySel(Source, Target, Sel),
  Size is Card * Sel,
  setNodeSize(Result, Size),
  assert(edgeSelectivity(Source, Target, Sel)).

assignEntropySize(Source, Target, join(Arg1, Arg2, _), Result) :-
  resSize(Arg1, Card1),
  resSize(Arg2, Card2),
  entropySel(Source, Target, Sel),
  Size is Card1 * Card2 * Sel,
  setNodeSize(Result, Size),
  assert(edgeSelectivity(Source, Target, Sel)).

deleteEntropyNodes :-
  retractall(entropy_node(_,_)).

createEntropyNode( [] ).
createEntropyNode( [[N,E]|L] ) :-
  assert(entropy_node(N,E)),
  createEntropyNode( L ).

:- dynamic
   smallResultSize/2,
   smallResultCounter/4,
   entropy_node/2,
   small_cond_sel/4,
   useEntropy/0,
   firstResultSize/2,
   firstEdgeSelectivity/3.

useEntropy.

use_entropy :-
  assert(useEntropy).

dont_use_entropy :-
  retractall(useEntropy).

deleteSmallResults :-
  deleteSmallResultCounter,
  deleteSmallResultSize,
  deleteSmallSelectivity,
  deleteFirstSizes.

saveFirstSizes :-
  not(copyFirstResultSize), !,
  not(copyFirstEdgeSelectivity).

copyFirstResultSize :-
  resultSize(Result, Size),
  assert(firstResultSize(Result, Size)),
  fail.

copyFirstEdgeSelectivity :-
  edgeSelectivity(Source, Target, Sel),
  assert(firstEdgeSelectivity(Source, Target, Sel)),
  fail.

deleteFirstSizes :-
  retractall(firstResultSize(_,_)),
  retractall(firstEdgeSelectivity(_,_,_)).

quit :-
  halt.

argList( 1, [_] ).
argList( N, [_|L] ) :-
  N1 is N-1,
  argList( N1, L ).

showValues( Pred, Arity ) :-
  not(showValues2( Pred, Arity )).

showValues2( Pred, Arity ) :-
  argList( Arity, L ),
  P=..[Pred|L], !, P, nl, write( P ), fail.