321 lines
11 KiB
Plaintext
321 lines
11 KiB
Plaintext
|
|
/*
|
||
|
|
----
|
||
|
|
This file is part of SECONDO.
|
||
|
|
|
||
|
|
Copyright (C) 2004, University in Hagen, Department of Computer Science,
|
||
|
|
Database Systems for New Applications.
|
||
|
|
|
||
|
|
SECONDO is free software; you can redistribute it and/or modify
|
||
|
|
it under the terms of the GNU General Public License as published by
|
||
|
|
the Free Software Foundation; either version 2 of the License, or
|
||
|
|
(at your option) any later version.
|
||
|
|
|
||
|
|
SECONDO is distributed in the hope that it will be useful,
|
||
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||
|
|
GNU General Public License for more details.
|
||
|
|
|
||
|
|
You should have received a copy of the GNU General Public License
|
||
|
|
along with SECONDO; if not, write to the Free Software
|
||
|
|
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
||
|
|
----
|
||
|
|
|
||
|
|
//paragraph [1] Title: [{\Large \bf \begin{center}] [\end{center}}]
|
||
|
|
//paragraph [2] Center: [{\begin{center}] [\end{center}}]
|
||
|
|
//paragraph [10] Footnote: [{\footnote{] [}}]
|
||
|
|
//paragraph [44] table4columns: [\begin{quote}\begin{tabular}{llll}] [\end{tabular}\end{quote}]
|
||
|
|
|
||
|
|
//characters [20] verbatim: [\verb@] [@]
|
||
|
|
//characters [21] formula: [$] [$]
|
||
|
|
//characters [22] capital: [\textsc{] [}]
|
||
|
|
//characters [23] teletype: [\texttt{] [}]
|
||
|
|
|
||
|
|
//[--------] [\hline]
|
||
|
|
//[TOC] [\tableofcontents]
|
||
|
|
//[p] [\par]
|
||
|
|
//[@] [\@]
|
||
|
|
//[LISTING-SH] [\lstsetSH]
|
||
|
|
|
||
|
|
[1] A quick introduction into the PostgreSQL DBMS
|
||
|
|
|
||
|
|
|
||
|
|
[2] Database Systems for new Applications [p]
|
||
|
|
University of Hagen [p]
|
||
|
|
http://www.informatik.fernuni-hagen.de/secondo [p]
|
||
|
|
|
||
|
|
|
||
|
|
Author: M. Spiekermann, Last Changes: 2007-02-13
|
||
|
|
|
||
|
|
[TOC]
|
||
|
|
|
||
|
|
1 Introduction
|
||
|
|
|
||
|
|
PostgreSQL is a popular open source DBMS which is the successor of INGRES and
|
||
|
|
POSTGRES. Sometimes it may be interesting to compare it with Secondo. Hence we
|
||
|
|
will give a short overview how to install it on a Linux system, how to create databases and how
|
||
|
|
to create objects and populate it with data. However, its just a rough
|
||
|
|
introduction for further details consult the Postgres documentation which is
|
||
|
|
available as HTML-files below /usr/share/doc/packages/postgresql/html.
|
||
|
|
|
||
|
|
|
||
|
|
2 Installation on Linux
|
||
|
|
|
||
|
|
Start the package manager (on SuSe-Linux its called YAST) and select all
|
||
|
|
packages whose name starts with postgres.
|
||
|
|
|
||
|
|
3 Environment Setup
|
||
|
|
|
||
|
|
Before you can create a database you need to define and initialize a so called
|
||
|
|
data storage area or database cluster. The location of this directory should be
|
||
|
|
defined in the environment variable "PGDATA"[20]. The directory must be only
|
||
|
|
readable and writeable by the Linux user which is the database administrator.
|
||
|
|
|
||
|
|
In order to set up the storage area run the following commands:
|
||
|
|
|
||
|
|
[LISTING-SH]
|
||
|
|
|
||
|
|
*/
|
||
|
|
export PGDATA=/data/postgres-databases
|
||
|
|
mkdir $PGDATA
|
||
|
|
chmod go-rwx $PGDATA
|
||
|
|
initdb -D$PGDATA
|
||
|
|
|
||
|
|
/*
|
||
|
|
|
||
|
|
Afterwards the directory "$PGDATA" contains about 26MB data. The definition of
|
||
|
|
"PGDATA" should be done in the shells startup script (".bashrc") otherwise you
|
||
|
|
have to define it in every new shell. Now we can startup
|
||
|
|
up the database server process which is called "postmaster".
|
||
|
|
|
||
|
|
*/
|
||
|
|
postmaster [-D$PGDATA]
|
||
|
|
/*
|
||
|
|
|
||
|
|
It will print messages to the standard output.
|
||
|
|
|
||
|
|
4 Creating Databases
|
||
|
|
|
||
|
|
The utility "createdb" can be used to create a database, e.g. the
|
||
|
|
command
|
||
|
|
|
||
|
|
*/
|
||
|
|
createdb tpch
|
||
|
|
/*
|
||
|
|
|
||
|
|
will create a database called "tpch" which adds 31MB to the storage area. The
|
||
|
|
text based database client is called "psql", client internal commands start with
|
||
|
|
a "\" symbol, for example "\?" will list all client internal commands and "\q"
|
||
|
|
will quit the session. The command
|
||
|
|
|
||
|
|
*/
|
||
|
|
psql -dtpch
|
||
|
|
/*
|
||
|
|
|
||
|
|
establishes a connection to the "tpch" database. The command prompt now
|
||
|
|
includes the used database:
|
||
|
|
|
||
|
|
*/
|
||
|
|
tpch# \dt % display tables
|
||
|
|
tpch# \di % display indexes
|
||
|
|
tpch# \q % disconnect and exit
|
||
|
|
tpch# \i <file> % run query from file
|
||
|
|
tpch# \s <file> % save the cmd history
|
||
|
|
tpch# \h select % explain the syntax of the select statement
|
||
|
|
/*
|
||
|
|
|
||
|
|
5 Creating Objects
|
||
|
|
|
||
|
|
If you are connected with a database the create command can be used to
|
||
|
|
define a relation.
|
||
|
|
|
||
|
|
*/
|
||
|
|
create table customer (
|
||
|
|
C_CUSTKEY int4,
|
||
|
|
C_NAME varchar(25),
|
||
|
|
C_ADDRESS varchar(40),
|
||
|
|
C_NATIONKEY int4,
|
||
|
|
C_PHONE char(15),
|
||
|
|
C_ACCTBAL float4,
|
||
|
|
C_MKTSEGMENT char(10),
|
||
|
|
C_COMMENT varchar(117)
|
||
|
|
);
|
||
|
|
/*
|
||
|
|
|
||
|
|
Afterwards you can populate it with tuples by importing a text file. Each line
|
||
|
|
will be interpreted as a tuple. A field separator can be specified which marks
|
||
|
|
the end of an attribute value. This is a special client command, e.g.
|
||
|
|
|
||
|
|
*/
|
||
|
|
\copy customer FROM 's05pp/customer.tbl.pg' WITH DELIMITER AS '|';
|
||
|
|
/*
|
||
|
|
|
||
|
|
reads the tuple data from the file "s05pp/customer.tbl.pg". An index can be
|
||
|
|
created by
|
||
|
|
|
||
|
|
*/
|
||
|
|
create index customer_c_custkey on cutomer(c_custkey);
|
||
|
|
/*
|
||
|
|
|
||
|
|
Another kind of objects are sequences. The commands
|
||
|
|
|
||
|
|
*/
|
||
|
|
create sequence serial starts 1;
|
||
|
|
select nextval('serial); % will return 2
|
||
|
|
/*
|
||
|
|
|
||
|
|
Sometimes it is necessary to store query results as new relations. This can be
|
||
|
|
done by the "create table <ident> as" command. Moreover new attribute values can
|
||
|
|
be computed by the existing tuple values by just writing expressions of the
|
||
|
|
available functions and operations, e.g.
|
||
|
|
|
||
|
|
*/
|
||
|
|
create table customer_s100
|
||
|
|
as select C_CUSTKEY, C_NAME, nextval('serial') % 100 as C_NUM
|
||
|
|
from customer;
|
||
|
|
/*
|
||
|
|
|
||
|
|
|
||
|
|
6 Investigating Query Plans
|
||
|
|
|
||
|
|
If a query is introduced by "explain" or "explain analyze" the used query plan
|
||
|
|
will be printed. The second variant runs the query and displays estimated costs
|
||
|
|
and tuple cardinalities with actual runtimes.
|
||
|
|
|
||
|
|
*/
|
||
|
|
explain <query>
|
||
|
|
explain analyze <query>
|
||
|
|
/*
|
||
|
|
|
||
|
|
7 Maintenance
|
||
|
|
|
||
|
|
The query planner needs accurate statistics about the data. It will use samples
|
||
|
|
of the data to estimate the frequency distribution of a table attribute's
|
||
|
|
values. The internal estimates will be updated by the command "analyze"
|
||
|
|
it collects statistics about the contents of tables in the database, and
|
||
|
|
stores the results in the system table "pg_statistic".
|
||
|
|
|
||
|
|
In normal PostgreSQL operation, tuples that are deleted or obsoleted by an
|
||
|
|
update are not physically removed from their table; they remain present until
|
||
|
|
the command "vaccum" is called. This procedure reclaims storage occupied by deleted
|
||
|
|
tuples. Hence the administrator should run
|
||
|
|
|
||
|
|
*/
|
||
|
|
vacuum analyze
|
||
|
|
/*
|
||
|
|
|
||
|
|
after remarkable updates.
|
||
|
|
|
||
|
|
8 Tuning
|
||
|
|
|
||
|
|
By using the set command the admin can set various runtime parameters.
|
||
|
|
This can be useful to force or to disable some evaluation methods for
|
||
|
|
relational algebra expressions. For example, the statement below disables the use
|
||
|
|
of indexes.
|
||
|
|
|
||
|
|
*/
|
||
|
|
set enable_indexscan = off;
|
||
|
|
/*
|
||
|
|
8.1 Adjusting cost factors
|
||
|
|
|
||
|
|
SQL statements can be translated into different execution plans which compute
|
||
|
|
the same result. The Planner (or Optimizer) module uses data statistics, cost functions
|
||
|
|
and some basic cost factors to rate such plans. The optimization algorithms sytematically
|
||
|
|
procudes subplans and prunes non-efficient solutions. The result of this process might be
|
||
|
|
the best available plan. However, error factors are
|
||
|
|
|
||
|
|
(1) Imprecise statistics
|
||
|
|
(2) Imprecise cost functions
|
||
|
|
(3) Imprecise cost factors
|
||
|
|
|
||
|
|
Some important cost factors are:
|
||
|
|
|
||
|
|
*/
|
||
|
|
cpu_tuple_cost;
|
||
|
|
cpu_operator_cost;
|
||
|
|
/*
|
||
|
|
Those are expressed as float values which define the ratio of time they need compared
|
||
|
|
with a sequential access of a memory page. The costs can be determined by running
|
||
|
|
some queries.
|
||
|
|
|
||
|
|
First you need to create relations $R_1, R_2$ with different tuple sizes but
|
||
|
|
the same number of tuples and pages. Hence the time difference for scanning
|
||
|
|
those relations can be used to compute the time for a page fetch. Moreover, the
|
||
|
|
size of the relations should be bigger than the main memory. Hence we have
|
||
|
|
$|t_{q1} - t_{q2}| = T_{pc} |P_1 - P_2|$ where $t_{qi}$ is the runtime for a
|
||
|
|
query which scans relation $R_i$.
|
||
|
|
|
||
|
|
Afterwards one can mesaure the time for processing a tuple $T_{tc}$by constructing
|
||
|
|
relations with the same number of pages but a different number of tuples. Again
|
||
|
|
the run time difference for a scan can be utilized to compute the processing
|
||
|
|
overhead for a single tuple.
|
||
|
|
|
||
|
|
Finally queries applying a different number of operators are used to compute the
|
||
|
|
time needed for a single operator $T_{oc}$.
|
||
|
|
|
||
|
|
|
||
|
|
9 Understanding the Postgres Planner
|
||
|
|
|
||
|
|
Below there are three similar queries which result in different plans.
|
||
|
|
|
||
|
|
*/
|
||
|
|
|
||
|
|
Q1: explain select count(*) from m1, m2 where m1.a = m2.a and m1.a = 1;
|
||
|
|
Aggregate (cost=22128.85..22128.85 rows=1 width=0)
|
||
|
|
-> Nested Loop (cost=8543.55..22119.35 rows=949638 width=0)
|
||
|
|
-> Seq Scan on m2 (cost=0.00..8542.72 rows=978 width=4)
|
||
|
|
Filter: (1 = a)
|
||
|
|
-> Materialize (cost=8543.55..8546.12 rows=971 width=4)
|
||
|
|
-> Seq Scan on m1 (cost=0.00..8543.29 rows=971 width=4)
|
||
|
|
Filter: (a = 1)
|
||
|
|
|
||
|
|
Q2: explain select count(*) from m1, m2 where m1.a = m2.a and m2.a < 10;
|
||
|
|
Aggregate (cost=99334.22..99334.23 rows=1 width=0)
|
||
|
|
-> Merge Join (cost=53549.54..99163.39 rows=17083708 width=0)
|
||
|
|
Merge Cond: ("outer".a = "inner".a)
|
||
|
|
-> Sort (cost=8547.57..8547.74 rows=17246 width=4)
|
||
|
|
Sort Key: m2.a
|
||
|
|
-> Seq Scan on m2 (cost=0.00..8542.72 rows=17246 width=4)
|
||
|
|
Filter: (a < 10)
|
||
|
|
-> Sort (cost=45001.97..45011.97 rows=1000110 width=4)
|
||
|
|
Sort Key: m1.a
|
||
|
|
-> Seq Scan on m1 (cost=0.00..8533.29 rows=1000110 width=4)
|
||
|
|
|
||
|
|
Q3 explain select count(*) from m1, m2 where m1.a = m2.a and m1.a < 10;
|
||
|
|
Aggregate (cost=80644.17..80644.17 rows=1 width=0)
|
||
|
|
-> Hash Join (cost=8543.39..80543.07 rows=10109754 width=0)
|
||
|
|
Hash Cond: ("outer".a = "inner".a)
|
||
|
|
-> Seq Scan on m2 (cost=0.00..8532.72 rows=999894 width=4)
|
||
|
|
-> Hash (cost=8543.29..8543.29 rows=10208 width=4)
|
||
|
|
-> Seq Scan on m1 (cost=0.00..8543.29 rows=10208 width=4)
|
||
|
|
Filter: (a < 10)
|
||
|
|
|
||
|
|
/*
|
||
|
|
Note that in Q1 the planner rewrites the query and adds an additional predicate m2.a = 0.
|
||
|
|
This is possible since an equi-join essentially needs the same values to produce matches.
|
||
|
|
Moreover, it seems that hashjoins and mergejoins are prevented since they are never chosen, even with
|
||
|
|
configuration option "enable_nestloop = off" which raises the total costs up to 100.000.000.
|
||
|
|
|
||
|
|
Extraordinarily, this technique is not applied for queries "Q2" and "Q3" even
|
||
|
|
though it could reduce costs. Moreover, one can observe, that the estimates for
|
||
|
|
"m1.a < 10" and "m2.a < 10" vary in a wide range despite the fact that relation "m2"
|
||
|
|
is a copy of "m1". After each command which updates statistics samples, e.g.
|
||
|
|
analyze m1, the estimate changes. Note: the sttistics about data distributions
|
||
|
|
can be confiured on a per column basis or for globally by the parameter
|
||
|
|
"default_statistics_target".
|
||
|
|
|
||
|
|
Adding a redundant (totally correlated) predicate "m2.b = 2" misguides the planner since it
|
||
|
|
chooses a very expensive plan based on the estimate that the scan on "m2" will return only
|
||
|
|
1 tuple (actually 1000 tuples). This leads to a nested loop-join without materialization
|
||
|
|
of the intermediate result, hence m2 will be scanned 1000 times. This is a good demonstration
|
||
|
|
for the needs of robust query optimization as claimed in [xxx].
|
||
|
|
|
||
|
|
*/
|
||
|
|
Q4: Q3 and m2.b = 2
|
||
|
|
Aggregate (cost=17099.17..17099.17 rows=1 width=0)
|
||
|
|
-> Nested Loop (cost=0.00..17099.16 rows=971 width=0)
|
||
|
|
-> Seq Scan on m2 (cost=0.00..8553.29 rows=1 width=4)
|
||
|
|
Filter: ((b = 2) AND (1 = a))
|
||
|
|
-> Seq Scan on m1 (cost=0.00..8543.29 rows=971 width=4)
|
||
|
|
Filter: (a = 1)
|
||
|
|
|