394 lines
16 KiB
ReStructuredText
394 lines
16 KiB
ReStructuredText
:author: Sebastian J. Bronner <sebastian@bronner.name>
|
|
:date: 2018-01-11
|
|
|
|
.. If desired, an HTML version of this file can be generated with the following
|
|
command:
|
|
|
|
rst2html5 README.rst > README.html
|
|
|
|
.. contents::
|
|
|
|
Linking to Secondo
|
|
==================
|
|
|
|
This algebra is intended to be compiled and used together with Secondo. The
|
|
general instructions for linking an algebra to Secondo [#link]_ apply with just
|
|
a small additional consideration: It requires that the library
|
|
``boost_filesystem`` be linked along with it.
|
|
|
|
Prerequisites for both the linking and compilation stages are that the library
|
|
and its header files are available. In Ubuntu 16.04 this is achieved by
|
|
installing the following packages:
|
|
|
|
* ``libboost-filesystem1.58-dev``
|
|
* ``libboost-filesystem1.58.0``
|
|
* ``libboost1.58-dev``
|
|
|
|
Finally, secondo has to be instructed how to link this algebra. Besides the
|
|
normal entries in ``Algebras/Management/AlgebraList.i.cfg`` and
|
|
``makefile.algebras``, an additional line is added to the latter. The complete
|
|
entry in ``makefile.algebras`` should be as follows::
|
|
|
|
ALGEBRA_DIRS += Distributed4
|
|
ALGEBRAS += Distributed4Algebra
|
|
COMMON_LD_FLAGS += -lboost_filesystem
|
|
|
|
Configuring Secondo Instances
|
|
=============================
|
|
|
|
This algebra facilitates a somewhat complex configuration of multiple Secondo
|
|
instances. It is therefore not intuitive how Secondo should best be configured
|
|
to be able to try it out. The following is a configuration suitable to that
|
|
purpose. It was created on the department of `Database Systems for New
|
|
Applications <http://dna.fernuni-hagen.de/>`_' ``newton`` cluster [#cluster]_.
|
|
It should be easy enough to adapt to any other environment, though.
|
|
|
|
Create the following files and directories on the named cluster servers:
|
|
|
|
``~/cfg/supervisor.ini`` [#home]_ on ``newton3`` [#n3]_:
|
|
|
|
This is a copy of ``~/secondo/bin/SecondoConfig.ini`` with just a few lines
|
|
changed as follows. Be sure to replace ``<username>`` with your own username
|
|
Also **create the directory** named in ``SecondoHome``::
|
|
|
|
SecondoHome=/home/<username>/dbs/supervisor
|
|
GlobalMemory=1024
|
|
#RTFlags += SMI:NoTransactions
|
|
|
|
It is important that ``SMI:NoTransactions`` is **not** enabled because each
|
|
SecondoMonitor will spawn several Secondo processes accessing a single
|
|
database simultaneously.
|
|
|
|
``~/cfg/master.ini`` on ``newton3``:
|
|
|
|
Same as ``supervisor.ini`` above, except that the two lines are changed as
|
|
follows::
|
|
|
|
SecondoHome=/home/<username>/dbs/master
|
|
GlobalMemory=8192
|
|
|
|
``~/cfg/inserter.ini`` on ``newton{1,2,4,5}``:
|
|
|
|
Same as ``supervisor.ini`` above, except that the two lines are changed as
|
|
follows::
|
|
|
|
SecondoHome=/home/<username>/dbs/inserter
|
|
GlobalMemory=1024
|
|
|
|
``~/cfg/worker.ini`` on ``newton[1-5]``:
|
|
|
|
Same as ``supervisor.ini`` above, except that the two lines are changed as
|
|
follows::
|
|
|
|
SecondoHome=/home/<username>/dbs/worker
|
|
GlobalMemory=2048
|
|
|
|
``~/cfg/monitors.cfg`` [#ports]_ on ``newton3``::
|
|
|
|
newton1 ~/cfg/inserter.ini ~/secondo/bin ~/dbs/inserter 10070
|
|
newton2 ~/cfg/inserter.ini ~/secondo/bin ~/dbs/inserter 10070
|
|
newton3 ~/cfg/master.ini ~/secondo/bin ~/dbs/master 10070
|
|
newton4 ~/cfg/inserter.ini ~/secondo/bin ~/dbs/inserter 10070
|
|
newton5 ~/cfg/inserter.ini ~/secondo/bin ~/dbs/inserter 10070
|
|
newton1 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071
|
|
newton2 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071
|
|
newton3 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071
|
|
newton4 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071
|
|
newton5 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071
|
|
|
|
Finally start up the Secondo Instances with either of the following commands::
|
|
|
|
~/secondo/bin/remoteMonitors ~/cfg/monitors.cfg start
|
|
~/secondo/bin/remoteMonitors2 ~/cfg/monitors.cfg start
|
|
|
|
The only difference is that ``remoteMonitors2`` has a somewhat more concise
|
|
output.
|
|
|
|
Preparing the Databases
|
|
=======================
|
|
|
|
Run the following commands on ``newton3`` to create and prepare a test database
|
|
named ``D4`` on all configured Secondo instances::
|
|
|
|
cd ~/secondo/bin
|
|
|
|
Repeat the following sequence for every host/port combination specified in
|
|
``monitors.cfg`` above::
|
|
|
|
./SecondoTTYCS -h newton1 -p 10070
|
|
|
|
create database d4;
|
|
quit;
|
|
|
|
The remainder will be configured from the supervisor::
|
|
|
|
./SecondoTTYBDB -c ~/cfg/supervisor.ini
|
|
|
|
create database d4;
|
|
open database d4;
|
|
let peers = [const rel(tuple([Host: string, Port: int, Config: string]))
|
|
value (
|
|
("newton1.fernuni-hagen.de" 10070 "-")
|
|
("newton2.fernuni-hagen.de" 10070 "-")
|
|
("newton3.fernuni-hagen.de" 10070 "-")
|
|
("newton4.fernuni-hagen.de" 10070 "-")
|
|
("newton5.fernuni-hagen.de" 10070 "-")
|
|
)];
|
|
let workers = [const rel(tuple([Host: string, Port: int, Config: string]))
|
|
value (
|
|
("newton1.fernuni-hagen.de" 10071 "-")
|
|
("newton2.fernuni-hagen.de" 10071 "-")
|
|
("newton3.fernuni-hagen.de" 10071 "-")
|
|
("newton4.fernuni-hagen.de" 10071 "-")
|
|
("newton5.fernuni-hagen.de" 10071 "-")
|
|
)];
|
|
let da = intstream(0,99) namedtransformstream[Id] extend[Value: .Id*1.1]
|
|
ddistribute3["da", 10, FALSE, workers];
|
|
query share("da", TRUE, peers);
|
|
let dp = [const dpartition(Id: int)
|
|
value (
|
|
((0 0) (10 1) (20 2) (30 3) (40 4) (50 5) (60 6) (70 7) (80 8) (90 9))
|
|
da
|
|
)];
|
|
query share("dp", TRUE, peers);
|
|
let ad = [const adist
|
|
value (dp 1000 (
|
|
("newton1.fernuni-hagen.de" 10070 "-")
|
|
("newton2.fernuni-hagen.de" 10070 "-")
|
|
("newton3.fernuni-hagen.de" 10070 "-")
|
|
("newton4.fernuni-hagen.de" 10070 "-")
|
|
("newton5.fernuni-hagen.de" 10070 "-")
|
|
))];
|
|
|
|
Trying Out Some Operators
|
|
=========================
|
|
|
|
With the exception of the three lock operators, all operators implemented in
|
|
this algebra are intended for use on the supervisor and require an object of
|
|
type *adist* as an argument. They are designed to **redistribute** the records
|
|
stored in the underlying *darray* partitioned by the *dpartition*. Operators to
|
|
create, read, update, or delete records based on their value are not part of
|
|
this algebra. Such operators are expected to take the *dpartition* as an
|
|
argument. They should be operable on any of the peers defined in the *adist*, as
|
|
they all get a synchronized copy of the *dpartitian* and *darray*.
|
|
|
|
For this reason, all of these examples will be executed on the supervisor. It
|
|
might be interesting to look in on the workers or peers just to see how the
|
|
database objects have changed, though. That is completely up to you, and will
|
|
not be included in the steps outlined here [#debug]_::
|
|
|
|
cd ~/secondo/bin
|
|
./SecondoTTYBDB -c ~/cfg/supervisor.ini
|
|
|
|
open database d4;
|
|
|
|
If you look at the output of ``query da``, you will see that there are ten slots
|
|
arranged in standard order on the five workers. Standard order is defined in
|
|
*Distributed2Algebra* and means that the slots are assigned to workers round
|
|
robin. In our case that means the following:
|
|
|
|
* slot 0 -> worker 0 (``newton1:10071``)
|
|
* slot 1 -> worker 1 (``newton2:10071``)
|
|
* slot 2 -> worker 2 (``newton3:10071``)
|
|
* slot 3 -> worker 3 (``newton4:10071``)
|
|
* slot 4 -> worker 4 (``newton5:10071``)
|
|
* slot 5 -> worker 0
|
|
* slot 6 -> worker 1
|
|
* slot 7 -> worker 2
|
|
* slot 8 -> worker 3
|
|
* slot 9 -> worker 4
|
|
|
|
First we'll move slot 2 on worker 2 to worker 3::
|
|
|
|
query moveslot(ad, 2, 3);
|
|
|
|
The effect of this can be seen on worker 2 (``da_2`` is missing), worker 3
|
|
(``da_2`` is now there), and the supervisor (``da``'s slot-to-worker map should
|
|
now be ``(0 1 3 3 4 0 1 2 3 4)``, indicating that both slots 2 and 3 are now on
|
|
worker 3).
|
|
|
|
Now let's split ``da_2`` into two slots::
|
|
|
|
query splitslot(ad, 2);
|
|
|
|
The output from this query is ``10``. This is the slot number where half of the
|
|
values formerly in slot 2 were put. In addition to the slots it previously held,
|
|
worker 3 now also holds slot 10. ``da``'s map now contains an eleventh element
|
|
naming worker 3: ``(0 1 3 3 4 0 1 2 3 4 3)``. This time ``dp`` has changed, as
|
|
well. It now has 2 partitions for values between 20 and 30 split between slot 2
|
|
and slot 10: ``(20.0 2) (25.0 10)``.
|
|
|
|
The next logical attempt would be to merge two slots together. If we try to
|
|
merge slot 2 and slot 3, we would have a slot with a hole in it. Slot 2 contains
|
|
values between 20 and 25 and slot 3 values between 30 and 40. The values between
|
|
25 and 30 are in slot 10. Let's see what Secondo thinks about this attempt::
|
|
|
|
query mergeslots(ad, 2, 3);
|
|
|
|
You should have received the message, ``Error: The partitions starting with
|
|
20.000000 and 30.000000 are not contiguous.`` The *Distributed4Algebra* doesn't
|
|
allow slots with holes in their mapped value range, as we would probably expect
|
|
intuitively.
|
|
|
|
So, let's merge slots 10 and 3 instead, to get a slot having values between 25
|
|
and 40::
|
|
|
|
query mergeslots(ad, 10, 3);
|
|
|
|
The value ``3`` is returned. This indicates that the slot now holding the
|
|
combined data is slot 3. Slot 10 was removed from the system. (This happens
|
|
whenever the last slot in a *darray* is no longer needed.) ``dp``'s partitions
|
|
confirm the new value range for slot 3: ``(25.0 3)`` and ``da``'s map shows that
|
|
slot 10 is no more: ``(0 1 3 3 4 0 1 2 3 4)``.
|
|
|
|
Besides manipulating slots, this algebra also enables the removal and addition
|
|
of both workers and peers. Peers are any secondo instances that contain
|
|
synchronized copies of the *darray* and *dpartition* objects ``da`` and ``dp``.
|
|
They can operate on the workers independently. Usually these will be a master
|
|
and several inserters. *Distributed4Algebra* is not concerned with their
|
|
purpose, so they are simply called peers. Peers are recorded in the *adist*
|
|
object ``ad`` wihle workers are recorded in the *darray* object ``da`` (as known
|
|
from *Distributed2Algebra*).
|
|
|
|
Let's disconnect a worker first. Let's take worker 2::
|
|
|
|
query removeworker(ad, 2);
|
|
|
|
Oops! Apparently, I was too quick about it: ``Error: The specified worker still
|
|
has slots mapped to it.`` Let's move the remaining slot on worker 2 to worker 0
|
|
and then remove worker 2. According to ``da``'s slot-to-worker map, the slot on
|
|
worker 2 is slot 7::
|
|
|
|
query moveslot(ad, 7, 0);
|
|
query removeworker(ad, 2);
|
|
|
|
This time it worked (return value ``TRUE``). Looking at ``da``'s worker list
|
|
now shows four workers. The former worker 2 (``newton3``) is missing. This also
|
|
means that the former worker 3 is now worker 2 and the former worker 4 is now
|
|
worker 3. The slot-to-worker map has taken these new worker numbers into
|
|
account: ``(0 1 2 2 3 0 1 0 2 3)``.
|
|
|
|
Now let's remove one of the peers. Let's pick peer 1 at random::
|
|
|
|
query removepeer(ad, 1);
|
|
|
|
This operation shows a lot less communication with other Secondo instances. Only
|
|
the peer being removed needs to be talked to. Its copies of ``da`` and ``dp``
|
|
are removed as part of removing it from the local *adist* object ``ad``. No
|
|
other peers or any workers are involved. ``ad`` now shows a reduced list of
|
|
peers.
|
|
|
|
The operators ``moveslot``, ``removeworker``, and ``removepeer`` all took a
|
|
worker or a peer by number (list index). For convenience, they may also be
|
|
specified with hostname and port number as used in the worker or peer list. The
|
|
following commands would have had the same effect as the ones used above::
|
|
|
|
query moveslot(ad, 2, "newton4.fernuni-hagen.de", 10071);
|
|
query moveslot(ad, 7, "newton1.fernuni-hagen.de", 10071);
|
|
query removeworker(ad, "newton3.fernuni-hagen.de", 10071);
|
|
query removepeer(ad, "newton2.fernuni-hagen.de", 10070);
|
|
|
|
The operators ``addworker`` and ``addpeer`` are straightforward. To add the
|
|
worker and peer back that were just removed, you would use the following
|
|
commands::
|
|
|
|
query addworker(ad, "newton3.fernuni-hagen.de", 10071, "-");
|
|
query addpeer(ad, "newton2.fernuni-hagen.de", 10070, "-");
|
|
|
|
Those are all the operators needed to redistribute records in a *darray*.
|
|
However, three more operators were necessary to provide for synchronization.
|
|
Let's go through them one by one::
|
|
|
|
query lock("da", TRUE);
|
|
|
|
This will lock the database object ``da`` for exclusive access. This is useful
|
|
when performing requests directly on the *darray* object underlying a
|
|
*dpartition* object, as those operators aren't aware of any concurrency and have
|
|
no concept of locking. Doing this will ensure that any of the above operators
|
|
and any future operators handling value-based access via the *dpartition* object
|
|
will not interfere during your request. This operator will wait forever to gain
|
|
the lock if it is already locked. The alternative operator will try to gain the
|
|
lock and fail with an error message if it isn't immediately possible::
|
|
|
|
query trylock("da", TRUE);
|
|
|
|
There it is: ``Error: The mutex at
|
|
/dev/shm/secondo:home_<username>_dbs_supervisor_D4:da is already locked.`` The
|
|
second argument to these operators allows gaining sharable access. Let's have a
|
|
look at their behavior. But first, we have to unlock ``da``::
|
|
|
|
query unlock("da");
|
|
query lock("da", FALSE);
|
|
query trylock("da", FALSE);
|
|
|
|
Now we hold 2 sharable access locks on ``da``. It is time to introduce a tricky
|
|
situation. Some of the above operators require an exclusive lock on ``da``.
|
|
Let's take ``splitslot`` as an example. If it is run while a lock on ``da`` is
|
|
held (as is currently the case), it will wait until that lock is released::
|
|
|
|
query splitslot(ad, 0);
|
|
|
|
Fortunately, it provides an informational message to the user in this case:
|
|
``The mutex at /dev/shm/secondo:home_<username>_dbs_supervisor_D4:da is already
|
|
locked. Waiting for exclusive ownership.``
|
|
|
|
But now what? The lock won't be released automatically because it was manually
|
|
acquired. And the command line isn't available to unlock it. It would
|
|
theoretically be possible to log in to another Secondo instance and run unlock
|
|
from there, but there is a better solution: the companion utilitiy ``mutexset``
|
|
in ``Algebras/Distributed4/util``. 2 locks were acquired on ``da``, so at least
|
|
2 must be released before ``splitslot`` can proceed.
|
|
|
|
To use ``mutexset``, you must first compile it. It is not compiled with
|
|
Secondo. From a second shell on the same server as the Secondo instance
|
|
(``newton3``), run the following::
|
|
|
|
make -C ~/secondo/Algebras/Distributed4/util
|
|
|
|
Then run the following as often as necessary until you see ``splitslot``
|
|
continue::
|
|
|
|
~/secondo/Algebras/Distributed4/util/mutexset /dev/shm/secondo\:home_<username>_dbs_supervisor_D4\:da unlock
|
|
|
|
That can be an invaluable tool if Secondo should ever crash while holding locks.
|
|
When such a left-over lock is detected later, it can be manually released
|
|
without resorting to killing the waiting Secondo instance.
|
|
|
|
Running ``mutexset`` without any arguments will provide usage information.
|
|
|
|
This has been an introduction by example of how the operators of the
|
|
*Distributed4Algebra* work. You should now be familiar enough with them to use
|
|
them for whatever purpose you wish.
|
|
|
|
----
|
|
|
|
.. [#link] `Programmer's Guide
|
|
<http://dna.fernuni-hagen.de/Secondo.html/files/Documentation/Programming/ProgrammersGuide.pdf>`_.
|
|
Section 1.7.3, p. 15 (PDF:19). Found at
|
|
`<http://dna.fernuni-hagen.de/Secondo.html/content_docu_extend.html>`_
|
|
(2018-01-10).
|
|
|
|
.. [#cluster] For more details about the ``newton`` cluster, see: `Distributed
|
|
Query Processing in Secondo
|
|
<http://dna.fernuni-hagen.de/Secondo.html/files/Documentation/General/DistributedQueryProcessinginSecondo.pdf>`_.
|
|
Section 3.2, p. 6 (PDF:11). Found at
|
|
`<http://dna.fernuni-hagen.de/Secondo.html/content_docu.html>`_
|
|
(2018-01-10).
|
|
|
|
.. [#home] The symbol ``~`` stands for your personal home directory, usually
|
|
``/home/<username>``. It is understood by ``bash``, so the command ``mkdir
|
|
~/dbs`` is the same as ``mkdir /home/<username>/dbs``.
|
|
|
|
.. [#n3] Just to avoid having ``newton1`` used by everyone as the primary (and
|
|
therefore most loaded) member of the cluster, these instructions use
|
|
``newton3`` as the central server.
|
|
|
|
.. [#ports] The port numbers chosen here are composed from my ``uid`` (command
|
|
`id`) and an additional digit. This helps to prevent conflicts arising from
|
|
other users' processes listening on the same port.
|
|
|
|
.. [#debug] If you are interested in seeing the communication happening between
|
|
Secondo instances, you can run either ``query traceCommands(TRUE);`` or
|
|
``query da2enableLog(TRUE);`` or both. To see the log from ``da2enableLog``,
|
|
run ``query da2Log() consume;``.
|