:author: Sebastian J. Bronner :date: 2018-01-11 .. If desired, an HTML version of this file can be generated with the following command: rst2html5 README.rst > README.html .. contents:: Linking to Secondo ================== This algebra is intended to be compiled and used together with Secondo. The general instructions for linking an algebra to Secondo [#link]_ apply with just a small additional consideration: It requires that the library ``boost_filesystem`` be linked along with it. Prerequisites for both the linking and compilation stages are that the library and its header files are available. In Ubuntu 16.04 this is achieved by installing the following packages: * ``libboost-filesystem1.58-dev`` * ``libboost-filesystem1.58.0`` * ``libboost1.58-dev`` Finally, secondo has to be instructed how to link this algebra. Besides the normal entries in ``Algebras/Management/AlgebraList.i.cfg`` and ``makefile.algebras``, an additional line is added to the latter. The complete entry in ``makefile.algebras`` should be as follows:: ALGEBRA_DIRS += Distributed4 ALGEBRAS += Distributed4Algebra COMMON_LD_FLAGS += -lboost_filesystem Configuring Secondo Instances ============================= This algebra facilitates a somewhat complex configuration of multiple Secondo instances. It is therefore not intuitive how Secondo should best be configured to be able to try it out. The following is a configuration suitable to that purpose. It was created on the department of `Database Systems for New Applications `_' ``newton`` cluster [#cluster]_. It should be easy enough to adapt to any other environment, though. Create the following files and directories on the named cluster servers: ``~/cfg/supervisor.ini`` [#home]_ on ``newton3`` [#n3]_: This is a copy of ``~/secondo/bin/SecondoConfig.ini`` with just a few lines changed as follows. Be sure to replace ```` with your own username Also **create the directory** named in ``SecondoHome``:: SecondoHome=/home//dbs/supervisor GlobalMemory=1024 #RTFlags += SMI:NoTransactions It is important that ``SMI:NoTransactions`` is **not** enabled because each SecondoMonitor will spawn several Secondo processes accessing a single database simultaneously. ``~/cfg/master.ini`` on ``newton3``: Same as ``supervisor.ini`` above, except that the two lines are changed as follows:: SecondoHome=/home//dbs/master GlobalMemory=8192 ``~/cfg/inserter.ini`` on ``newton{1,2,4,5}``: Same as ``supervisor.ini`` above, except that the two lines are changed as follows:: SecondoHome=/home//dbs/inserter GlobalMemory=1024 ``~/cfg/worker.ini`` on ``newton[1-5]``: Same as ``supervisor.ini`` above, except that the two lines are changed as follows:: SecondoHome=/home//dbs/worker GlobalMemory=2048 ``~/cfg/monitors.cfg`` [#ports]_ on ``newton3``:: newton1 ~/cfg/inserter.ini ~/secondo/bin ~/dbs/inserter 10070 newton2 ~/cfg/inserter.ini ~/secondo/bin ~/dbs/inserter 10070 newton3 ~/cfg/master.ini ~/secondo/bin ~/dbs/master 10070 newton4 ~/cfg/inserter.ini ~/secondo/bin ~/dbs/inserter 10070 newton5 ~/cfg/inserter.ini ~/secondo/bin ~/dbs/inserter 10070 newton1 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071 newton2 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071 newton3 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071 newton4 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071 newton5 ~/cfg/worker.ini ~/secondo/bin ~/dbs/worker 10071 Finally start up the Secondo Instances with either of the following commands:: ~/secondo/bin/remoteMonitors ~/cfg/monitors.cfg start ~/secondo/bin/remoteMonitors2 ~/cfg/monitors.cfg start The only difference is that ``remoteMonitors2`` has a somewhat more concise output. Preparing the Databases ======================= Run the following commands on ``newton3`` to create and prepare a test database named ``D4`` on all configured Secondo instances:: cd ~/secondo/bin Repeat the following sequence for every host/port combination specified in ``monitors.cfg`` above:: ./SecondoTTYCS -h newton1 -p 10070 create database d4; quit; The remainder will be configured from the supervisor:: ./SecondoTTYBDB -c ~/cfg/supervisor.ini create database d4; open database d4; let peers = [const rel(tuple([Host: string, Port: int, Config: string])) value ( ("newton1.fernuni-hagen.de" 10070 "-") ("newton2.fernuni-hagen.de" 10070 "-") ("newton3.fernuni-hagen.de" 10070 "-") ("newton4.fernuni-hagen.de" 10070 "-") ("newton5.fernuni-hagen.de" 10070 "-") )]; let workers = [const rel(tuple([Host: string, Port: int, Config: string])) value ( ("newton1.fernuni-hagen.de" 10071 "-") ("newton2.fernuni-hagen.de" 10071 "-") ("newton3.fernuni-hagen.de" 10071 "-") ("newton4.fernuni-hagen.de" 10071 "-") ("newton5.fernuni-hagen.de" 10071 "-") )]; let da = intstream(0,99) namedtransformstream[Id] extend[Value: .Id*1.1] ddistribute3["da", 10, FALSE, workers]; query share("da", TRUE, peers); let dp = [const dpartition(Id: int) value ( ((0 0) (10 1) (20 2) (30 3) (40 4) (50 5) (60 6) (70 7) (80 8) (90 9)) da )]; query share("dp", TRUE, peers); let ad = [const adist value (dp 1000 ( ("newton1.fernuni-hagen.de" 10070 "-") ("newton2.fernuni-hagen.de" 10070 "-") ("newton3.fernuni-hagen.de" 10070 "-") ("newton4.fernuni-hagen.de" 10070 "-") ("newton5.fernuni-hagen.de" 10070 "-") ))]; Trying Out Some Operators ========================= With the exception of the three lock operators, all operators implemented in this algebra are intended for use on the supervisor and require an object of type *adist* as an argument. They are designed to **redistribute** the records stored in the underlying *darray* partitioned by the *dpartition*. Operators to create, read, update, or delete records based on their value are not part of this algebra. Such operators are expected to take the *dpartition* as an argument. They should be operable on any of the peers defined in the *adist*, as they all get a synchronized copy of the *dpartitian* and *darray*. For this reason, all of these examples will be executed on the supervisor. It might be interesting to look in on the workers or peers just to see how the database objects have changed, though. That is completely up to you, and will not be included in the steps outlined here [#debug]_:: cd ~/secondo/bin ./SecondoTTYBDB -c ~/cfg/supervisor.ini open database d4; If you look at the output of ``query da``, you will see that there are ten slots arranged in standard order on the five workers. Standard order is defined in *Distributed2Algebra* and means that the slots are assigned to workers round robin. In our case that means the following: * slot 0 -> worker 0 (``newton1:10071``) * slot 1 -> worker 1 (``newton2:10071``) * slot 2 -> worker 2 (``newton3:10071``) * slot 3 -> worker 3 (``newton4:10071``) * slot 4 -> worker 4 (``newton5:10071``) * slot 5 -> worker 0 * slot 6 -> worker 1 * slot 7 -> worker 2 * slot 8 -> worker 3 * slot 9 -> worker 4 First we'll move slot 2 on worker 2 to worker 3:: query moveslot(ad, 2, 3); The effect of this can be seen on worker 2 (``da_2`` is missing), worker 3 (``da_2`` is now there), and the supervisor (``da``'s slot-to-worker map should now be ``(0 1 3 3 4 0 1 2 3 4)``, indicating that both slots 2 and 3 are now on worker 3). Now let's split ``da_2`` into two slots:: query splitslot(ad, 2); The output from this query is ``10``. This is the slot number where half of the values formerly in slot 2 were put. In addition to the slots it previously held, worker 3 now also holds slot 10. ``da``'s map now contains an eleventh element naming worker 3: ``(0 1 3 3 4 0 1 2 3 4 3)``. This time ``dp`` has changed, as well. It now has 2 partitions for values between 20 and 30 split between slot 2 and slot 10: ``(20.0 2) (25.0 10)``. The next logical attempt would be to merge two slots together. If we try to merge slot 2 and slot 3, we would have a slot with a hole in it. Slot 2 contains values between 20 and 25 and slot 3 values between 30 and 40. The values between 25 and 30 are in slot 10. Let's see what Secondo thinks about this attempt:: query mergeslots(ad, 2, 3); You should have received the message, ``Error: The partitions starting with 20.000000 and 30.000000 are not contiguous.`` The *Distributed4Algebra* doesn't allow slots with holes in their mapped value range, as we would probably expect intuitively. So, let's merge slots 10 and 3 instead, to get a slot having values between 25 and 40:: query mergeslots(ad, 10, 3); The value ``3`` is returned. This indicates that the slot now holding the combined data is slot 3. Slot 10 was removed from the system. (This happens whenever the last slot in a *darray* is no longer needed.) ``dp``'s partitions confirm the new value range for slot 3: ``(25.0 3)`` and ``da``'s map shows that slot 10 is no more: ``(0 1 3 3 4 0 1 2 3 4)``. Besides manipulating slots, this algebra also enables the removal and addition of both workers and peers. Peers are any secondo instances that contain synchronized copies of the *darray* and *dpartition* objects ``da`` and ``dp``. They can operate on the workers independently. Usually these will be a master and several inserters. *Distributed4Algebra* is not concerned with their purpose, so they are simply called peers. Peers are recorded in the *adist* object ``ad`` wihle workers are recorded in the *darray* object ``da`` (as known from *Distributed2Algebra*). Let's disconnect a worker first. Let's take worker 2:: query removeworker(ad, 2); Oops! Apparently, I was too quick about it: ``Error: The specified worker still has slots mapped to it.`` Let's move the remaining slot on worker 2 to worker 0 and then remove worker 2. According to ``da``'s slot-to-worker map, the slot on worker 2 is slot 7:: query moveslot(ad, 7, 0); query removeworker(ad, 2); This time it worked (return value ``TRUE``). Looking at ``da``'s worker list now shows four workers. The former worker 2 (``newton3``) is missing. This also means that the former worker 3 is now worker 2 and the former worker 4 is now worker 3. The slot-to-worker map has taken these new worker numbers into account: ``(0 1 2 2 3 0 1 0 2 3)``. Now let's remove one of the peers. Let's pick peer 1 at random:: query removepeer(ad, 1); This operation shows a lot less communication with other Secondo instances. Only the peer being removed needs to be talked to. Its copies of ``da`` and ``dp`` are removed as part of removing it from the local *adist* object ``ad``. No other peers or any workers are involved. ``ad`` now shows a reduced list of peers. The operators ``moveslot``, ``removeworker``, and ``removepeer`` all took a worker or a peer by number (list index). For convenience, they may also be specified with hostname and port number as used in the worker or peer list. The following commands would have had the same effect as the ones used above:: query moveslot(ad, 2, "newton4.fernuni-hagen.de", 10071); query moveslot(ad, 7, "newton1.fernuni-hagen.de", 10071); query removeworker(ad, "newton3.fernuni-hagen.de", 10071); query removepeer(ad, "newton2.fernuni-hagen.de", 10070); The operators ``addworker`` and ``addpeer`` are straightforward. To add the worker and peer back that were just removed, you would use the following commands:: query addworker(ad, "newton3.fernuni-hagen.de", 10071, "-"); query addpeer(ad, "newton2.fernuni-hagen.de", 10070, "-"); Those are all the operators needed to redistribute records in a *darray*. However, three more operators were necessary to provide for synchronization. Let's go through them one by one:: query lock("da", TRUE); This will lock the database object ``da`` for exclusive access. This is useful when performing requests directly on the *darray* object underlying a *dpartition* object, as those operators aren't aware of any concurrency and have no concept of locking. Doing this will ensure that any of the above operators and any future operators handling value-based access via the *dpartition* object will not interfere during your request. This operator will wait forever to gain the lock if it is already locked. The alternative operator will try to gain the lock and fail with an error message if it isn't immediately possible:: query trylock("da", TRUE); There it is: ``Error: The mutex at /dev/shm/secondo:home__dbs_supervisor_D4:da is already locked.`` The second argument to these operators allows gaining sharable access. Let's have a look at their behavior. But first, we have to unlock ``da``:: query unlock("da"); query lock("da", FALSE); query trylock("da", FALSE); Now we hold 2 sharable access locks on ``da``. It is time to introduce a tricky situation. Some of the above operators require an exclusive lock on ``da``. Let's take ``splitslot`` as an example. If it is run while a lock on ``da`` is held (as is currently the case), it will wait until that lock is released:: query splitslot(ad, 0); Fortunately, it provides an informational message to the user in this case: ``The mutex at /dev/shm/secondo:home__dbs_supervisor_D4:da is already locked. Waiting for exclusive ownership.`` But now what? The lock won't be released automatically because it was manually acquired. And the command line isn't available to unlock it. It would theoretically be possible to log in to another Secondo instance and run unlock from there, but there is a better solution: the companion utilitiy ``mutexset`` in ``Algebras/Distributed4/util``. 2 locks were acquired on ``da``, so at least 2 must be released before ``splitslot`` can proceed. To use ``mutexset``, you must first compile it. It is not compiled with Secondo. From a second shell on the same server as the Secondo instance (``newton3``), run the following:: make -C ~/secondo/Algebras/Distributed4/util Then run the following as often as necessary until you see ``splitslot`` continue:: ~/secondo/Algebras/Distributed4/util/mutexset /dev/shm/secondo\:home__dbs_supervisor_D4\:da unlock That can be an invaluable tool if Secondo should ever crash while holding locks. When such a left-over lock is detected later, it can be manually released without resorting to killing the waiting Secondo instance. Running ``mutexset`` without any arguments will provide usage information. This has been an introduction by example of how the operators of the *Distributed4Algebra* work. You should now be familiar enough with them to use them for whatever purpose you wish. ---- .. [#link] `Programmer's Guide `_. Section 1.7.3, p. 15 (PDF:19). Found at ``_ (2018-01-10). .. [#cluster] For more details about the ``newton`` cluster, see: `Distributed Query Processing in Secondo `_. Section 3.2, p. 6 (PDF:11). Found at ``_ (2018-01-10). .. [#home] The symbol ``~`` stands for your personal home directory, usually ``/home/``. It is understood by ``bash``, so the command ``mkdir ~/dbs`` is the same as ``mkdir /home//dbs``. .. [#n3] Just to avoid having ``newton1`` used by everyone as the primary (and therefore most loaded) member of the cluster, these instructions use ``newton3`` as the central server. .. [#ports] The port numbers chosen here are composed from my ``uid`` (command `id`) and an additional digit. This helps to prevent conflicts arising from other users' processes listening on the same port. .. [#debug] If you are interested in seeing the communication happening between Secondo instances, you can run either ``query traceCommands(TRUE);`` or ``query da2enableLog(TRUE);`` or both. To see the log from ``da2enableLog``, run ``query da2Log() consume;``.