Files
secondo/Documents/Secondo-News.txt
2026-01-23 17:03:45 +08:00

1509 lines
52 KiB
Plaintext
Raw Permalink Blame History

***** SECONDO NEWS *****
This file is a replacement of the secondo-news mailing list. Please add here
interesting information which should be kept for future SECONDO users.
Just add new messages directly below this text with a new header.
2011-08-17 Preparations for relase of Secondo 3.1.1
===============================================================================
This is just a summary of the most important changes. Please read the
ReleaseInfo to learn more details on changes:
* Harmonized naming schema for moving and unit types
* Renamed operators with typical attribute names to avoid problems in queries
* Plugins NearestNeighbor, TBTree, and STPattern have been merged into the
standard algebra collection.
* Operators have been added to several Algebras
* TypeMappings have been changed in order to foster the use of generic
TypeMapping tools and new Attribute member functions returning the typename
* Support for geographic coordinates has been integrated or prepared for
many spatial and spatio-temporal operators.
* Several bug fixes
2010-06-21 Preparations for release of Secondo 3.0
===============================================================================
Several changes have been done in the last 6 monthes.
* The Flob concept has been totally re-implemented to avoid the nasty errors
created by the old Flob-Cache.
* For spatial and spatiotemporal datatypes with set semantics, we now
differentiate between EMPTY and UNDEFINED values.
* Several changes have been done to make the SMI code compatible with
different versions of 3rd party software, namely BerkeleyDB, thus increasing
compatibility with different platforms.
* Many bugfixes regarding system stability: Memory holes have been fixed,
some operator implementations corrected.
* New support structure "TupleFile": This type can be used by algorithms
that need to materialize data. Data is stored in flat files rather than
in temporal relations. Also, only data not fitting into the main memory
buffer gets materialized on harddisc. Should be used as an replacement for
the "TupleBuffer".
* Changed implementations/ New algebra: The ExtRelation-2Algebra provides
external Algorithms for sorting and different join algorithms. Also,
sorting is now done by a paramerizable multi-stage mergesort, with restricted
amount of main memory. The new algorithms uses TupleFiles instead of the old
TupleBuffer.
Most according original algorithms from the ExtRelationAlgebra have been
replaced by the operators from this new algebra.
* New operators in the RTreeAlgebra allow for query-based inspection of the
tree structures.
* New modules: The BTree2Algebra provides parametrizable BTrees. The
RTreeViewer allows for visualized online-exploration of Rtree objects.
* Optimizer: Exception handling was extended so that now most errors can be
caught and reported to the user.
Scripts for executing the BerlinMOD/R benchmark from the optimizer have
been added to the Optimizer directory.
2009-03-02 Changes in the SpatialAlgebra
===============================================================================
Since the defined flag is already included in StandardAttribute, I have removed
the additional one from class Point. This is important because class Point will
be used frequently and the change reduces its size.
Sorry, but once again you have to restore your databases!
Regards
Markus
2008-11-14 Changes of TemplateClass Rectangle<dim>
===============================================================================
Since the defined flag is already included in StandardAttribute, I have removed
the additional one from the implementation.
Once again you have to restore your databases!
Regards
Christian
2008-10-27 Changes of the Tuple's Block Layout
===============================================================================
In order to allow direct access to attributes wihtout unpacking
all other attribute data the block layout of tuple records has been
changed. A special relation iterator which utilizes this feature will
follow soon.
Once again you have to restore your databases!
Regards
Markus
2008-10-20 Changes for class FLOB
===============================================================================
In order to save disk space, the FLOB class has been changed.
It contains now only a pointer to its meta data which will be
restored when loaded from disk.
Sorry, but again you have to restore your databases!
Regards
Markus
2008-08-29 Changes at the DateTime class
===============================================================================
In order to save disk space, the DateTime class has been changed. Type and
defined flag are now coded within an single character. For this reason,
you have to restore all your databases :-(
Regards
Thomas
2008-08-22 Optional Attribute Datatype Serialization
===============================================================================
Now it's possible to implement functions for attribute data types which manage
the storage to a memory block and the reinitialization from a memory block.
This makes it possible to save disk space, since the default block storage
mechanism is not space efficient. There are example implementations for int,
real and string. Currently, this is work in progress and can be deactivated by
undefining the compile flag USE_SERIALIZATION in makefile.options
Documentation: Attribute.h StandardTypes.h
Note: The code changes require to do a make clean, to rebuild SECONDO
and to restore your databases :-(
Regards
Markus
2008-08-07 Word changed
===============================================================================
To save memory usage, the struct Word was changed to be a variant.
Because of this change, you have to restore all your databases.
Thomas
2008-07-15 Added new members to template class R_Tree
===============================================================================
In order to support Angelika Braese's implementation of the
NearestNeighborAlgebra, class R_Tree was extended with additional private
attributes and public functions.
Therefore
YOU NEED TO RESTORE ALL YOUR DATABASES CONTAINING RTREES!
Christian
2007-05-30 lrsArray removed from the line type
===============================================================================
In order to make the line type more simple, the lrsArray and further members
has been removed from the Line class. Thereby, some functions are not longer
available. For compensating that, a new type sline with a corresponding
class SimpleLine has been introduced. A Simple line represents a simple
polyline (i.e. with at most one component and without any branches).
There are some functions (and operators) for converting between the types
provided.
By the changes at the line type:
YOU NEED TO RESTORE ALL YOUR DATABASES CONTAINING LINES!
Have fun!
Thomas
2007-05-30 Added Generic Open and Save Functions for Attribute Types
===============================================================================
We added Template Functions OpenAttribute<T> and SaveAttribute<T> to
file "Attribute.h".
You can use these functions in TypeConstructors for attribute types, so that
you no longer need to implement them yourself.
We used this method to provide OPEN and SAVE methods to all MAPPING and some
further types. Therefore
YOU NEED TO RESTORE ALL YOUR DATABASES!
Enjoy!
Thomas & Christian
2007-09-10 Changes in representation of type movingregion and uregion
===============================================================================
In order to establish a proper use of the defined flag within mapping types,
I needed to change the representations of datatypes movingregion and uregion.
You need to restore any database containing objects of these types!
Christian
2007-06-06 First changes for Linux x86_64 systems
===============================================================================
We have managed to install and compile secondo on a Linux 64 bit system. The
following problems and limitations still arise:
- Some tests fail due to floating point precision errors
- Some tests,e.g those for operator tuplesize, need to provide new platform
dependent results.
- Some system constants defined in limits.h INT_MAX and LONG_MAX may exceed
numbers representable in nested lists.
- The Jpl directory does not compile
However, but all operations work without system crashes!
Needed SDK-Changes:
Berkeley-DB 4.2.52 must be replaced by version 4.3.29 since the older one
does not compile on a x86_64 platform.
Best Regards
Markus
2007-04-24 Changes in the RelationAlgebra
===============================================================================
The base class GenericRelation was revised in order to make it compatible with
class TupleBuffer, e.g. some member functions of class Relation were declared
as virtual functions in class GenericRelation.
Moreover, the TupleBuffer was made available as secondo type trel. You can
create a trel by consuming a stream(tuple(..)) using operator tconsume, e.g.
> plz feed tconsume;
--------------------
Currently this type is intended to be used only for temporary results, the save
and open function are not implemented. Since the TupleBuffer creates its
Berkeley-DB files in a separate directory without transaction control you can
save many megabytes of log files if you run queries with a quite big temporary
relation as result.
Moreover, the TupleBuffer now ignores to copy persistent LOBs (LOBs which are
stored on disk). This saves again much processing time since LOBs on disk have
their own lob-file and record-id which are stable during the query. Only LOBs
which are created in the query itself need to be written to the lob-file of
temporary result relations.
Best Regards
Markus
2007-04-19 Revised Organization of Tuples
===============================================================================
Due to inconsistent management of allocated management and some contradictions
in the concept of fresh and solid tuples we changed the concept and
implementation of the tuple representation. Now tuples are always handled like
the former "fresh" tuples and writing them to disk has no effect for their
current memory organization.
The folowing major changes happend:
(1) The solid state of tuples is removed, thus tuples are stateless now.
(3) The FLOB class has been revised.
(4) The DBArray class has a new function "TrimToSize", which resizes the array
and the underlying FLOB to hold exactly the number of elements which are
stored in it.
(5) Some bug fixes in operator implementations concerning reference counting.
Moreover, the optional "Main Memory" relational algebra implementation files
are removed.
If make fails, please try "make clean; make". Afterwards you should rebuild
your databases.
Best Regards
Thomas and Markus
2007-04-03 New dependency
===============================================================================
I have implemented and added a new algbera module, the 'GSLAlgebra'. It uses
functions imported from the GNU Scientific Library (GSL) and therefore depends
on this library. I have included the GSL into the installsdk script and copied
the gsl sources/binaries into the gnu-folders of the
SECONDO_SDK_INSTALLATION_KIT.
If you get problems compiling Secondo, please install GSL 1.8 manually (e.g.
using yast) or by getting the recent SECONDO_SDK_INSTALLATION_KIT and
re-running the 'installsdk' script.
Christian
2007-01-02 R-Trees
===============================================================================
I have implemented bulkloading for RTrees. Also, using the nodes(_) operator,
you can inspect your RTrees now.
As changes were done in the RTree implementation, you may need to restore your
RTree-Indices, if you run into problems.
Christian
2007-05-01 Example Queries
===============================================================================
There are new features for the definition of example queries please study
the file Documents/Secondo-Ideas.txt for details.
Regards,
Markus
2006-28-11 Operator specs
===============================================================================
From now on, we have an new mechanism for specifying example queries. In the
future it should guarantee that all example queries of the online help will
run and produce correct results. Thus it is another way of defining little
tests which are suitable for a quick regression test over all algebra modules.
The concept is described in the file Documents/Secondo-Ideas.txt. Basically
you need to provide an ".examples" file in the algebra module's directory (for
an example refer to the StandardAlgebra).
At startup for each active algebra the ".examples" file will be read in and the
examples are processed by the Secondo-Parser. Errors will be displayed during
startup (Currently ther are a lot of them). Template files are generated below
bin/tmp.
Kind regards,
Markus
2006-10-05 Changes in building the OptServer
===============================================================================
The JPL library and the Secondo part of the OptServer are divided into different
files now. This enables the use of precompiled jpl libraries. For datails see
Jpl/readme.txt. On windows platforms this change requires a definition of the
variable PL_DLL. Otherwise the OptServer will not compile. It's recommended
to define this variable within the ~/.secondo.<platform>rc file.
Best regards,
Thomas
2006-09-13 Correction in the QueryProcessor::Eval function
===============================================================================
I have corrected the query processor's Eval function. The error was that
we needed to call the Request for "simple" objects in stream operators
(operators that do return streams). Now we only call Request when it is
really necessary, i.e. for nodes that cannot be previously evaluated by
the query processor, e.g. streams and functions.
Best regards,
Victor
2006-09-11 Support for MAC OSX
===============================================================================
I have changed a lot of makefiles and cpp-files in order to make the
build process also possible on Mac OSX. It may happen, that now the system
does not compile on Windows since I haven't it tested for this platform yet.
In case of trouble please send a mail to markus.spiekermann(at)fernuni-hagen.de
Regards
Markus
2006-08-22 New System Tables
===============================================================================
I have implemented two new system tables called
SEC_CACHEINFO
SEC_FILEINFO
They provide information about Berkeley-DB's internal cache usage. For detailed
information please refer to the file "CacheInfo.h".
2006-05-08 New Sample and Small Relations
===============================================================================
Due to changes for the entropy optimizer, the "_small" relations have a new
structure and need to be recreated (if you want to use the entropy optimizer).
Probably the easiest way to do this is to restore databases and also to
reinitialize the optimizer information ("rm stored*" in the optimizer
directory).
Sample relations also have changed a week ago and need to be recreated as well.
For standard databases this is done automatically.
For non-standard databases such as germany, samples and small relations should
now be created manually from the optimizer, calling predicates
createSamples('Kreis', 100, 50)
and
createSmall(kreis, 50)
(sorry for the different syntax), respectively.
Regards
Ralf
2006-05-03 Command Times and Counters
===============================================================================
Command Times:
--------------
The output of the query or command times has been changed.
Now also the times for
1) creating the list representation for the result object
2) for committing the transaction (which is reasonable!)
3) for copying the result list (necessary in order to empty the list memory)
are shown. If the output is too noisy, change the setting in SecondoConfig.ini
in order to suppress them.
Note: Some keywords in SecondoConfig.ini were changed. Please update your
configuration file by replacing it with SecondoConfig.example.
Counters:
---------
There are new counters which keep track of bytes read or written to disk.
They are implemented in SmiRecord and the PrefetchingIterator. There are
counters for
1) The number of function calls
2) The number of transferred bytes
3) The transferred data volume measured in pages
Note: The last value is not identical to reading pages from disk. This
information is only present in the Berkeley-DB cache. Currently, we have no
interface to access the Berkeley-DB cache statistics.
Finally, all command times and counter values are stored in the system tables
SEC_COMMANDS
SEC_COUNTERS
which are non persistent relation objects whose values are only kept during the
current session. However, you can store them by
let sessionCmds = SEC_COMMANDS feed consume;
Regards
Markus
2006-02-24 Automatic Tests
===============================================================================
The TestRunner has new features! Please refer to example.test for its
documentation. The following features are new:
1) The expected result of a query can be specified in a separate file, e.g.
#yields @resultFile
2) Values of real atoms of the result and the expected result can be compared
approximately either by a relative or by a fix tolerance parameter
3) File names can be specified including environment variables of the shell,
e.g. #yields @$(HOME)/data/query1.result
Moreover, the notation $(VARIABLE) can be used for SECONDO's restore and save
commands, but then the file name needs to be a text atom which can easily
specified by enclosing it in single quotes, e.g.
restore database germany from '$(VARIABLE)/secondo-data/germany';
Note: On windows the path separator must be a backslash, for example
restore database germany from 'C:\msys\1.0\home\myname\secondo-data\germany'
There should be comprehensive test cases for every algebra. The automated test
scripts will run all files below "Tests/Testspecs" which are ending with
".test". Moreover, you need not to set up your own data in a test. The databases
defined in "bin/createdb.test" are restored before all other tests. Hence, if
you need a specific database of the secondo-data repository, please add it there
in the test file you must only open the database.
Finally, you can run all tests locally by calling
make runtests
and a single test can be invoked by
TestRunner -i <file>
*** Please try to create and maintain test files for ***
*** your algebra from the very beginning of its implementation! ***
Best regards
Markus
2006-02-23 Environment Changes
===============================================================================
The changes of 02-20 can have some confusing effects since files which before
are created by make are now under CVS control. In order to make sure that
everyone has the same configuration please run the following commands:
cvs update -dP
make update-environment;
open a new shell and run
cvs update -dP
make
Sorry for the trouble,
Markus
2006-02-20 Changes of building, linking, and starting the applications
===============================================================================
The build procedure has been changed. Now only two applications called
"SecondoBDB" and "SecondoCS" are compiled. They know the options
-pl: Start as SecondoPL
-test: Start as TestRunner
-srv: Start as Server (only SecondoBDB)
Hence we need only to link the algebra libraries into one application
instead of many of them. This speeds up linking. For convenicence and
backward compatibility there are some shell scripts called
bin: SecondoTTYBDB, SecondoTTYCS, TestRunnner, TestRunnerCS, SecondoMonitor
Optimizer: SecondoPL, SecondoPLCS
Hope it works for all,
Markus
2006-01-09 A big change on the kernel of the system has been made
===============================================================================
- The algebra levels were removed. Since the descriptive level is
implemented in the optimizer, it is not needed anymore. The kernel of
the Secondo system now works only in the executable level. The concepts
of Models and Costs were also removed from the kernel of the system
for the same reasons.
- In the relational algebra, a LRU cache for FLOBs is implemented.
The main idea behind this cache is to better use the memory inside
operators. Before this modification, the FLOB size was taken into
cosideration to calculate the size of a tuple in memory. Now, only
the attribute and the small FLOBs are considered, which increases
the number of tuples that fit in memory considerably. The cache is
also important because a pointer to the FLOB memory is returned
instead of copy, which can reduce the CPU time. The memory
utilization is corrected for some operators.
- The concept of free tuples is changed. Now, instead of a boolean
value telling whether a tuple is free or non-free, we have an
integer number where zero means that the tuple is free for deletion.
Whenever it is loaded into memory and we do not want to delete
it, this number is increased. When it is unloaded, the number is
decreased. With this change we could avoid all calls to the
function CloneIfNecessary that is now removed. In fact, we do
not clone tuples anymore.
- In a lower granularity, we avoid, as much as possible, cloning
attributes too. A reference counter is added to the TupleElement
class. Every time a tuple needs an attribute from another tuple,
it just copies the attribute's pointer and increases the counter.
When it wants to delete the attribute, it decreases the counter,
which is only deleted when the counter is zero.
2005-11-22 Problems with Javagui using Java Version 1.5
===============================================================================
When you are using Javagui with Sun's Java Version 1.5, the snapshot function
of Javagui leads to a hang up of the Java Virtual Machine on linux systems.
The error message will be:
"Couldn't execl robot child process: Permission denied"
To solve this problem, go into the jdk/jre/lib/i386 subfolder of your java
installation and change the file "awt_robot" to be executable using the command
chmod ugo+x awt_robot
Depending on the installation of the java sdk it may be required to do that
with root rights. If you don't have root access, install your private java-sdk
or ask your administrator.
2005-10-26 Notes for Problems on newer Linux Systems:
===============================================================================
1) Installation problems:
-------------------------
Some new linux distributions (e.g. SuSe 9.2) are equipped with bash version 3.0.
This causes problems in the installsdk script. For example the configure script
of the gcc will break due to compatibility problems of the trap command when the
bash is running in posix mode. At the secondo website you can download a newer
version of installsdk which solves this problem.
2) Environment problems:
------------------------
Using SuSe 9.2 we observed, that environment changes done in the file ~/.bashrc
are permanent for all shells. Hence we recommend only to define an alias, e.g.
alias initsecondo="source .secondorc $HOME/secondo"
Before compiling SECONDO you have to run this new alias command
(only a single time for that shell)
initsecondo
cd secondo
make
This has the advantage that the changes in the environment made by the
.secondorc file are only local to this shell (and subshells) but not
to other shells. This is more secure and prevents to mess up your system
by changing important variables like PATH or LD_LIBRARY_PATH.
3) Secondo Server Startup Problems
----------------------------------
There seems to be a problem in retrieving the IP-address for localhost.
In the file SecondoConfig.ini the value localhost must be replaced by 127.0.0.1
Best Regards
Markus Spiekermann
2005-09-29
===============================================================================
Dear all,
currently there is a problem (only MS-WINDOWS) with the jpeg library used by
the picture algebra. A file called jpeg62.dll is missing in the
Secondo-SDK/bin directory. If you have this problem, please download
"jpeg6b-3-bin.zip" from the SECONDO website or disable the picture algebra.
Markus
2005-07-22 Makefile Switches
================================================================================
Dear all,
I have introduced some new variables which influence the make process:
For instance the setttings
SECONDO_ACTIVATE_ALL_ALGEBRAS="true"
SECONDO_YACC=/usr/bin/bison
will compile and link all algebra modules. The systems parser generator must
be used otherwise it is not possible to create the Secondo-Parser. On windows
we haven't yet a newer version of bison, hence it is not possible to activate
all algebras there. It will also only work if there are no empty algebra
directories. This can be avoided by using optin -P (prune empty directories).
This should always be used by cvs checkout and update commands.
With the new switch all subdirectories of Algebras (execept Management) are
used. Moreover for every directory a library file "lib/lib<algdir>.a" may be
produced (except NauticalMap since it does not compile). All these files are
linked togehter with the applications. Moreover the file AlgebraList.i.cfg
may not contain entries which have no algebra directory, hence this will
result in an "undefined reference" error when linking all together.
Moreover, the libraries are now grouped by the "-( lib1 ... -libN -)" linker
command which will automatically resolve circular dependencies among them.
However, this was only for your information most of you will not need it. The
overnight make run will use it to ensure that all algebras will compile.
Bye
Markus
2005-07-22 CVS
================================================================================
Dear all,
if you need a stable version of SECONDO you can checkout
cvs co -d sec-stable -rLAST_STABLE secondo
the tag LAST_STABLE will be set by the automatic overnight
test if everthing compiles and the Testrunner files return no errors.
Something about the update command:
---------------------------------------------------------
In general you should use always the option -d, e.g.
cvs update -d
otherwise you will not get new directories from the server. Sometimes
empty directories can cause trouble. In this case use
cvs update -P
which will remove them. Once you have requested a fixed version by
specifying a tag or date -r<tag> or -D<date> you will see no future updates.
All files in your working copy are marked with a sticky tag. In order to
change this behaviour call
cvs update -A
which will reset them.
Bye
Markus
2005-07-18 TupleIdentifier Algebra
================================================================================
Folks,
I added a new algebra called TupleIdentifier Algebra implemented by
Matthias Zielke. I also added a new way of creating B-Trees from
streams. Now, the operator createbtree also expects a stream of tuples
containing an extra attribute called tid (from the TupleIdentifier
algebra). Two operators are provided in the TupleIdentifier algebra
for adding such attributes, namely addtupleid and tupleid. One can
index a relation now in these ways, for example:
let ten_no = ten createbtree[no]
// The old way that is still valid
let ten_no = ten feed extend[tupleid(.)] createbtree[no]
let ten_no = ten feed addtupleid createbtree[no]
The motivation behind these changes is that now one can sort a
relation before inserting it into a B-Tree, which is much more
efficient. An example for that is:
let plz_PLZ = plz feed addtupleid sortby[PLZ asc] createbtree[PLZ]
The changes are available on our CVS server but if you download the
changes please also activate this algebra in the makefile.algebras and
AlgebraList.i files. The changes were already made in the
makefile.algebras.sample and AlgebraList.i.cfg cvs files.
If someone finds any difficulties or any problems, please let me know.
[]s
Victor
2005-07-15 Extensions in the optimzer's information look up
===============================================================================
Dear all,
there are some extensions in the database dependent information look up, which
are needed by the optimizer. Three files, nameley '<relname>_sample_j',
'<relname>_sample_s' and '<relname>_small' are created. An index on relation
'<relname>_small' with name '<relname>_<attrname>_small' will be created, if
and only if there is an index available for the the pair
(<relname>,<attrname>).
Gathered information look up are stored in local memory and are available via
predicates 'storedX', where 'X' is one of 'Spell', 'Card', 'Sel', 'PET',
'Index', 'TupleSize' or 'Rel'.
Please carry out the following steps to keep consistency in your database
dependent information.
1. Delete all files in your database with name <relname>_sample_j,
<relname>_sample_s, <relname>_sample and <relname>_small, if available.
2. Delete or rename all files 'storedXs.pl' (seven files) from your local
optimizer directory.
3. Make a simple query for each relation in your database, e.g. (sql) select
count(*) from <relname>, using SecondoPL for example.
For every pair (<relname>,<attrname>) there is information available if an
index exists or not. If you add an index '<relname>_<attrname>' to your
database simply type 'updateIndex(<relname>,<attrname>)' to inform the
optimizer that there is an index for the pair (<relname>,<attrname>) available.
Additionally an index of the same type will be created for the relation
<relname>_small.
If u delete an index for the pair (<relname>,<attrname>) type
'updateIndex(<relname>,<attrname>)'. The index for the relation <relname>_small
will be deleted and the optimizer will be informed that there is no index
available anymore.
Note, that the updateIndex predicate works like a switch and doesn't check if
the pair (<relname>,<attrname>) is really deleted from the database. This is
the user's responsibility.
If you want to delete a relation from your database type
'updateRel(<relname>)'. All created files above will be deleted and all
information about this relation will be removed from the optimizer's knowledge
base.
Regards
Frank
2005-05-20 Sample Files in the Optimizer, New Relational Object in Secondo-Data
================================================================================
Dear all,
i've made some changes in the optimizer module, namely changes in the files
'statistics.pl' and 'database.pl.' For computing a better selectivity,
specially for selection predicates, there are now two different sample files
available. The file '<relationname>_sample_s' will be used for selection
predicates and the file '<relationname>_sample_j' for join predicates. For a
proper work with this new feature it is necessary, that you delete the
following files from your 'Optimizer' directory: 'storedSels.pl',
'storedSpells.pl', 'storedCards.pl', 'storedRels.pl', 'storedTupleSizes.pl',
'storedIndexes.pl'. Furthermore you can delete the the old sample files
<relationname>_sample from your databases.
There is a new relational object, called 'telefon', available in the CVS module
'secondo-data'. You will find the file in the directory 'Objects/Telefon97'.
'telefon' contains 31.499.800 tuples with address and telephone entries from
Germany. These data are only for internal use, because the data isn't freeware.
If you want to restore this object in a database, make 'cvs update -d' in your
'secondo-data' directory and follow the instructions from the README file.
Regards
Frank
2005-05-13 Memory limit for operators - stack trace
================================================================================
Dear all,
I have documented some new (or old but undocumented) configuration options in
SecondoConfig.example. I will mention two important things here:
1) There is a new section QueryProcessor:
# --- QueryProcessor Section ---
[QueryProcessor]
# Max memory in kb available for an operator (e.g. hashjoin or sort)
#MaxMemPerOperator=4096
if the parameter above is set, the memory available for operators can be
defined. Before, it was hard coded, e.g. hashjoin 16MB and other operators
like product only 2MB. This seems to be a little bit unfair when one tries to
compare two algorithms.
Moreover, we can test with small inputs if the persistent implementation of
the algorithms work. It turned out that the sortmergejoin has a problem,
since it chrashes with a segmentation fault in some of my queries when the
memory is less than 4MB. Maybe some branches of code were called which have
never been called before. By default every operator will have 16MB now.
2) Stack Trace
I have improved the output of the stacktrace. On a linux system (if compiled
with debugging information [-ggdb]) we will see complete function and file
names instead of mangled C++ Symbols now.
# Uncomment the next line if you don't want
# to see a stack trace when Secondo chrashes.
# Note: The stack trace is not available on windows!
RTFlags += DEBUG:DemangleStackTrace
Output Example:
********************************************
**
** Signal #SIGSEGV caught! Printing Stack ...
**
********************************************
?? --> [ ??:0 ]
Application::PrintStacktrace() --> [ Application.cpp:241 ]
Application::AbortOnSignalHandler(int) --> [ Application.cpp:317 ]
?? --> [ ??:0 ]
?? --> [ ??:0 ]
Tuple::~Tuple() --> [ RelationPersistent.cpp:558 ]
Tuple::DeleteIfAllowed() --> [ RelationAlgebra.h:412 ]
MergeJoinLocalInfo::ClearBucket(std::vector<Tuple*, std::allocator<Tuple*>
>&) --> [ ExtRelAlgPersistent.cpp:710 ]
MergeJoinLocalInfo::NextResultTuple() --> [ ExtRelAlgPersistent.cpp:940 ]
int MergeJoin<false>(Word*, Word&, int, Word&, void*) -->
[ ExtRelAlgPersistent.cpp:1068 ]
Operator::CallValueMapping(int, Word*, Word&, int, Word&, void*) -->
[ Algebra.h:181 ]
AlgebraManager::Execute(int, int, Word*, Word&, int, Word&, void*) -->
[ AlgebraManager.h:661 ]
QueryProcessor::Eval(OpNode*, Word&, int) --> [ QueryProcessor.cpp:2682 ]
QueryProcessor::Request(void*, Word&) --> [ QueryProcessor.cpp:2792 ]
Head(Word*, Word&, int, Word&, void*) --> [ ExtRelationAlgebra.cpp:1020 ]
Operator::CallValueMapping(int, Word*, Word&, int, Word&, void*) -->
[ Algebra.h:181 ]
AlgebraManager::Execute(int, int, Word*, Word&, int, Word&, void*) -->
[ AlgebraManager.h:661 ]
QueryProcessor::Eval(OpNode*, Word&, int) --> [ QueryProcessor.cpp:2682 ]
QueryProcessor::Request(void*, Word&) --> [ QueryProcessor.cpp:2792 ]
TCountStream(Word*, Word&, int, Word&, void*) -->
[ RelationAlgebra.cpp:1864 ]
Operator::CallValueMapping(int, Word*, Word&, int, Word&, void*) -->
[ Algebra.h:181 ]
AlgebraManager::Execute(int, int, Word*, Word&, int, Word&, void*) -->
[ AlgebraManager.h:661 ]
QueryProcessor::Eval(OpNode*, Word&, int) --> [ QueryProcessor.cpp:2682 ]
SecondoInterface::Command_Query(AlgebraLevel, unsigned long, unsigned long&,
std::string&) --> [ SecondoInterface.cpp:1344 ]
SecondoInterface::Secondo(std::string const&, unsigned long, int, bool, bool,
unsigned long&, int&, int&, std::string&, std::string const&) -->
[ SecondoInterface.cpp:1129 ]
SecondoTTY::CallSecondo() --> [ SecondoTTY.cpp:590 ]
SecondoTTY::CallSecondo2() --> [ SecondoTTY.cpp:623 ]
SecondoTTY::ProcessCommand() --> [ SecondoTTY.cpp:305 ]
SecondoTTY::ProcessCommands() --> [ SecondoTTY.cpp:443 ]
SecondoTTY::Execute() --> [ SecondoTTY.cpp:921 ]
main --> [ SecondoTTY.cpp:1068 ]
?? --> [ ??:0 ]
_start --> [ start.S:105 ]
*********** End Stack **********************
Regards
Markus
2005-03-03 Problems with the bison parser generator
================================================================================
Dear all,
I finished to merged in the Picture Algebra devolped by
students of the practical course in database systems.
It is possible to import JPEG, TGA and PCX pictures.
Sample pictures and an import command file can be found
in the secondo-data CVS repository. A nice viewer is also
present. Besides picture a data type histogram is implemented,
which represents color distributions in the RGB color scheme.
By default this algebra is not active. If you switch it
on the parser generator bison has a problem since
the table size of 32767 will be exceeded. I found no
switches to resize this table, hence I decided to use
a newer version of bison which works fine. On linux
you can easily switch to a newer version if you edit
the makfile.linux and set
YACC=/usr/bin/bison
On windows you need to download a newer version from
gnuwin32.sourceforge.net.
Bye
Markus
2005-02-09 Secondo-SDK Configuration
===============================================================================
Dear all,
I have revised the environment setup for SECONDO. The next time
when you update and run make the files
.secondorc
.secondo.linuxrc (or .secondo.win32rc)
.secondo.sdkrc
will be created in your home directory. The file .bashrc should
simply call
source $HOME/.secondorc [secondo-root-dir] [secondo-sdk-dir]
this should be already done in all installations. By default $HOME/secondo
is assumed to be the directory where the SECONDO sources are present.
If not so you can pass the directory as optional argument. The second
parameter can overrule the default directory SECONDO_SDK=/home/secondo-sdk.
This is interesting for global installations like we have on zeppelin.
Since the automatic detection of the Berkeley-DB version was very slow on
windows-msys I removed it. If you have already installed a newer Berkeley-DB
please set up the directory in the file ".secondo.sdkrc". Moreover, you may
have to change the CVSROOT variable it is also defined there. In most cases
this should be the only file to change.
I hope, that now the environment configuration is
(1) better to maintain
(2) better to understand
(3) has more verbose error messages in case of missing directores, etc
Bye
Markus
P.S.: When you have more than one SECONDO source trees on your computer you
can simply change the environment by calling "setvar" (without parameter
$PWD) in the root of a SECONDO source tree.
2005-02-07 The aggregate operator
===============================================================================
Hi all,
I added in the Extended Relational Algebra a new operator called 'aggregate'.
I am sending you the description of the changes I made in the CVS.
[]s
Victor
----------
In this version four modifications were made:
- The operators unionbbox and intersectionbbox were removed from the
Rectangle Algebra, because
- The operator aggregate was added in the Extended Relational Algebra, and
- The operators union and intersection were added in the Rectangle Algebra.
- The specification of the translate operator was changed.
With that, general aggregate operations can be done in relations, for
example, the unionbbox can now be rewritten as
query Kreis_box feed
aggregate[box; fun(r1: rect, r2: rect) r1 union r2; [const rect value undef]]
where the first argument box is an attribute of type rect in the relation
Kreis_box; the second is the aggregate function that operates in two elements;
and the third argument is the empty value to be calculated with the first
tuple of the relation.
As another example, the sum operation can now be rewritten as:
query ten feed aggregate[no; fun(i1: int, i2: int) i1+i2; 0]
----------
2005-01-31 Automatic Tests
===============================================================================
Hello all,
I created some new scripts which automatically run all available
tests through the TestRunner. Moreover, I created a new directory
Tests/Testspecs and I moved all tests there. You can simply add new
testfiles there and if they end with ".test" they will be recognized by
the script. You can call the script by
make runtests
Please do this before you check in changes in important modules!!!
The test creates his own berkeley db database directory The output of the
tests will be stored in Tests/Testspecs/<name>.log
Moreover, every night a cron job retrieves a CVS copy, compiles SECONDO and
runs these tests. All people who commited since the last succesful run will
get an email if any error happens.
Currently, the oldrelalg.test fails with a segmentation fault.
Bye
Markus
2005-01-25 Berkeley-DB Release change
===============================================================================
Dear all,
I'm sorry but unfortunately, my instructions were misguiding. The problem is,
that the directory for the Berkeley-DB was hard coded into the setvar script
and I told you another name. Hence, even if you have installed a new
Berkeley-DB it will not be used.
With the command 'catvar' all important used directories are displayed. If
BERKELEY_DB_DIR is not the directory where you have installed it, still the
old verision will be used. Now I fixed the problem and the only restriction
is that the library must be installed below directory $SECONDO_SDK
If you already have installed Berkeley-DB 4.2.52. do
1) cvs update
2) make update environment
3) close all shells and open a new one
4) make clean; make
5) restore databases
If you haven't done it already, the old instructions will work.
2005-01-24 Berkeley-DB Release change
===============================================================================
Dear all,
I changed some SMI code in order to make it possible that
you can use SECONDO also with Berkeley DB version 4.2.52.
This version has native mingw support, hence it is easier to
install with gcc on windows.
Since version 4.2.52 is intented to be used for the CeBit-Version, please
follow the instructions below and install Berkeley-DB 4.2.52 as soon
as possible, because there are only 3 weeks left for testing this version.
Here are the upgrade instructions:
1) update Secondo (cvs update)
2) run make update-environment
3) close all open shells and start a new one
4) Download Berkeley-DB 4.2.52 (without encryption) from www.sleepycat.com
or if you have access from zeppelin:/home/secondo/SECONDO_CD/{linux,windows}/
non-gnu
5) Extract the distribution somewhere and run the following commands
a) cd db-4.2.52/build_unix
b) ../dist/configure --enable-mingw --enable-cxx --prefix=$SECONDO_SDK/db4252
Note: The switch enable-mingw should only be used on windows
c) make
d) make install
6) Save your current databases if you think this is necessary
7) Now do a make clean on your secondo directory and call make again
8) Delete the database directory and restore your databases.
If you have any problems with the procedure above, please contact me
Best Regards
Markus
2004-12-22 Nested Lists
===============================================================================
Dear all,
since Zhiming had problems with nested lists containing German "Umlaute" like
<EFBFBD> <20>etc. I revised the scanner specifications in order to make them more
secure. However, the problem with "Umlaute" seems to be only a problem
of the Java based nested list scanner and is still present. Besides some
other improvements to the nested list parsing were done and explained below:
Changes in the C++ and Java Scanner:
-Special characters like \t \v \a \b, etc. will be overwritten.
-Moreover there should be no problem to interchange nested list files
between linux and windows.
Changes in C++:
-The error message was improved. The position of the character causing an
error and the last token name will be displayed.
-The scanner and parser can be switched into a debug mode displaying many
useful information. This can be done by a new command called "set", For
example
(set "NLParser:Debug" = TRUE);
will turn on the debug mode for the parser. Note: The command is only
recognized in nested list syntax. Moreover the command can be used to
change the RTFlags defined in the SecondoConfig.ini at runtime. Currently only
boolean values are supported. However,
some of them are not really runtime parameters since they are only used at
startup of the system, hence changing them later is meaningless.
The flag "NLScanner:Debug" will be used to control the scanners output. If
both are switched on the output is very noisy and not easy to understand
since scanning and parsing are interleaved.
Bye
Markus
===============================================================================
There are some more messages but I think some of them are out of date now. Below
you will find those which I thought moght be most interesting
2004-11-10 Counter and Runtime Information
===============================================================================
ello,
if the two RTFLAGS
SI:CommandTime
SI:PrintCounters
are set in SecondoConfig.ini the files
cmd-times.csv
cmd-counters.csv
will be created. This is useful to import data into
spreadsheets programs, e.g. MS-Excel. Before you
import the counters.csv file it may be necessary to edit
it, since the number of counters depends on the executed
code hence on the commands you type in. Look at the
exaples below:
cmd-times.csv:
------------------------
#nr|command|realtime|cpu-time
1|list databases|0|0.01
2|create database testqueries|2|0.04
3|restore database testqueries from testqueries|3|0.1
cmd-counters.csv:
---------------------------
1
SmiFile::Close|SmiFile::Open
2|9|8
SmiFile::Close|SmiFile::Create|SmiFile::Open|SmiFile:Realloc-DBHandles|
SmiRecord::Write
3|27|18|17|16|422
4|9|0|0|0|0
The first column contains the number of the command and the following columns
the values of the counters. If the number of counters changes a new headline
will be printed. List databases produces no output of counters, hence there
is no headline.
Information about using counters in your code will be found in the file
include/Counter.h
Bye
Markus
2004-07-30 B-Tree indexing of conplex types
===============================================================================
Hi all,
I have just finished a new approach for indexing complex data types using
B-Trees.
The idea is to use a string representation of the objects that preserve the
ordering. For that, a new abstract class is created, namely
IndexableStandardAttribute, which has three functions: one for writing the
value into a string (char *), one for reading the value from a string, and
finally one for returning the size of the string representation of the object
in bytes.
The data type class must implement this abstract class in order to be
indexable by B-Trees. For the Secondo type checking, I have created a kind
called INDEXABLE which the data types must belong.
As an example, one can see the DateTime algebra.
[]'s
Victor
2004-07-21 New Operators available
===============================================================================
Dear All,
I've implemented 3 new operators, "units", "theyear", and "themonth" in
Secondo. The "units" operator didn't work at first because of an error with
the definition of the UPoint class. Now Markus has fixed it and it works
fine.
The signatures of these OPs are as follows:
------units------(to transform moving data into stream of moving units)
mpoint ->stream(upoint)
mint->stream(constint)
mreal->stream(ureal)
-----theyear-----(to get a periods value from the indicated year)
int->periods
----themonth---(to get the periods value from the indicated month)
int x int ->periods
Example queries are as follows:
query U15 feed extendstream[ MUnit: units(.zug) ] consume;
query theyear(2000)
query themonth(2000, 3)
Best Regards,
Zhiming
2004-07-12 Restoring large objects
===============================================================================
Dear all,
some time ago, I explained how to configure make
to create a version of Secondo which uses a persistent
implementation of the Nested-List module capable to
restore large objects. The changes made there address
two things
(1) The variable NL_PERSISTENT in the makefile.env
which switches between persistent and in memory
representation.
(2) This variable was also used to define a Berkeley-DB mode
without logging and transactions.
The latter was now changed to a runtime flag. If you define
the flag
SMI:NoTransactions
in the SecondoConfig.ini file (the RTFlags key).
Secondo will startup the SMI without transactions and logging.
This mode can be useful for restoring databases or to see which
overhead logging and transactions causes during query processing
You can also use the version with the NL_PERSISTENT flag permanently
since for the most commands the buffer of the Nested-List module is big enough
and hence no additional disk I/O is needed. I tested it for a while and I
think it runs stable now.
However, each time when you change the NL_PERSISTENT key, run make clean
before building Secondo again. This is due to the fact that the Nestedlist
module is used nearly everywhere in the system and I'm not sure if the make
files handles all dependencies correct.
Bye Markus
2003-11-07 About makefiles
===============================================================================
Hello all,
I have changed some makefiles and created some new ones since some of them were
too complex. Now the structure is as follows (-> indicates a include relation)
makefile -> makfile.env
The top level makefile should only call other makfiles in an appropriate order.
The rules for creating libraries were moved to a file makefile.libs. Every sub
level makefile includes makefile.env which contains all basic definitions such
as names of tools, directories for searching includes, etc.
makfile.env
->makefile.jni
->makefile.algebras
->makefile.optimizer
->makefile.{.linux,.win32}
makefile.algebras is used for the definition of algebra names and directories
where their source code resides. This makes ist more convenient to switch
on/off algebras since you have only to comment out two lines and edit the file
/Algebras/Management/AlgebraList.i. You don't have to change the file
Algebras/makefile anymore. I corrected also the dependencies so that you don't
need to do a "make clean" after Algebra re-configuration. But be careful, there
may be some interdependencies between Algebras. Every algebra implementor who
knows about dependencies should document them in this file.
makefile.jni contains information about JNI-Algebras written in Java.
Currently, this doesn't work properly but will be corrected next week. The
compilation of JNI-Algebras is controlled by the macro USE_JNI in the
makefile.env
makefile.optimizer contains information about the Prolog and JPL installation.
If this file detects that Prolog is installed (this is determined by some
environment variables which have to be set manually after prolog installation)
the optimizer will be compiled. Additonally, the JPL based Optimizer Server
will be compiled.
Finally, here are some installation instructions:
MS-Windows: The Berkeley-DB will now be installed at a global place in the file
system, therefore you will have to change some environment variables. When you
have already installed Secondo from the CD-ROM and you want to make a new
inital copy from the cvs server do the following:
cvs update secondo
cd secondo/Win32
make install
cd MSYS
make install
notepad /etc/setvar.bash
Set up your Prolog and J2SDK installation directories. After you have done this
you will never have to change to the Win32 directory and run "make install"
anymore.
Linux: Replace your setvar.bash script with the version in the secondo directory
Maybe it was renamed try
which setvar
to determine the location. Otherwise look into your .profile or .bashrc shell
configuration file. Adjust the Prolog and J2SDK directories.
If you have problems, please contact me.
Bye
Markus
2004-04-30
===============================================================================
Hello all,
I have commited the following changes to CVS
1) new operators seqinit: int -> bool and seqnext: -> int implemented in the
StandardAlgebra.
These operators allow to create a sequence of numbers. After startup the
Sequence counter is set to 0. With seqinit it can be resetted to an arbitrary
integer value. The seqnext() can be used in conjunction with the extend
operators to add unique or uniformly distributed attribute values (seqnext()
mod N) to relations
2) The random number generation in the randint and sample operator are revised
as recommended in the man page documentation of the rand() function. In C++
code always use the computation of rand()/(RAND_MAX+1.0) to create floating
values in the range [0,1) and multiply them by a constant N to create values
in the range [0,N-1]. On Windows the library dependent constant RAND_MAX is
limited to 32.000 which is a very small value. With the modification above
the sample operator will create uniformly distributed numbers but limited to
a total number of samples of 3*RAND_MAX/4 to avoid long runtimes.
3) In case of abnormal program termination a stack trace will be printed on
the screen (linux version only). This may help to determine the origin of
trouble at a first glance.
Bye
Markus