255 lines
6.9 KiB
Plaintext
255 lines
6.9 KiB
Plaintext
/*
|
|
----
|
|
This file is part of SECONDO.
|
|
|
|
Copyright (C) 2012, University in Hagen
|
|
Faculty of Mathematic and Computer Science,
|
|
Database Systems for New Applications.
|
|
|
|
SECONDO is free software; you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation; either version 2 of the License, or
|
|
(at your option) any later version.
|
|
|
|
SECONDO is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with SECONDO; if not, write to the Free Software
|
|
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
|
----
|
|
|
|
\tableofcontents
|
|
|
|
\newpage
|
|
|
|
1 Overview
|
|
|
|
DBLP is a large collection of bibligraphy data. As the data is structured well
|
|
it is possible to transfer a subset to a property graph.
|
|
|
|
\begin{figure}[h]
|
|
\centerline{
|
|
\includegraphics[width=1\textwidth]{schema.eps}}
|
|
\end{figure}
|
|
|
|
|
|
2 Prepare data
|
|
|
|
2.1 Create database
|
|
|
|
Create a SECONDO database named "pgraph2".
|
|
|
|
*/
|
|
SECONDO> create database pgraph2;
|
|
SECONDO> open database pgraph2;
|
|
/*
|
|
|
|
As this database is quite huge, it is necessary to adjust the available memory
|
|
in ~SecondoConfig.ini~ to at least 2GB:
|
|
|
|
*/
|
|
|
|
[QueryProcessor]
|
|
GlobalMemory=2048
|
|
|
|
/*
|
|
|
|
In the script files the following statement exposes additional memory to the MainMemoryAlgebra:
|
|
|
|
*/
|
|
|
|
query meminit (1524);
|
|
|
|
/*
|
|
|
|
2.2 Import relations
|
|
|
|
To import the raw data to SECONDO follow the following steps:
|
|
|
|
1 Download the raw data from https://dblp.uni-trier.de/xml/dblp.xml.gz
|
|
|
|
2 Make sure, SECONDO is (temporary) compiled without transaction support as transaction
|
|
logging will use a huge amount of system resources. \\
|
|
(See bin/SecondoConfig.ini value "RTFlags += SMI:NoTransactions")
|
|
|
|
3 Use the application in Tools/Converter/Dblp2Secondo to generate the
|
|
following import files: \\ Document, Authordoc, Author, Keyword. \\
|
|
(Follow the instructions in the contained READ.ME file)
|
|
Before importing (!) the relations using the script ~restore\_objs~, rename
|
|
the relations in the let-statement to Document\_raw, Author\_raw, Authordoc\_raw,
|
|
Keyword\_raw. This allows to use these names in the node relations later.
|
|
|
|
|
|
2.3 Transform relations to property graph
|
|
|
|
As the complete dataset is very large, it is possible to convert a subset
|
|
to the property graph. Currently it will take all publications from 2017
|
|
and all documents where the author contains the word "gueting"
|
|
(About 320.000 records.)
|
|
To create the relations with the subset you can use the script:
|
|
|
|
*/
|
|
SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/createrels_subset
|
|
/*
|
|
|
|
With the use of Secondo-Pregel it is also possible to query large graphs. To use
|
|
the complete dblp-database, you can run the following script:
|
|
|
|
*/
|
|
SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/createrels
|
|
/*
|
|
|
|
The imported data will be split to nodes and edge relations to represent a graph.
|
|
These relations will be taken to define the graph later.
|
|
|
|
2.4 Create Ordered Relations
|
|
|
|
The following scripts creates the Ordered-Relations which are used for the property graph:
|
|
|
|
*/
|
|
SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/create_orel
|
|
/*
|
|
|
|
These are used for the Compute-Function. With these Relations it is also possible
|
|
to query a persistent and non-distributed graph.
|
|
|
|
2.5 Distribute Data to the Workers
|
|
|
|
For the use of Secondo-Pregel the relations have to be distributed to the defined Workers.
|
|
Therefore you need to define a Workers-Configuration-File an rename it in the script.
|
|
It is also possible to change the part-Function for the distribution.
|
|
The detailed steps to create different Workers is described in "Distributed Query
|
|
Processing in secondo". To distribute the data and define the Workers you can use the script:
|
|
|
|
*/
|
|
SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/distribute
|
|
/*
|
|
|
|
|
|
3 Creating a property graph
|
|
|
|
A property graph has to be defined before matching operators can be
|
|
used to query the graph. This is done be registering the node and edge
|
|
relations. (This could be seen as the schema of the graph)
|
|
|
|
*/
|
|
SECONDO> @../Algebras/PropertyGraph/sample-dblp/create
|
|
/*
|
|
|
|
At first a PropertyGraph object is created. The argument "p2" is
|
|
used to prefix objects in the memory catalog to keep the data
|
|
of multiple graphs separated.
|
|
|
|
*/
|
|
|
|
let p2=createpgraph("p2");
|
|
|
|
/*
|
|
|
|
To define the schema of the graph, the script ~create~ uses the
|
|
following operators to register the property graph:
|
|
|
|
* ~createpgraph(name)~ to create a property graph object
|
|
|
|
* ~addnodesrel[relname,fromclause,toclause]~ to register node relations
|
|
|
|
* ~addedgesrel[relanme,propertyname, indexname]~ to register edge relations
|
|
|
|
* ~addnodeindex[relanme]~ to register node property indexes
|
|
|
|
This configuration will be saved in the database and will
|
|
be available between sessions.
|
|
|
|
To get information about the configuration of a property graph
|
|
objects, use the ~info~ operator.
|
|
|
|
*/
|
|
SECONDO> query p2 info;
|
|
/*
|
|
|
|
|
|
4 Loading the property graph
|
|
|
|
To be able to query the property graph, it needs to be loaded with the
|
|
Operator loadgraphorel.
|
|
This will create the statistics of the graph (at the first excecution) and
|
|
additional structures to support the match operators.
|
|
|
|
*/
|
|
SECONDO> query p2 loadgraphorel;
|
|
/*
|
|
|
|
After loading the graph, the structure of the graph has to be configured.
|
|
For the use of Secondo-Pregel you can configure the structures "pregelpersistent"
|
|
and "pregelmemory".
|
|
|
|
*/
|
|
SECONDO> query p2 cfg["structure","pregelpersistent";
|
|
/*
|
|
|
|
|
|
5 Sample Queries
|
|
|
|
The PropertyGraph Algebra defines one matching operator for the use of Secondo-Pregel, namemly
|
|
|
|
- ~match1b~: Uses a query tree to match subgraphs
|
|
starting from the root node
|
|
|
|
5.1 Query 'coauthor'
|
|
|
|
Queries the top 5 co-authors of publications of "Ralf Hartmut Gueting".
|
|
|
|
The results will be grouped and show the authors with the sum of joint publications.
|
|
|
|
|
|
5.1.1 match1b
|
|
|
|
Note the direction argument "$<$-" to match an edge in reverse direction.
|
|
|
|
*/
|
|
|
|
query p2
|
|
match1b ['
|
|
(
|
|
( auth Author ( (Name "Ralf Hartmut Gueting") ) )
|
|
AUTHOR_OF
|
|
( (doc Document)
|
|
AUTHOR_OF <-
|
|
( (Author) )
|
|
)
|
|
)',
|
|
'',
|
|
'( ((auth Name) Name) )'
|
|
];
|
|
|
|
/*
|
|
|
|
Also available as sciptfile:
|
|
|
|
*/
|
|
SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/match1b-coauthors
|
|
/*
|
|
|
|
After running the script the results of the query have to be created by
|
|
the following command:
|
|
|
|
*/
|
|
SECONDO> query createSDArray("Results", Workers) dsummarize
|
|
filter[not(.Name = "Ralf Hartmut Gueting")] sortby[Name]
|
|
groupby[Name; Cnt:group count] sortby[Cnt:desc] head[5] consume;
|
|
|
|
/*
|
|
|
|
To delete the results and start a new query they can be deleted with the command:
|
|
|
|
*/
|
|
SECONDO> qquery createSDArray("Results", Workers)
|
|
dmap["", . feed . deletedirect count] getValue
|
|
|
|
|
|
|
|
|