/* ---- This file is part of SECONDO. Copyright (C) 2012, University in Hagen Faculty of Mathematic and Computer Science, Database Systems for New Applications. SECONDO is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. SECONDO is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with SECONDO; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ---- \tableofcontents \newpage 1 Overview DBLP is a large collection of bibligraphy data. As the data is structured well it is possible to transfer a subset to a property graph. \begin{figure}[h] \centerline{ \includegraphics[width=1\textwidth]{schema.eps}} \end{figure} 2 Prepare data 2.1 Create database Create a SECONDO database named "pgraph2". */ SECONDO> create database pgraph2; SECONDO> open database pgraph2; /* As this database is quite huge, it is necessary to adjust the available memory in ~SecondoConfig.ini~ to at least 2GB: */ [QueryProcessor] GlobalMemory=2048 /* In the script files the following statement exposes additional memory to the MainMemoryAlgebra: */ query meminit (1524); /* 2.2 Import relations To import the raw data to SECONDO follow the following steps: 1 Download the raw data from https://dblp.uni-trier.de/xml/dblp.xml.gz 2 Make sure, SECONDO is (temporary) compiled without transaction support as transaction logging will use a huge amount of system resources. \\ (See bin/SecondoConfig.ini value "RTFlags += SMI:NoTransactions") 3 Use the application in Tools/Converter/Dblp2Secondo to generate the following import files: \\ Document, Authordoc, Author, Keyword. \\ (Follow the instructions in the contained READ.ME file) Before importing (!) the relations using the script ~restore\_objs~, rename the relations in the let-statement to Document\_raw, Author\_raw, Authordoc\_raw, Keyword\_raw. This allows to use these names in the node relations later. 2.3 Transform relations to property graph As the complete dataset is very large, it is possible to convert a subset to the property graph. Currently it will take all publications from 2017 and all documents where the author contains the word "gueting" (About 320.000 records.) To create the relations with the subset you can use the script: */ SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/createrels_subset /* With the use of Secondo-Pregel it is also possible to query large graphs. To use the complete dblp-database, you can run the following script: */ SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/createrels /* The imported data will be split to nodes and edge relations to represent a graph. These relations will be taken to define the graph later. 2.4 Create Ordered Relations The following scripts creates the Ordered-Relations which are used for the property graph: */ SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/create_orel /* These are used for the Compute-Function. With these Relations it is also possible to query a persistent and non-distributed graph. 2.5 Distribute Data to the Workers For the use of Secondo-Pregel the relations have to be distributed to the defined Workers. Therefore you need to define a Workers-Configuration-File an rename it in the script. It is also possible to change the part-Function for the distribution. The detailed steps to create different Workers is described in "Distributed Query Processing in secondo". To distribute the data and define the Workers you can use the script: */ SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/distribute /* 3 Creating a property graph A property graph has to be defined before matching operators can be used to query the graph. This is done be registering the node and edge relations. (This could be seen as the schema of the graph) */ SECONDO> @../Algebras/PropertyGraph/sample-dblp/create /* At first a PropertyGraph object is created. The argument "p2" is used to prefix objects in the memory catalog to keep the data of multiple graphs separated. */ let p2=createpgraph("p2"); /* To define the schema of the graph, the script ~create~ uses the following operators to register the property graph: * ~createpgraph(name)~ to create a property graph object * ~addnodesrel[relname,fromclause,toclause]~ to register node relations * ~addedgesrel[relanme,propertyname, indexname]~ to register edge relations * ~addnodeindex[relanme]~ to register node property indexes This configuration will be saved in the database and will be available between sessions. To get information about the configuration of a property graph objects, use the ~info~ operator. */ SECONDO> query p2 info; /* 4 Loading the property graph To be able to query the property graph, it needs to be loaded with the Operator loadgraphorel. This will create the statistics of the graph (at the first excecution) and additional structures to support the match operators. */ SECONDO> query p2 loadgraphorel; /* After loading the graph, the structure of the graph has to be configured. For the use of Secondo-Pregel you can configure the structures "pregelpersistent" and "pregelmemory". */ SECONDO> query p2 cfg["structure","pregelpersistent"; /* 5 Sample Queries The PropertyGraph Algebra defines one matching operator for the use of Secondo-Pregel, namemly - ~match1b~: Uses a query tree to match subgraphs starting from the root node 5.1 Query 'coauthor' Queries the top 5 co-authors of publications of "Ralf Hartmut Gueting". The results will be grouped and show the authors with the sum of joint publications. 5.1.1 match1b Note the direction argument "$<$-" to match an edge in reverse direction. */ query p2 match1b [' ( ( auth Author ( (Name "Ralf Hartmut Gueting") ) ) AUTHOR_OF ( (doc Document) AUTHOR_OF <- ( (Author) ) ) )', '', '( ((auth Name) Name) )' ]; /* Also available as sciptfile: */ SECONDO> @../Algebras/PropertyGraph2/sample-pregel/sample-dblp/match1b-coauthors /* After running the script the results of the query have to be created by the following command: */ SECONDO> query createSDArray("Results", Workers) dsummarize filter[not(.Name = "Ralf Hartmut Gueting")] sortby[Name] groupby[Name; Cnt:group count] sortby[Cnt:desc] head[5] consume; /* To delete the results and start a new query they can be deleted with the command: */ SECONDO> qquery createSDArray("Results", Workers) dmap["", . feed . deletedirect count] getValue