Question

Orient version: Official Distribution OrientDB Graph Edition 1.0.1

I'm attempting to build a Blueprints compatible graph with OrientDB SQL inserts (faster than g.addVertex and no OutOfMemory errors).

When creating 1.5M records, insert takes longer than expected (13 mins on quad core i7) via console (batch) and searching via 'name' field is extremely slow (2 mins).

For testing purposes, we have a very simple import sql file which attempts to create vertices that are Blueprints compatible with two set params, name:account_id and type:account_type, and empty in/out arrays.

import.ornt:

connect local:/graph1/databases/dbdemo admin admin;
drop database;
create database local:/graph1/databases/dbdemo admin admin local graph;
insert into V (name,type,in,out) values ("1111111","player",[],[]);
...
disconnect;

Note: it would be ideal to be able to set the record identifier as our pre-existing account_id. But I'm not sure how to attempt this.

The above sql ends up being ~100MBs and I run it via the console (console.sh includes optimization -Dmvrbtree.optimizeThreshold=-1):

:$ ./console.sh import.ornt 
OrientDB console v.1.0.1 (build @BUILD@) www.orientechnologies.com
Type 'help' to display all the commands supported.

Installing extensions for GREMLIN language v.2.0.0-SNAPSHOT
Connecting to database [local:/graph1/databases/dbdemo] with user 'admin'...OK

Database 'dbdemo' deleted successfully
Creating database [local:/graph1/databases/dbdemo] using the storage type [local]...
Database created successfully.

Current database is: local:/graph1/databases/dbdemo

Inserted record 'V#6:0{name:11111111,type:player,in:[0],out:[0]} v0' in 0.002000 sec(s).
...
Disconnecting from the database [dbdemo]...OK

At this point, we have the basic OrientDB Graph with OGraphVertex class having roughly 1.5M vertices.

orientdb> classes 

CLASSES:
----------------------------------------------+---------------------+-----------+
 NAME                                         | CLUSTERS            | RECORDS   |
----------------------------------------------+---------------------+-----------+
 ORIDs                                        | 5                   |         0 |
 OGraphEdge                                   | 7                   |         0 |
 OUser                                        | 4                   |         3 |
 ORole                                        | 3                   |         3 |
 OGraphVertex                                 | 6                   |   1524528 |
----------------------------------------------+---------------------+-----------+
 TOTAL                                                                  1524534 |
--------------------------------------------------------------------------------+

Using the console, selecting via Orient SQL or Gremlin/Pipes takes forever:

orientdb> gremlin g.V.has('name','1149400');                      

v[#6:617363]

Script executed in 129.809006 sec(s).
> select from OGraphVertex where name like '1149400';

---+---------+--------------------+--------------------+--------------------+--------------------
  #| RID     |name                |type                |in                  |out                 
---+---------+--------------------+--------------------+--------------------+--------------------
  0|#6:617363|1149400             |player              |[0]                 |[0]                 
---+---------+--------------------+--------------------+--------------------+--------------------

1 item(s) found. Query executed in 112.531 sec(s).

orientdb> 

Roughly 2 minutes using OrientDB SQL or Gremlin!

As a potential workaround, it would be ideal set the record identifier as the original account id from the MySQL db source. Is it possible to have custom record ids?

For example, since we'll know the account_id (e.g., 1149400) when beginning traversal, the ideal would be:

orientdb> gremlin g.v('#6:1149400').map

{name=1149400, type=player}

Script executed in 0.054000 sec(s).

0.054000 is a lot faster than 112.531!!

Was it helpful?

Solution

The OrientDB RecordId cannot be changed, but you could create an index against the V.name property. Example:

CREATE INDEX OGraphVertex.name NOTUNIQUE

or even:

CREATE INDEX V_name ON OGraphVertex (name) notunique;

For more information look at: OrientDB Indexes

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top