There are various limitations and “gotchas” that one should be aware of when using Titan. Some of these limitations are necessary design choices and others are issues that will be rectified as Titan development continues. Finally, the last section provides solutions to common issues.





Design Limitations

Size Limitation

Titan can store up to a quintillion edges (2^60) and half as many vertices. That limitation is imposed by Titan’s id scheme.

DataType Definitions

When declaring the data type of a property key using dataType(Class) Titan will enforce that all properties for that key have the declared type, unless that type is Object.class. This is an equality type check, meaning that sub-classes will not be allowed. For instance, one cannot declare the data type to be Number.class and use Integer or Long. For efficiency reasons, the type needs to match exactly. Hence, you Object.class as the data type for type flexibility. In all other cases, declare the actual data type to benefit from increased performance and type safety.

Temporary Limitations

Key Index Must Be Created Prior to Key Being Used

To index vertices by key, the respective key index must be created before the key is first used in a vertex property. Read more about creating vertex indexes.

Unable to Drop Key Indices

Once an index has been created for a key, it can never be removed.

Types Can Not Be Changed Once Created

This pitfall constrains the graph schema. While the graph schema can be extended, previous declarations cannot be changed.

Batch Loading Speed

Titan provides a batch loading mode that can be enabled through the configuration. However, this batch mode only facilitates faster loading into the storage backend, it does not use storage backend specific batch loading techniques that prepare the data in memory for disk storage. As such, batch loading in Titan is currently slower than batch loading modes provided by single machine databases. The Bulk Loading documentation lists ways to speed up batch loading in Titan.

Another limitation related to batch loading is the failure to load millions of edges into a single vertex at once or in a short time of period. Such supernode loading can fail for some storage backends. This limitation also applies to dense index entries. For more information, please refer to the ticket .

Beware

Multiple Titan instances on one machine

Running multiple Titan instances on one machine backed by the same storage backend (distributed or local) requires that each of these instances has a unique configuration for storage.machine-id-appendix. Otherwise, these instances might overwrite each other leading to data corruption. See Graph Configuration for more information.

Custom Class Datatype

Titan supports arbitrary objects as attribute values on properties. To use a custom class as data type in Titan, either register a custom serializer or ensure that the class has a no-argument constructor and implements the equals method because Titan will verify that it can successfully de-/serialize objects of that class. Please read Datatype and Attribute Serializer Configuration for more information.

Global Graph Operations

Since version 0.2.1, Titan supports global graph operations for all storage backends. However, beware that such operations will likely cause OutOfMemoryException for large graphs since all vertices and/or edges can be loaded into memory by iterating through them. Use Faunus to implement global graph operations on large, distributed Titan graphs that do not fit into memory.

Transactional Scope for Edges

Edges should not be accessed outside the scope in which they were originally created or retrieved.

Ghost Vertices

When the same vertex is concurrently removed in one transaction and modified in another, both transactions will successfully commit on eventually consistent storage backends and the vertex will still exist with only the modified properties or edges. This is referred to as a ghost vertex. It is possible to guard against ghost vertices on eventually consistent backends using key out-uniqueness but this is prohibitively expensive in most cases. A more scalable approach is to allow ghost vertices temporarily and clearing them out in regular time intervals, for instance using Titan tools.

Snappy 1.4 does not work with Java 1.7

Cassandra 1.2.x makes use of Snappy 1.4. Titan will not be able to connect to Cassandra if the server is running Java 1.7 and Cassandra 1.2.x (with Snappy 1.4). Be sure to remove the Snappy 1.4 jar in the cassandra/lib directory and replace with a Snappy 1.5 jar version (available here).

Useful Tips

Removing JAMM Warning Messages

When launching Titan with embedded Cassandra, the following warnings may be displayed:

958 [MutationStage:25] WARN  org.apache.cassandra.db.Memtable  - MemoryMeter uninitialized (jamm not specified as java agent); assuming liveRatio of 10.0.  Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead

Cassandra uses a Java agent called MemoryMeter which allows it to measure the actual memory use of an object, including JVM overhead. To use JAMM (Java Agent for Memory Measurements), the path to the JAMM jar must be specific in the Java javaagent parameter when launching the JVM (e.g. -javaagent:path/to/jamm.jar). Rather than modifying titan.sh and adding the javaagent parameter, I prefer to set the JAVA_OPTIONS environment variable with the proper javaagent setting:

export JAVA_OPTIONS=-javaagent:$TITAN_HOME/lib/jamm-0.2.5.jar

Cassandra Connection Problem

By default, Titan uses the Astyanax library to connect to Cassandra clusters. On EC2 and Rackspace, it has been reported that Astyanax was unable to establish a connection to the cluster. In those cases, changing the backend to storage.backend=cassandrathrift solved the problem.

ElasticSearch OutOfMemoryException

When numerous clients are connecting to ElasticSearch, it is likely that an OutOfMemoryException occurs. This is not due to a memory issue, but to the OS not allowing more threads to be spawned by the user (the user running ElasticSearch). To circumvent this issue, increase the number of allowed processes to the user running ElasticSearch. For example, increase the ulimit -u from the default 1024 to 10024.