There are various limitations and “gotchas” that one should be aware of when using Titan. Some of these limitations are necessary design choices and others are issues that will be rectified as Titan development continues. Finally, the last section provides solutions to common issues.
Titan can store up to a quintillion edges (2^60) and half as many vertices. That limitation is imposed by Titan’s id scheme.
When declaring the data type of a property key using dataType(Class)
Titan will enforce that all properties for that key have the declared type, unless that type is Object.class
. This is an equality type check, meaning that sub-classes will not be allowed. For instance, one cannot declare the data type to be Number.class
and use Integer
or Long
. For efficiency reasons, the type needs to match exactly. Hence, you Object.class
as the data type for type flexibility. In all other cases, declare the actual data type to benefit from increased performance and type safety.
To index vertices by key, the respective key index must be created before the key is first used in a vertex property. Read more about creating vertex indexes.
Once an index has been created for a key, it can never be removed.
This pitfall constrains the graph schema. While the graph schema can be extended, previous declarations cannot be changed.
Titan provides a batch loading mode that can be enabled through the configuration. However, this batch mode only facilitates faster loading into the storage backend, it does not use storage backend specific batch loading techniques that prepare the data in memory for disk storage. As such, batch loading in Titan is currently slower than batch loading modes provided by single machine databases. The Bulk Loading documentation lists ways to speed up batch loading in Titan.
Another limitation related to batch loading is the failure to load millions of edges into a single vertex at once or in a short time of period. Such supernode loading can fail for some storage backends. This limitation also applies to dense index entries. For more information, please refer to the ticket .
Running multiple Titan instances on one machine backed by the same storage backend (distributed or local) requires that each of these instances has a unique configuration for storage.machine-id-appendix
. Otherwise, these instances might overwrite each other leading to data corruption. See Graph Configuration for more information.
Titan supports arbitrary objects as attribute values on properties. To use a custom class as data type in Titan, either register a custom serializer or ensure that the class has a no-argument constructor and implements the equals
method because Titan will verify that it can successfully de-/serialize objects of that class. Please read Datatype and Attribute Serializer Configuration for more information.
Since version 0.2.1, Titan supports global graph operations for all storage backends. However, beware that such operations will likely cause OutOfMemoryException
for large graphs since all vertices and/or edges can be loaded into memory by iterating through them. Use Faunus to implement global graph operations on large, distributed Titan graphs that do not fit into memory.
Edges should not be accessed outside the scope in which they were originally created or retrieved.
When the same vertex is concurrently removed in one transaction and modified in another, both transactions will successfully commit on eventually consistent storage backends and the vertex will still exist with only the modified properties or edges. This is referred to as a ghost vertex. It is possible to guard against ghost vertices on eventually consistent backends using key out-uniqueness but this is prohibitively expensive in most cases. A more scalable approach is to allow ghost vertices temporarily and clearing them out in regular time intervals, for instance using Titan tools.
Cassandra 1.2.x makes use of Snappy 1.4. Titan will not be able to connect to Cassandra if the server is running Java 1.7 and Cassandra 1.2.x (with Snappy 1.4). Be sure to remove the Snappy 1.4 jar in the cassandra/lib
directory and replace with a Snappy 1.5 jar version (available here).
When launching Titan with embedded Cassandra, the following warnings may be displayed:
958 [MutationStage:25] WARN org.apache.cassandra.db.Memtable - MemoryMeter uninitialized (jamm not specified as java agent); assuming liveRatio of 10.0. Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead
Cassandra uses a Java agent called MemoryMeter
which allows it to measure the actual memory use of an object, including JVM overhead. To use JAMM (Java Agent for Memory Measurements), the path to the JAMM jar must be specific in the Java javaagent parameter when launching the JVM (e.g. -javaagent:path/to/jamm.jar
). Rather than modifying titan.sh
and adding the javaagent parameter, I prefer to set the JAVA_OPTIONS
environment variable with the proper javaagent setting:
export JAVA_OPTIONS=-javaagent:$TITAN_HOME/lib/jamm-0.2.5.jar
By default, Titan uses the Astyanax library to connect to Cassandra clusters. On EC2 and Rackspace, it has been reported that Astyanax was unable to establish a connection to the cluster. In those cases, changing the backend to storage.backend=cassandrathrift
solved the problem.
When numerous clients are connecting to ElasticSearch, it is likely that an OutOfMemoryException
occurs. This is not due to a memory issue, but to the OS not allowing more threads to be spawned by the user (the user running ElasticSearch). To circumvent this issue, increase the number of allowed processes to the user running ElasticSearch. For example, increase the ulimit -u
from the default 1024 to 10024.