Neo4j and Graph database basics

5 minute read

Neo4j is a graph database

Neo4j is a graph database. One thing that’s unique about graph databases is the data model. Unlike other databases that might use tables or documents to represent data, graph databases use a graph.

Now when we say graph, we’re talking about the graph data structure, not necessarily data visualization.

Data visualization can be an important aspect of making sense of graph data, but it’s important to understand that the underlying data model in Neo4j is composed of nodes, the entities in our data, and relationships that connect nodes.

Neo4j browser is a tool for interacting with neo4j database.

Neo4j Browser is a query workbench and visualization interface for Neo4j. It provides an interactive environment for developers to work with the database.

We use the Cypher Query Language for interacting with Neo4j. You can think of Cypher as similar to Sequel, but designed for graphs instead of tables. Cypher’s all about pattern matching.

Neo4j Desktop

Neo4j desktop is quick and easy installation on macOS. Shown below is the UI for Neo4j Desktop. In Neo4j Desktop, you can create one or more projects each with its own database or set of databases.

In addition to databases, a project can contain graph apps. By default, you will see some graph apps like: the Neo4j ETL Tool, Neo4j Browser and Neo4j Bloom. It is also possible to download and install other graph apps for your dev env.

A project can also use additional libraries called plugins. By default, these plugins are available.

Plugins are extensions to the Cypher Query Language.

Creating a graph (or database)

You can either create a local or remote graph/database. We will create a local graph - meaning that the database will run on our local system. You must provide a password for the graph, which is something that the applications use to connect to the database after it is started. Here, you can also select which version of the database to create.

After you create the database, you either start or manage it. If you click manage you can see that the database is currently stopped. You can also view the location of the directory where this database is located by clicking “open folder”.

You will notice that sk_graphdb is now the active database for our project. NOTE: There can only be one active database per project, so the default movie_db will be stopped.

A started database uses specific ports for clients connecting to it. Most applications such as Python, will use the bolt port. Whereas, Neo4j browser uses the http port.

Neo4j Browser

You typically use the Neo4j browser to get started with your development with Neo4j, where you can enter Cypher statements to access the database. Here is the Neo4j browser, it is automatically connected to the currently running database for this project. This database has no nodes and relationships in it.

Graph database concepts

1. Example graph

We will use the example graph below to introduce the basic concepts of the property graph:

2. Nodes

Nodes are often used to represent entities. The simplest possible graph is a single node. Consider the graph below, consisting of a single node.

3. Labels

Labels are used to shape the domain by grouping nodes into sets where all nodes that have a certain label belongs to the same set.

For example, all nodes representing users could be labeled with the label :User. With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.

Since labels can be added and removed during runtime, they can also be used to mark temporary states for nodes. A :Suspended label could be used to denote bank accounts that are suspended, and a :Seasonal label can denote vegetables that are currently in season.

A node can have zero to many labels.

In the example above, the nodes have the labels Person and Movie, which is one possible way of describing the data. But assume that we want to express different dimensions of the data. One way of doing that is to add more labels.

Below is an example showing the use of multiple labels:

4. Relationships

A relationship connects two nodes.

Relationships organize nodes into structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex, richly inter-connected structures.

Our example graph will make a lot more sense once we add relationships to it:

5. Relationship Types

A relationship must have exactly one relationship type.

Our example uses ACTED_IN and DIRECTED as relationship types. The roles property on the ACTED_IN relationship has an array value with a single item in it.

Below is an ACTED_IN relationship, with the Tom Hanks node as the source node and Forrest Gump as the target node.

We observe that the Tom Hanks node has an outgoing relationship, while the Forrest Gump node has an incoming relationship.

Relationships always have a direction. However, you only have to pay attention to the direction where it is useful. This means that there is no need to add duplicate relationships in the opposite direction unless it is needed in order to properly describe your use case.

6. Properties

Properties are name-value pairs that are used to add qualities to nodes and relationships.

In our example graphs, we have used the properties name and born on Person nodes, title and released on Movie nodes, and the property roles on the :ACTED_IN relationship.

The value part of the property can hold different data types such as number, string and boolean.

7. Traversals and paths

A traversal is how you query a graph in order to find answers to questions.

For example:

  • “What music do my friends like that I don’t yet own?”, or
  • “What web services are affected if this power supply goes down?”

Traversing a graph means visiting nodes by following relationships according to some rules.

If we want to find out which movies Tom Hanks acted in according to our tiny example database, the traversal would start from the Tom Hanks node, follow any :ACTED_IN relationships connected to the node, and end up with Forrest Gump as the result (see the dashed lines):

The path above has length one.

8. Schema

A schema in Neo4j refers to indexes and constraints.

Neo4j is often described as schema optional, meaning that it is not necessary to create indexes and constraints. You can create data — nodes, relationships and properties — without defining a schema up front. Indexes and constraints can be introduced when desired, in order to gain performance or modeling benefits.

9. Naming rules

Node labels, relationship types and properties are case sensitive meaning for example that the property name means something different than the property Name. It is recommended to follow the naming conventions described in the following table:

What is next?

Next, we will make use of the default Movie Database inside the Primer project and explore Cypher Query Language in the next post.