Neo4j and Graph database basics
Neo4j is a graph database
Neo4j is a graph database. One thing that’s unique about graph databases is the data model. Unlike other databases that might use tables or documents to represent data, graph databases use a graph.
Now when we say graph, we’re talking about the graph data structure, not necessarily data visualization.
Data visualization can be an important aspect of making sense of graph data, but it’s important to understand that the underlying data model in Neo4j is composed of nodes, the entities
in our data, and relationships
that connect nodes.
Neo4j browser is a tool for interacting with neo4j database.
Neo4j Browser is a query workbench and visualization interface for Neo4j. It provides an interactive environment for developers to work with the database.
We use the Cypher Query Language for interacting with Neo4j. You can think of Cypher as similar to Sequel, but designed for graphs instead of tables. Cypher’s all about pattern matching.
Neo4j Desktop
Neo4j desktop is quick and easy installation on macOS. Shown below is the UI for Neo4j Desktop. In Neo4j Desktop, you can create one or more projects each with its own database or set of databases.
In addition to databases, a project can contain graph apps. By default, you will see some graph apps like: the Neo4j ETL Tool, Neo4j Browser and Neo4j Bloom. It is also possible to download and install other graph apps for your dev env.
A project can also use additional libraries called plugins. By default, these plugins are available.
Plugins are extensions to the Cypher Query Language.
Creating a graph (or database)
You can either create a local or remote graph/database. We will create a local graph - meaning that the database will run on our local system. You must provide a password for the graph, which is something that the applications use to connect to the database after it is started. Here, you can also select which version of the database to create.
After you create the database, you either start or manage it. If you click manage you can see that the database is currently stopped. You can also view the location of the directory where this database is located by clicking “open folder”.
You will notice that sk_graphdb
is now the active database for our project. NOTE: There can only be one active database per project, so the default movie_db
will be stopped.
A started database uses specific ports for clients connecting to it. Most applications such as Python, will use the bolt port. Whereas, Neo4j browser uses the http port.
Neo4j Browser
You typically use the Neo4j browser to get started with your development with Neo4j, where you can enter Cypher statements to access the database. Here is the Neo4j browser, it is automatically connected to the currently running database for this project. This database has no nodes and relationships in it.
Graph database concepts
1. Example graph
We will use the example graph below to introduce the basic concepts of the property graph:
2. Nodes
Nodes are often used to represent entities. The simplest possible graph is a single node. Consider the graph below, consisting of a single node.
3. Labels
Labels are used to shape the domain by grouping nodes into sets where all nodes that have a certain label belongs to the same set.
For example, all nodes representing users could be labeled with the label :User
. With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.
Since labels can be added and removed during runtime, they can also be used to mark temporary states for nodes. A :Suspended
label could be used to denote bank accounts that are suspended, and a :Seasonal
label can denote vegetables that are currently in season.
A node can have zero to many labels.
In the example above, the nodes have the labels Person
and Movie
, which is one possible way of describing the data. But assume that we want to express different dimensions of the data. One way of doing that is to add more labels.
Below is an example showing the use of multiple labels:
4. Relationships
A relationship connects two nodes.
Relationships organize nodes into structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex, richly inter-connected structures.
Our example graph will make a lot more sense once we add relationships to it:
5. Relationship Types
A relationship must have exactly one relationship type.
Our example uses ACTED_IN
and DIRECTED
as relationship types. The roles property on the ACTED_IN
relationship has an array value with a single item in it.
Below is an ACTED_IN
relationship, with the Tom Hanks node as the source node and Forrest Gump as the target node.
We observe that the Tom Hanks node has an outgoing relationship, while the Forrest Gump node has an incoming relationship.
Relationships always have a direction. However, you only have to pay attention to the direction where it is useful. This means that there is no need to add duplicate relationships in the opposite direction unless it is needed in order to properly describe your use case.
6. Properties
Properties are name-value pairs that are used to add qualities to nodes and relationships.
In our example graphs, we have used the properties name
and born
on Person
nodes, title
and released
on Movie
nodes, and the property roles on the :ACTED_IN
relationship.
The value part of the property can hold different data types such as number, string and boolean.
7. Traversals and paths
A traversal is how you query a graph in order to find answers to questions.
For example:
- “What music do my friends like that I don’t yet own?”, or
- “What web services are affected if this power supply goes down?”
Traversing a graph means visiting nodes by following relationships according to some rules.
If we want to find out which movies Tom Hanks acted in according to our tiny example database, the traversal would start from the Tom Hanks node, follow any :ACTED_IN
relationships connected to the node, and end up with Forrest Gump as the result (see the dashed lines):
The path above has length one.
8. Schema
A schema in Neo4j refers to indexes and constraints.
Neo4j is often described as schema optional, meaning that it is not necessary to create indexes and constraints. You can create data — nodes, relationships and properties — without defining a schema up front. Indexes and constraints can be introduced when desired, in order to gain performance or modeling benefits.
9. Naming rules
Node labels, relationship types and properties are case sensitive meaning for example that the property name means something different than the property Name. It is recommended to follow the naming conventions described in the following table:
What is next?
Next, we will make use of the default Movie Database
inside the Primer project
and explore Cypher Query Language in the next post.