The Obligatory “I’m Trying Neo4j” Post (Part 1)

I remember reading someone (more-or-less-jokingly) suggesting that all blog software should come with a default first post entitled “I’m Trying Linux”.  That was a few years ago though, and while open-source operating systems are still a topic of interest, the big buzz amongst cutting-edge nerds in the last couple of years has of course been around NoSQL databases.

There are numerous relatively straightforward document/ key-value store options – which to simplify grossly, can be said to do the same sort of thing as relational databases, only with greater scalability, less impedance with object-oriented programming models and typically less strict guarantees on ACID properties.  But then there is also the one area which seems to really grab the attention of the old-school database crowd, because it offers not just to solve some of the more obvious issues with the relational model when applied to the web and OOP, but also to open up new possibilities in terms of data modelling.  That is to say, graph databases.

The clear leader in terms of mentions and mind-share in the graph space at the moment seems to be Neo4j.  Just in the last couple of days, I’ve had a couple of colleagues express an interest and noticed a post on the topic on one of the sites I read regularly, so now seemed like as good a time as any to finally install the free community edition and run it through its paces.

Just to be clear, I’m not going to provide an exhaustive survey of Neo4j or give instructions to get up and running – there’s a ton of stuff out there for that, a lot of which can be found simply by going to the Neo4j website.  Rather, I’m going to try and comment on what I think is interesting and/or useful about it as a tool. As per the title, I consider this very much the first part of many, so watch this space if you want to see more.

Suffice it to say, installation is extremely quick and easy and once running, you can connect by simply pointing your browser at localhost:7474, which gets you to the following:

Neo4j-Screenshot-1

See that tiny white bar near the top of the screen with the “$” sign to the left?  That’s actually a query window, and through it you can interface directly with Neo4j using their own declarative query language, Cypher.

The install comes with a set of training presentations (which you can also see the links to in the above screenshot), but having run through the basic examples I decided to do my own thing rather than dive into their “Movie Graph” code.  Just a matter of different learning styles I guess, but I find that pushing myself to think up something to program can often (not always) help to get a firmer grasp of a language’s core features faster.

I was watching the excellent “Margin Call”, which is set in a large Wall Street firm at the beginning of the 2008 financial crisis, but focused on a handful of key employees as information about the coming crisis makes its way up the organisational hierarchy.  Employee networks?  Political alliances?  That sounds basically like graph-node stuff, the kind of thing that you’d end up doing as some kind of bastardised adjacency list structure in any standard relational system – yep, that’ll do.

So first of all let’s create some employees, with a Cypher clause called (unsurprisingly) “CREATE”:

CREATE (Seth : Employee {name : ‘Seth Bregman’, title : ‘Junior Risk Analyst’}),
(Peter : Employee {name : ‘Peter Sullivan’, title : ‘Senior Risk Analyst’}),
(Eric : Employee {name : ‘Eric Dale’, title : ‘Head of Risk Management’}),
(Will : Employee {name : ‘Will Emerson’, title : ‘Head of Trading’}),
(Sam : Employee {name : ‘Sam Rogers’, title : ‘Head of Sales and Trading’}),
(Jared : Employee {name : ‘Jared Cohen’, title : ‘Head of Capital Markets’}),
(Sarah : Employee {name : ‘Sarah Robertson’, title : ‘Chief Risk Management Officer’}),
(John : Employee {name : ‘John Tuld’, title : ‘Chief Executive Officer’}),
(Seth)-[:REPORTS_TO]->(Eric),
(Peter)-[:REPORTS_TO]->(Eric),
(Eric)-[:REPORTS_TO]->(Will),
(Will)-[:REPORTS_TO]->(Sam),
(Sam)-[:REPORTS_TO]->(Jared),
(Jared)-[:REPORTS_TO]->(John),
(Sarah)-[:REPORTS_TO]->(John);

With some combination of intuition and RTFM, it should be pretty clear what’s happening there – we’re creating a node variable for each employee, with some key-value pairs to set up properties (name, title), and then adding a “reports to” relationship where appropriate.  Note at this point that relationships can also have properties and be assigned to variables using similar syntax as for nodes, although I haven’t taken advantage of it at this point.  Now to fire it off:

Neo4j-Screenshot-2

So okay – that’s definitely done something – but I thought Neo4j was supposed to give me a visualisation of my graph? Looks like I need to use the MATCH clause to find nodes (and note the use of RETURN – we have to do something with what the MATCH gives us, and in this case we want to get data back):

MATCH (ee:Employee)
RETURN ee;

Neo4j-Screenshot-3

Ooookay…. that’s a bit more like it. The nodes kind of bounce around until you drag them to where you want them on the screen and clicking on each one brings up a property list.  You can see that there’s also a set of “Person” nodes defined in the legend, which exist in the database because of some previous training code I’ve run.  If I’d run “MATCH (ee) RETURN ee;” then they would have shown up as a separate network, but because I specified that I want nodes of type “Employee”, I only get the relevant entries.

Basically that’s pretty cool, although I suspect bringing back a much larger network would start to make it less useful.  In fact, that seems like a good prompt to talk about filtering the graph, which can be done by using (at least) a couple of different techniques around the MATCH clause. Say we want to see employees who report to the CEO only, we can do it in a kind of SQL style with a WHERE clause:

MATCH (LineReports:Employee)-[:REPORTS_TO]->(boss:Employee)
WHERE boss.name = ‘John Tuld’
RETURN LineReports;

or in a way that feels a bit truer to the pattern-matching aspect of Cypher, like so:

MATCH (LineReports:Employee)-[:REPORTS_TO]->(boss:Employee{name:’John Tuld’})
RETURN LineReports;

This gives us the same kind of visualisation (but thinned out) and in all cases there seems to be the option to view as a data table and/or export as JSON:

Neo4j-Screenshot-4

Neo4j-Screenshot-5-edited

Trust me when I say I’ve only scratched the surface of what Cypher can do there, even in terms of what I’ve managed to experiment with in the last 48 hours.  However, this post seems to be getting quite long enough already, and I hope you can already get the sense that I do, that this is potentially quite a nice language to work in.  It’s got a kind of cleanness to it, and a kind of functional-meets-imperative-meets-query feel that reminds me somewhat of LINQ (especially in the context of LINQPad). I’m definitely keen to keep trying out different features and to see how well it deals with more complex queries and a slightly greater volume of data.

One thing to notice from a SQL Server point of view (my home turf) is that there does seem to be a similar concept of cached execution plans, with the similarities extending as far as the recommended use of parameterised queries to maximise plan reuse – and it looks like there’s also a command-line profiler tool available to wring the best performance out of your queries.  More details of some of that here.  So as with a lot else in Neo4j, I would say there’s enough familiar hooks to get you started, and enough new functionality and potential to keep things interesting.

Overall – I would definitely recommend taking a look at what Neo4j can offer.  Getting a grasp of the basics doesn’t require a massive investment of time and the tools are pretty decent, although doubtless they will mature with time, and there’s already a good range of client libraries/ drivers available.  As and when I get a chance, I’ll probably take a look at some of these and also at the crucial question of how well it handles fully automated data imports from other sources.

4 thoughts on “The Obligatory “I’m Trying Neo4j” Post (Part 1)

  1. Pingback: Graph Data Modelling (“Obligatory Neo4j” Part 2) | alex.d.garland

  2. Pingback: My start with Neo4J and what you should consider when starting with it | Mathias Hauser

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s