Commentary by Wes

Ramblings about programming, chess, languages, etc.

Using an ATT Beam (AirCard 340U Sierra Wireless) on a Raspberry Pi

| Comments

I spent quite a while tinkering with my new raspberry pi to get the USB modem working, so I figured I would publicly document to save someone else the time. Or myself, in case I manage to brick my pi.

First, some specifics:

  • Raspberry Pi model B
  • Raspbian 2014-01-07 release, running 3.10.25+ kernel
  • ATT Beam AKA Netgear AirCard 340U Sierra Wireless

Cypher 2.0 OPTIONAL MATCH

| Comments

I love the new OPTIONAL MATCH feature from Cypher, now available in 2.0-RC1. It added the ability to filter optional patterns without affecting the main query. Here’s a quick illustration of how it works.

Here’s the old way of doing optional relationships.

1
2
3
START a=node(*)
MATCH (a)-[r?]-(b)
RETURN a, r, b; 

Relationship Strength Decay in Neo4j’s Cypher

| Comments

Someone was talking about how it would be cool to have some sort of way to decay strength of relationships over time in Cypher. I thought that was a cool idea, and it occurred to me that we might already have the functionality in Cypher. Shamefully enough, I have forgotten most of the trivial math I learned related to this stuff, so I looked up how decay works on wikipedia.

AnormCypher 0.4.4 released!

| Comments

AnormCypher is a Cypher-oriented Scala library for Neo4j Server (REST). The goal is to provide a great API for calling arbitrary Cypher and parsing out results, with an API inspired by Anorm from the Play! Framework.

0.4.4 changes:

  • Bump integration tests to Neo4j 2.0.0-M06
  • Minor tweak to Neo4jREST.setServer, so you don’t have to specify endpoint if you specify user/pass.

Roadmap:

  • 0.5.0: Scala 2.10 Future/Stream non-blocking support
  • 0.6.0: Cake pattern with pluggable interfaces for different protocols, REST batch/transactional API

Pragmatic Cypher Optimization (2.0 M06)

| Comments

I’ve seen a few stack overflow and google group questions about queries that are slow, and I think there are some things that need to be said regarding Cypher optimization. These techniques are a few ways of improving your queries that aren’t necessarily intuitive. Before reading this, you should have an understanding of WITH (see my other post: The Mythical With).

First, let me throw out a nice disclaimer that these rules of thumb I’ve discovered are by no means definitively best practices, and you should measure your own results with cold and warm caches, running queries 3+ times to see realistic results with a warm cache.

Second, let me throw out another disclaimer, that Cypher is improving rapidly, and that these rules of thumb may only be valid for a few milestone releases. I’ll try to make future updates, but I’m sure there’s always danger of becoming out of date.

Ok, let’s get to it.

1. Use parameters whenever you can.

This is important to performance because Cypher caches execution plans and will be faster the second time you run the same query (even if it has different parameters). I’ll mention it first, even though it won’t be the thing to get you the best performance gains.

More info from the Neo4j Manual.

2. Avoid cartesian products when they aren’t required to get the data you need.

This is the single worst mistake I’ve seen, and it seems somewhat more common with the 2.0 style, although it is certainly possible to do with START as well.

This is a cartesian product:

1
2
MATCH (a), (b)
RETURN *

If you find yourself writing the code above, please make sure you are doing significant filtering on (a) and (b) in a WHERE clause. Ideally, it will be an index lookup–anything else will create a cartesian product explosion that will be hard to track down.

Here’s another cartesian product slightly harder to see:

1
2
3
4
5
MATCH (u:User)-[:purchased]->(i:Item)
WHERE ...
WITH u, i
MATCH (foo)-[:related]->(bar) // this pattern is not connected to u or i, and it will create a cartesian product between (u,i) and (foo,bar)
RETURN *

3. Avoid patterns in the WHERE clause.

The only valid reason to use a pattern in a WHERE clause is to check for negative patterns with NOT. Anything else means you should put the pattern in MATCH to enjoy a 20-30% performance increase.

1
2
3
4
MATCH (u:User)-[:viewed]->(i:Item), (u)-[:purchased]->(other:Item)
WHERE u.id=123123
  AND (i)-[:related_to]->(other)
RETURN i

could be optimized slightly with:

1
2
3
4
5
MATCH (u:User)-[:viewed]->(i:Item), (u)-[:purchased]->(other:Item)
WHERE u.id=123123
WITH i, other
MATCH (i)-[:related_to]->(other)
RETURN i

Also, remember that for each intermediate result you are scanning, the WHERE clause needs to be checked. Filter results as early as possible–the fewer results you have in each query part, the better.

4. Start your MATCH patterns at the lowest cardinality identifier you can (ideally 1), and expand outward.

This means that if you’ve got :Users related to :Games and you know :Games has 20M records and :Users has 1M records, start with the :User label, like so:

1
2
3
4
5
6
7
8
MATCH (u:User)-[:played]->(g) // Cypher will automatically use the specified label starting point
RETURN *

OR

MATCH (u:User)-[:played]->(g:Game)
USING SCAN u:User // if you specify both labels, USING SCAN directs cypher to use the right one
RETURN *

This is often most applicable to aggregation when scanning large amounts of nodes. However, it’s not always obvious that you should start at a single node (or few nodes) and expand your patterns as you grow the query, which is the same concept.

5. Separate your MATCH patterns, doing the minimal amount of expansion for each pattern. Add 1 new identifier per pattern.

This is the flimsiest of my rules. Measure, measure, measure to make sure it is true in your case. An example is taking this query:

1
2
3
4
MATCH (u:User)-[:played]->(g:Game)-[:contains]->(p:Position)-[:next]-(nextPos:Position)
WHERE u.id = {uid}
  AND nextPos.val = {val}
RETURN nextPos

And breaking it up into something like this:

1
2
3
4
5
6
MATCH (u:User)-[:played]->(g:Game)
WHERE u.id = {uid}
WITH g
MATCH (g)-[:contains]->(p:Position)-[:next]->(nextPos:Position)
WHERE nextPos.val = {val}
RETURN nextPos

Here, we broke up the complex MATCH/WHERE into two parts. First, we find the user and related games. Then, we find the nextPos (this is the only value we care about, so it makes sense to do it in one match, instead of passing along intermediate values that don’t constrain the pattern).

6. If you can’t get Cypher to be fast enough…

And you have already hit us up on the google group or stack overflow (to make sure you didn’t miss something)…
Just wait a few more months (they’re working on that). If you can’t wait a few more months, make yourself an unmanaged extension using the lower-level Java-API.

Also, I charge $1 per microsecond of query improvement (3 run average on warm cache) on your sample data. Feel free to hire me at @wefreema. ;)

If you want to see more Cypher stuff, check out my Cypher page.

AnormCypher 0.4.3 released!

| Comments

Thanks to mvallerie, AnormCypher 0.4.3 now uses the official typesafe play-json repository, to go with the release of play 2.2.

AnormCypher is a Cypher-oriented Scala library for Neo4j Server (REST). The goal is to provide a great API for calling arbitrary Cypher and parsing out results, with an API inspired by Anorm from the Play! Framework.

0.4.3 changes:

  • Switch to typesafe package for play-json 2.2
  • Switch to typesafe repository instead of mandubian (play-json)
  • Update unit tests to handle new M05 syntax for Cypher
  • Bump integration tests to Neo4j 2.0.0-M05

Roadmap:

  • 0.5.0: Scala 2.10 Future/Stream non-blocking support
  • 0.6.0: Cake pattern with pluggable interfaces for different protocols, REST batch/transactional API

AnormCypher 0.4.2 released!

| Comments

Thanks to okumin, AnormCypher 0.4.2 now supports non-ascii characters with the proper UTF-8 settings.

AnormCypher is a Cypher-oriented Scala library for Neo4j Server (REST). The goal is to provide a great API for calling arbitrary Cypher and parsing out results, with an API inspired by Anorm from the Play! Framework.

0.4.2 changes:

  • Allow non-ascii characters by default.
  • Bump to scala 2.10.2.
  • Bump integration tests to Neo4j 2.0.0-M03

Roadmap:

  • 0.5.0: Scala 2.10 Future/Stream non-blocking support
  • 0.6.0: Cake pattern with pluggable interfaces for different protocols, REST batch/transactional API

AnormCypher 0.4.1 released!

| Comments

Thanks to Pieter, AnormCypher 0.4.1 supports versions earlier than Neo4j 1.9 (I didn’t realize this was an issue).

AnormCypher is a Cypher-oriented Scala library for Neo4j Server (REST). The goal is to provide a great API for calling arbitrary Cypher and parsing out results, with an API inspired by Anorm from the Play! Framework.

0.4.1 changes:

  • Allow configurable Cypher endpoint.

Roadmap:

  • 0.5.0: Scala 2.10 Future/Stream non-blocking support
  • 0.6.0: Cake pattern with pluggable interfaces for different protocols, REST batch/transactional API

Cypher: It doesn’t all start with the START (in Neo4j 2.0!)

| Comments

So, apparently, the Neo Technology guys read one of my last blog posts titled “It all starts with the START” and wanted to make a liar out of me. Actually, I’m quite certain it had nothing at all to do with that–they are just wanting to improve Cypher to make it the best graph query language out there. But yes, the START clause is now optional. “How do I tell Neo4j where to start my traversals”, you might ask. Well, in the long run, you won’t need to anymore. Neo4j will keep index and node/rel statistics and know which index to use, and know which start points to use to make the match and where the most efficient query based on its cost optimization. It’s not quite there yet, so for a while we’ll probably want to make generous use of “index hints”, but I love the direction this is going–feels just like the good old SQL.

I have a feeling this will be a long post, so bear with me. With a fresh unzip of the tar file… let’s get dirty. I’ll just be running in shell, since some of the index stuff can’t be saved in console yet. This is a beautiful thing:

1
2
3
4
5
6
7
8
neo4j-sh (0)$ match n return n;
+-----------+
| n         |
+-----------+
| Node[0]{} |
+-----------+
1 row
14 ms