Daggregator is a free-standing server for analyzing large sets of connected data over a REST API. Daggregator provides a set of aggregation functions (sum and count currently) for arbitrary key/value pairs.
To use daggregator, simply tell it what objects you want to investigate and how they're connected; each object can include any number of named numeric values and any number of connections to other targets.
Once you've defined the objects (nodes) in your graph aggregate data can
be queried for any node, for example the sum of the key foo
for any upstream
nodes.
Daggregator uses two objects, node
and flow
. Nodes represent
arbitrary sets of data. The data in nodes is aggregated on other
nodes by defining flows between them. Each node stores a set
of key/value pairs.
Flows are defined between a source and target node. Data defined on 'upstream' nodes is aggregated on 'downstream' nodes ('upstream' refers to source nodes)
Daggregator provides a RESTful API with the following endpoints:
Returns the data defined for node <id>
the list of source
nodes identifiers and the list of target node identifiers in the
following form:
// GET /node/foo
{
'node': {
'identifier': 'foo',
'data': {
'numeric_foo': 4,
'numeric_bar': 5,
'text_foo': 'four',
'text_bar': 'five'
},
'targets': [
'identifier_1',
'identifier_2'
],
'sources': [
'identifier_3',
'identifier_4'
]
}
}
Returns 404 if the node isn't defined.
Returns an aggregate value of <key>
using one of the functions
sum
or count
. count
returns the number
of source nodes for which <key>
is defined. Results for multiple keys
can be returnd by concatenating with +
(ie /node/foo/sum/bar+baz
).
JSON returned in the following format:
// GET /node/foo/sum/bar+baz
{
'node': {
'identifier': 'foo',
'aggregates': {
'numeric_bar': { 'SUM': 3.42 },
'numeric_baz': { 'SUM': 3.24 },
'text_foo': { 'SUM': null}
}
}
}
sum
returns null
if the key isn't defined for any sources (text
strings are considered null). count
returns 0 if the key isn't
defined.
Returns a hash of the number of occurrances of each unique for of key
in
the node's upstream graph. For numeric keys, consider using the bin_count
function.
// GET /node/foo/distribution/bar+baz
{
'node': {
'identifier': 'foo',
'aggregates': {
'bar': {
'lorem': 10,
'ipsum': 2,
'dolor': 35
},
'baz': {
'ipsum': 20,
'dolor': 2
}
}
}
}
Divides the range of key
into bins
equal-sized regions and
returns the number of upstream nodes whose value lies within each
region. Expects numeric values.
// GET /node/foo/distribution/bar+baz?bins=3
{
'node': {
'identifier': 'foo',
'aggregates': {
'bar': {
'[0,5)': 2,
'[5,10)': 5,
'[10,15]': 3
},
'baz': {
'[20,21.5)': 34,
'[21.5,23)': 3,
'[23,24.5]': 23
}
}
}
}
If no values are defined or all the values are the same, returns
null (ie 'bar': null
).
Creates/updates a node. Nodes are created with a user-defined <id>
value, which should be unique across all nodes. The id parameter will be
converted to a string, and can contain any character excluding :
, /
.
Expects JSON in the body defining a (possibly empty) set of data key/values in the following form:
// POST /node/foo
{
'node': {
'data': {
'numeric_foo': 1,
'numeric_bar': 2,
'text_foo': 'one'
}
}
}
If the node already exists, the attributes specified will be updated, and all other attributes will be left unmodified.
Sets a key's value. If the key is already defined, it updates the value, otherwise it adds the key.
Returns 404 if the node doesn't exist.
Creates a flow from node <source id>
to nodes <target1 id>
and
<target2 id>
.
Implicitly creates the nodes if they don't exist.
Opposite of flow_to
Removes a node's key. The responsd will be 200 regardless of the existence of the key.
Returns 404 if they node doesn't exist.
Removes the flow from node <source id>
to node <target id>
. Returns 200 regardless
of the existence of the flow.
Returns 404 if either of the nodes don't exist.
Removes a node. Implicitly removes all flows through the node.
Returns 404 if the node doesn't exist.
Nodes can be queried at the by GET-ting /node
with a list of parameters. All nodes
accept the following parameters:
-
limit=<n>
the maximum number of records to return -
offset=<n>
the offset to start at -
order=<key>:asc
order by key ascending -
order=<key>:desc
order by key descending -
order=[<key>:asc,<otherkey>:desc]
order by key in ascending order, and by otherkey in descending order if key has the same value -
sources=<id1>,<id2>
limit to nodes with flows originating from any member of the set (recursive via flows) -
sources=<id1>:<key1>,<id2>:<key2>
same as above, but restricts to flows with operations based on key1 or key2 -
sources=<id1>:<key1>:<agr1>,<id2>:-<key2>:<agr2>
same as above, but restricts to flows with operations from key1 -> agr1 etc -
targets=<id1>,<id2>
limit to nodes with flows terminating in any member of the set (recursive via flows) -
targets=<id1>:<key1>,<id2>:<key2>
same as above, but restricts to flows with operations targeting key1 or key2 -
targets=<id1>:<key1>:<agr1>,<id2>:-<key2>:<agr2>
same as above, but restricts to flows with operations from key1 -> agr1 etc
In addition, queries can specify constraints on the value of keys:
<key>=<n>
nodes for which the value of<key>
equals<n>
<key>.ne=<n>
nodes for which the value of<key>
is not equal to<n>
<key>.gt=<n>
nodes for which the value of<key>
is greater than<n>
<key>.lt=<n>
nodes for which the value of<key>
is less than<n>
These can be combined with the keywords target
and source
to reestrict the results
to nodes whose sources or targets meet certain criteria:
targets:<key>=<n>
nodes with flows terminating in targets for which the value of<key>
is<n>
sources:<key>.ne=<n>
nodes with flows originating from sources for which the value of<key>
is<n>
When specifying constraints on related nodes, the constraints are treated as AND operations,
so for example targets:foo=1&sources:bar=5
would return nodes with flows terminating in
targets with foo=1
and flows originating in sources with bar=5
.
Clone the repo, bundle install
. You'll need to install neo4j, and there's a rake task for it:
rake neo4j:install[community,1.8.M01]
rake neo4j:start
If you're using Heroku you can install the addon with
heroku addons:add neo4j --neo4j-version 1.8.M01