I have been doing a lot of work using Pachube recently. Pachube (pronounced “Patch-bay”) is an on-line database service designed to accept real time data from sensors.
I have a server which reads various wireless sensors and stores the readings in a local MySql database. A process on the server scans the database and uploads new data from a particular subset of sensors onto Pachube.
I used the JSON interface instead of the EEML one because it is just more straightforward; easier to learn. My code is written in Java so I started off by using JPachube but it didn’t seem to be flexible enough for me – one of the things I wanted to do was upload historic data from the database for example. Once I took the plunge of constructing my own messages it was easy.
Using the data
Whilst they are fine for slowly changing data, the standard set of graphs don’t really show fast changing data very well when displayed for longer periods than an hour. At one stage I had a memory leak caused by letting java.sql.Statements go out of scope before implicitly closing java.sql.ResultSets produced from them – I changed the code to explicitly close both and in the correct order.
(SQL is involved because I have added Pachube onto an existing system – data was originally just stored in a local MySql database but now is also uploaded from the database to Pachube — the Pachube API does not use SQL. Chances are that if you use Pachube then you will just upload to it and so not have anything to do with SQL).
But to check the that the problem had gone I was using it to monitor memory use to see if I still had a memory leak and I found the display a little confusing. It was only when I looked at the figures manually that I concluded that there was no problem.

Memory use over 12 hours on 24 hour plot. Looks like a serious leak

Last hour of same data on 1 hour plot. No memory leak now
Since my priority was checking for data leaks I have not really investigated this further. However I think Pachube is displaying a few datapoints for each hour in the 24 hour display, perhaps selected to be as close as possible to minute boundaries. My program has a memory use cycle time of slightly more than a second (it has a one second delay between each burst of activity), so only 59-and-a-bit of them will fit into 60 seconds, so maybe some sort of aliasing is happening.
The apps that I have tried are OK but the ones I have tried don’t seem to handle API keys very well – I didn’t want to make the data public because it includes things like data from PIR sensors in different rooms. I can log into the Pachube account and run an app, in which case the session cookie for Pachube gets used but I cannot pass the API key in the HTML as far as I can tell.
Uploading to Pachube
The actual data upload part of Pachube “just works”. Once you have a feed, you can specify datastream IDs and they will be created as needed. Many of the Pachube examples show the datastreams as numbers, but you can use text strings provided they don’t contain any whitespace.
Most of my original irritations with Pachube were solved by a helpful message from Usman at Pachube – they were to are to do with uploading multiple historic datapoints in a single datastream in one go, which I was unable to do until Usman directed me towards the correct part of the documentation.
Pachube seems to be missing tools to carry out proper statistical analysis of the data – probably not an oversight so much as a difference between the specific goals of my project compared to what Pachube is actually designed to do.
Maybe something like Pachube isn’t the right place to do this anyway — perhaps some other web-based service which could take datastreams from different sources and carry out transformations on them is needed.
Nevertheless despite my original misgivings I have been surprised at how useful the Pachube graphs have been over the last few days. Just being able to pull up the graphs from the web was really handy when it came to explaining the project to some other people, and the graphs have proved helpful in monitoring the correct functioning of the sensors – much easier to look at a line than a table of numbers.
But I am glad that I have the backing SQL database – it came for free since it was left over from a previous version of my data acquisition platform – since it is available to carry out complex database queries for the serious statistical analysis. I intend using R, which I am just learning now.
The examples in Pachube’s documentation on using Curl are good, I extensively use Curl for experimenting with Pachube. You can programmatically change most things, for example the following JSON will change the feed characteristics.
{
"version":"1.0.0",
"title":"Jason",
"description":"Test feed",
"private":"true",
"location":{
"name":"Chez Jason",
"lat":"51.501",
"lon":"-0.142",
"ele":"30.0",
"exposure":"indoor",
"domain":"physical",
"disposition":"fixed" }
}
Datastreams don’t need creating in advance – Pachube will create them when you write data to them. For example:
{
"version":"1.0.0",
"datastreams" : [
{ "unit": {
"symbol": "C",
"label": "celsius"
},
"id": "house_temp",
"at": "2011-08-07T10:36:38.742+01:00",
"current_value": "21.0" }
]
}
In my feed I am providing a timestamp and units but you needn’t do this. I actually provide the units only once for the datastream and they are stored until they are changed, so the next datapoint is:
A minimal set, which means set the value now is:
{
"version":"1.0.0",
"datastreams" : [
{ "id": "house_temp",
"current_value": "21.0"
}
]
}

Electric power use
You can upload more than one datastream at a time:
{
"version":"1.0.0",
"datastreams" : [
{ "id": "house_power",
"at": "2011-08-07T10:36:38.742+01:00",
"current_value": "513.0" },
{ "id": "house_temp",
"at": "2011-08-07T10:36:38.742+01:00",
"current_value": "21.0" }
]
}
Uploading multiple datapoints per datastream
You can upload many timestamped datapoints for a particular datastream at a time by a posting to a particular datastream with a POST request – for example https://v2/feeds/1235/datastreams/house_power/datapoints, rather than a PUT to the feed https://v2/feeds/1235 which you do for the single data points case shown in the previous piece of JSON, for example like this:
{
"datapoints" : [
{ "at": "2011-08-07T10:36:38.742+01:00",
"value": "513.0" },
{ "at": "2011-08-07T10:36:39.742+01:00",
"value": "511.0" }
]
}
There is a limit of 500 points per request, but more can be uploaded just by using multiple requests, and for this to work the datastream must have already been created otherwise the response is {"errors":"ActiveRecord::RecordNotFound","title":"Not found"}.
I have changed my code to always upload data in this way, even when there is only one datapoint. I have a process which wakes up every second to see if there is anything new in the local database to be uploaded. This means that the process never knows whether it will have one point or several for a particular datastream.
If there is more than one datapoint then it seems better to bundle them altogether into a single HTTPS request rather than issue several, and since the technique works just as well for one datapoint there is no need to have two separate pieces of Java.
{
"version":"1.0.0",
"datapoints" : [
{
"at": "2011-08-11T18:32:31.348+01:00",
"value": "122.0"
}
]
}
When my program starts it issues the command to create the datastream, which has no effect if it already exists but recreates it if it is missing.