2

Livecount

Posted July 11th, 2011 in Projects and tagged , , , , , by Greg Bayer
                                
Livecount

Livecount is an implementation of real-time counters that leverages the performance of memcache and task queues on Google AppEngine.

Building a solid analytics platform is often a combination of real-time and batch processing. Batch processing, with a tool like Hadoop, is great for digging into large amounts of past data and asking questions that cannot be anticipated.  In contrast, when it is known ahead of time that certain aggregates will be required, the best solution is usually to count each event as it happens. Livecount makes it easier to address this second use-case.

I encourage you to read about our experience with Livecount at Pulse.

Data Persistance

Livecount initially stores all counts in memcache.  To minimize the risk of data loss, each time a count is updated Livecount creates a worker task to write that count from the memcache to the datastore in the background. If the count is ever evicted, it is reloaded from the datastore on the next read or write.

Performance

Since counter updates are usually written back to the datastore within seconds, the risk of loss is minimal. Write performance is excellent, since only the memcache must be updated before completing a request. Most reads can also be served from the memcache. Load on the datastore is further reduced by storing a dirty flag along with each memcached count. If more increment events come in than can be written back in real time, only one write is needed to update the datastore with the latest count. After a successful write, the dirty bit is cleared and the other backlogged write tasks for that counter are skipped.

Using Livecount

This simple solution has allowed Pulse’s backend to easily scale to counting hundreds of events per second, with minimal cost and complexity.  Livecount’s API requires nothing more than a simple string counter name.

from livecount.counter import load_and_increment_counter

load_and_increment_counter(name=url)

For more advanced use-cases, namespaces are supported for keep counters organized and easy to query.  Recently, we also added support for time period fields to help support hourly/daily/weekly/monthly aggregates. Here’s a more advanced example.

from livecount.counter import PeriodType, load_and_increment_counter

load_and_increment_counter(name=url, period=datetime.now(), \
period_types=[PeriodType.DAY, PeriodType.WEEK], namespace="starred", delta=1)

Livecount is open-source and easily deployable on Google AppEngine. Checkout the README on Github.

If you have something to count, give Livecount a try.  I’d love to hear your thoughts or suggestions for improvement!

  • http://twitter.com/makistv just me

    is it still feasible to use it now after GAE released its new pricing schema ?

  • http://GBayer.com Greg Bayer

    Yes. For us, it’s definitely still feasible to use it with the new pricing. Try it out and let me know if you find that it ends up costing more than another approach.