Building a solid analytics platform is often a combination of real-time and batch processing. Batch processing, with a tool like Hadoop, is great for digging into large amounts of past data and asking questions that cannot be anticipated. In contrast, when it is known ahead of time that certain aggregates will be required, the best solution is usually to count each event as it happens. Livecount makes it easier to address this second use-case.
Livecount initially stores all counts in memcache. To minimize the risk of data loss, each time a count is updated Livecount creates a worker task to write that count from the memcache to the datastore in the background. If the count is ever evicted, it is reloaded from the datastore on the next read or write.
Since counter updates are usually written back to the datastore within seconds, the risk of loss is minimal. Write performance is excellent, since only the memcache must be updated before completing a request. Most reads can also be served from the memcache. Load on the datastore is further reduced by storing a dirty flag along with each memcached count. If more increment events come in than can be written back in real time, only one write is needed to update the datastore with the latest count. After a successful write, the dirty bit is cleared and the other backlogged write tasks for that counter are skipped.
This simple solution has allowed Pulse’s backend to easily scale to counting hundreds of events per second, with minimal cost and complexity. Livecount’s API requires nothing more than a simple string counter name.
from livecount.counter import load_and_increment_counter load_and_increment_counter(name=url)
For more advanced use-cases, namespaces are supported for keep counters organized and easy to query. Recently, we also added support for time period fields to help support hourly/daily/weekly/monthly aggregates. Here’s a more advanced example.
from livecount.counter import PeriodType, load_and_increment_counter load_and_increment_counter(name=url, period=datetime.now(), \ period_types=[PeriodType.DAY, PeriodType.WEEK], namespace="starred", delta=1)
Livecount is open-source and easily deployable on Google AppEngine. Checkout the README on Github.
If you have something to count, give Livecount a try. I’d love to hear your thoughts or suggestions for improvement!