Scaling with the Kindle Fire

Posted December 1st, 2011 in Pulse by Greg Bayer

Earlier this week I wrote a guest post for the Google App Engine Blog on how Pulse has scaled-up our backend infrastructure to prepare for the recent Kindle Fire launch.

The Kindle Fire includes Pulse as one of the only preloaded apps and is projected to sell over five million units this quarter alone. This meant we had to prepare for nearly doubling our user-base in a very short period. We also needed to be ready for spikes in load due to press events and the holiday season

To learn more about our architecture on Google App Engine, how we dealt with the recent App Engine pricing changes, and how we prepared for an expected increase of 5M+ users, check out the original post. You can also find the original post on Pulse’s Engineering Blog and some additional analysis from on GigaOM.

VN:F [1.9.5_1105]
Rating: 5.0/5 (1 vote cast)

Livecount

Posted July 11th, 2011 in Projects by Greg Bayer

Livecount is an implementation of real-time counters that leverages the performance of memcache and task queues on Google AppEngine.

Building a solid analytics platform is often a combination of real-time and batch processing. Batch processing, with a tool like Hadoop, is great for digging into large amounts of past data and asking questions that cannot be anticipated.  In contrast, when it is known ahead of time that certain aggregates will be required, the best solution is usually to count each event as it happens. Livecount makes it easier to address this second use-case.

I encourage you to read about our experience with Livecount at Pulse.

Data Persistance

Livecount initially stores all counts in memcache.  To minimize the risk of data loss, each time a count is updated Livecount creates a worker task to write that count from the memcache to the datastore in the background. If the count is ever evicted, it is reloaded from the datastore on the next read or write.

Performance

Since counter updates are usually written back to the datastore within seconds, the risk of loss is minimal. Write performance is excellent, since only the memcache must be updated before completing a request. Most reads can also be served from the memcache. Load on the datastore is further reduced by storing a dirty flag along with each memcached count. If more increment events come in than can be written back in real time, only one write is needed to update the datastore with the latest count. After a successful write, the dirty bit is cleared and the other backlogged write tasks for that counter are skipped.

Using Livecount

This simple solution has allowed Pulse’s backend to easily scale to counting hundreds of events per second, with minimal cost and complexity.  Livecount’s API requires nothing more than a simple string counter name.

from livecount.counter import load_and_increment_counter

load_and_increment_counter(name=url)

For more advanced use-cases, namespaces are supported for keep counters organized and easy to query.  Recently, we also added support for time period fields to help support hourly/daily/weekly/monthly aggregates. Here’s a more advanced example.

from livecount.counter import PeriodType, load_and_increment_counter

load_and_increment_counter(name=url, period=datetime.now(), \
period_types=[PeriodType.DAY, PeriodType.WEEK], namespace="starred", delta=1)

Livecount is open-source and easily deployable on Google AppEngine. Checkout the README on Github.

If you have something to count, give Livecount a try.  I’d love to hear your thoughts or suggestions for improvement!

VN:F [1.9.5_1105]
Rating: 5.0/5 (1 vote cast)

Pulse News is Hiring!

Posted December 10th, 2010 in Pulse by Greg Bayer

A few months ago I mentioned that I left the government/research world (Sandia Labs) and joined an exciting new startup.   I’d like to share a bit more about my experience so far and announce that we are hiring!

Those who have worked at a large company and then moved to startup can probably relate to my experience.  First, without a doubt, the most motivating and fun part about working at Pulse is seeing the impact of my work. And I don’t mean just having someone say “Good Job” or receiving a strong performance review, I mean seeing thousands of people USE the results of your work and submit feedback about how it benefitted their lives.  At Pulse, this experience is magnified by the fact that we release new product features every two weeks, and not ever quarter, or every year!

In addition to really seeing the impact of your work at Pulse, another benefit I never knew I was missing in the corporate world is the feeling of being a part of a tight-nit team where everyone is giving 110% towards reaching the same goal.  This is something that is rarely found outside the startup world and should not be underestimated in its ability to improve productivity and genuine enjoyment of work. Imagine you and your coworkers are always committed to the same goals and never distracted by office politics or personal agendas.  How often does that happen in your current job?

Finally, at Pulse both the impact of your work and great team dynamics are highly leveraged.  In the corporate world great project outcomes or solid team building commonly result in a plus mark on the rating form and a promotion in three to five years.  How motivating is this for the most capable employees?  The impact of their work, which in the computer science world can easily be many times that of other employees, is muffled by ridged performance review and compensation structures.  In a startup, this muffling effect is removed.  Everyone accepts slightly lower fixed compensation up front in exchange for the chance to turn hard work into a big reward down the line.  This reward is directly tied to the impact your work has and how well the team works together.  Now everyone is properly incentivized to give their all, and giving your all for something you believe in is inherently rewarding!

Check out the official Pulse Blog and our hiring post there.  Here are some of the postions we are currently looking to fill.

Continue Reading »

VN:F [1.9.5_1105]
Rating: 5.0/5 (2 votes cast)

Map(Reduce) Analytics on Google AppEngine

Posted October 29th, 2010 in Big Data, Development by Greg Bayer

AppEngine AnalyticsGoogle AppEngine is a great tool for building simple web applications which are automatically scalable. All of the basic building blocks are readily available and accessible from both python and java. This includes a database, a caching layer, and support for background tasks.

What about the big data analytics and informatics that made Google famous? Does AppEngine help us there as well? The answer is yes; although with some serious limitations.

Continue Reading »

VN:F [1.9.5_1105]
Rating: 4.3/5 (3 votes cast)

Recently Joined Pulse!

Posted October 10th, 2010 in Pulse by Greg Bayer

Pulse by Alphonso LabsAfter a year and half of big data research for the government and quite a bit of fun with Hadoop, I’ve decided to join some good friends at an early-stage startup called Alphonso Labs.

Pulse is currently the #1 news reader on the iPad, iPhone, Andriod app stores.  I’ll be leading the development of our backend data platform and working with a great team.

As we start to build out Pulse’s backend, I’ll be continuing to experiment with Google App engine.  Stay tuned for more posts in that regard.

Pulse on the iPad

Continue Reading »

VN:F [1.9.5_1105]
Rating: 5.0/5 (1 vote cast)

Java on Google App Engine

Posted July 17th, 2010 in Development by Greg Bayer

A few thoughts from my first test of Google App Engine.

My goal was to put up a prototype java web app for pushing email alerts based on RSS content (more to come on the full idea). Unfortunately, it took much longer than I expected to get things going (longer than my web app protoype took to write) – leaving me feeling a bit disappointed.  On the up side,  the app has been running quite well for about a week now.

Continue Reading »

VN:F [1.9.5_1105]
Rating: 5.0/5 (1 vote cast)