Scaling with the Kindle Fire

Posted December 1st, 2011 in Pulse by Greg Bayer

Earlier this week I wrote a guest post for the Google App Engine Blog on how Pulse has scaled-up our backend infrastructure to prepare for the recent Kindle Fire launch.

The Kindle Fire includes Pulse as one of the only preloaded apps and is projected to sell over five million units this quarter alone. This meant we had to prepare for nearly doubling our user-base in a very short period. We also needed to be ready for spikes in load due to press events and the holiday season

To learn more about our architecture on Google App Engine, how we dealt with the recent App Engine pricing changes, and how we prepared for an expected increase of 5M+ users, check out the original post. You can also find the original post on Pulse’s Engineering Blog and some additional analysis from on GigaOM.

VN:F [1.9.5_1105]
Rating: 5.0/5 (1 vote cast)

Livecount

Posted July 11th, 2011 in Projects by Greg Bayer

Livecount is an implementation of real-time counters that leverages the performance of memcache and task queues on Google AppEngine.

Building a solid analytics platform is often a combination of real-time and batch processing. Batch processing, with a tool like Hadoop, is great for digging into large amounts of past data and asking questions that cannot be anticipated.  In contrast, when it is known ahead of time that certain aggregates will be required, the best solution is usually to count each event as it happens. Livecount makes it easier to address this second use-case.

I encourage you to read about our experience with Livecount at Pulse.

Data Persistance

Livecount initially stores all counts in memcache.  To minimize the risk of data loss, each time a count is updated Livecount creates a worker task to write that count from the memcache to the datastore in the background. If the count is ever evicted, it is reloaded from the datastore on the next read or write.

Performance

Since counter updates are usually written back to the datastore within seconds, the risk of loss is minimal. Write performance is excellent, since only the memcache must be updated before completing a request. Most reads can also be served from the memcache. Load on the datastore is further reduced by storing a dirty flag along with each memcached count. If more increment events come in than can be written back in real time, only one write is needed to update the datastore with the latest count. After a successful write, the dirty bit is cleared and the other backlogged write tasks for that counter are skipped.

Using Livecount

This simple solution has allowed Pulse’s backend to easily scale to counting hundreds of events per second, with minimal cost and complexity.  Livecount’s API requires nothing more than a simple string counter name.

from livecount.counter import load_and_increment_counter

load_and_increment_counter(name=url)

For more advanced use-cases, namespaces are supported for keep counters organized and easy to query.  Recently, we also added support for time period fields to help support hourly/daily/weekly/monthly aggregates. Here’s a more advanced example.

from livecount.counter import PeriodType, load_and_increment_counter

load_and_increment_counter(name=url, period=datetime.now(), \
period_types=[PeriodType.DAY, PeriodType.WEEK], namespace="starred", delta=1)

Livecount is open-source and easily deployable on Google AppEngine. Checkout the README on Github.

If you have something to count, give Livecount a try.  I’d love to hear your thoughts or suggestions for improvement!

VN:F [1.9.5_1105]
Rating: 5.0/5 (1 vote cast)

How To Use a Commuter Check Card to Purchase a Caltrain Monthly Pass

Posted June 27th, 2011 in Observations by Greg Bayer

Warning

Before I start, let me recommend that you don’t try this. The potential savings you gain from using pre-tax Commuter Check cards likely won’t be worth the pain of actually trying to buy something with them. Return the cards to your employer and ask them to enroll in another option for funding your commute costs pre-tax!

Update: The Autoload program via Clipper works great. Instead of buying a pass in person with a commuter check card, you tag on/off once at the beginning of the month to load a new pass.

Goal Prototyped Below

Use two Commuter Check cards issued by an employer (each containing $100) to purchase a zone 1-3 monthly Caltrain pass on a Clipper card (for $179).

What Not To Do (Because It Doesn’t Work)

This section is intended to lower your expectations to the appropriate level. Almost all advice I have received on this topic has been either been out of date or blatantly incorrect, resulting in more wasted time than I ever thought possible. Please take my own advice with a grain of salt, as anything related to Caltrain or Commuter Cards is likely to change in unpredictable ways in the near future.

Walgreens

Since monthly Caltrain passes are usually purchased at Walgreens, this is a common first stop when trying to achieve the goal listed above. Unfortunately, although Commuter Check cards are labeled as credit card compatible, they dont work in the standard machines and cannot be accepted by Walgreens. This fact does not stop most Caltrain/Clipper/Commuter Check employees and random people on the street from suggesting this option every time you ask.

Online / Over the Phone

For some reason, probably related to the same reason Walgreens can’t accept them, Clipper’s online system does not accept Commuter Check cards. Again, this fact does not stop Clipper/Commuter Check call center employees from suggesting this option every time you ask. The Clipper call center will even try to process your Commuter Check card and tell you there is something wrong with it. This will result in you sepending another 20 minutes on hold with Commuter Check customer service just to be told that the card is fine and that you should purchase your monthly pass online (see beginning of this section).

In Person

Explaining all of this to a Clipper call center employee will sometimes result in them telling you that you can only use your Commuter Check cards at a Caltrain booth with a clerk (real person). In my case, they strongly suggested I go to the 4th/King Caltrain station before 7pm. While this advice is getting closer to the right answer, if you actually try going to 4th/King and asking around for a clerk or a booth, you’ll eventually discover that all of the Caltrain clerks in the entire system were laid off a few months ago. The remaining employees at the station fall back to the commonly issued advice above (see Walgreens & Online / Over the Phone).

What Does Work (Sort Of)

I’d like to credit the clerk at the Walgreens across from the 4th/King Caltrain station with the first halfway useful advice. Apparently, the Transit Store at the Powell street BART station accepts Commuter Check cards in exchange for Caltrain monthly passes!

BART Transit Store

Amazingly, the magazine stand style booths at some of the major BART stations along market do take Commuter Check cards. Unfortunately, some open late / close early and asking a BART employee for their hours results in very inaccurate information. Also, for some reason the Civic Center booth (where I was advised to go because it opens earlier than most others), does not take Commuter Check cards (anymore). Where can you use them? Based on my experience, I know for sure that you can use them at the Embarcadero BART station Transit Store booth and should be able to use them at the Powell street station if you can get there when they are open.

Bad Cards

When I finally reached a person with the ability to accept Commuter Check cards, I found out that one of my two cards was “malfunctioning.” According to the clerk, this happens all the time. I happily paid the remainder of my monthly pass with my own credit card in order to end the whole painful ordeal! I returned the bad card to my company and recommended that they no longer issue Commuter Check cards (see Warning). In any case, I won’t be using them again.

VN:F [1.9.5_1105]
Rating: 5.0/5 (1 vote cast)

Pulse Wins Apple Design Award and Raises $9 Million Series A

Posted June 16th, 2011 in Pulse by Greg Bayer

I’m very excited to share that Pulse has announced it’s series A funding round! All of us are still fired up about last week’s Apple Design Award at WWDC and our recent 4 million user milestone, not to mention that today is our co-founder Ankit’s birthday. Thanks to the team for their tireless work and to everyone who has helped us get here!

Check out some of today’s press:

Pulse Blog – Announcing Our Series A Financing
TechCrunch – 4 Million Users Strong And Apple Design Award In Hand, Pulse Grabs $9 Million Series A
WSJ – Pulse Taps $9M To Win Battle For Mobile-News Consumers
Forbes – News Reader Pulse Raises $9 Million
Mashable – Pulse Passes 4 Million Users, Raises $9 Million for Visual News Reader

VN:F [1.9.5_1105]
Rating: 5.0/5 (2 votes cast)

Working Hard With No Regrets

Posted June 2nd, 2011 in Observations by Greg Bayer

Working for a startup usually means putting in more hours than others. Recently, I spent two days on less than 3 hours of sleep in order to push out our new Pulse.me release. This doesn’t seem strange to me and didn’t make me unhappy. In fact, it was one of the most exciting and fun things I’ve done in a while. However, after mentioning it to some friends, I realized not everyone understands why it can be good to spend so much time “working” to build something you believe in.

Upon hearing about my sleep deprived state, my friend sent me a link to the top 5 regrets people make on their deathbed along with the comment “you might need this.”  I appreciated the link and enjoyed the reminder to live life to the fullest, especially with regards to keeping in touch with friends and loved ones. I also realized that my friend didn’t understand that for me the long hours I put in are all about fulfilling my dreams of creating new technology and impacting the world in a positive way. According the article, not chasing after dreams is people’s #1 regret.

Of course there is an opportunity cost to time spent on any endeavor and this inevitably contributes to spending less time with friends and loved ones (regret #4). I believe maintaining a healthy balance between the two is critical. Simply “working less” (regret #2) would not make me happier. Chasing after dreams is an essential part of my life. The feeling of fulfillment I get from doing so makes me a much happier / more content person, and this in turn positively affects my relationships.

However, sometimes I do get caught up in chasing my dreams and forget to make time for friends and family. Just like realizing dreams, successful relationships are built on quality time spent together. I always appreciate being reminded to dedicate more time to this essential part of life, as I was today. I’d love to hear your thoughts or personal experiences on achieving the right balance.

Disclaimer: This post was written in a sleep-deprived state.

VN:F [1.9.5_1105]
Rating: 4.5/5 (15 votes cast)

New Eng Blog / Using Data Analysis to Discover Top Stories

Posted May 26th, 2011 in Big Data, Development, Pulse by Greg Bayer

In addition to the regular Pulse Blog where we regularly share updates about our latest features and new content, Pulse now has an Engineering Blog!  The goal is to share some of the exciting engineering work that goes into bringing users the Pulse experience they’ve come to expect. To kick things off I added a post about Using Data Analysis to Discover Top Stories.  In the post I share a bit about how we use AWS to collect and analyse our data, along with how we serve up the feeds we build via AppEngine.  Check it out!

VN:F [1.9.5_1105]
Rating: 5.0/5 (1 vote cast)

Moving Files from one Git Repository to Another, Preserving History

Posted May 17th, 2011 in Development by Greg Bayer

If you use multiple git repositories, it’s only a matter of time until you’ll want to refactor some files from one project to another.  Today at Pulse we reached the point where it was time to split up a very large repository that was starting to be used for too many different sub-projects.

After reading some suggested approaches, I spent more time than I would have liked fighting with Git to actually make it happen. In the hopes of helping someone else avoid the same trouble, here’s the solution that ended up working best. The solution is primarily based on ebneter’s excellent question on Stack Overflow.

Another solution is Linus Torvald’s “The coolest merge, EVER!“ Unfortunately, his approach seems to require more manual fiddling than I would like and results in a repository with two roots. I don’t completely understand the implications of this, so I opted for something more like a standard merge.

Goal:

  • Move directory 1 from Git repository A to Git repository B.

Constraints:

  • Git repository A contains other directories that we don’t want to move.
  • We’d like to perserve the Git commit history for the directory we are moving.

Get files ready for the move:

Make a copy of repository A so you can mess with it without worrying about mistakes too much.  It’s also a good idea to delete the link to the original repository to avoid accidentally making any remote changes (line 3).  Line 4 is the critical step here.  It goes through your history and files, removing anything that is not in directory 1.  The result is the contents of directory 1 spewed out into to the base of repository A.  You probably want to import these files into repository B within a directory, so move them into one now (lines 5/6). Commit your changes and we’re ready to merge these files into the new repository.

git clone <git repository A url>
cd <git repository A directory>
git remote rm origin
git filter-branch --subdirectory-filter <directory 1> -- --all
mkdir <directory 1>
mv * <directory 1>
git add .
git commit

Merge files into new repository:

Make a copy of repository B if you don’t have one already.  On line 3, you’ll create a remote connection to repository A as a branch in repository B.  Then simply pull from this branch (containing only the directory you want to move) into repository B.  The pull copies both files and history.  Note: You can use a merge instead of a pull, but pull worked better for me. Finally, you probably want to clean up a bit by removing the remote connection to repository A. Commit and you’re all set.

git clone <git repository B url>
cd <git repository B directory>
git remote add repo-A-branch <git repository A directory>
git pull repo-A-branch master
git remote rm repo-A-branch

Update: Removed final commit thanks to Von’s comment.

VN:F [1.9.5_1105]
Rating: 4.1/5 (11 votes cast)

Pulse News is Hiring!

Posted December 10th, 2010 in Pulse by Greg Bayer

A few months ago I mentioned that I left the government/research world (Sandia Labs) and joined an exciting new startup.   I’d like to share a bit more about my experience so far and announce that we are hiring!

Those who have worked at a large company and then moved to startup can probably relate to my experience.  First, without a doubt, the most motivating and fun part about working at Pulse is seeing the impact of my work. And I don’t mean just having someone say “Good Job” or receiving a strong performance review, I mean seeing thousands of people USE the results of your work and submit feedback about how it benefitted their lives.  At Pulse, this experience is magnified by the fact that we release new product features every two weeks, and not ever quarter, or every year!

In addition to really seeing the impact of your work at Pulse, another benefit I never knew I was missing in the corporate world is the feeling of being a part of a tight-nit team where everyone is giving 110% towards reaching the same goal.  This is something that is rarely found outside the startup world and should not be underestimated in its ability to improve productivity and genuine enjoyment of work. Imagine you and your coworkers are always committed to the same goals and never distracted by office politics or personal agendas.  How often does that happen in your current job?

Finally, at Pulse both the impact of your work and great team dynamics are highly leveraged.  In the corporate world great project outcomes or solid team building commonly result in a plus mark on the rating form and a promotion in three to five years.  How motivating is this for the most capable employees?  The impact of their work, which in the computer science world can easily be many times that of other employees, is muffled by ridged performance review and compensation structures.  In a startup, this muffling effect is removed.  Everyone accepts slightly lower fixed compensation up front in exchange for the chance to turn hard work into a big reward down the line.  This reward is directly tied to the impact your work has and how well the team works together.  Now everyone is properly incentivized to give their all, and giving your all for something you believe in is inherently rewarding!

Check out the official Pulse Blog and our hiring post there.  Here are some of the postions we are currently looking to fill.

Continue Reading »

VN:F [1.9.5_1105]
Rating: 5.0/5 (2 votes cast)

Map(Reduce) Analytics on Google AppEngine

Posted October 29th, 2010 in Big Data, Development by Greg Bayer

AppEngine AnalyticsGoogle AppEngine is a great tool for building simple web applications which are automatically scalable. All of the basic building blocks are readily available and accessible from both python and java. This includes a database, a caching layer, and support for background tasks.

What about the big data analytics and informatics that made Google famous? Does AppEngine help us there as well? The answer is yes; although with some serious limitations.

Continue Reading »

VN:F [1.9.5_1105]
Rating: 4.3/5 (3 votes cast)

Recently Joined Pulse!

Posted October 10th, 2010 in Pulse by Greg Bayer

Pulse by Alphonso LabsAfter a year and half of big data research for the government and quite a bit of fun with Hadoop, I’ve decided to join some good friends at an early-stage startup called Alphonso Labs.

Pulse is currently the #1 news reader on the iPad, iPhone, Andriod app stores.  I’ll be leading the development of our backend data platform and working with a great team.

As we start to build out Pulse’s backend, I’ll be continuing to experiment with Google App engine.  Stay tuned for more posts in that regard.

Pulse on the iPad

Continue Reading »

VN:F [1.9.5_1105]
Rating: 5.0/5 (1 vote cast)