Learn iOS Faster

Posted September 27th, 2013 in Development, Opportunities by Greg Bayer
                     

iOS is one of the most sought after skills in the software industry. More importantly, it’s a lot of fun to work on a major app with the potential to impact millions of users, and it can be even more rewarding to launch your own.

I started learning iOS several years ago, as many developers do, by diving in to Xcode. If you have experience in other languages, it’s easy to work off of a few examples and look up anything else via Google. Right?

Unfortunately, what I got was a mess of an app that works but is very difficult to maintain and iterate on. In retrospect, it’s a good idea to learn the fundamentals and best practices first.

Top iOS developers (like Ankit Gupta) suggest starting with the Stanford iOS class. The course content is well structured and easy to follow. It’s also available completely free on iTunes U. Just open iTunes, navigate to iTunes U, and search for the Stanford class listed as Coding Together – Developing Apps for iPhone and iPad (Winter 2013). You’ll find an excellent recording of Stanford’s CS193p and along with lecture slides, assignments, etc.

cs193p

All of the lectures contain worthwhile content, but even watching the first few will help you do things the right way the first time.

Tip: If the lectures are too slow for you, you can speed them up. After you download a lecture in iTunes U (you can do this for all lectures ahead of time if you want to watch offline), control-click it and select Show in Finder. Then control-click the file in Finder and select Open with QuickTime Player. From here you can watch the lecture in fast forward!

Most people will probably find the 2x speed to be a bit fast, so increase the speed in 10% increments by option-clicking on the fast forward button. I find that 1.7x works well if I’m giving the lecture my full attention.

Updates from Google IO 2013

Posted May 21st, 2013 in Development by Greg Bayer
                     

I’m happy to see that Google has been continuing to invest in App Engine and the broader Google Cloud platform. At this year’s Google IO there were many exciting new feature announcements from across Google. There were also a strong set of new features announced for Google Cloud. Here are some of the highlights from my perspective. To see how things have evolved, check out my wish list from Google IO 2012.

Google-Cloud-Platform

App Engine

Here’s the full list of production and experimental features for App Engine.

Google Compute Engine

 

App Engine Datastore: How to Efficiently Export Your Data

Posted November 8th, 2012 in Big Data, Development by Greg Bayer
                     

While Google App Engine has many strengths, as with all platforms, there are some some challenges to be aware of. Over the last two years, one of our biggest challenges at Pulse has been how difficult it can be to export large amounts of data for migration, backup, and integration with other systems. While there are several options and tools, so far none have been feasible for large datasets (10GB+).

Since we have many TBs of data in Datastore, we’ve been actively looking for a solution to this for some time. I’m excited to share a very effective approach based on Google Cloud Storage and Datastore Backups, along with a method for converting the data to other fomats!

Existing Options For Data Export

These options that have been around for some time. They are often promoted as making it easy to access datastore data, but the reality can be very different when dealing with big data.

  1. Using the Remote API Bulk Loader. Although convenient, this official tool only works well for smaller datasets. Large datasets can easily take 24 hours to download and often fail without explanation. This tool has pretty much remained the same (without any further development) since App Engine’s early days. All official Google instructions point to this approach.
  2. Writing a map reduce job to push the data to another server. This approach can be painfully manual and often requires significant infrastructure elsewhere (eg. on AWS).
  3. Using the Remote API directly or writing a handler to access datastore entities one query at a time, you can run a parallelizable script or map reduce job to pull the data to where you need it. Unfortunately this has the same issues as #2.

A New Approach – Export Data via Google Cloud Storage

The recent introduction of Google Cloud Storage has finally made exporting large datasets out of Google App Engine’s datastore possible and fairly easy. The setup steps are annoying, but thankfully it’s mostly a one-time cost. Here’s how it works.

One-time setup

  • Create a new task queue in your App Engine app called ‘backups’ with the maximum 500/s rate limit (optional).
  • Sign up for a Google Cloud Storage account with billing enabled. Download and configure gsutil for your account.
  • Created a bucket for your data in Google Cloud Storage. You can use the online browser to do this. Note: There’s an unresolved bug that causes backups to buckets with underscores to fail.
  • Use gsutil to set the acl and default acl for that bucket to include your app’s service account email address with WRITE and FULL_CONTROL respectively.

 Steps to export data

  • Navigate to the datastore admin tab in the App Engine console for your app. Click the checkbox next to the Entity Kinds you want to export, and push the Backup button.
  • Select your ‘backups’ queue (optional) and Google Cloud Storage as the destination. Enter the bucket name as /gs/your_bucket_name/your_path.
  • A map reduce job with 256 shards will be run to copy your data. It should be quite fast (see below).

Steps to download data

  • On the machine where you want the data, run the following command. Optionally you can include the -m flag before cp to enable multi-threaded downloads.
gsutil cp -R /gs/your_bucket_name/your_path /local_target

 

Reading Your Data

Unfortunately, even though you now have an efficient way to export data, this approach doesn’t include a built-in way to convert your data to common formats like CSV or JSON. If you stop here, you’re basically stuck using this data only to backup/restore App Engine. While that is useful, there are many other use-cases we have for exporting data at Pulse. So how do we read the data? It turns out there’s an undocumented, but relatively simple way of converting Google’s level db formated backup files into simple python dictionaries matching the structure of your original datastore entities. Here’s a Python snippet to get you started.

# Make sure App Engine APK is available
#import sys
#sys.path.append('/usr/local/google_appengine')
from google.appengine.api.files import records
from google.appengine.datastore import entity_pb
from google.appengine.api import datastore

raw = open('path_to_a_datastore_output_file', 'r')
reader = records.RecordsReader(raw)
for record in reader:
        entity_proto = entity_pb.EntityProto(contents=record)
        entity = datastore.Entity.FromPb(entity_proto)
        #Entity is available as a dictionary!

Note: If you use this approach to read all files in an output directory, you may get a ProtocolBufferDecodeError exception for the first record. It should be safe to ignore that error and continue reading the rest of the records.

Performance Comparison

Remote API Bulk Loader

  • 10GB / 10 hours ~ 291KB/s
  • 100GB – never finishes!

Backup to Google Cloud Storage + Download with gsutil

  • 10GB / 10 mins + 10 mins ~ 8.5MB/s
  • 100GB / 35 mins + 100 mins ~ 12.6MB/s

App Engine Wish List – Updates From Google IO 2012

Posted June 28th, 2012 in Development by Greg Bayer
                     

We’ve been using Google App Engine at Pulse since 2010, back when we had only one backend engineer. In that time, App Engine has served us very well. There are many things Google App Engine does very well; the most obvious advantage is saving us lots of Ops work and letting us stay focused on our application. Over the last two years, it has grown with us both in terms of scale (from 200k users, to 15M+) and in terms of features.

As I’m writing this post (from Google I/O 2012), I’m happy to report that App Engine continues to grow with us. This year, Google’s App Engine team has announced that they are fixing our number one wish list item! They have also started addressing several other important concerns. For some context, here is Pulse’s App Engine wish list as of about a month ago.

  1. SSL support for custom domains
  2. Faster bulk import & export of datastore data
  3. Faster datastore snapshotting
  4. Tunable memcache eviction policies & capacity
  5. Improved support for searching / browsing / downloading high volume application logs
  6. Faster (diff-based) deployment for large applications
  7. Support for naked domains (without www. in front)
  8. Unlimited developer accounts per application

Barb Darrow from GigaOm published part of this list earlier this week (before I/O started). Check out the article Google App Engine: What developers want at Google I/O to see more common wish list items from other developers.

As of yesterday, (with the release of SDK version 1.7.0), SSL for custom domains is now officially supported either via SNI for $9/month or via a custom IP for $99/month. This means that you can now host a domain like www.pulse.me on App Engine and support https throughout your site. Previously it had only been possible to use http with your domain, and any secure transactions had to be routed to the less appealing xxxxx.appspot.com domain. This meant you had to break the user’s flow or use some complicated hacks to hide the domain switching. Now it is finally possible to present a seamless, secure experience without ever leaving your custom domain.

There were many other great features released with 1.7.0 (see the link above). As for the rest of our wish list, here’s how it stands now!

  1. SSL support for custom domains
    – Supported now!
  2. Faster bulk import & export of datastore data
    – Update 2: App Engine Datastore: How to Efficiently Export Your Data
  3. Faster datastore snapshotting
    – Update 3: The internal settings for map reduce-based snapshotting have been increased to use 256 shards. It’s actually pretty fast now! Still hoping for incremental backups in the future.
  4. Tunable memcache eviction policies & capacity
    – I hear that we will soon be able to segment applications and control capacity. Eviction policy controls are likely to take longer.
  5. Improved support for searching / browsing / downloading high volume application logs
    – It was announced that this is coming very soon!!
  6. Faster (diff-based) deployment for large applications
    – Update 4: This is supporting and working for us now!
  7. Support for naked domains (without www. in front)
    – Pending. No ETA.
  8. Unlimited developer accounts per application
    – This is now supported for premier accounts!

Let me know in the comments if you have any questions about these or want to share some of your wish list items. I’m always happy to discuss App Engine issues with other developers.

Update: Just now, at the second Google I/O keynote, Urs Hölzle has announced Google’s push into the IaaS space with Google Compute Engine. It should be interesting to see if this offers serious competition to Amazon’s EC2 for future Pulse systems and features. 771886 cores available to the demo Genome app was pretty impressive! I’ll post here and/or at eng.pulse.me when we get a chance to try it out!

How to Buy a Basic SSL Certificate

Posted April 6th, 2012 in Development by Greg Bayer
                     

In order to support SSL for a simple Tornado server on EC2, a certificate is required. This process seems harder than it should be, so I thought I’d share the process that recently worked for me.

There are several tradeoffs to consider:

  • Certificate Authority (CA) Reputation (‘Self Sign’VeriSign)
  • Price (Free – $3000/year)
  • Domain Coverage: (Single, Multi, Wildcard)

After considering these options and reading about other people’s experiences, I concluded that GoDaddy is the least expensive, reasonably well respected CA. At GoDaddy the wildcard option is 15 times as expensive as the standard single domain certificate (with discount), so it’s a better deal to buy single domain certs even if you need a few.

Steps I took:

  1. Search Google for GoDaddy SSL deal.
  2. Login to GoDaddy and buy a single domain certificate for $12.99/year.
  3. Go to ‘My Account’, click SSL Certificates. Activate your purchased token. Wait a few minutes.
  4. Configure your cert. Choose “Third party server”. Provide a Certificate Signing Request (CSR) for your domain (see below).
  5. Download the cert. Use the cert along with your .key file from the CSR generation process to setup SSL on your server(s).

Continue Reading »

New Eng Blog / Using Data Analysis to Discover Top Stories

Posted May 26th, 2011 in Big Data, Development, Pulse by Greg Bayer
                     

In addition to the regular Pulse Blog where we regularly share updates about our latest features and new content, Pulse now has an Engineering Blog!  The goal is to share some of the exciting engineering work that goes into bringing users the Pulse experience they’ve come to expect. To kick things off I added a post about Using Data Analysis to Discover Top Stories.  In the post I share a bit about how we use AWS to collect and analyse our data, along with how we serve up the feeds we build via AppEngine.  Check it out!

Moving Files from one Git Repository to Another, Preserving History

Posted May 17th, 2011 in Development by Greg Bayer
                     

If you use multiple git repositories, it’s only a matter of time until you’ll want to refactor some files from one project to another.  Today at Pulse we reached the point where it was time to split up a very large repository that was starting to be used for too many different sub-projects.

After reading some suggested approaches, I spent more time than I would have liked fighting with Git to actually make it happen. In the hopes of helping someone else avoid the same trouble, here’s the solution that ended up working best. The solution is primarily based on ebneter’s excellent question on Stack Overflow.

Another solution is Linus Torvald’s “The coolest merge, EVER!” Unfortunately, his approach seems to require more manual fiddling than I would like and results in a repository with two roots. I don’t completely understand the implications of this, so I opted for something more like a standard merge.

Goal:

  • Move directory 1 from Git repository A to Git repository B.

Constraints:

  • Git repository A contains other directories that we don’t want to move.
  • We’d like to perserve the Git commit history for the directory we are moving.

Continue Reading »

Map(Reduce) Analytics on Google AppEngine

Posted October 29th, 2010 in Big Data, Development by Greg Bayer
                     

AppEngine AnalyticsGoogle AppEngine is a great tool for building simple web applications which are automatically scalable. All of the basic building blocks are readily available and accessible from both python and java. This includes a database, a caching layer, and support for background tasks.

What about the big data analytics and informatics that made Google famous? Does AppEngine help us there as well? The answer is yes; although with some serious limitations.

Continue Reading »

Java on Google App Engine

Posted July 17th, 2010 in Development by Greg Bayer
                     

A few thoughts from my first test of Google App Engine.

My goal was to put up a prototype java web app for pushing email alerts based on RSS content (more to come on the full idea). Unfortunately, it took much longer than I expected to get things going (longer than my web app protoype took to write) – leaving me feeling a bit disappointed.  On the up side,  the app has been running quite well for about a week now.

Continue Reading »