<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Greg Bayer</title>
	<atom:link href="http://gbayer.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://gbayer.com</link>
	<description>Welcome.</description>
	<lastBuildDate>Thu, 01 Dec 2011 17:47:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Scaling with the Kindle Fire</title>
		<link>http://gbayer.com/pulse/scaling-with-the-kindle-fire/</link>
		<comments>http://gbayer.com/pulse/scaling-with-the-kindle-fire/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 17:33:30 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Pulse]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[App Engine]]></category>
		<category><![CDATA[Cloud Hosting]]></category>
		<category><![CDATA[Scalability]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=721</guid>
		<description><![CDATA[
Earlier this week I wrote a guest post for the Google App Engine Blog on how Pulse has scaled-up our backend infrastructure to prepare for the recent Kindle Fire launch.
The Kindle Fire includes Pulse as one of the only preloaded apps and is projected to sell over five million units this quarter alone. This meant we had to prepare [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://gbayer.com/v2-wordpress/wp-content/uploads/2011/12/Pulse-on-Kindle-Fire.png"><img class="alignnone size-full wp-image-722" style="border: 0px initial initial;" title="Pulse-on-Kindle-Fire" src="http://gbayer.com/v2-wordpress/wp-content/uploads/2011/12/Pulse-on-Kindle-Fire.png" alt="" width="500" height="253" /></a></p>
<p>Earlier this week I wrote a guest post for the <a href="http://googleappengine.blogspot.com/" target="_blank">Google App Engine Blog</a> on how <a href="http://www.pulse.me" target="_blank">Pulse</a> has scaled-up our backend infrastructure to prepare for the recent Kindle Fire launch.</p>
<p>The Kindle Fire includes Pulse as <a href="http://blog.pulse.me/post/12978740559/kindle-fire-welcome-to-the-pulse-family" target="_blank">one of the only preloaded apps</a> and is projected to sell over <a href="http://techcrunch.com/2011/11/09/amazon-ups-orders-from-kindle-fire-suppliers-to-5-million-units/">five million units</a> this quarter alone. This meant we had to prepare for nearly <a href="http://techcrunch.com/2011/11/16/pulse-scores-key-spot-on-kindles-home-shelf-co-founder-says-it-may-pass-10m-users-this-year/">doubling our user-base</a> in a very short period. We also needed to be ready for spikes in load due to press events and the holiday season</p>
<p>To learn more about our architecture on Google App Engine, how we dealt with the recent App Engine pricing changes, and how we prepared for an expected increase of 5M+ users, check out the <a href="http://googleappengine.blogspot.com/2011/11/scaling-with-kindle-fire.html" target="_blank">original post</a>. You can also find the original post on <a href="http://eng.pulse.me/scaling-with-the-kindle-fire/" target="_blank">Pulse&#8217;s Engineering Blog</a> and some additional analysis from <a title="Posts by Derrick Harris" rel="author" href="http://gigaom.com/author/dharrisstructure/" target="_blank">Derrick Harris</a> on <a href="http://gigaom.com/cloud/pulse-on-kindle-fire-powered-by-google/" target="_blank">GigaOM</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/pulse/scaling-with-the-kindle-fire/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Livecount</title>
		<link>http://gbayer.com/projects/livecount/</link>
		<comments>http://gbayer.com/projects/livecount/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 17:49:00 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[App Engine]]></category>
		<category><![CDATA[Counters]]></category>
		<category><![CDATA[Memcache]]></category>
		<category><![CDATA[Real-time]]></category>
		<category><![CDATA[Task Queues]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=591</guid>
		<description><![CDATA[Livecount is an implementation of real-time counters that leverages the performance of memcache and task queues on Google AppEngine.
Building a solid analytics platform is often a combination of real-time and batch processing. Batch processing, with a tool like Hadoop, is great for digging into large amounts of past data and asking questions that cannot be anticipated.  In contrast, when it [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Livecount on Github" href="https://github.com/gregbayer/gae-livecount" target="_blank">Livecount</a> is an implementation of real-time counters that leverages the performance of <a href="http://code.google.com/appengine/docs/python/memcache/overview.html" target="_blank">memcache</a> and <a href="http://code.google.com/appengine/docs/python/taskqueue/overview.html" target="_blank">task queues</a> on <a title="Google AppEngine" href="http://code.google.com/appengine/" target="_blank">Google AppEngine</a>.</p>
<p>Building a solid analytics platform is often a combination of real-time and batch processing. Batch processing, with a tool like <a title="Hadoop" href="http://hadoop.apache.org/" target="_blank">Hadoop</a>, is great for digging into large amounts of past data and asking questions that cannot be anticipated.  In contrast, when it is known ahead of time that certain aggregates will be required, the best solution is usually to count each event as it happens. Livecount makes it easier to address this second use-case.</p>
<p>I encourage you to read about our <a title="Introducing Livecount" href="http://eng.pulse.me/introducing-livecount/" target="_blank">experience with Livecount</a> at <a title="Pulse" href="http://pulse.me" target="_blank">Pulse</a>.</p>
<h3>Data Persistance</h3>
<p>Livecount initially stores all counts in memcache.  To minimize the risk of data loss, each time a count is updated Livecount creates a worker task to write that count from the memcache to the datastore in the background. If the count is ever evicted, it is reloaded from the datastore on the next read or write.</p>
<div>
<h3>Performance</h3>
<p>Since counter updates are usually written back to the datastore within seconds, the risk of loss is minimal. Write performance is excellent, since only the memcache must be updated before completing a request. Most reads can also be served from the memcache. Load on the datastore is further reduced by storing a dirty flag along with each memcached count. If more increment events come in than can be written back in real time, only one write is needed to update the datastore with the latest count. After a successful write, the dirty bit is cleared and the other backlogged write tasks for that counter are skipped.</p>
<h3>Using Livecount</h3>
<p>This simple solution has allowed Pulse&#8217;s backend to easily scale to counting hundreds of events per second, with minimal cost and complexity.  <a title="Livecount API" href="https://github.com/gregbayer/gae-livecount/blob/master/livecount/counter.py" target="_blank">Livecount&#8217;s API</a> requires nothing more than a simple string counter name.</p>
<pre class="brush:py">from livecount.counter import load_and_increment_counter

load_and_increment_counter(name=url)</pre>
<p>For more advanced use-cases, namespaces are supported for keep counters organized and easy to query.  Recently, we also added support for time period fields to help support hourly/daily/weekly/monthly aggregates. Here&#8217;s a more advanced example.</p>
<pre class="brush:py">from livecount.counter import PeriodType, load_and_increment_counter

load_and_increment_counter(name=url, period=datetime.now(), \
period_types=[PeriodType.DAY, PeriodType.WEEK], namespace="starred", delta=1)</pre>
<p>Livecount is open-source and easily deployable on Google AppEngine. Checkout the README on <a title="Livecount on Github" href="https://github.com/gregbayer/gae-livecount" target="_blank">Github</a>.</p>
<p>If you have something to count, give Livecount a try.  I&#8217;d love to hear your thoughts or suggestions for improvement!</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/projects/livecount/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How To Use a Commuter Check Card to Purchase a Caltrain Monthly Pass</title>
		<link>http://gbayer.com/observations/how-to-use-a-commuter-check-card-to-purchase-a-caltrain-monthly-pass/</link>
		<comments>http://gbayer.com/observations/how-to-use-a-commuter-check-card-to-purchase-a-caltrain-monthly-pass/#comments</comments>
		<pubDate>Mon, 27 Jun 2011 21:20:32 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Observations]]></category>
		<category><![CDATA[Caltrain]]></category>
		<category><![CDATA[Clipper Card]]></category>
		<category><![CDATA[Commuter Check]]></category>
		<category><![CDATA[Commuting]]></category>
		<category><![CDATA[Public Transit]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=645</guid>
		<description><![CDATA[
Warning
Before I start, let me recommend that you don&#8217;t try this.  The potential savings you gain from using pre-tax Commuter Check cards likely won&#8217;t be worth the pain of actually trying to buy something with them.  Return the cards to your employer and ask them to enroll in another option for funding your [...]]]></description>
			<content:encoded><![CDATA[<p><span style="font-weight: normal;"><img class="size-full wp-image-652 alignnone" title="commuter_check_card" src="http://gbayer.com/v2-wordpress/wp-content/uploads/2011/07/commuter_check_card.gif" alt="" width="180" height="112" /></span></p>
<h2>Warning</h2>
<p>Before I start, let me recommend that you <strong>don&#8217;t try this</strong>.  The potential savings you gain from using pre-tax <a href="http://www.commutercheck.com/card.aspx" target="_blank">Commuter Check</a> cards likely won&#8217;t be worth the pain of actually trying to buy something with them.  Return the cards to your employer and ask them to enroll in another option for funding your commute costs pre-tax!<br />
<br />
<strong>Update</strong>: The <a href="http://www.clippercard.com/ClipperWeb/getTranslinkRegisterForAutoloadInfo.do" target="_blank">Autoload</a> program via Clipper works great. Instead of buying a pass in person with a commuter check card, you tag on/off once at the beginning of the month to load a new pass.</p>
<h2>Goal Prototyped Below</h2>
<p><strong></strong>Use two Commuter Check cards issued by an employer (each containing $100) to purchase a zone 1-3 monthly Caltrain pass on a Clipper card (for $179).</p>
<h2>What Not To Do (Because It Doesn&#8217;t Work)</h2>
<p>This section is intended to lower your expectations to the appropriate level. Almost all advice I have received on this topic has been either been out of date or blatantly incorrect, resulting in more wasted time than I ever thought possible. Please take my own advice with a grain of salt, as anything related to Caltrain or Commuter Cards is likely to change in unpredictable ways in the near future.</p>
<h3>Walgreens</h3>
<p>Since monthly Caltrain passes are usually purchased at Walgreens, this is a common first stop when trying to achieve the goal listed above. Unfortunately, although Commuter Check cards are labeled as credit card compatible, they dont work in the standard machines and cannot be accepted by Walgreens. This fact does not stop most Caltrain/Clipper/Commuter Check employees and random people on the street from suggesting this option every time you ask.</p>
<h3>Online / Over the Phone</h3>
<p>For some reason, probably related to the same reason Walgreens can&#8217;t accept them, Clipper&#8217;s online system does not accept Commuter Check cards. Again, this fact does not stop Clipper/Commuter Check call center employees from suggesting this option every time you ask. The Clipper call center will even try to process your Commuter Check card and tell you there is something wrong with it. This will result in you sepending another 20 minutes on hold with Commuter Check customer service just to be told that the card is fine and that you should purchase your monthly pass online (see beginning of this section).</p>
<h3>In Person</h3>
<p>Explaining all of this to a Clipper call center employee will sometimes result in them telling you that you can only use your Commuter Check cards at a Caltrain booth with a clerk (real person). In my case, they strongly suggested I go to the 4th/King Caltrain station before 7pm. While this advice is getting closer to the right answer, if you actually try going to 4th/King and asking around for a clerk or a booth, you&#8217;ll eventually discover that all of the Caltrain clerks in the entire system were laid off a few months ago. The remaining employees at the station fall back to the commonly issued advice above (see Walgreens &amp; Online / Over the Phone).</p>
<h2>What Does Work (Sort Of)</h2>
<p><img class="size-medium wp-image-651 alignleft" title="clipper_card" src="http://gbayer.com/v2-wordpress/wp-content/uploads/2011/07/clipper_card-193x300.jpg" alt="" width="193" height="300" /></p>
<p>I&#8217;d like to credit the clerk at the Walgreens across from the 4th/King Caltrain station with the first halfway useful advice. Apparently, the Transit Store at the Powell street BART station accepts Commuter Check cards in exchange for Caltrain monthly passes!</p>
<h3>BART Transit Store</h3>
<p>Amazingly, the magazine stand style booths at some of the major BART stations along market <strong>do</strong> take Commuter Check cards. Unfortunately, some open late / close early and asking a BART employee for their hours results in very inaccurate information. Also, for some reason the Civic Center booth (where I was advised to go because it opens earlier than most others), does not take Commuter Check cards (anymore). Where can you use them? Based on my experience,<strong> I know for sure that you can use them at the Embarcadero BART station Transit Store booth</strong> and should be able to use them at the Powell street station if you can get there when they are open.</p>
<h3>Bad Cards</h3>
<p>When I finally reached a person with the ability to accept Commuter Check cards, I found out that one of my two cards was &#8220;malfunctioning.&#8221; According to the clerk, this happens all the time. I happily paid the remainder of my monthly pass with my own credit card in order to end the whole painful ordeal! I returned the bad card to my company and recommended that they no longer issue Commuter Check cards (see Warning). In any case, I won&#8217;t be using them again.</p>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/observations/how-to-use-a-commuter-check-card-to-purchase-a-caltrain-monthly-pass/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Pulse Wins Apple Design Award and Raises $9 Million Series A</title>
		<link>http://gbayer.com/pulse/pulse-wins-apple-design-award-and-raises-9-million-series-a/</link>
		<comments>http://gbayer.com/pulse/pulse-wins-apple-design-award-and-raises-9-million-series-a/#comments</comments>
		<pubDate>Thu, 16 Jun 2011 22:33:50 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Pulse]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[Awards]]></category>
		<category><![CDATA[Funding]]></category>
		<category><![CDATA[UI Design]]></category>
		<category><![CDATA[WWDC]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=582</guid>
		<description><![CDATA[
I&#8217;m very excited to share that Pulse has announced it&#8217;s series A funding round! All of us are still fired up about last week&#8217;s Apple Design Award at WWDC and our recent 4 million user milestone, not to mention that today is our co-founder Ankit&#8217;s birthday. Thanks to the team for their tireless work and to everyone who [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://gbayer.com/v2-wordpress/wp-content/uploads/2011/06/alphonso_winner.png"><img class="alignnone size-full wp-image-585" title="Pulse Wins Apple Design Award" src="http://gbayer.com/v2-wordpress/wp-content/uploads/2011/06/alphonso_winner.png" alt="" width="500" height="300" /></a></p>
<p>I&#8217;m very excited to share that <a title="Pulse News" href="http://pulse.me" target="_blank">Pulse</a> has announced it&#8217;s series A funding round! All of us are still fired up about last week&#8217;s <a title="Pulse Apple Design Award" href="http://developer.apple.com/wwdc/ada/#pulse" target="_blank">Apple Design Award</a> at WWDC and our recent 4 million user milestone, not to mention that today is our co-founder Ankit&#8217;s <a href="https://twitter.com/#!/gankit/status/81469608279805952" target="_blank">birthday</a>. Thanks to the team for their <a href="http://gbayer.com/observations/working-hard-with-no-regrets/" target="_self">tireless work</a> and to everyone who has helped us get here!</p>
<p>Check out some of today&#8217;s press:</p>
<p><a title="Pulse Blog - Announcing Our Series A Financing" href="http://blog.pulse.me/pulse-raises-9-million-in-series-a-financing" target="_blank">Pulse Blog &#8211; Announcing Our Series A Financing</a><br />
<a title="TechCrunch - 4 Million Users Strong And Apple Design Award In Hand, Pulse Grabs $9 Million Series A" href="http://techcrunch.com/2011/06/16/4-million-users-strong-and-apple-design-award-in-hand-pulse-grabs-9-million-series-a/" target="_blank">TechCrunch &#8211; 4 Million Users Strong And Apple Design Award In Hand, Pulse Grabs $9 Million Series A<br />
</a><a title="WSJ - Pulse Taps $9M To Win Battle For Mobile-News Consumers" href="http://blogs.wsj.com/venturecapital/2011/06/16/pulse-taps-9m-to-win-battle-for-mobile-news-consumers/" target="_blank">WSJ &#8211; Pulse Taps $9M To Win Battle For Mobile-News Consumers</a><a title="Forbes - News Reader Pulse Raises $9 Million" href="http://blogs.forbes.com/tomiogeron/2011/06/16/news-reader-pulse-raises-9-million/" target="_blank"><br />
Forbes &#8211; News Reader Pulse Raises $9 Million<br />
</a><a title="Mashable - Pulse Passes 4 Million Users, Raises $9 Million for Visual News Reader" href="http://mashable.com/2011/06/16/pulse-funding/" target="_blank">Mashable &#8211; Pulse Passes 4 Million Users, Raises $9 Million for Visual News Reader</a></p>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/pulse/pulse-wins-apple-design-award-and-raises-9-million-series-a/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Working Hard With No Regrets</title>
		<link>http://gbayer.com/observations/working-hard-with-no-regrets/</link>
		<comments>http://gbayer.com/observations/working-hard-with-no-regrets/#comments</comments>
		<pubDate>Thu, 02 Jun 2011 21:31:20 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Observations]]></category>
		<category><![CDATA[Dreams]]></category>
		<category><![CDATA[Family]]></category>
		<category><![CDATA[Friends]]></category>
		<category><![CDATA[Productivity]]></category>
		<category><![CDATA[Pulse]]></category>
		<category><![CDATA[Regrets]]></category>
		<category><![CDATA[Relationships]]></category>
		<category><![CDATA[Startups]]></category>
		<category><![CDATA[Work]]></category>
		<category><![CDATA[Work–Life Balance]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=563</guid>
		<description><![CDATA[Working for a startup usually means putting in more hours than others. Recently, I spent two days on less than 3 hours of sleep in order to push out our new Pulse.me release. This doesn&#8217;t seem strange to me and didn&#8217;t make me unhappy. In fact, it was one of the most exciting and fun things [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://gbayer.com/v2-wordpress/wp-content/uploads/2011/06/work-life-balance.png"><img class="size-full wp-image-570 alignleft" title="work-life-balance" src="http://gbayer.com/v2-wordpress/wp-content/uploads/2011/06/work-life-balance.png" alt="" width="200" height="275" /></a><a href="http://gbayer.com/pulse/pulse-news-is-hiring/">Working for a startup</a> usually means putting in more hours than others. Recently, I spent two days on less than 3 hours of sleep in order to push out our new <a title="Pulse.me" href="http://www.pulse.me" target="_blank">Pulse.me</a> release. This doesn&#8217;t seem strange to me and didn&#8217;t make me unhappy. In fact, it was one of the most exciting and fun things I&#8217;ve done in a while. However, after mentioning it to some friends, I realized not everyone understands why it can be good to spend so much time &#8220;working&#8221; to build something you believe in.</p>
<p>Upon hearing about my sleep deprived state, my friend sent me a link to the <a title="top 5 regrets people make on their deathbed" href="http://ohdarling.posterous.com/nurse-reveals-the-top-5-regrets-people-make-o" target="_blank">top 5 regrets people make on their deathbed</a> along with the comment &#8220;you might need this.&#8221;  I appreciated the link and enjoyed the reminder to live life to the fullest, especially with regards to keeping in touch with friends and loved ones. I also realized that my friend didn&#8217;t understand that for me the long hours I put in are all about fulfilling my dreams of creating new technology and impacting the world in a positive way. According the article, not chasing after dreams is people&#8217;s #1 regret.</p>
<p>Of course there is an opportunity cost to time spent on any endeavor and this inevitably contributes to spending less time with friends and loved ones (regret #4). I believe maintaining a healthy balance between the two is critical. Simply &#8220;working less&#8221; (regret #2) would not make me happier. Chasing after dreams is an essential part of my life. The feeling of fulfillment I get from doing so makes me a much happier / more content person, and this in turn positively affects my relationships.</p>
<p>However, sometimes I do get caught up in chasing my dreams and forget to make time for friends and family. Just like realizing dreams, successful relationships are built on quality time spent together. I always appreciate being reminded to dedicate more time to this essential part of life, as I was today. I&#8217;d love to hear your thoughts or personal experiences on achieving the right balance.</p>
<p>Disclaimer: This post was written in a sleep-deprived state.</p>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/observations/working-hard-with-no-regrets/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>New Eng Blog / Using Data Analysis to Discover Top Stories</title>
		<link>http://gbayer.com/big-data/new-eng-blog-using-data-analysis-to-discover-top-stories/</link>
		<comments>http://gbayer.com/big-data/new-eng-blog-using-data-analysis-to-discover-top-stories/#comments</comments>
		<pubDate>Thu, 26 May 2011 22:31:52 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Pulse]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[AppEngine]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[Top Stories]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=549</guid>
		<description><![CDATA[In addition to the regular Pulse Blog where we regularly share updates about our latest features and new content, Pulse now has an Engineering Blog!  The goal is to share some of the exciting engineering work that goes into bringing users the Pulse experience they&#8217;ve come to expect. To kick things off I added a post about Using [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-551 alignleft" title="Pulse Engineering Blog" src="http://gbayer.com/v2-wordpress/wp-content/uploads/2011/05/Pulse-Engineering-Blog.png" alt="" width="191" height="54" />In addition to the regular <a title="Pulse Blog" href="http://blog.alphonsolabs.com" target="_blank">Pulse Blog</a> where we regularly share updates about our latest features and new content, <a title="Pulse News" href="http://alphonsolabs.com" target="_blank">Pulse</a> now has an <a title="Pulse Engineering Blog" href="http://eng.alphonsolabs.com" target="_blank">Engineering Blog</a>!  The goal is to share some of the exciting engineering work that goes into bringing users the Pulse experience they&#8217;ve come to expect. To kick things off I added a post about <a title="Using Data Analysis to Discover Top Stories" href="http://eng.alphonsolabs.com/using-data-analysis-to-discover-top-stories/" target="_blank">Using Data Analysis to Discover Top Stories</a>.  In the post I share a bit about how we use AWS to collect and analyse our data, along with how we serve up the feeds we build via AppEngine.  Check it out!</p>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/big-data/new-eng-blog-using-data-analysis-to-discover-top-stories/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Moving Files from one Git Repository to Another, Preserving History</title>
		<link>http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/</link>
		<comments>http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/#comments</comments>
		<pubDate>Tue, 17 May 2011 07:25:51 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Git]]></category>
		<category><![CDATA[Refactoring]]></category>
		<category><![CDATA[Version Control]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=526</guid>
		<description><![CDATA[If you use multiple git repositories, it&#8217;s only a matter of time until you&#8217;ll want to refactor some files from one project to another.  Today at Pulse we reached the point where it was time to split up a very large repository that was starting to be used for too many different sub-projects.
After reading some [...]]]></description>
			<content:encoded><![CDATA[<p>If you use multiple git repositories, it&#8217;s only a matter of time until you&#8217;ll want to refactor some files from one project to another.  Today at <a href="http://alphonsolabs.com" target="_blank">Pulse</a> we reached the point where it was time to split up a very large repository that was starting to be used for too many different sub-projects.</p>
<p>After reading some suggested approaches, I spent more time than I would have liked fighting with Git to actually make it happen.  In the hopes of helping someone else avoid the same trouble, here&#8217;s the solution that ended up working best. The solution is primarily based on ebneter&#8217;s <a href="http://stackoverflow.com/questions/1365541/how-to-move-files-from-one-git-repo-to-another-not-a-clone-preserving-history" target="_blank">excellent question</a> on Stack Overflow.</p>
<p>Another solution is Linus Torvald&#8217;s &#8220;<a href="http://thread.gmane.org/gmane.comp.version-control.git/5126/" target="_blank">The coolest merge, EVER!</a>&#8220; Unfortunately, his approach seems to require more manual fiddling than I would like and results in a repository with two roots. I don&#8217;t completely understand the implications of this, so I opted for something more like a standard merge.</p>
<h3>Goal:</h3>
<ul>
<li>Move directory 1 from Git repository A to Git repository B.</li>
</ul>
<h3>Constraints:</h3>
<ul>
<li>Git repository A contains other directories that we don&#8217;t want to move.</li>
<li>We&#8217;d like to perserve the Git commit history for the directory we are moving.</li>
</ul>
<h3>Get files ready for the move:</h3>
<p>Make a copy of repository A so you can mess with it without worrying about mistakes too much.  It&#8217;s also a good idea to delete the link to the original repository to avoid accidentally making any remote changes (line 3).  Line 4 is the critical step here.  It goes through your history and files, removing anything that is not in directory 1.  The result is the contents of directory 1 spewed out into to the base of repository A.  You probably want to import these files into repository B within a directory, so move them into one now (lines 5/6). Commit your changes and we&#8217;re ready to merge these files into the new repository.</p>
<pre class="brush:shell">git clone &lt;git repository A url&gt;
cd &lt;git repository A directory&gt;
git remote rm origin
git filter-branch --subdirectory-filter &lt;directory 1&gt; -- --all
mkdir &lt;directory 1&gt;
mv * &lt;directory 1&gt;
git add .
git commit</pre>
<h3>Merge files into new repository:</h3>
<p>Make a copy of repository B if you don&#8217;t have one already.  On line 3, you&#8217;ll create a remote connection to repository A as a branch in repository B.  Then simply pull from this branch (containing only the directory you want to move) into repository B.  The pull copies both files and history.  Note: You can use a merge instead of a pull, but pull worked better for me. Finally, you probably want to clean up a bit by removing the remote connection to repository A. Commit and you&#8217;re all set.</p>
<pre class="brush:shell">git clone &lt;git repository B url&gt;
cd &lt;git repository B directory&gt;
git remote add repo-A-branch &lt;git repository A directory&gt;
git pull repo-A-branch master
git remote rm repo-A-branch</pre>
<p><em><span style="color: #c0c0c0;">Update: Removed final commit thanks to Von&#8217;s comment.</span></em></p>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Pulse News is Hiring!</title>
		<link>http://gbayer.com/pulse/pulse-news-is-hiring/</link>
		<comments>http://gbayer.com/pulse/pulse-news-is-hiring/#comments</comments>
		<pubDate>Fri, 10 Dec 2010 21:48:06 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Pulse]]></category>
		<category><![CDATA[App Engine]]></category>
		<category><![CDATA[CSS]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[GWT]]></category>
		<category><![CDATA[Hiring]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[iOS]]></category>
		<category><![CDATA[iPad]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[Jobs]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[RoR]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=437</guid>
		<description><![CDATA[
A few months ago I mentioned that I left the government/research world (Sandia Labs) and joined an exciting new startup.   I&#8217;d like to share a bit more about my experience so far and announce that we are hiring!
Those who have worked at a large company and then moved to startup can probably relate to my experience. [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/uJEmUaFXWuA?fs=1&amp;hl=en_US" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="640" height="385" src="http://www.youtube.com/v/uJEmUaFXWuA?fs=1&amp;hl=en_US" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>A few months ago I <a title="Recently Joined Pulse!" href="http://gbayer.com/general/recently-joined-pulse/">mentioned</a> that I left the government/research world (Sandia Labs) and joined an exciting new startup.   I&#8217;d like to share a bit more about my experience so far and announce that we are hiring!</p>
<p>Those who have worked at a large company and then moved to startup can probably relate to my experience.  First, without a doubt, the most motivating and fun part about working at Pulse is seeing the <i>impact of my work. </i> And I don&#8217;t mean just having someone say &#8220;Good Job&#8221; or receiving a strong performance review, I mean seeing thousands of people USE the results of your work and submit feedback about how it benefitted their lives.  At Pulse, this experience is magnified by the fact that we release new product features every two weeks, and not ever quarter, or every year!</p>
<p>In addition to really seeing the impact of your work at Pulse, another benefit I never knew I was missing in the corporate world is the feeling of being a part of a tight-nit team where everyone is giving 110% towards reaching the same goal.  This is something that is rarely found outside the startup world and should not be underestimated in its ability to improve productivity and genuine enjoyment of work. Imagine you and your coworkers are always committed to the <i>same goals</i> and never distracted by office politics or personal agendas.  How often does that happen in your current job?</p>
<p>Finally, at Pulse both the impact of your work and great team dynamics are <i>highly leveraged</i>.  In the corporate world great project outcomes or solid team building commonly result in a plus mark on the rating form and a promotion in three to five years.  How motivating is this for the most capable employees?  The impact of their work, which in the computer science world can easily be many times that of other employees, is muffled by ridged performance review and compensation structures.  In a startup, this muffling effect is removed.  Everyone accepts slightly lower fixed compensation up front in exchange for the chance to turn hard work into a big reward down the line.  This reward is directly tied to the impact your work has and how well the team works together.  Now everyone is properly incentivized to give their all, and giving your all for something you believe in is inherently rewarding!</p>
<p>Check out the official <a title="Pulse Blog" href="http://blog.alphonsolabs.com" target="_blank">Pulse Blog</a> and our <a title="We're Hiring!" href="http://blog.alphonsolabs.com/a-day-at-the-pulse-news-hq-were-hiring" target="_blank">hiring post</a> there.  Here are some of the postions we are currently looking to fill.</p>
<p><span id="more-437"></span><br />
OPEN POSITIONS:<strong><br />
</strong></p>
<p><strong>iOS developer: </strong>Our iPad/iPhone platform continues to be our most active platform. We&#8217;re looking for a developer to join our iOS team, which currently consists of Ankit (our co-founder) and Dr. Tyler (yes, he has a PhD from NYU).</p>
<ul>
<li>We <i>don&#8217;t care </i>much about your college, your major or your GPA.</li>
<li>We <i>do care</i> about your experience with the iOS platform. Ideally, you have already developed and launched an app in the store. You prototype features rapidly and iterate on the design even more rapidly &#8211; while writing clean code.</li>
</ul>
<p><strong>Web developer: </strong>Pulse currently does not have much of a web presence. We hacked a Wordpress theme to build our current website. We use Posterous for our blog. For the rest, we depend on Get Satisfaction, Facebook, Twitter and the likes. You will be leading our charge on the web!</p>
<ul>
<li>We <i>don&#8217;t care </i>much about your college, your major or your GPA.</li>
<li>We <i>do care</i> about your experience with web. Ideally, you are experienced in writing HTML/CSS/Javascript (GWT is a bonus!). You&#8217;re also familiar with either Django, PHP/Python, AppEngine, RoR or other frameworks. You love making beautiful websites or web applications. Huge bonus if you&#8217;ve already made some.</li>
</ul>
<p>Drop us a line at <a href="mailto:jobs@alphonsolabs.com">jobs@alphonsolabs.com</a> or feel free to ping me directly at gb [at] alphonsolabs.com</p>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/pulse/pulse-news-is-hiring/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Map(Reduce) Analytics on Google AppEngine</title>
		<link>http://gbayer.com/big-data/mapreduce-analytics-on-google-appengine/</link>
		<comments>http://gbayer.com/big-data/mapreduce-analytics-on-google-appengine/#comments</comments>
		<pubDate>Fri, 29 Oct 2010 10:52:57 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[App Engine]]></category>
		<category><![CDATA[Counters]]></category>
		<category><![CDATA[In-Memory]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Memcache]]></category>
		<category><![CDATA[Sharded]]></category>
		<category><![CDATA[Task Queues]]></category>
		<category><![CDATA[Write-Behind]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=377</guid>
		<description><![CDATA[Google AppEngine is a great tool for building simple web applications which are automatically scalable. All of the basic building blocks are readily available and accessible from both python and java.  This includes a database, a caching layer, and support for background tasks.
What about the big data analytics and informatics that made Google famous? [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://gbayer.com/v2-wordpress/wp-content/uploads/2010/10/appengine-analytics.jpg"><img class="size-full wp-image-392 alignright" title="appengine-analytics" src="http://gbayer.com/v2-wordpress/wp-content/uploads/2010/10/appengine-analytics.jpg" alt="AppEngine Analytics" width="300" height="166" /></a>Google AppEngine is a great tool for building simple web applications which are automatically scalable. All of the basic building blocks are readily available and accessible from both python and java.  This includes a database, a caching layer, and support for background tasks.</p>
<p>What about the big data analytics and informatics that made Google famous?  Does AppEngine help us there as well?  The answer is yes; although with some serious limitations.</p>
<p><span id="more-377"></span></p>
<h3>AppEngine MapReduce</h3>
<p>Recently <a href="http://googleappengine.blogspot.com/2010/07/introducing-mapper-api.html" target="_blank">Mike Aizatsky and others at Google</a> started working on a great add-on to AppEngine called <a href="http://code.google.com/p/appengine-mapreduce/" target="_blank">MapReduce</a>. This is, of course, conceptually the same MapReduce originally published by Jeffrey Dean and Sanjay Ghemawat <a href="http://labs.google.com/papers/mapreduce.html" target="_blank">here</a> and the same MapReduce upon which <a href="http://hadoop.apache.org/" target="_blank">Hadoop</a> is based.</p>
<p>Unfortunately, for <a href="http://www.youtube.com/watch?v=_7fJotosrNQ" target="_blank">various reasons</a>, Google decided not to make their internal MapReduce infrastructure available.  Instead, they have decided to develop an entirely new system within the sandboxed world of AppEngine.  This allows them to provide some additional batch processing support, while protecting their infrastructure.</p>
<p>For AppEngine developers, this means that a limited set of analytics functionality is currently available out-of-the-box.  The new functionality amounts to support for &#8220;map-only&#8221; jobs at medium scale (64 shards).  Reduce functionality, which is critical for most analytics, is not yet available.  If your application requires reduce-like aggregates, your options include: waiting for Google to release more features, switching to Amazon EMR or EC2, or working around Google&#8217;s limitations.</p>
<p>Assuming you chose option number three, there are several ways to implement aggregates on top of AppEngine&#8217;s map jobs.   As an example, lets look at a simple word count.</p>
<h3>Built-in Counters</h3>
<p>To count how often a few words (up to about a thousand or so) appear in a large set of documents, we can use the platform&#8217;s built in counters.  The code looks something like <a href="http://code.google.com/p/appengine-mapreduce/wiki/GettingStartedInPython" target="_blank">this</a> or this:</p>
<pre class="brush:py">from mapreduce import operation as op

def process(entity):
    word1 = entity.content[0]
    word2 = entity.content[1]
    yield op.counters.Increment(word1)
    yield op.counters.Increment(word2)</pre>
<p>Unfortunately,  for large aggregates this won&#8217;t work.  Assume now that you want to count all of the words in the documents.  There will be too many counters to use the built-in facility.  Since AppEngine does not support reduce operations, the usual approach of allowing a reducer to sum up counts from each mapper is also unavailable.  What remains is to implement an efficient, scalable counter.</p>
<h3>Sharded Counters</h3>
<p>Because of the parallel nature of many mappers counting at the same time, one resonable approach is to implement <a href="http://code.google.com/appengine/articles/sharding_counters.html" target="_blank">sharded counters</a>.  The goal is to avoid database contention by splitting a single counter into several shards.  These shards can then be incremented independently and summed up to get a total counter value.  This technique can be quite effective, but still requires a database read and write for every counter increment.  In practice I found that this severely limited the throughput of my MapReduce jobs on AppEngine.</p>
<h3>In-Memory Counters</h3>
<p>To avoid blocking on database reads and writes, a logical approach is to consider using the low latency <a href="http://code.google.com/appengine/docs/python/memcache/usingmemcache.html" target="_blank">memcache</a> layer AppEngine provides to implement completely in-memory counters.  This works great for short jobs with a small to medium number of counters.  Unfortunately, when running a long job with many counters, it is very likely that some (sometimes many) counters will be evicted from the cache before the job completes.  Since AppEngine only allows cache eviction policy hints and shares one big cache among all apps, there is no way to prevent this.</p>
<h3>Write-Behind Counters</h3>
<p>For long running jobs with many counters, we need a combination of low latency (in memory) counters and durability to survive cache evictions.  One elegant solution is to use AppEngine&#8217;s task queues to write counter values to the datastore periodically.  This allows counter increments to continue at high throughput without blocking on datastore interactions. Instead, updates to the datastore are batched and written back less frequently.  This works well because the memcache is unlikely to evict recently incremented counters before the background task has a chance to store the counter&#8217;s value.</p>
<p>Note: With this approach there is a small chance that a counter value will be lost.   However, for many applications this is acceptable and well worth the increased mapper throughput.</p>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/big-data/mapreduce-analytics-on-google-appengine/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Recently Joined Pulse!</title>
		<link>http://gbayer.com/pulse/recently-joined-pulse/</link>
		<comments>http://gbayer.com/pulse/recently-joined-pulse/#comments</comments>
		<pubDate>Mon, 11 Oct 2010 04:00:25 +0000</pubDate>
		<dc:creator>Greg Bayer</dc:creator>
				<category><![CDATA[Pulse]]></category>
		<category><![CDATA[Alphonso Labs]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[App Engine]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Government]]></category>
		<category><![CDATA[iPad]]></category>
		<category><![CDATA[iPhone]]></category>

		<guid isPermaLink="false">http://gbayer.com/?p=378</guid>
		<description><![CDATA[After a year and half of big data research for the government and quite a bit of fun with Hadoop, I&#8217;ve decided to join some good friends at an early-stage startup called Alphonso Labs.
Pulse is currently the #1 news reader on the iPad, iPhone, Andriod app stores.  I&#8217;ll be leading the development of our backend data [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://gbayer.com/v2-wordpress/wp-content/uploads/2010/10/pulse-logo-black.jpg"><img class="size-full wp-image-380 alignleft" title="Pulse Logo" src="http://gbayer.com/v2-wordpress/wp-content/uploads/2010/10/pulse-logo-black.jpg" alt="Pulse by Alphonso Labs" width="250" height="110" /></a>After a year and half of big data research for the <a href="http://sandia.gov" target="_blank">government</a> and quite a bit of fun with Hadoop, I&#8217;ve decided to join some good friends at an early-stage startup called <a title="Alphonso Labs" href="http://www.alphonsolabs.com" target="_blank">Alphonso Labs</a>.</p>
<p><a title="Pulse" href="http://www.alphonsolabs.com/products" target="_blank">Pulse</a> is currently the #1 news reader on the iPad, iPhone, Andriod app stores.  I&#8217;ll be leading the development of our backend data platform and working with a great <a title="Pulse Team" href="http://www.alphonsolabs.com/team" target="_blank">team</a>.</p>
<p>As we start to build out Pulse&#8217;s backend, I&#8217;ll be continuing to experiment with Google App engine.  Stay tuned for more posts in that regard.</p>
<p><a href="http://gbayer.com/v2-wordpress/wp-content/uploads/2010/10/pulse-ipad.jpg"><img class="alignnone size-full wp-image-379" title="Pulse iPad" src="http://gbayer.com/v2-wordpress/wp-content/uploads/2010/10/pulse-ipad.jpg" alt="Pulse on the iPad" width="620" height="764" /></a></p>
<p><span id="more-378"></span>Thoughts?  Please share your comments below.</p>
]]></content:encoded>
			<wfw:commentRss>http://gbayer.com/pulse/recently-joined-pulse/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

