<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Something People Want</title>
	<atom:link href="http://www.olark.com/spw/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.olark.com/spw</link>
	<description>Our blog on startups, code, and design.</description>
	<lastBuildDate>Mon, 09 Apr 2012 17:27:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Don&#8217;t break the Internet with your Javascript</title>
		<link>http://www.olark.com/spw/2012/03/dont-break-the-internet-with-your-javascript/</link>
		<comments>http://www.olark.com/spw/2012/03/dont-break-the-internet-with-your-javascript/#comments</comments>
		<pubDate>Fri, 30 Mar 2012 06:39:45 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[developers]]></category>

		<guid isPermaLink="false">http://www.olark.com/spw/?p=100</guid>
		<description><![CDATA[.entry-content h1 {font-size: 1.2em; border-bottom: 1px solid #ddd} Every day, Olark sees more than 3 million visits across thousands of different websites and browsers. We constantly have to ask ourselves: are we causing any issues or slowdowns on our customers&#8217; &#8230; <a href="http://www.olark.com/spw/2012/03/dont-break-the-internet-with-your-javascript/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<style>
.entry-content h1 {font-size: 1.2em; border-bottom: 1px solid #ddd}
</style>
<p>Every day, Olark sees more than 3 million visits across thousands of different websites and browsers.  We constantly have to ask ourselves: <em>are we causing any issues or slowdowns on our customers&#8217; websites?</em>  <strong>Knowing the answers to these questions is critical to our success</strong>.  A casual misstep means we could be silently breaking thousands of websites, halting transactions, angering the world, causing the next recession&#8230;you know, the usual.</p>
<p>Back in October, we <a target="_blank" href="http://www.olark.com/spw/2011/10/lightningjs-safe-fast-and-asynchronous-third-party-javascript/">wrote briefly</a> about the challenges of providing third-party Javascript as a service.  This time around, we&#8217;ll talk a bit more about these challenges, and what we do at Olark to thrive in spite of it.</p>
<h1>Why is providing 3rd-party Javascript so challenging?</h1>
<p>Living inside other companies&#8217; websites means a lot less control&#8230;and sometimes it feels a lot like running through a minefield. While we can sidestep some challenges with <a target="_blank" href="http://lightningjs.com/">recent techniques</a>, the fact remains that our code executes in a hostile environment.  Here is a short list of real problems that we have seen in the wild:</p>
<ul>
<li><strong>Poorly sandboxed Javascript</strong>
<ul>
<li>overridden builtins (e.g. custom <code>window.escape</code>)</li>
<li>overridden global prototypes (e.g. <code>toJSON</code>)</li>
<li><a target="_blank" href="http://www.olark.com/customer/portal/articles/334307-using-olark-with-jquery-prettyphoto">jQuery plugins</a> that capture all keyboard events</li>
</ul>
</li>
<li><strong>Cookie idiosyncrasies</strong>
<ul>
<li>too many cookies on the host page</li>
<li>cookies disallowed inside of iframed websites</li>
</ul>
</li>
<li><strong>Inconsistent browser caching</strong>
<ul>
<li>out-of-date caches lasting longer than expected</li>
<li>sporadic <a href="http://code.google.com/p/chromium/issues/detail?id=117408" target="_blank">&#64;font-face re-downloading</a></li>
</ul>
</li>
<li><strong>Broken CSS</strong>
<ul>
<li>different HTML doctypes</li>
<li>conflicting (or overly broad) CSS rules</li>
</ul>
</li>
<li><strong>Old versions of the embed code</strong>
<ul>
<li>CMS plugins that continue to use our old embed code (e.g. for WordPress, Joomla, etc)</li>
</ul>
</li>
</ul>
<p>With this long list of variables that we cannot control, unit and functional testing has slowly become less effective at uncovering real-world issues.  Ever seen a browser testing tool that offers &#8220;Internet Explorer 7 with XHTML doctype plus custom JSON overrides&#8221; as one of its environments? Neither have we <img src='http://www.olark.com/spw/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> </p>
<h1>So how can you survive (and thrive) across thousands of websites?</h1>
<p>One word: <strong>monitoring</strong>.  Deep, application-level monitoring.</p>
<p>At Olark, we have developed a collection of tools to do application monitoring via log analysis.  Our monitoring architecture looks looks something like this:</p>
<p><center><img src="http://www.olark.com/spw/wp-content/uploads/2012/03/monitoring-architecture.png"/></center></p>
<p>In the end, we simply write code like this in Javascript:</p>
<pre>log("something weird happened #warn")</pre>
<p>&#8230;this will track &#8220;warn&#8221; events.  Notice that we use <strong>#hashtags</strong> to name the events we want to show up in our metrics.</p>
<p>Our system collects these log messages from Javascript and aggregates them into a central log server.  Then, our <a target="_blank" href="https://github.com/olark/hashmonitor">hashmonitor</a> tool parses the hashtags and counts their occurrences, and finally sends the calculated metrics into <a target="_blank" href="https://github.com/steiza/tinyfeedback">tinyfeedback</a> for viewing:</p>
<p><img src="http://www.olark.com/spw/wp-content/uploads/2012/03/warn-counter-example.png"/></p>
<p>&#8230;this is the simplest example of how we track issues in the wild.</p>
<p>One thing we have found incredibly useful about this approach is that <strong>we can always go back to investigate the original log messages</strong> when we see a spike in any of our metrics.  This allows us to quickly roll back a deployment, while still having enough data to dig into why it might be happening.</p>
<h1>How do we break down errors and warnings?</h1>
<p>Sometimes we want to dig deeper into a particular warning, so we need a special event name for it.  To accomplish this, our monitoring system allows multiple #hashtags in a single log message.  For example, we keep track of cookie issues:</p>
<pre>log("cookie problems #nocookies_for_session #warn")</pre>
<p>&#8230;which gives us a way to break out these specific warnings in a more detailed way:</p>
<p><img src="http://www.olark.com/spw/wp-content/uploads/2012/03/broken-cookies-example.png"/></p>
<p>In particular, these cookie metrics influenced our decision test cookie-setting before booting Olark, preventing strange behaviors when cookies could not be read on subsequent pages.</p>
<h1>How do we monitor performance?</h1>
<p>Our monitoring system also allows value-based metrics.  For example, we track the time when configuration assets are downloaded:</p>
<pre>log("received account configuration #perf_assets=200")</pre>
<p>&#8230;and our hashmonitor will automatically parse this as a <strong>value-based metric</strong> and calculate values for its distribution:</p>
<ul>
<li>average</li>
<li>median</li>
<li>1st/10th and 90th/99th percentile</li>
<li>standard deviation</li>
</ul>
<p><img src="http://www.olark.com/spw/wp-content/uploads/2012/03/assets-distribution-example.png"/></p>
<p>We have used this data to make important performance decisions, like adding these configuration assets to our CDN.  We were able to boost overall speed, and also tighten up the 90th-percentile load time by having geolocated CDN delivery.</p>
<h1>Does this really matter that much for user experience?</h1>
<p>Definitely.  To measure &#8220;soft&#8221; metrics like user experience, we look at the number of conversations that begin on Olark every minute.  This tells us that visitors are engaging with Olark and hopefully generating more sales opportunities for our customers.</p>
<p>Recently, we made some improvements that whittled down median load time.  As a result, we improved conversation volume by nearly 10%:</p>
<p><img src="http://www.olark.com/spw/wp-content/uploads/2012/03/convo-improvement-example.png"/></p>
<p>Having this deep monitoring has really helped us to effectively measure when our code changes positively impact the real world (and real people!).  We wouldn&#8217;t have it any other way.</p>
<p style="padding: 5px; background-color: #ffc"><em>If this stuff sounds interesting to you, come <a href="https://twitter.com/#!/mjpizz" target="_blank">find me</a> at <a target="_blank" href="http://2012.jsconf.us/#/schedule">JSConf</a> next week!</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.olark.com/spw/2012/03/dont-break-the-internet-with-your-javascript/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Ninja MySQL Backups: Your Silent Guardians Against Interweb Oni</title>
		<link>http://www.olark.com/spw/2012/02/ninja-mysql-backups-your-silent-guardians-against-interweb-oni/</link>
		<comments>http://www.olark.com/spw/2012/02/ninja-mysql-backups-your-silent-guardians-against-interweb-oni/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 10:55:32 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.olark.com/spw/?p=86</guid>
		<description><![CDATA[Hey there, guys! Aaron Wilson here, the ever-present but ever-invisible Olark Ruby Ninja Warrior. I&#8217;m coming out of the shadows to tell you a little bit about our fun journey with database backups. A good backup never lets you know &#8230; <a href="http://www.olark.com/spw/2012/02/ninja-mysql-backups-your-silent-guardians-against-interweb-oni/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Hey there, guys! Aaron Wilson here, the ever-present but ever-invisible Olark Ruby Ninja Warrior. I&#8217;m coming out of the shadows to tell you a little bit about our fun journey with database backups.</p>
<h2>A good backup never lets you know it&#8217;s there&#8230;</h2>
<p>The core of any startup is data; how it&#8217;s stored, how it&#8217;s processed, and how it&#8217;s interpreted are the very essentials of computing. Olark is no exception, and between document stores, keystores, relational databases, message queues, and so on, we&#8217;ve got a lot of information to manage. One of our most important datastores (although becoming less so, which could probably fill its own blog post) is a collection of MySQL databases that store, among other things, user information, user relationships and site configuration. A lot of these collections of bits are key pieces of product data that keep everything else running. If we lost this datastore, it would be a huge setback for us as a company.</p>
<p>And so, as we&#8217;ve focused over the last nine months or so on eliminating single points of failure from our system, making the databases redundant and backing up this data were two items on the list of problems to solve&#8211;not only does having backup data create peace of mind, it also gives us an easy source of staging data to test with (and destroy) before deployment. At the time, it was about 4.5GB of data to manage (now, it&#8217;s more). Most of this data was in a single database, the datastore for our Rails website, clocking in at around 95% of the data. The backups had these criteria to meet:</p>
<ol>
<li><strong>Compact:</strong> Keeping successive backups that are each ~4.5GB in size quickly adds up, even with storage space as cheap as it is these days. Compressing these with gzip is pretty effective, bringing it down to roughly a 30% of that, but it still adds up.</li>
<li><strong>Quick to restore:</strong> If this database goes down, important parts of our system become completely unusable. Even worse, with certain failure modes, future use can become unstable and need corrective maintenance. Minimizing these effects means minimizing the time it takes to get the restored data in place.</li>
<li><strong>Current:</strong> If our latest backup is from a week ago, that&#8217;s a week of interactions to recreate. Even a single missed day of data can have a huge impact, so our backups need to be current.</li>
<li><strong>Non-blocking:</strong> Obviously, if your backup process interrupts availability, you&#8217;re asking for problems to solve later. While the time the backups are taken is best done far away from peak load, availability is always important, especially when your customers are global.</li>
<li><strong>Tested:</strong> If you&#8217;ve never restored from your backup, you don&#8217;t have a backup!</li>
</ol>
<h2>Backup Dojo: Forge me into a sword, that I might slay my demons</h2>
<p>MySQL has built-in capabilities that solve part of these problems. Timely backups can be kept with <a href="http://dev.mysql.com/doc/refman/5.0/en/binary-log.html">binary logs</a>, which have the ability to replay all the SQL actions taken in a given time. Binary logs churn quickly, though, and should be kept locally to keep MySQL write actions from piling up and potentially hogging resources from other things. Since a lot of activity occurs (and actions are sometimes redundant), the size of these logs becomes unwieldy very quickly&#8211;we found that keeping more than a couple of days&#8217; worth wasn&#8217;t feasible. We settled, then, on snapshots, which would store state from given points in time that could be synched to present day with whatever binlogs we had on hand.</p>
<p>By default, MySQL doesn&#8217;t make this easy on any database larger than a few hundred megs (at least, not without <a href="http://www.mysql.com/products/enterprise/backup.html">paying for a license</a>). The go-to backup tool for MySQL, <a href="http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html">mysqldump</a>, is fine for small databases, but for us was taking close to an <em>hour</em> to take a full snapshot of the main database (and similar time to load). That&#8217;s bad, without any other qualification. All sorts of awful things happen in much less time, and having such a huge window for the snapshot to be interrupted is asking for trouble.</p>
<p>Luckily, the community has stepped up to fill in this gap in (the free version of) MySQL&#8217;s functionality: <a href="http://www.percona.com/software/percona-xtrabackup/">Percona XtraBackup</a>, a free, non-blocking, and blazing fast backup tool for InnoDB and XtraDB databases (we, incidentally, store all of this in InnoDB&#8211;anything MyISAM might serve us better for, we don&#8217;t store in MySQL). Percona works by making use of InnoDB crash recovery; it essentially simulates a crash, copies the raw datafiles manually to the backup location, and uses crash recovery to validate backup integrity and play up to the binlogs that happened during the file copy. The install and usage of the product isn&#8217;t completely trivial, but it&#8217;s not bad, and <a href="http://www.databasejournal.com/features/mysql/article.php/3912176/MySQL-Hotbackups-with-XtraBackup.htm">a wonderful article</a> by Sean Hull covers the essentials, so I won&#8217;t. Using XtraBackup cut the backup time from an hour to an astonishing five minutes, during which time the database was completely available (although not without caveats, which I&#8217;ll talk about in a bit). The restore process takes the same amount of time, and some steps of the restore process can be &#8220;pre-loaded&#8221; to make restoring from a particular backup take about half as long (more precisely, it can allow about half of the restore process to run in the background while the restored database is available for writes; details below). All of these steps I encapsulated in two Rake tasks&#8211;one for backup, and one for restore&#8211;which was then managed by a Python script that would bundle these backups with the others. The high-level of the backup Rake task looks like this:</p>
<ol>
<li>Load the Rails environment and grab the database credentials</li>
<li>Define some things, and look for/create a lock file&#8211;this is a low-cost, easy-to-implement way to make sure you&#8217;re not blindly re-running the backup process after it&#8217;s failed, or running the backup more than once concurrently.</li>
<li>Run innobackupex through Rake&#8217;s sh command, using the &#8211;slave-data option, which saves a bunch of useful auxiliary data that makes spinning up a <a href="http://dev.mysql.com/doc/refman/5.0/en/replication.html">replicated DB easier</a>.</li>
<li>Make sure the backup actually exists, and run innobackupex with the &#8211;apply-log option to run crash InnoDB&#8217;s crash recovery process</li>
<li>Create a &#8220;prepared copy&#8221; of the backup. Since the restore of the database involves turning off the DB, replacing the data directory with the backup, and turning the database back on, we want to cut down on the amount of time it takes to replace those files, which means using &#8220;mv&#8221; rather than &#8220;cp&#8221; (shifting disk references to 4GB on the machine in question takes mere seconds&#8211;actually copying the files took, at the time, up to five minutes). If you mv the backed up directory, though, then you only get to restore your backup once, after which it no longer exists. That would be pretty silly. To solve this, we make a redundant copy of the backup directory and designate it the &#8220;prepared&#8221; one&#8211;whenever we run a restore, we&#8217;ll mv this directory in, and then after we turn the DB back on with the backup, we start a cp in the background to create a new prepared directory.</li>
<li>Delete all but some number of past backups. We actually only keep a day&#8217;s worth of backups in an uncompressed state&#8211;the above-mentioned Python script takes care of compressing older backups and moving them around to keep our disk from filling up. This task, though, only manages them before they&#8217;re compressed.</li>
<li>Assuming everything completed successfully, remove the lock file to signal that we&#8217;re open for business.</li>
</ol>
<p>Pretty straightforward. The restore task is similar: Find the prepared copy (create one if it doesn&#8217;t exist), turn off MySQL, move the files in, turn MySQL back on. The only weirdness, here, is that a restore might be happening from a different environment than the backup originated&#8211;in particular, we completely restore the staging database every day from the last day&#8217;s production data, and those databases have different credentials, and, crucially, different database names. Since the backup data is binary, there&#8217;s no simple way to change the name of the database on the backups themselves, meaning the renames have to be performed in MySQL. The commands look like this:</p>
<pre><code>STOP SLAVE; #if we took this backup from a slave in a replication setup, we don't want it to continue trying to run as a slave
SET SESSION group_concat_max_len=4096; #We need to generate a very long query, so we want to make sure it doesn't get truncated by the default max
SELECT @stmt := CONCAT('RENAME TABLE ',GROUP_CONCAT(table_schema,'.',table_name,' TO ','&lt;current_env_db_name&gt;.',table_name),';') FROM information_schema.TABLES WHERE table_schema LIKE '&lt;previous_env_db_name&gt;' GROUP BY table_schema;
PREPARE rename_schema FROM @stmt;
EXECUTE rename_schema;
</code></pre>
<p>&#8230;then appropriate revokes and grants are executed to make sure the Rails environment can properly execute, and that there aren&#8217;t any old environment users hanging around to make things confusing. Piece of cake, right?</p>
<h2>Pain is my greatest teacher, my scars my greatest strength</h2>
<p>There are caveats to this process. One fun thing that I discovered while creating the above task is that XtraBackup (understandably) is murder on disk i/o. The database is still available for reads, but writes will hang. This won&#8217;t crash MySQL unless you&#8217;re at high load, but if your subsidiary services that cause writes don&#8217;t handle timeouts gracefully, they may crash/hang/explode. The best solution to this is to run a replicated setup, and have XtraBackup run on one of your read-only slaves. For redundancy, we actually have multiple slaves, including one that doesn&#8217;t process any reads or writes in production; it simply keeps itself dutifully rolled up to the master. Setting up the replication process to run from that server suited us just fine.</p>
<p>And so, those Rake tasks are run by the Python script I&#8217;ve already mentioned, which is scheduled by cron. The Python script also runs mysqldump backups of the smaller MySQL databases, manages compressing/deleting old copies of the database, performs a staging data restore (which has the added benefit of being a daily test of our restore process, since our staging environment is intentionally as close to identical to our production environment as possible), and finally, sends copies of our backups completely offsite, in case of a meteor attack on Rackspace (we&#8217;re never safe until the <a href="http://smbc.myshopify.com/collections/frontpage/products/revenge-13x9-poster">dinosaurs can fight back</a>). If any step of the process fails, monitoring systems send us an email.</p>
<p>And that&#8217;s that! A few incantations, and a whole lot of peace of mind.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olark.com/spw/2012/02/ninja-mysql-backups-your-silent-guardians-against-interweb-oni/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LightningJS: safe, fast, and asynchronous third-party Javascript</title>
		<link>http://www.olark.com/spw/2011/10/lightningjs-safe-fast-and-asynchronous-third-party-javascript/</link>
		<comments>http://www.olark.com/spw/2011/10/lightningjs-safe-fast-and-asynchronous-third-party-javascript/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 14:00:29 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.olark.com/spw/?p=46</guid>
		<description><![CDATA[Why do we care about third-party Javascript embedding? Over the last 5 years or so, embedding Javascript code has become the norm. Much of this code is delivered by third-party services like Google Analytics and others. In fact, I just &#8230; <a href="http://www.olark.com/spw/2011/10/lightningjs-safe-fast-and-asynchronous-third-party-javascript/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h1 id="why_do_we_care_about_third_party_javascript_embedding">Why do we care about third-party Javascript embedding?</h1>
<p>Over the last 5 years or so, embedding Javascript code has become the norm. Much of this code is delivered by third-party services like Google Analytics and others. In fact, I just checked this morning and the Olark website embeds at least six separate third-party Javascript tools ranging from <a href="http://www.google.com/analytics/">website</a> <a href="http://mixpanel.com">analytics</a>, to <a href="http://www.optimizely.com/">A/B testing frameworks</a>, to <a href="http://disqus.com">commenting systems</a>…and of course our very own <a href="http://www.olark.com">chat box</a>.</p>
<p>The advantages are obvious: we didn’t have to write a single line of code. Nor did we have the operational headache of spinning up those services on our own machines. By simply dropping in a bit of embedded Javascript, the third-party code connects to its own services and “just works”. The process is easy enough that even non-technical people can usually add embed code easily via their CMS admin panels (e.g. WordPress).</p>
<p>The disadvantage is that each of these embedded Javascript libraries can add overhead to the original website. Slowdowns can (and do) happen if the third-party servers are slow to deliver the code to the browser. Even asynchronous embed techniques will still block the <code>window.onload</code> event <a href="http://www.webkit.org/blog/1395/running-scripts-in-webkit">until the third-party code finishes downloading</a>.</p>
<p>Additionally, as a third-party Javascript provider, you need to worry about whether your customers have embedded other libraries that might conflict with yours. These conflicts can range from changing globals to adding prototypes, and even overriding native functions &#8211; at Olark we have seen websites that override both <code>window.escape</code> and the native JSON decoders.</p>
<h1 id="what_can_we_do">What can we do?</h1>
<p>We need a way to embed Javascript code that gives us the following benefits:</p>
<ul>
<li><strong>Safe:</strong> gives our code a context that is safe from Javascript conflicts</li>
<li><strong>Fast:</strong> does not affect the loading speed of the parent page (including <code>window.onload</code>)</li>
<li><strong>Asynchronous:</strong> still allows our Javascript functions to be called easily</li>
</ul>
<p>Fortunately, Meebo already <a href="http://blog.meebo.com/?p=2956">blogged in detail</a> about their solution nearly a year ago. Awesome! There were a few things missing from the example code though:</p>
<ul>
<li>relied on the Meebo build system</li>
<li>missing public tests and benchmarks</li>
<li>left out the “other half” of the system (the bootstrapping portion for the actual library)</li>
</ul>
<p>We even built upon this embed code internally at Olark, adding a few minor fixes and the bootstrapping necessary for our asynchronous API calls. Last week we spent some time extracting the core concepts and distilling it down into a single reusable codebase…and we are pumped to finally to release it to the community <img src='http://www.olark.com/spw/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h1 id="introducing_lightningjs">Introducing LightningJS</h1>
<p>LightningJS allows third-party providers to deliver their Javascript in a way that is <strong>safe</strong> (each library gets its own <code>window</code> context while still having access to the original document), <strong>fast</strong> (does not block <code>window.onload</code>), and <strong>asynchronous</strong> (exposes an easy way to asynchronously call methods). You can get more detailed information and source code from <a href="http://lightningjs.com">lightningjs.com</a>.</p>
<p>Here is a brief look at LightningJS in action…</p>
<h2 id="what_does_it_look_like">What about an example?</h2>
<p>Let’s say that we are Pirates Incorporated, purveyors of all things <a href="http://www.pirateglossary.com/">piratey</a> on the interwebs. When using LightningJS, we can tell our customers to paste code like this into their HTML page:</p>
<pre><code>&lt;script type="text/javascript"&gt;
/*** the code from lightningjs-embed.min.js goes here ***/
window.piratelib = lightningjs.require("piratelib", "//static.piratelib.com/piratelib.js");
&lt;/script&gt;
</code></pre>
<p>Our customers can call methods on <code>piratelib</code> immediately, even though none of our code has actually loaded yet:</p>
<pre><code>piratelib("fireWarningShot", {direction: "starboard"})
</code></pre>
<p>This calls the <code>fireWarningShot</code> method on our API. At some point, we decide to return a value to our customers that indicates whether the warning shot was seen. We also decide to throw exceptions in cases where the warning shot failed. Since LightningJS already implements the <a href="http://wiki.commonjs.org/wiki/Promises/A">CommonJS Promise API</a>, we can use the <code>.then(fulfillmentCallback, errorCallback)</code> method to handle return values and exceptions:</p>
<pre><code>piratelib("fireWarningShot", {direction: "starboard"}).then(function(didSee) {
    if (!didSee) {
        // arrr, those landlubbers didn't see our warning shot...we're no
        // scallywags, so run another shot across the bow
        piratelib("fireWarningShot", {direction: "starboard"});
    }
}, function(error) {
    if (error.toString() == "crew refused") {
        // blimey! it's mutiny!
    }
})
</code></pre>
<h2 id="what_about_the_hard_data">What about the hard data?</h2>
<p>Exhaustive browser support is probably the most important in terms of our measurement. To that end, the included test cases pass in every browser we could get our hands on:</p>
<ul>
<li>Firefox 2+ (tested in 2.0, 3.0, 3.6, 4.0, 5.0, 6.0, 7.0)</li>
<li>Chrome 12+ (tested in 12, 13, 14, 15)</li>
<li>Internet Explorer 6+ (tested in 6, 7, 8, 9)</li>
<li>Safari 4+ (tested in 4.0, 5.0, 5.1)</li>
<li>Opera 10+ (tested in 10, 11.5)</li>
<li>Mobile Safari 5+ (tested in 5.0, 5.1)</li>
</ul>
<p>&#8230;and for all you practical folks out there, it should help knowing that embed techniques used in LightningJS have been battle-tested in the wild by both Olark and Meebo across thousands of websites and browsers.</p>
<p>We also attempted to benchmark the performance of the LightningJS embed code under the worst-case scenario where third-party server performance is the bottleneck. To achieve this, we contrived a page with built-in delays that would ideally:</p>
<ul>
<li>fire <code>document.ready</code> after ~1s</li>
<li>fire <code>window.onload</code> after ~2s</li>
<li>finish downloading the third-party code after ~5s</li>
</ul>
<p>Timing this benchmark was a bit difficult over a tunneled connection (we used the otherwise excellent <a href="http://www.browserstack.com">BrowserStack</a> to run them), but the results demonstrated that LightningJS always had better or equal behavior to traditional embed codes.</p>
<p>In the modern browsers we tested (all versions of Firefox, Chrome, Safari, and Mobile Safari), LightningJS always bested the traditional asynchronous embed code by not blocking the <code>window.onload</code> event:</p>
<table>
<tbody>
<tr>
<td style="font-weight: bold; background-color: #ddd;">Event</td>
<td style="font-weight: bold; background-color: #ddd;">Traditional Synchronous</td>
<td style="font-weight: bold; background-color: #ddd;">Traditional Asynchronous</td>
<td style="font-weight: bold; background-color: #ddd;">LightningJS</td>
</tr>
<tr>
<td>document.ready</td>
<td style="background-color: #aa0200; color: white;">~5s</td>
<td style="background-color: #00aa13; color: white;">~1s</td>
<td style="background-color: #00aa13; color: white;">~1s</td>
</tr>
<tr>
<td>window.onload</td>
<td style="background-color: #aa0200; color: white;">~5s</td>
<td style="background-color: #aa0200; color: white;">~5s</td>
<td style="background-color: #00aa13; color: white;">~2s</td>
</tr>
<tr>
<td>third-party loaded</td>
<td style="background-color: #00aa13; color: white;">~5s</td>
<td style="background-color: #00aa13; color: white;">~5s</td>
<td style="background-color: #00aa13; color: white;">~5s</td>
</tr>
</tbody>
</table>
<p>We saw similar improvements in Internet Explorer, though due to caching we could not measure whether LightningJS was better or equal to the traditional asynchronous approach.</p>
<p>In Opera, the results were even better &#8211; it appears that traditional asynchronous code actually blocks <code>document.ready</code> as well. LightningJS never blocked <code>document.ready</code>, though it seems that none of the embed codes can avoid blocking <code>window.onload</code> in Opera.</p>
<h1 id="what8217s_next">What’s next?</h1>
<p>There are a lot of third-party services out there. We certainly hope that they will take some notice of LightningJS and start taking advantage of the benefits it provides. As a customer, it probably wouldn’t hurt to try asking them <img src='http://www.olark.com/spw/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>If you have some ideas on how to tighten up the compressed embed code and any other improvements, don’t forget to <a href="https://github.com/olark/lightningjs">fork LightningJS on GitHub</a>. Even better, get in touch with us here at Olark…<a href="http://www.olark.com/jobs">we’re hiring</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olark.com/spw/2011/10/lightningjs-safe-fast-and-asynchronous-third-party-javascript/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>You can list a directory containing 8 million files! But not with ls..</title>
		<link>http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/</link>
		<comments>http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/#comments</comments>
		<pubDate>Thu, 11 Aug 2011 19:37:45 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[developers]]></category>

		<guid isPermaLink="false">http://www.olark.com/spw/?p=14</guid>
		<description><![CDATA[I needed to list all files in a directory, but ls, find, and os.listdir all hung. This is my story. NOTE: there is no good reason that you should ever have 8 million files in the same directory, but if &#8230; <a href="http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I needed to list all files in a directory, but <code>ls</code>, <code>find</code>, and <code>os.listdir</code> all hung.  This is my story.</p>
<p>NOTE: there is no good reason that you should ever have 8 million files in the same directory, but if you do, this is your solution <img src='http://www.olark.com/spw/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p>
<p>TLDR: Write a C program that calls the syscall <a href="http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html" target="_blank">getdents</a> directly, with a large buffer size, ignore entries with inode == 0.</p>
<p><strong>Why doesn’t ls work?</strong></p>
<p><code>ls</code> and practically every other method of listing a directory (including python os.listdir, find .) rely on libc <code>readdir()</code>. However <code>readdir()</code> only reads 32K of directory entries at a time, which means that if you have a lot of files in the same directory (i.e. 500M of directory entries) it is going to take an insanely long time to read all the directory entries, especially on a slow disk.  For directories containing a large number of files, you’ll need to dig deeper than tools that rely on readdir().  You will need to use the <code>getdents()</code> syscall directly, rather than helper methods from libc.</p>
<p><strong>How to quickly list a directory with 8 million files</strong></p>
<p>The trick is to understand <code>getdents()</code>, the low level system call that reads directory entries from disk, and returns a directory entry (dirent) data structure.</p>
<p><code>GETDENTS (filehandle for directory entries, *directory entry pointer, number of bytes to read)</code></p>
<p>Luckily the man page has a lot of detail, and provides C source code to list all the files in a directory.  Read it <a href="http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html" target="_blank">here</a> and copy the source code to a file, like listdir.c.</p>
<p>There are two modifications you will need to do in order quickly list all the files in a directory.  First, increase the buffer size from X to something like 5 megabytes.<br />
<code>#define BUF_SIZE 1024*1024*5</code></p>
<p>Then modify the main loop where it prints out the information about each file in the directory to skip entries with inode == 0.  I did this by adding<br />
<code>if (dp-&gt;d_ino != 0) printf(...);</code></p>
<p>In my case I also really only cared about the file names in the directory so I also rewrote the printf() statement to only print the filename.<br />
<code>if(d-&gt;d_ino) printf("%s\n  ", (char *) d-&gt;d_name);</code></p>
<p>Compile it (it doesn’t need any external libraries, so it’s super simple to do)<br />
<code>gcc listdir.c -o listdir</code></p>
<p>Now just run<br />
<code>./listdir [directory with insane number of files]</code></p>
<p>Presto, you should see all the files in your insanely large directory <img src='http://www.olark.com/spw/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p>
<p>In my case I did<br />
<code>./listdir [directory with insane number of files] &gt; output.txt</code><br />
and then used the contents of output.txt to process the files that were previously “unlistable”.</p>
<p><strong>Why did we run into this problem in the first place? </strong></p>
<p>Without going into too many details, we needed a simple datastore where we could find an entry based on a key, append values to each key, and expire keys that were older than a certain threshold, while performing a cleanup operation on the stored values.  The filesystem actually works really well for this cases as long as there aren’t that many keys active at the same time.  (i.e. on the file system listing all keys is the slowest operation, so if there aren’t many keys there’s nothing to worry about).</p>
<p>Something got screwed up, and caused the expiration action to fail.  So instead of keeping a small number of files in the directory, we started generating a lot of entries that were not being cleaned up.  The next time the cleanup operation ran, it could no longer list all the keys in a reasonable amount of time (reading 32K of directory entries at a time).  This compounded the problem and lead to a state where new files were being created, but old files were not cleaned up.  Soon we had 8 million files in a single directory.  We were able to fix the root cause fairly quickly by resolving a bug in our “sweeper” and creating a new cache directory, but we still have 8 million files that needed to be processed.  Keep in mind the timescale here is a matter of hours.  (Luckily this problem happened on a weekend, was caught fast, and had very little effect on our customers). One important thing to keep in mind, is at this point we had no idea how many files needed to to be processed, all we knew was that <code>ls</code> and <code>os.listdir()</code> were both hanging when trying to list all the files in a directory.</p>
<p>I asked in IRC, and tried searching google without much luck.  Other people had run into this problem before, but no one had a good solution.  In fact I recall running into a similar problem listing a mailqueue in Qmail many years ago, but  I have no idea how we solved it.  (If you like solving this sort of problem, we are hiring: <a href="mailto:jobs@olark.com">jobs@olark.com</a>)</p>
<p><strong>Diagnosing the unknown</strong></p>
<p>Whenever I see something hang without debugging output I turn to <code>strace</code>.  strace lets you watch the system calls made during program execution and let’s you see what’s going on when no output is being printed.</p>
<p>I tried<br />
<code> strace find .</code><br />
<code> strace ls</code><br />
and<br />
<code> strace python<br />
import os<br />
os.listdir(".")</code></p>
<p>All of these functions produced basically the same strace output:</p>
<p>First they open the file containing information about a directory<br />
<code> open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5<br />
</code></p>
<p>Then they make the getdents syscall which returns directory entries<br />
<code> getdents(5, /* 586 entries */, 32768)  = 32752<br />
</code></p>
<p>It took me a while to figure this out, but basically the <code>getdents</code> call<br />
<a href="http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html" target="_blank">http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html</a></p>
<p>looks like this:<br />
<code>int getdents(unsigned int fd, struct linux_dirent *dirp,                     unsigned int count);</code></p>
<p>where count is really size of buffer. (in this case 32K)</p>
<p>Of course I didn&#8217;t notice this until someone in IRC mention that you could do<br />
<code>ls -dl . </code><br />
to see the size of the file storing directory entries.</p>
<p>In my directory I got:<br />
<code>drwxr-xr-x 2 root root 537919488 Jul 29 04:55 . (513M)</code></p>
<p>Putting two and two together I could see that the reason it was taking forever to list the directory was because <code>ls</code> was reading the directory entries file 32K at a time, and the file was 513M.  So it would take around 16416 system calls of <code>getdents()</code> to list the directory.  That is a lot of calls, especially on a slow virtualized disk.</p>
<p>This lead me to the solution above, increase the read buffer size for getdents() to decrease the number of system calls and speed up all the disk access <img src='http://www.olark.com/spw/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p>
<p><strong>Take aways:</strong></p>
<p>1) It is possible to list a directory with 8 million files in it.</p>
<p>2) strace is your friend</p>
<p>3) Don&#8217;t be afraid to compile code and modify it (hell, simple C compiles so fast it could be interpreted)</p>
<p>4) There is no good reason to have 8 million files in a directory <img src='http://www.olark.com/spw/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> , but this was a good learning experience (and possibly a good interview question).</p>
<p><strong>Appendix:</strong></p>
<p>After all this I was still a little curious about where the 32K buffer size in <code>ls</code>  came from so I downloaded to source to coreutils (contains <code>ls</code>) and poked around.  (NOTE: this is mostly just some notes I took, and not a detailed analysis)</p>
<p>Search through the code, and you’ll find that <code>ls</code> called <code>readdir()</code> with a directory pointer.<br />
<code>while (1)<br />
{<br />
/* Set errno to zero so we can distinguish between a readdir failure<br />
and when readdir simply finds that there are no more entries.  */<br />
errno = 0;<br />
next = readdir (dirp);<br />
if (next)<br />
</code></p>
<p><code>readdir()</code> is defined in libc.  So I continued my search and downloaded the source to libc to figure out how the buffer size for <code>readdir()</code> was determined.</p>
<p>getdents() is called inside of readdir() (as expected)<br />
<code> bytes = __GETDENTS (dirp-&gt;fd, dirp-&gt;data, maxread);<br />
if (bytes &lt;= 0)<br />
</code></p>
<p>And the byte size comes from the size of the directory entry struct, or the <code>dirp-&gt;allocation</code>.  Given that we were reading multiple entries, I am pretty sure the maxread variable was being set from <code>dirp-&gt;allocation</code>.<br />
<code>/* Fixed-size struct; must read one at a time (see below).  */<br />
maxread = sizeof *dp;<br />
#else<br />
maxread = dirp-&gt;allocation;<br />
#endif</code></p>
<p>If you poke around in<br />
<code>sysdeps/unix/dirstream.h<br />
sysdeps/unix/opendir.c<br />
</code></p>
<p>You can see where this value gets set. However, it certainly doesn’t appear to changed based on the size of the directory entry file.  (Perhaps the buffer should be dynamically set based on the size of the directory entry file)</p>
<p>Finally, to learn about the trick for skipping deleted files, I noticed a line:<br />
<code> if (dp-&gt;d_ino == 0) ..</code> inside of <code>ls</code>, which is used to filter out deleted files from appearing when a directory is listed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Motivation: freedom</title>
		<link>http://www.olark.com/spw/2011/07/motivation-freedom/</link>
		<comments>http://www.olark.com/spw/2011/07/motivation-freedom/#comments</comments>
		<pubDate>Thu, 28 Jul 2011 08:33:23 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.olark.com/spw/?p=6</guid>
		<description><![CDATA[For me building companies was never about money.  It has always been about creating self sustaining organizations where I could hang out with my friends doing something that we all enjoyed. <a href="http://www.olark.com/spw/2011/07/motivation-freedom/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<blockquote><p>“Choose a job you love, and you will never have to work a day in your life.”<br />
—   Confucius</p></blockquote>
<p>For me building companies was never about money.  It has always been about creating self sustaining organizations where I could hang out with my friends doing something that we all enjoyed.</p>
<p>I grew up in the greater Washington DC Area, a suburban sprawl where almost everyone I knew grew up to work for the federal government in one way or another.  A job was paycheck, and work was work.  In this world of 9-5 jobs, my father was a countercultural example:  a professor with a flexible schedule, and the freedom to spend his free time thinking and writing about his interests.  Before I knew I wanted to start businesses, I knew that I wanted to create a path that would provide me the flexibility and freedom to choose my own direction.</p>
<blockquote><p>&#8220;If you want to understand the entrepreneur, study the juvenile delinquent. The delinquent is saying with his actions, &#8220;This sucks. I&#8217;m going to do my own thing.&#8221;<br />
—Yvon Chouinard, Founder of Patagonia</p></blockquote>
<p>I always felt that the biggest risk we took as founders was that if somehow we were not successful we might find ourselves merely working a job for a paycheck.  At best we would be working on someone else’s dream, at worst we’d forget that we had dreams of our own.  This risk has been a guiding principle for the growth of Olark (<a href="http://www.olark.com">http://www.olark.com</a>), a company I cofounded with a few close friends.  As we expand we are looking to grow our team by adding other individuals who share our passions and are fulfilled by helping shape a company that they also own.</p>
<p>I remember my first entrepreneurial success.  At the end of middle school I decided to spend my time doing something useful and to stop wasting my time playing computer games and building pointless websites.  I somehow managed to convince a few people to hire me to do web development work, installing scripts, and building simple websites for dollars an hour (it was a real good deal for them &#8211; and for me, as I was basically being paid to learn).  Shortly after dabbling with custom development work my cousin Roland and I started Nethernet Consulting with a $100 loan from his parents to buy the nethernet.com domain name.  (In retrospect, should have bought something like search.com as 1997 was still the wild west of domain names).  Just a few months after getting started we landed a development contract that allowed me to bring on my friend Kevin as a 3rd partner in Nethernet.  The success was that Kevin was able to quit his mindless job typing up the newsletter for his mom’s church, and instead do what he loved, write computer code at 100 WPM.  I had liberated my first friend.</p>
<p>Around my junior year of High School Nethernet the consulting company morphed into Netherweb the web hosting company.  There were a variety of reasons for this decision, but the main reason was that consulting really wasn’t that much fun, you were stuck either finding new jobs, or in the best case just getting paid to do more work for someone else.  In 1998 we invested some of our profits from Nethernet into our first web server, Davinci. (As was the fad at the time we named our first servers after renaissance painters).  For three High School students the best thing about running a web hosting company was unlimited access to computer hardware.  We built our own servers, taught ourselves how to manage Cisco switches bought from eBay, and learned all we could about how the Internet worked.  The added stress and sense of accomplishment from running our own company helped us move so much faster than what is possible in the classroom.</p>
<p>Kevin, Roland, and I were fascinated with computers and the Internet.  The reason we were able to work so hard at starting a company in high school was because we loved what we were doing.  Extrinsic rewards can never match the intrinsic reward of doing what you really enjoy.</p>
<p>Netherweb never was amazingly successful, but it sure beat the jobs my friends had in high school and undergrad.  As college freshmen we rented an office at Virginia Tech’s corporate research center.   We decked it out with whiteboards, and cheapest chairs and folding tables we could find (we took a similar approach for Olark).  In our minds the office added legitimacy to what we were doing, we had a fancy address: 2000 Kraft Drive, and access to a fancy board room, but we didn’t have the same dedication as in high school.  We stopped doing customer service ourselves and hired a few of our friends, and some outside contractors to keep our customers happy.  Losing touch with our customers was one of the biggest mistakes we made with Netherweb, and was a important learning experience.  I will never let that happen again.  At Olark every employee does a bi-weekly rotation on support, from the CEO to the most junior engineer, we’ve ingrained our culture with a call to serve our customers.</p>
<blockquote><p>“Customer Service Isn’t Just A Department!”<br />
- Tony Hsieh, CEO Zappos</p></blockquote>
<p>Rome wasn’t built in a day, and Netherweb didn’t die in a day either, in fact when we as founders stopped paying attention to support Netherweb was still a growing company.  We learned so much through failure.</p>
<p>Netherweb was run as a 4 hour workweek company long before Tim Feris coined the term.  One night a week Kevin and I accompanied by one of our friends (usually Alpha) would head out to the office and hack late into the night.  In those days we pumped most of our revenue back into the company so on the days we worked in the office instead of paying ourselves an hourly wage we would go out for a really nice company sponsored dinner.  Taking our friends out to a nicer dinner after a day of hard work was much more appreciated than the equivalent salary, and much more fun.  In my experience fringe benefits are almost always valued at more than their cash equivalent.  Understanding how to hacking reward systems to make your team happy is just part of the fun of running a company.</p>
<p>We learned also how to hack code.  In web hosting three things are important: uptime, speed, and customer service.  When you are running a web hosting company on the side, the first thing you’ll realize is that you have to do a lot of customer service when the servers go down or are slow.  If you can keep things fast, and the servers on, you won’t need to do as much customer support.  If you can make it so that your customers can order your service and be setup instantly, you can make money in your sleep.  Netherweb was one of the first companies to launch a clustered hosting solution, we were too dumb to know how to market this effectively, but by hosting our customers sites across multiple servers we were able to eliminate downtime, deal with busy sites, and never have to wake up in the middle of night when a server went down.  We were also one of the first companies to completely automate the web hosting order process, our customers could signup for web hosting, buy a plan, and be live on a server in minutes &#8212; believe it or not it use to take days for a some web hosting to create a new account.  In retrospect we were much more intrigued by the technology, and building a cool product than we were with running a business.  It’s as if the primary purpose of the business was to enable us to play around with cool technology, rather than provide our customers with a service.  I still love playing around with cool technology to build awesome products, although now the awesomeness of the product is a function of how much our customers love it, rather than it’s technical coolness.</p>
<p>Even while serving 1000s of customers we never fully committed to Netherweb.  It was a fun summer job, a great learning experience, a good story, but it always was just something we did on the side while pursuing other goals.  We sold Netherweb in December of 2008 to avoid making the same mistake for our current venture, Olark.  Roland and I founded Olark Live Chat (<a href="http://www.olark.com">http://www.olark.com</a>) from the ashes of Netherweb, adding Matt and Zach as early founders to help us build the new company.  From the beginning we committed much more to Olark than we ever did to Netherweb, where Netherweb was do or do school work, Olark was do or get a real job.   Where Netherweb one of many juggled ideas Olark has become the one idea.  Olark has become the vehicle for fulfilling my motivations.  If there is anything to learn from this, it’s that you can get pretty far working on something part-time, but it’s only once you fully commit to it that you will know where it can take you.</p>
<p>I love my wife. I love my life. I love my job. I love building a great product that customers love.  I love building a great company where each and every team member has the flexibility and freedom to do what they love, with ownership over the fruits of their labor.  I love building a company that delivers happiness to our employees, our customers, and our customer’s customers.  I love continuously learning from both from success and failure to iterate, improve, and try again.  I love creating a company where I want to work.  The freedom and direction to create my path and choose my direction while making the world a better place, is my sappy reason for doing what I do.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olark.com/spw/2011/07/motivation-freedom/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>

