Don't break the Internet with your Javascript

Every day, Olark sees more than 3 million visits across thousands of different websites and browsers. We constantly have to ask ourselves: are we causing any issues or slowdowns on our customers' websites?

Knowing the answers to these questions is critical to our success.

A casual misstep means we could be silently breaking thousands of websites, halting transactions, angering the world, causing the next recession...you know, the usual.

Previously we wrote briefly about the challenges of providing third-party Javascript as a service. This time around, we'll talk a bit more about these challenges, and what we do at Olark to thrive in spite of it.

Why is providing 3rd-party Javascript so challenging?

Living inside other companies' websites means a lot less control...and sometimes it feels a lot like running through a minefield. While we can sidestep some challenges with recent techniques, the fact remains that our code executes in a hostile environment. Here is a short list of real problems that we have seen in the wild:

Poorly sandboxed Javascript
- overridden builtins (e.g. custom window.escape)
- overridden global prototypes (e.g. toJSON)
- jQuery plugins that capture all keyboard events
Cookie idiosyncrasies
- too many cookies on the host page
- cookies disallowed inside of iframed websites
Inconsistent browser caching
- out-of-date caches lasting longer than expected
- sporadic @font-face re-downloading
Broken CSS
- different HTML doctypes
- conflicting (or overly broad) CSS rules
Old versions of the embed code
- CMS plugins that continue to use our old embed code (e.g. for WordPress, Joomla, etc)

With this long list of variables that we cannot control, unit and functional testing has slowly become less effective at uncovering real-world issues. Ever seen a browser testing tool that offers "Internet Explorer 7 with XHTML doctype plus custom JSON overrides" as one of its environments? Neither have we :P

So how can you survive (and thrive) across thousands of websites?

One word: monitoring. Deep, application-level monitoring.

At Olark, we have developed a collection of tools to do application monitoring via log analysis. Our monitoring architecture looks looks something like this:

In the end, we simply write code like this in Javascript:

log("something weird happened #warn")

…this will track "warn" events. Notice that we use #hashtags to name the events we want to show up in our metrics.

Our system collects these log messages from Javascript and aggregates them into a central log server. Then, our hashmonitor tool parses the hashtags and counts their occurrences, and finally sends the calculated metrics into tinyfeedback for viewing:

…this is the simplest example of how we track issues in the wild.

One thing we have found incredibly useful about this approach is that we can always go back to investigate the original log messages when we see a spike in any of our metrics. This allows us to quickly roll back a deployment, while still having enough data to dig into why it might be happening.

How do we break down errors and warnings?

Sometimes we want to dig deeper into a particular warning, so we need a special event name for it. To accomplish this, our monitoring system allows multiple #hashtags in a single log message. For example, we keep track of cookie issues:

log("cookie problems #nocookies_for_session #warn")

…which gives us a way to break out these specific warnings in a more detailed way:

In particular, these cookie metrics influenced our decision test cookie-setting before booting Olark, preventing strange behaviors when cookies could not be read on subsequent pages.

How do we monitor performance?

Our monitoring system also allows value-based metrics. For example, we track the time when configuration assets are downloaded:

log("received account configuration #perf_assets=200")

…and our hashmonitor will automatically parse this as a value-based metric and calculate values for its distribution:

average
median
1st/10th and 90th/99th percentile
standard deviation

We have used this data to make important performance decisions, like adding these configuration assets to our CDN. We were able to boost overall speed, and also tighten up the 90th-percentile load time by having geolocated CDN delivery.

Does this really matter that much for user experience?

Definitely. To measure "soft" metrics like user experience, we look at the number of conversations that begin on Olark every minute. This tells us that visitors are engaging with Olark and hopefully generating more sales opportunities for our customers.

Recently, we made some improvements that whittled down median load time. As a result, we improved conversation volume by nearly 10%

Having this deep monitoring has really helped us to effectively measure when our code changes positively impact the real world (and real people!). We wouldn't have it any other way.