Post Mortem report: 19 October 2016

On Wednesday 19 October, Envato Market sites suffered a prolonged incident and were intermittently unavailable for over eight hours. The incident began at 01:56 AEDT (Tuesday, 18 October 2016, 14:56 UTC) and ended at 10:22 AEDT (Tuesday, 18 October 2016, 11:22 UTC). During this time, users would have seen our “Maintenance” page intermittently and therefore would not have been able to interact with the sites. The issue was caused by an inaccessible directory on a shared filesystem, which in turn was caused by a volume filling to capacity. The incident duration was 8 hours 26 minutes; total downtime of the sites was 2 hours 56 minutes.

We’re sorry this happened. During the periods of downtime, the site was completely unavailable. Users couldn’t find or purchase items, authors couldn’t add or manage their items. We’ve let our users down and let ourselves down too. We aim higher than this and are working to ensure it doesn’t happen again.

In the spirit of our “Tell it like it is” company value, we are sharing the details of this incident with the public.

To The Cloud in-depth

In a previous post, Envato Market: To The Cloud! we discussed why we moved the Envato Market websites to Amazon Web Services (AWS) and a little bit about how we did it. In this post we’ll explore more of the technologies we used, why we chose them and the pros and cons we’ve found along the way.

To begin with there are a few key aspects to our design that we feel helped modernise the Market Infrastructure and allowed us to take advantage of running in a cloud environment.

  • Where possible, everything should be an artefact
    • Source code for the Market site
    • Servers
    • System packages (services and libraries)
  • Everything is defined by code
    • Amazon Machine Images (AMIs) are built from code that lives in source control
    • Infrastructure is built entirely using code that lives in source control
    • The Market site is bundled into a tarball using scripts
  • Performance and resiliency testing
    • Form hypotheses about our infrastructure and then define mechanisms to prove them

We made a few technical decisions to achieve these goals along the way. Here we’ll lay those decisions out and why it worked for us, as well as some caveats we discovered along the way, but first.

Envato Market: To The Cloud!

This is the story of how we moved Envato’s Market sites to the cloud. Envato Market is a family of seven themed websites selling digital assets. We’re busy; our sites operate to the tune of 25,000 requests per minute on average, serving up roughly 140 million pageviews per month. We have nearly eleven million unique items for sale and seven million users. We recently picked this site up out of its home for the past six years and moved to Amazon Web Services (AWS). Read on to learn why we did it, how we did it, and what we learned!

Getting Envato Market HTTPS everywhere

Last month we announced that we had finally completed the move to HTTPS everywhere for Envato Market. This was no easy feat since we are serving over 170 million page views a month that includes about 10 million products listed and are all user generated content. Along the way we have learnt many valuable lessons that we want to share with the wider community and hopefully make other HTTPS moves easier and encourage a better adoption of HTTPS everywhere.

How we tracked down Ruby heap corruption in amongst 35 million daily requests

Back in November 2015, one of the Envato Market developers made a startling discovery - our exception tracker was overrun with occurrences of undefined method exceptions with the target classes being NilClass and FalseClass. These type of exceptions are often a symptom that you’ve written some Ruby code and not accounted for a particular case where the data you are accessing is returning nil or false. For our users, this would manifest itself as our robot error page letting you know that we encountered an issue. This was a particularly hairy scenario to be in because the exceptions we were seeing were not legitimate failures and replaying the requests never reproduced the error and code inspection showed values could never be set to nil or false.

Running Headless JavaScript Testing with Electron On Any CI Server

Background

Since the end of 2015, the Envato Front End team has been working on bringing a modern development workflow to our stack. Our main project repo powers sites like themeforest.net and serves around 150 million Page Views a month, so it is quite a challenge to re-architect our front end while maintaining a stable site. In addition, the codebase in 9 years old, so it contains the code from many developers and multiple approaches.

We recently introduced our first React based component into the code base when we developed an autosuggest search feature on the homepage of themeforest.net and videohive.net. The React component was written with ES6, and uses Webpack to bundle the JavaScript code.

As I mentioned above, it’s a 9 year old code base and nobody can guarentee that introducing something new won’t break the code, so we began all the work with tests in mind. This post documents our experiences developing the framework for testing the React based autosuggestion component.

Introducing StackMaster - The missing CloudFormation tool

StackMaster

CloudFormation is an Amazon (AWS) service for provisioning infrastructure as “stacks”, described in a JSON template. We use it a lot at Envato, and initially I hated it. Typing out JSON is just painful (literally!), and the APIs exposed in the AWS CLI are very asynchronous and low level. I wanted something to hold my hand and provide more visibility into stack updates.

Today I’d like to introduce a project we’ve recently open-sourced: StackMaster is a tool to make working with multiple CloudFormation stacks a lot simpler. It solves some of the problems we’ve experienced while working with the CloudFormation CLI directly. The project is a refinement of some existing tooling that we have been using internally at Envato for most of this year, and it was built during one of Envato’s previous “Hack Fortnights”.

How Envato defined the expectations of our developers

The Envato development team has always had a strong sense of what we stand for, how we work together and what we expect of each other … at least that is what many of us thought. Around 9 months ago our company participated in the Great Places to Work survey, which gauges how our employees feel about Envato as a place to work. Each department received a breakdown of their feedback, and whilst much of our feedback was great, one statement was a clear outlier “Management makes its expectations clear”. This was a trigger to question our assumptions about those expectations. This post tells the story of that journey.

How to organise i18n without losing your translation_not_found

I’ve written before about Working with Locales and Time Zones in Rails, but I often feel the i18n library (short for internationalisation) is underused (appreciated?). Perhaps it is even avoided because of the perception it is more effort to develop with and harder to maintain.

This article will, I hope, open your mind to the idea that you will be better off using i18n in your application (even for a single language) and that it can be maintainable with some simple organisational pointers.

Envato Market Structure Styleguide

Today we have released the Envato Market ‘Structure Styleguide’ to the public.

https://market.styleguide.envato.com

A Structure Styleguide is a special breed of living styleguide, designed to document an application’s UI logic and all of the possible permutations of the UI.

The goal is to have a complete and shared understanding of how an application behaves in any situation, even in the most uncommon edge cases, without having to laboriously recreate each scenario by hand.