Unknown to our users, we recently migrated edge network providers. This involved some particularly interesting problems that we needed to solve in order to migrate without impacting availability or integrity of our services.
An ELK stack is a combination of three components; ElasticSearch, Logstash and Kibana, to form a Log Aggregation system. It is one of the most popular ways to make log files from various services and servers easily visible and searchable. While there are many great SaaS solutions available, many companies still choose to build their own.
While working on the new Envato Market Shopfront app, the team agreed to always keep all the dependencies in the project up to date. Sometimes it was just straightforward patch or minor version upgrade, but sometimes it could also be breaking changes that need a whole lot of thought. The upgrade to
react-router v4 happended to be a good example.
To expose our internal services to the outside world, we use what is known as an API Gateway. This is a central point of contact for the outside world to access the services Envato Market uses behind the scenes. Taking this approach allows authors to leverage the information and functionality Envato provides on its marketplaces within their own applications without duplicating or managing it themselves. It also benefits customers who want to programmatically interact with Envato Market for their purchases instead of using a web browser.
You may have recently heard reports or seen news about a security bug called “Cloudbleed” affecting sites served by Cloudflare. Envato delivers some websites using services provided by Cloudflare, however Cloudflare have confirmed that none of our websites are directly affected by this security bug. Cloudflare published a detailed explanation of what the bug is and how it came to be, you can read it on their blog.
UPDATE Since the original publication of this post, Cloudflare have released a follow up blog post with information they have learned in their investigations. The second article focuses more on explaining the real-world impact of the bug, rather than the technical details.
On Wednesday 19 October, Envato Market sites suffered a prolonged incident and were intermittently unavailable for over eight hours. The incident began at 01:56 AEDT (Tuesday, 18 October 2016, 14:56 UTC) and ended at 10:22 AEDT (Tuesday, 18 October 2016, 11:22 UTC). During this time, users would have seen our “Maintenance” page intermittently and therefore would not have been able to interact with the sites. The issue was caused by an inaccessible directory on a shared filesystem, which in turn was caused by a volume filling to capacity. The incident duration was 8 hours 26 minutes; total downtime of the sites was 2 hours 56 minutes.
We’re sorry this happened. During the periods of downtime, the site was completely unavailable. Users couldn’t find or purchase items, authors couldn’t add or manage their items. We’ve let our users down and let ourselves down too. We aim higher than this and are working to ensure it doesn’t happen again.
In the spirit of our “Tell it like it is” company value, we are sharing the details of this incident with the public.
In a previous post, Envato Market: To The Cloud! we discussed why we moved the Envato Market websites to Amazon Web Services (AWS) and a little bit about how we did it. In this post we’ll explore more of the technologies we used, why we chose them and the pros and cons we’ve found along the way.
To begin with there are a few key aspects to our design that we feel helped modernise the Market Infrastructure and allowed us to take advantage of running in a cloud environment.
- Where possible, everything should be an artefact
- Source code for the Market site
- System packages (services and libraries)
- Everything is defined by code
- Amazon Machine Images (AMIs) are built from code that lives in source control
- Infrastructure is built entirely using code that lives in source control
- The Market site is bundled into a tarball using scripts
- Performance and resiliency testing
- Form hypotheses about our infrastructure and then define mechanisms to prove them
We made a few technical decisions to achieve these goals along the way. Here we’ll lay those decisions out and why it worked for us, as well as some caveats we discovered along the way, but first.
This is the story of how we moved Envato’s Market sites to the cloud. Envato Market is a family of seven themed websites selling digital assets. We’re busy; our sites operate to the tune of 25,000 requests per minute on average, serving up roughly 140 million pageviews per month. We have nearly eleven million unique items for sale and seven million users. We recently picked this site up out of its home for the past six years and moved to Amazon Web Services (AWS). Read on to learn why we did it, how we did it, and what we learned!
Last month we announced that we had finally completed the move to HTTPS everywhere for Envato Market. This was no easy feat since we are serving over 170 million page views a month that includes about 10 million products listed and are all user generated content. Along the way we have learnt many valuable lessons that we want to share with the wider community and hopefully make other HTTPS moves easier and encourage a better adoption of HTTPS everywhere.
Back in November 2015, one of the Envato Market developers made a
startling discovery - our exception tracker was overrun with occurrences
undefined method exceptions with the target classes being
FalseClass. These type of exceptions are often a
symptom that you’ve written some Ruby code and not accounted for a
particular case where the data you are accessing is returning
false. For our users, this would manifest itself as our robot error
page letting you know that we encountered an issue. This was a
particularly hairy scenario to be in because the exceptions we were
seeing were not legitimate failures and replaying the requests never
reproduced the error and code inspection showed values could never be