Too Much of a Good Thing?
(That's lava, not my actual servers. It's a metaphor, get it?)
A funny thing happened over the last few weeks. On one hand, our traffic has been steadily increasing! On the other hand, I've been running some A/B tests and tinkering with page layouts so that people (A) browse more and (B) create more previews. The thinking on my end is pretty straightforward: the more people who make free previews, the more people will buy HD renders. E-commerce funnels 101, right? It turns out that might not always be the case. In mid-July, the site was processing around 500-600 free previews a day. Over the course of the last month, that number peaked at 1,346 a few days ago.
More than double the previews, more than double the revenue—that's how this works, right? Time to party?
Except sales actually declined.
It turns out that my poor render servers can't keep up with demand! The average render time for free previews has gone from around 2:15-2:30 to more like 4:30-5:00. I rolled back some of my site changes to ease off on the number of previews being created, but the servers weren't able to recover straight away. By this point, one of the machines was in a death spiral that actually caused render times to go up to 12:00 yesterday before the entire machine hard crashed. After a reboot it seems to be humming along, which points to a second problem (Problem 1 being that I just don't have enough servers running to keep up with demand): render server performance degrades over time.
Problem 1 could be solved in the short term by just adding more servers, but the current system is pretty brittle and it takes quite a while to get a new server configured and deployed... and this doesn't solve problem 2. I'm hoping that the still-in-progress server re-architecture will solve both problems, so—in the short term—I think the best solution is to double down on rebuilding the render servers and manually restart the existing servers once they start to bog down.
Scale is a good problem to have, so thanks for bearing with me as I figure out how to scale up to meet this new influx of traffic!