An Expensive And Common Cloud Analytics Mistake
The move to the cloud is continuing to accelerate and most organizations I deal with are at minimum incorporating cloud platforms and processing into their architectures … if not pressing to move largely to the cloud. While there are many advantages to the cloud, it is also necessary to use caution to make sure that the risks of the cloud are mitigated while pursuing the advantages. One approach that can make a migration to the cloud quite costly is to transfer analytic code and processes as-is to the cloud instead of greatly increasing focus on efficiency.
Efficiency? Our Code Is “Efficient Enough”!
In a classic on-premise environment, analytics and data science teams aren’t known for the efficiency of their processes. In reality, processing was effectively “free” because the equipment was on the floor and ready to be used. In fact, analytical processes were often run at off-peak times and so made use of what would have been otherwise idle capacity. This was a win for all.
Traditionally, the primary concern when it came to analytics efficiency was that a process was “efficient enough” to meet two relatively low bars:
The process would finish within the timeframe needed
The process wasn’t so inefficient that it caused problems with other critical processes that also needed resources
If someone wrote a horribly inefficient process that used a lot of extra disk space and CPU, nobody really cared as long as it completed and released the resources before other processes needed the capacity. Even for near real time analytics, as long as the process ran “fast enough” for its purpose and didn’t inhibit other ongoing processes, everyone was happy. Hence, efficiency only needed to be “good enough” and analytical process builders only paid enough attention to efficiency to cross that low bar. I can speak from the perspective of both personal experience and guilt on this one!
Why “Good Enough” Isn’t “Good Enough” On The Cloud
The above approach was fine for many years, but it is not fine in a cloud environment. Why? Because in a cloud environment you’ll pay for every byte you store and every CPU cycle you use. One of the big advantages of the cloud is the ability to access powerful systems and only pay for what you use. A big disadvantage is that you’ll explicitly pay for everything you use. Suddenly, those “good enough” processes have a hard, tangible cost that can really sting.
A while back, I had a meeting with the leader of an analytics team within a major cloud provider. She said that they were given the mandate to move everything to the cloud so that they could set an example for the company’s clients. So, she and her team made the migration. At first it seemed to have been fairly painless and seamless. The data was moved, people started doing their work on the cloud, and they would be billed at an internal rate for the resources used (keep in mind that customers would have paid more for the same resources). All seemed great … until she got her first bill!
She explained that what they hadn’t properly accounted for was exactly how much what they were doing would cost under the new model. Before the cloud, she was charged a fixed monthly feed to access internal systems, where resource usage was then “all you can eat”. As with many teams, hers did not worry about efficiency beyond getting to “good enough”.
Her first month’s bill ate a huge percentage of her entire annual budget, causing somewhat of a panic. Her team realized that they couldn’t go about business as usual on the cloud because costs were no longer fixed and extra processing was no longer “free”. The team took immediate action to start to test processes on small samples, have efficiency experts double check code before deploying, and to generally think hard about what running a process would cost before hitting “submit”. In doing so, they gradually got the costs back under control. But, it took some pain to get there.
Focus On Efficiency Before Migrating To The Cloud
The moral of this story is that if an organization is going to push more processing into the cloud, then those building analytical processes must start to take efficiency much more seriously. “Good enough” can bust budgets and cost leaders their jobs. It isn’t that people have altogether ignored efficiency of processing in the past, it’s simply that it wasn’t usually necessary to elevate it to a major priority. On the cloud, every byte and cycle cost and therefore efficiency becomes absolutely crucial.
Consider providing training for an analytics and data science organization on how to be more efficient. It is also a good idea to create efficiency-focused employees who bless and tune any process before it is released. Some people can focus on getting the analytical logic laid out, while others can focus on optimizing the process. The worst thing to do is to move to the cloud without accounting for this fundamental shift. That can be a very costly mistake!
Originally published by the International Institute for Analytics.