How to build an algorithm in 6 steps
The word “Algorithm” is like the word “Artificial intelligence” or “Machine learning”; it’s catchy and sounds important and useful, but, does anyone other than a scientist know what the word ‘algorithm’ actually means?
In my opinion, too little attention is given to practical use cases written in a language that can be understood by non-technical professionals. This article about algorithms is my attempt.
The following piece will give you a quick, non-technical overview of the steps we took at Cervinodata to get our first algorithm to work.
If you are a technical reader, there is a section at the bottom with more details about the technology we used to make our first algorithm work.
Step 1: Determine the goal of the algorithm
Before you even start thinking about technology or methodology, you need to determine the goal you wish to achieve. Ask yourself, “what do I want to get done that requires an algorithm?”
In our case we asked ourselves, “what do our clients need us to get done that requires an algorithm?”
We actually had two requests from multiple clients, resulting in two specific goals:
Based on these goals, we decided to build an algorithm that can do two things:
Why the cost per click (CPC)?
Cost per click is a leading indicator for many marketers where the costs and the ‘created campaigns’ are the input variables and the clicks are the result of that (or output variables).
The CPC connects the input and the output.
The reason why we did not use the cost per conversion, the cost per transaction, or results per ad spend (ROAS) is that there are many more clicks than conversions or transactions. Using cost per click allow us to see more accurate predictions with less data. Once the predictions are to our liking, we can use the same algorithm to test on the cost per transaction or ROAS with confidence.
Step 2: Access historic and current data
For any algorithm, input data is essential. We need sufficient historic data in order to distinguish test data from control data. Our control data must be set to check if our algorithm predicts CPC correctly. We use data from multiple clients and multiple platforms to have multiple checks. This will improve the end result.
Step 3: Choose the right model(s)
There are many models available online, but the question is, which one(s) do you need to get to reach your goal?
After testing multiple complex and less complex models, the ARIMA model was best suited for our purpose. This relatively simple model does not need a lot of data or variables (take weather as an example) to make a prediction, making it more practical.
The model you use should always strike a balance between simplicity and output. More complex models might give you more accurate results, but generally take more time to get right.
Results per platform
Originally we created an average cost per click for all platforms combined before predicting the cost per click, but that didn’t give us the desired results.
We decided to predict the CPC for each platform separately for two reasons: First, the characteristics of each platform are different so blending the platforms also meant reducing insights. Second, we decided to include an anomaly detection model which allowed us to give more specific alerts to the user. We now include Google Ads, Facebook Ads, Adform and LinkedIn Ads. Each platform has noticeably different characteristics, as shown in the graph below.
How confident are you in the model?
No model today will be able to predict the future perfectly. Therefore, it is wise to work with a confidence interval (CI) (since the screenshots are in Dutch, CI = BI). For these kinds of predictions, it is sufficient to work with a 95% CI. This means that we are 95% sure that the actual CPC is between the lowest and highest bandwidth threshold.
The further in the future your prediction is, the wider the bandwidth becomes. The screenshot below provides a relevant representation.
Step 4: Fine tuning
Our algorithm has generic parameters or settings (the same for all clients) and specific parameters per platform. These platform specific parameters are still the same for all our clients, but we are working on more flexibility. It’s important to consider this when building your own algorithm.
It’s important to note that keeping your algorithm working with constant tweaks and maintenance is crucial. Building an algorithm is never a one-off activity, it needs to be part of your long term strategy.
Prediction vs reality
In the screenshots below, you can see some examples of the Cervinodata CPC prediction and how it relates to reality. The solid line shows the actual CPC for the last couple of weeks, the dotted line above and below the filled line shows the previous prediction.
As you can see, it was accurate in some cases and inaccurate in others. A miss in prediction can usually be explained by a change in budget (after the prediction was executed).
Predicting human behavior is still out of reach for us. :-)
Use your head
Even with a great model, it is still a good idea to keep using your own brain power and gut feeling when interpreting the data. There will always be context that has an impact on the outcome of the algorithm that was not taken into account in the model. Think, for instance, about a major news event, an exceptionally hot day, etc. Trusting your intuition will allow you to spot interesting variables that you can add to later versions of your model and improve its accuracy.
Step 5: Visualise your results
Once you have the final output you need, it is wise to think hard about how (and where) you want to present it. There are many dashboard solutions out there yet we still prefer Klipfolio for this. Klipfolio offers both TV screen view, desktop view, and mobile view and has nifty indicators that allow you to highlight specific parts of your data that need extra attention.
Step 6: Running your algorithm continuously
After we ran multiple successful tests and shared the results with our clients we were confident enough to start working on the right infrastructure that allows us to continuously run the algorithm, without manual work.
Bringing your algorithm to a place that is robust and permanent is easier said than done. It takes quite a bit of technical work to set it up. But once the work is done, you can reap the benefits and scale the application. This was essential for us because we anticipate we will be needing this for more clients and with more data. But even if you do not need to scale it, it is still needed to set up the right infrastructure to be able to easily re-use the algorithm (.i.e. you need scripts that fetch the data from a place, run the model(s) and return the results to be used in a dashboard).
We decided to implement the algorithm on an application server we call “The Predictor”. Our Cervinodata engine is used for collecting the data needed. Every night, the Cervinodata engine makes the data available for the Predictor, gets the results back and makes it available via a secure REST url. In the screenshots you can see we use Klipfolio to present the results. The REST url is connected Natively to Klipfolio. This is the Cervinodata integration. Because Klipfolio is capable of refreshing the data automatically, we can present our clients with a fresh prediction in their dashboard every morning.
Technology stack details for the techies
For the prediction of the cost per click we use the ARIMA models: “AutoRegressive Integrated Moving Average”. See more here.
With the ARIMA model two predictions are executed; one for the CPC today until 13 days into the future and one from 14 days ago until yesterday. The prediction of the past is used to verify the prediction (because we can immediately compare it to the actual CPC).
For both predictions, data is used from 45 data points preceding the first day of the prediction. In other words, for the future prediction this means we use the data from 45 days ago until yesterday to predict 13 days into the future. And for the verification prediction, we use the data from 59 days ago until 15 days ago.
For the anomaly detection we use scipy.signal.find_peaks. See more here.
For the infrastructure of the Predictor we use Google Kubernetes Engine cluster. See more here.
For the data collection and preparation, we use Cervinodata and for visualisation we use Klipfolio.
Furthermore, we use the Klipfolio API to easily copy and paste the dashboards so we can deploy the dashboards in multiple clients fast and easy.
Cervinodata makes it easy for online marketing agencies and mid-sized organisations to gain better and faster insight in the performance of their advertising campaigns and websites.
We add intelligence to our products in a simple to use way, so that not only the 1% data scientists can use it, but the 99% of non-technical professionals that need it for their decision making.
We believe that when marketers combine their gut feeling with the right numbers, they can dramatically increase their performance. Feel free to contact me at email@example.com If you want to know more about our CPC prediction or if you want to try it yourself.
Cervinodata is a product of Cervino Marketing. Cervino Marketing has been a long time partner of Klipfolio since early 2014.
Originally published October 28, 2019, updated Nov, 26 2019