Extract, Transform, Load!

Before starting to work on a dashboard, it’s important to know what exactly you want to report on. Reporting and building dashboards are, after all, simply a way to answer a general question such as “how many sales did we make this week?” and “how many calls are still in the queue?”. However, what you can report on depends heavily on the type of data you can source, how it’s formatted, and how you pipe it to your dashboard. This blog post discusses the concept of ETL (extract, transform, load) and how we deal with it at A. Works Enterprises.

Extracting source data

Let’s set aside for a minute that one must always start by considering the source of one’s data, in that it should be qualitative, relatively recent, and actually provide an answer to the question being asked. After all, a data set with irrelevant, old data points is not going to provide accurate answers to any questions. This post won’t be going into assessing the quality of a data set.

Two important considerations one must have when assessing the source of a data is the format the data comes in, and how it is being transferred.

The source format pertains to the actual structure of the data. For example is it a list of names, an Excel sheet, CSV (comma separated values), JSON (JavaScript Object Notation), XML (Extensible markup language), etc. The format is important because you’ll need to keep this in mind when loading data into the reporting tool as not all tools accept all data formats.

The transfer method is also important as some data sets are kept behind passwords or encryption schemes. Also, not all reporting tools have pre-built integration modules for all types of data. For example, PowerBI does support JSON formatted data, but does not come with a MongoDB module built in. If your goal is to extract data from a MongoDB setup you’ll need to find a way to bridge this data to PowerBI (which, by the way, is one of the things we do here 😉).

Transform to make data useful

Source data is not always ready to be used. Take for example the below (fictitious) excerpt of an export from an e-commerce platform that contains all purchases made in the past year.

"order id": "0001",
"purchaser": "a---@a---.com",
"total order amount": 2700,
"items": [{
     "item": "macbook pro",
     "price": 2600
     }, {
     "item": "logitech mx 3",
     "price": 100
}, {
"order id": "0002",
"purchaser": "b---@b---.com",
"total order amount": 300,
"items": [{
     "item": "Beats headphones",
     "price": 300

The example shows a JSON data structure with the “order id” key as the identifying item per node. Sending this straight through to a reporting tool – assuming the connection could be made considering the security and privacy implications – you’d be able to quite easily count the number of orders and the sum up the total order amount. You could even add a date filter to see the number of orders and total order amount in a given period.

But what if you’d like to see how many individual items were purchased? Or how many of a given product type were purchased? Or your revenue of headphone sales in a given period? To be able to display such information you’ll need to transform your data set into something that can be visualized, for example in to a list format instead of JSON notation, as illustrated below.

"order id","date","item","price"
"0001","03-05-2020T09:26:49","macbook pro",2600
"0001","03-05-2020T09:26:49","logitech mx 3",100
"0002","13-05-2020T6:42:23","Beats headphones",300

Depending on the reporting tool you’re using, and arguably also the amount of effort you’re willing to put into the transformation exercise, you could import the above JSON data and convert it on the fly in your tool. The downside to this is that it can take quite a bit of patience (and experience) to figure out how to correctly convert the data within your tool, not to mention how much time it’ll take your device to process and format potentially millions of lines of data.

Using an effective ETL tool such as ours not just extracts data from a(ny) source but also transforms data into the final format needed to do reporting. This means that a local device doesn’t need to do the heavy lifting, nor will you have to spend hours trying to bending yourself to the limits of your reporting software, and can quickly load the most recent data and display it. It’s fast and efficient (and with us for just one low monthly flat fee).

Load data into your tool

Having extracted and transformed source data, you’ll now need to pipe it to your tool. Limitations again occur when considering the import restrictions of the reporting tool you use. While many tools let you import transformed data in a variety of formats, it’s often not in an automated fashion.

Continuing with the above example, imagine having to download an export from an e-commerce platform, transforming the data in Excel, and then importing it into your reporting tool (hopefully just replacing an existing data set and not needing to make the entire dashboard again!). If you’re a marketing manager and need this sort of information on regular basis to make decisions, you’d be doing yourself a disservice by not automating the process. Even worse if you run something where live data is important, such as a call-center, and would have to manually see the status of your call pipeline 😅.

Again, an effective ETL tool (and again, just like ours is) will not just gather the source data and transform it, but also automate piping it to a dashboard so all your dash needs to do is refresh from time to time. After a simple setup process our tools completely automate the entire process from source to dashboard giving you live and ready-to-use data.

Bringing it all together

In other words, ETL is simply the process of taking data from one point and sending it somewhere else. Things get a bit more complicated when data needs to be transformed to fit a certain standard or transfer mechanism, and certainly when insights need to be found on a regular basis.

A good ETL setup will make your life a lot easier when it comes to reporting, and even make your reports a lot better and more relevant.