Updated: Jan 24
In a world where coding is becoming less essential for development, mastering automation tools and their logic is increasingly important. Tools like Zapier, Make, and others offer a great way to execute ideas and develop processes, just as you would with code.
The primary benefit of mastering an automation tool is that it can save you a lot of time. These tools come with pre-defined integrations, triggers, and actions that you can select with a few clicks. Instead of spending time studying a platform’s API documentation, you can simply sign up for an automation platform and start developing immediately.
Of course, there are some disadvantages to choosing an automation tool over classic coding development. Which is mainly the lack of flexibility. However, for certain situations, an automation tool can provide all the necessary components for a successful project.
It’s really life-changing for those who enjoy playing with automations. There are unlimited ways to track data from all over the internet, but also for day-to-day work: developing processes, creating alerts, communicating between platforms, etc. Automation tools can also help us develop ETL processes, which are essential for every business that works with data (and what business doesn’t work with data?).
Let’s start with the basics. ETL stands for Extract, Transform, and Load. These are the three main steps the data goes through when moving from one place to another. In some cases, the ‘Transform’ step comes after the ‘Load’, resulting in ‘ELT’: Extract, Load, and then Transform. This is valid, as long as the data is delivered in the correct format.
As mentioned, today you don’t need to be an expert in Python or any other coding language to develop ETL processes. You can use one of the great automation tools available to build the perfect ETL process from start to finish without any coding knowledge. Let’s see how you can develop a process, for example, to get automated reports from Random Data API and create a fake beers collection in our private Google spreadsheet.
We’ll keep it simple. The process will include only three main modules, which are the three steps we discussed earlier: Extract, Transform, and Load. So, the scenario on Make will look something like this:
Extract data from the source using a HTTP request.
Transform the data and modify it if needed.
Load the data to the spreadsheet in a certain structure.
Let’s go ahead and start with the development process.
To begin, we must establish a connection with the data source so we can extract data from it. Since it is dummy data from an open API source, we do not need any special authentication; we just need to use the URL and structure provided. In our case, we want to get data about imaginary beers, so we will use a URL that looks like this (all information is provided in the Random Data documentation):
We would like to limit our results to 50 per API call. The Random Data API provides a limit parameter for the HTTP request, as follows:
We would like to format the data in JSON. The source offers both JSON and XML formats, so our request will be:
After creating the desired URL, add it to the scenario using the “HTTP Request” module. Set the “Method” to “GET” and leave all other fields blank. It should look like this:
Now, we can run the scenario once to view its results. If it runs successfully, we’ll be able to expand the “Data” tab. If the scenario fails, we’ll see an error instead of the data.
We can click the plus icon next to “Data” to view the request result. Don’t worry if it’s confusing; it’s raw data in JSON format. We need to transform and organize it before we can use it. The goal is to make sure the raw data is received, so we can move on to the next step.
Then, we need to connect this module with the HTTP request we already have, and add the “Data” field in the parse.
Now that we have parsed the data from a long JSON query into separate values, we can begin transforming it. First, we need to map all fields we receive from the source. To do this, we can run the scenario once and observe the results.
ID - Number
UID - Text
Brand - Text
Name - Text
Style - Text
Hop - Text
Yeast - Text
Malts - Text
IBU - Text
Alcohol - Text
BLG - Text
For our example, we want all fields to appear in our destination. To transform the data, we’ll use the “Set Multiple Variables” module, found in the “Tools” modules.
Now, we can populate the correct values into the appropriate fields. Most of the fields will be filled in as we receive them, but we have also decided to configure the following fields as numeric:
We currently receive them with accompanying text. As seen earlier, they appear as follows:
IBU = “67 IBU” (The acronym "IBU" is associated with the number.)
Alcohol = “4.5%” (The percentage is attached to the number.)
BLG = “7.6°Blg” (The degree sign (°) and the word "Blg" are attached.)
To display those values numerically, we will use a function to convert it.
I used the parseNumber() function to remove all non-numeric characters from the values received. This function parses a string as a number. To convert the alcohol percentage, I multiplied it by 0.01 to get the correct format. This enables us to perform calculations on that field later.
We can now run the scenario and observe the data we receive.
We are now receiving the numbers in the desired formats; they are clean and ready to use.
Please note — this is not the only way to transform the data received in the HTTP request. We can also transform the data straight in the Google Sheet’s module, saving a step in our scenario. However, for this example, we’ll do it the classic way: completely transform the data and only then populate it directly to the destination.
The final step in our process, as the ETL guide us, is loading the data to our destination.
We will store data in Google Sheets, according to the fields we receive. Before using it, we need to create the spreadsheet and add column headers, such as:
We need to consider what to do when we receive data on a beer that is already in our spreadsheet. We don’t want to have duplicated records. We have two options then:
Before adding a new record, we'll check for the beer's ID. If it exists, we will update the row instead of creating a new one.
We will add rows in all cases, but will include a timestamp column to detect the newest row and use only that.
For our example, we'll use the first option mentioned. We'll look for the beer in our destination and use a router for two scenarios: if it exists, and if it doesn't. For that, we’ll use the module “Search Rows” of Google Sheets.
When configuring the module, we’ll use the field “UID” to search for the specific record in our destination. Every beer has a unique UID, so we cannot have duplicated values of those. The module’s configuration should look like this:
The maximum number of return rows should be "1", as there are no duplicates.
We need to add a router to detect whether the last module found records or not. If it found a record, it should update the row. If it didn't, it should create a new one.
Our condition for the router is the value of the "Total number of bundles" generated by the Google Sheets "Search Rows" module. If it found the record, the number of bundles must be greater than zero. If it didn't find anything, it must be 0.
When using a certain condition in a router, there's no need to configure it in both routes. We can just mark one as the "Fallback Route". To do this, simply select the "Yes" button.
Once the first condition is not met, the other route is executed by default. We now have two routes connected to the router, so we can set them up.
Since our data is already in the required format, we just need to map the values to the desired locations in the spreadsheet for both modules. The only difference between the two is that for the "Update a Row" module, we need to add the row number from the previous module (Search Rows) so the system knows which row to modify.
We will use the values from the "Set Multiple Variables" module as is, as follows:
We'll now repeat the same process for the "Add a Row" module.
We're finished. When we run the scenario, the data will be retrieved from the source, transformed by our automation, and sent to the spreadsheet. Let's give it a go.
We can see that the data arrived at its destination in the desired state: clean, organized, and ready for analysis.
This is a basic example of what Make (but basically any other cool automation platform) can do. Of course, it can become much more complex, but the logic remains the same: obtain the data you need in any way that works, transform and clean it so it is suitable for your destination, and finally, load it to your destination.
As mentioned before, this is not the only way to execute the automation we just did. We could do the same in an “ELT” process, first loading it, then transforming the data on the spreadsheet’s side. Keep in mind that there is no wrong or right here, as long as the process works. Get the data you need from one place to another in an organized and readable way. You can optimize every step of the process later on, even after the automation is going live.
Do you use an automation tool to create interesting ETL processes? Do you have any cool ideas for an automation or data set? Please contact me; I would love to hear about them.