Data Engineering with dbt Core
Background Previously, I watched and blogged a...
As I’m looking for ways to learn more about data analytics and engineering, I stumbled upon the End-to-End Data Engineering Project course at Linkedin Learning. The course is very practical and shows how to build a data pipeline using modern tools and techniques. The author is using macOS and I’m using Windows. As I followed along the course, I stumbled upon issues with setting up the same pipeline on Windows. Because of this, I decided to write this article to show the steps of setting up the data pipeline on Windows.
The data pipeline is basically made up of the following:
First, you have to download Docker Desktop for Windows. The complete instructions are on this page. Make sure that the Docker-WSL integration is enabled. The WSL will enable us to use Linux on Windows. I’ve used the default distribution included in the WSL which is Ubuntu 22.04 on my machine.
Next, we set up our Python environment to use DBT and Dagster for our project. Again, we use the Ubuntu terminal as DBT is throwing some errors when running in Windows.
The tutorial uses venv
for the creation of a virtual environment. I decided to use virtualenv
and virtualwrapper
instead as I want my Ubuntu installation to have the usual Python setup that I use. This way, I can create and switch between different virtual environments inside my Ubuntu installation in case I have another project.
I followed this tutorial in order to set up my Python environment on Ubuntu.
After setting up Ubuntu and Python installation, the other steps are basically the same as the tutorial. However, I’ve used the terminal on our Ubuntu installation to run all the scripts and commands that the author used, instead of the Windows command prompt.
Take note also that the author provided a different docker image for the Postgres database as the original image used in the tutorial is suitable for the Docker app on Apple Silicon architecture. The alternative docker image can be downloaded as follows
$ docker run --name big-star-container -d -p 5432:5432 lilearningproject/big-star-postgres-multi:latest -c "wal_level=logical"
Using Ubuntu on Windows allowed me to follow along the tutorial and build a data pipeline using a modern data engineering stack.
I’m looking to apply my learnings on this tutorial to build another data pipeline but using other tools like Fivetran, AWS, etc. Of course, I’ll write the steps and possibly create a tutorial to showcase alternative architectures.
Background Previously, I watched and blogged a...
This article is originally published in this...
This article is originally published in this...
This article is originally published in this...