start CICD article

This commit is contained in:
andrew.ridgway 2023-06-15 14:30:01 +10:00
parent 6cbe3095af
commit 4182185595

View File

@ -0,0 +1,34 @@
Title: CI/CD in Data Engineering
Date: 2023-06-15 20:00
Modified: 2023-06-15 20:00
Category: Data Engineering
Tags: data engineering, DBT, Terraform, IAC
Slug: CI/CD in Data and Data Infrastructure
Authors: Andrew Ridgway
Summary: When to use IaC CI/CD techniques or Software CI/CD techniques in Data Architecture
Data Engineering has traditionally been considered the bastard step child of work that would have once been considered Administrative in the Tech world. Predominately we write SQL and then deploy that SQL onto one or more Databases. In fact a lot of the traditional methodologies around data almost assume this is the core of how an organistation is managing the majority of it's data. In the last couple of years though there has been a very steady move towards having the Data Engineering workload of SQL move towards Software Engineering techniques. With the popularity of tools like [DBT](https://www.dbtlabs.com) and the latest newcommer on the block, [SQL-MESH](https://www.sql-mesh.com) The oppportunity has started to arise where we can align our Data Engineering workloads with different environments and move much more efficiently towards a Continous Integration and Deployment methodology in our workflows.
For the Data Engineering space the move to the cloud has been a breath of fresh air (Not so in some other IT disciplines). I am relatively young, so I don't 100% remember but my experience has taught me that there were 3 options here not so long ago
_Expensive:_
+ SAS
+ SSIS/SSRS
+ COGNOS/TM1
_Rickety:_
+ Just write stored procedures!
+ Startup script on my laptop XD
+ "Don't touch that machine over there, No one knows what it does but if it's turned off our financial reports don't work" (This is a third hand story I heard, seriously!)
_Hard:_
+ Hadoop
+ Spark (hadoop but whatever)
+ Python
+ R
_(The reason I've listed them as hard is because self hosting Hadoop/Spark and managing a truckload of python or R scripts, whilst it could have been "cheap" required a team of devs who really really knew what they were doing... so not really cheap and also **really hard**)_