start CICD article

2023-06-15 14:30:01 +10:00 · 2023-06-15 14:30:01 +10:00 · 4182185595
commit 4182185595
parent 6cbe3095af
1 changed files with 34 additions and 0 deletions
--- a/src/content/CI_CD_in_data.md
+++ b/src/content/CI_CD_in_data.md
@ -0,0 +1,34 @@
 Title: CI/CD in Data Engineering 
 Date: 2023-06-15 20:00
 Modified: 2023-06-15 20:00
 Category: Data Engineering
 Tags: data engineering, DBT, Terraform, IAC
 Slug: CI/CD in Data and Data Infrastructure 
 Authors: Andrew Ridgway
 Summary: When to use IaC CI/CD techniques or Software CI/CD techniques in Data Architecture 
 Data Engineering has traditionally been considered the bastard step child of work that would have once been considered Administrative in the Tech world. Predominately we write SQL and then deploy that SQL onto one or more Databases. In fact a lot of the traditional methodologies around data almost assume this is the core of how an organistation is managing the majority of it's data. In the last couple of years though there has been a very steady move towards having the Data Engineering workload of SQL move towards Software Engineering techniques. With the popularity of tools like [DBT](https://www.dbtlabs.com) and the latest newcommer on the block, [SQL-MESH](https://www.sql-mesh.com) The oppportunity has started to arise where we can align our Data Engineering workloads with different environments and move much more efficiently towards a Continous Integration and Deployment methodology in our workflows. 
 For the Data Engineering space the move to the cloud has been a breath of fresh air (Not so in some other IT disciplines). I am relatively young, so I don't 100% remember but my experience has taught me that there were 3 options here not so long ago
 _Expensive:_
 + SAS
 + SSIS/SSRS
 + COGNOS/TM1
 _Rickety:_
 + Just write stored procedures!
 + Startup script on my laptop XD
 + "Don't touch that machine over there, No one knows what it does but if it's turned off our financial reports don't work" (This is a third hand story I heard, seriously!)
 _Hard:_
 + Hadoop
 + Spark (hadoop but whatever)
 + Python
 + R
 _(The reason I've listed them as hard is because self hosting Hadoop/Spark and managing a truckload of python or R scripts, whilst it could have been "cheap" required a team of devs who really really knew what they were doing... so not really cheap and also **really hard**)_