From b6cec40d8ba18817889d535ae53a30c2995c4529 Mon Sep 17 00:00:00 2001 From: Andrew Ridgway Date: Thu, 25 Jul 2024 13:46:43 +1000 Subject: [PATCH] switching branches --- src/__pycache__/devpelconf.cpython-311.pyc | Bin 687 -> 687 bytes src/content/datahub_dbt_sources.md | 10 + src/output/archives.html | 4 +- src/output/author/andrew-ridgway.html | 15 +- src/output/authors.html | 2 +- .../category/business-intelligence.html | 2 +- src/output/category/data-engineering.html | 13 ++ src/output/datahub-dbt-sources.html | 172 ++++++++++++++++++ src/output/feeds/all-en.atom.xml | 2 +- src/output/feeds/all.atom.xml | 2 +- src/output/feeds/andrew-ridgway.atom.xml | 2 +- src/output/feeds/andrew-ridgway.rss.xml | 2 +- .../feeds/business-intelligence.atom.xml | 2 +- src/output/feeds/data-engineering.atom.xml | 2 +- src/output/index.html | 15 +- src/output/metabase-duckdb.html | 4 +- src/output/tag/datahub.html | 0 src/output/tag/dbt.html | 0 src/output/tags.html | 4 +- 19 files changed, 239 insertions(+), 14 deletions(-) create mode 100644 src/content/datahub_dbt_sources.md create mode 100644 src/output/datahub-dbt-sources.html create mode 100644 src/output/tag/datahub.html create mode 100644 src/output/tag/dbt.html diff --git a/src/__pycache__/devpelconf.cpython-311.pyc b/src/__pycache__/devpelconf.cpython-311.pyc index 42fc1debba4b53d73bfa00cd6e0499e35e207af9..08bf8753d7b4944c4d349bfbf8fa93c53ed44347 100644 GIT binary patch delta 20 acmZ3_x}KGLIWI340}xa!3EjxOgb4sN3I!4X delta 20 acmZ3_x}KGLIWI340}$8>8*SuX!UO;=SOg;g diff --git a/src/content/datahub_dbt_sources.md b/src/content/datahub_dbt_sources.md new file mode 100644 index 0000000..4c4862d --- /dev/null +++ b/src/content/datahub_dbt_sources.md @@ -0,0 +1,10 @@ +Title: Dynamically Generating a DBT sources.yml With Datahub +Date: 2023-12-15 20:00 +Modified: 2023-12-15 20:00 +Category: Data Engineering +Tags: data engineering, dbt, datahub +Slug: datahub-dbt-sources +Authors: Andrew Ridgway +Summary: Leveraging the power of Datahub schemas to dynamically generate dbt sources.yml + +I find that in our space the terms data catalog, data governance and data definitions can be dirty terms. I challenge any data professional to not say that these are at best after thoughts in a stack. Normally technologies that govern these areas of businesses data architecture are the unsexy ones, and there are good reasons for this. It is not fun to try and get multiple people in the room and get them to agree on any given metric. As my current boss is and has been fond of saying, "You get 3 people in the room to define how we measure a sale I will give you 3 completely different and yet valid answers". This is the core of the problem with Data governance, Catalogs and Definitions... no one can agree, and the result of this is engineers either ignore it... or put it on the back burner because its going to be an awful experience diff --git a/src/output/archives.html b/src/output/archives.html index 67e0951..94b91f5 100644 --- a/src/output/archives.html +++ b/src/output/archives.html @@ -82,7 +82,9 @@
-
Wed 18 October 2023
+
Fri 15 December 2023
+
Dynamically Generating a DBT sources.yml With Datahub
+
Wed 15 November 2023
Metabase and DuckDB
Tue 23 May 2023
Implmenting Appflow in a Production Datalake
diff --git a/src/output/author/andrew-ridgway.html b/src/output/author/andrew-ridgway.html index aa82a0f..bf46a24 100644 --- a/src/output/author/andrew-ridgway.html +++ b/src/output/author/andrew-ridgway.html @@ -81,6 +81,19 @@
+
+ +

+ Dynamically Generating a DBT sources.yml With Datahub +

+
+

Leveraging the power of Datahub schemas to dynamically generate dbt sources.yml

+ +
+

@@ -90,7 +103,7 @@

Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible


diff --git a/src/output/authors.html b/src/output/authors.html index 7861d42..634e2af 100644 --- a/src/output/authors.html +++ b/src/output/authors.html @@ -84,7 +84,7 @@

- Andrew Ridgway (3) + Andrew Ridgway (4)

diff --git a/src/output/category/business-intelligence.html b/src/output/category/business-intelligence.html index 834ab3e..bb4911e 100644 --- a/src/output/category/business-intelligence.html +++ b/src/output/category/business-intelligence.html @@ -91,7 +91,7 @@

Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible


diff --git a/src/output/category/data-engineering.html b/src/output/category/data-engineering.html index a22f256..de5c21f 100644 --- a/src/output/category/data-engineering.html +++ b/src/output/category/data-engineering.html @@ -82,6 +82,19 @@
+
+ +

+ Dynamically Generating a DBT sources.yml With Datahub +

+
+

Leveraging the power of Datahub schemas to dynamically generate dbt sources.yml

+ +
+

diff --git a/src/output/datahub-dbt-sources.html b/src/output/datahub-dbt-sources.html new file mode 100644 index 0000000..0c00b7c --- /dev/null +++ b/src/output/datahub-dbt-sources.html @@ -0,0 +1,172 @@ + + + + + + + + + + + Andrew Ridgway's Blog + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+

Dynamically Generating a DBT sources.yml With Datahub

+ Posted by + Andrew Ridgway + on Fri 15 December 2023 + + +
+
+
+
+
+ + +
+
+
+ +
+

I find that in our space the terms data catalog, data governance and data definitions can be dirty terms. I challenge any data professional to not say that these are at best after thoughts in a stack. Normally technologies that govern these areas of businesses data architecture are the unsexy ones, and there are good reasons for this. It is not fun to try and get multiple people in the room and get them to agree on any given metric. As my current boss is and has been fond of saying, "You get 3 people in the room to define how we measure a sale I will give you 3 completely different and yet valid answers". This is the core of the problem with Data governance, Catalogs and Definitions... no one can agree, and the result of this is engineers either ignore it... or put it on the back burner because its going to be an awful experience

+
+ +
+ +
+
+
+ +
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/output/feeds/all-en.atom.xml b/src/output/feeds/all-en.atom.xml index fb2318e..0c981af 100644 --- a/src/output/feeds/all-en.atom.xml +++ b/src/output/feeds/all-en.atom.xml @@ -1,5 +1,5 @@ -Andrew Ridgway's Bloghttp://localhost:8000/2023-10-18T20:00:00+10:00Metabase and DuckDB2023-10-18T20:00:00+10:002023-10-18T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-10-18:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> +Andrew Ridgway's Bloghttp://localhost:8000/2023-12-15T20:00:00+10:00Dynamically Generating a DBT sources.yml With Datahub2023-12-15T20:00:00+10:002023-12-15T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-12-15:/datahub-dbt-sources.html<p>Leveraging the power of Datahub schemas to dynamically generate dbt sources.yml</p><p>I find that in our space the terms data catalog, data governance and data definitions can be dirty terms. I challenge any data professional to not say that these are at best after thoughts in a stack. Normally technologies that govern these areas of businesses data architecture are the unsexy ones, and there are good reasons for this. It is not fun to try and get multiple people in the room and get them to agree on any given metric. As my current boss is and has been fond of saying, "You get 3 people in the room to define how we measure a sale I will give you 3 completely different and yet valid answers". This is the core of the problem with Data governance, Catalogs and Definitions... no one can agree, and the result of this is engineers either ignore it... or put it on the back burner because its going to be an awful experience</p>Metabase and DuckDB2023-11-15T20:00:00+10:002023-11-15T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-11-15:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> <p>However, for this to work we need some form of conatinerised reporting application.... lucky for us there is <a href="https://www.metabase.com/">Metabase</a> which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network?</p> <h3>The Beginnings of an Idea</h3> <p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p> diff --git a/src/output/feeds/all.atom.xml b/src/output/feeds/all.atom.xml index 6c9a868..383ec8a 100644 --- a/src/output/feeds/all.atom.xml +++ b/src/output/feeds/all.atom.xml @@ -1,5 +1,5 @@ -Andrew Ridgway's Bloghttp://localhost:8000/2023-10-18T20:00:00+10:00Metabase and DuckDB2023-10-18T20:00:00+10:002023-10-18T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-10-18:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> +Andrew Ridgway's Bloghttp://localhost:8000/2023-12-15T20:00:00+10:00Dynamically Generating a DBT sources.yml With Datahub2023-12-15T20:00:00+10:002023-12-15T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-12-15:/datahub-dbt-sources.html<p>Leveraging the power of Datahub schemas to dynamically generate dbt sources.yml</p><p>I find that in our space the terms data catalog, data governance and data definitions can be dirty terms. I challenge any data professional to not say that these are at best after thoughts in a stack. Normally technologies that govern these areas of businesses data architecture are the unsexy ones, and there are good reasons for this. It is not fun to try and get multiple people in the room and get them to agree on any given metric. As my current boss is and has been fond of saying, "You get 3 people in the room to define how we measure a sale I will give you 3 completely different and yet valid answers". This is the core of the problem with Data governance, Catalogs and Definitions... no one can agree, and the result of this is engineers either ignore it... or put it on the back burner because its going to be an awful experience</p>Metabase and DuckDB2023-11-15T20:00:00+10:002023-11-15T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-11-15:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> <p>However, for this to work we need some form of conatinerised reporting application.... lucky for us there is <a href="https://www.metabase.com/">Metabase</a> which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network?</p> <h3>The Beginnings of an Idea</h3> <p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p> diff --git a/src/output/feeds/andrew-ridgway.atom.xml b/src/output/feeds/andrew-ridgway.atom.xml index 1ca8a78..0b533ab 100644 --- a/src/output/feeds/andrew-ridgway.atom.xml +++ b/src/output/feeds/andrew-ridgway.atom.xml @@ -1,5 +1,5 @@ -Andrew Ridgway's Blog - Andrew Ridgwayhttp://localhost:8000/2023-10-18T20:00:00+10:00Metabase and DuckDB2023-10-18T20:00:00+10:002023-10-18T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-10-18:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> +Andrew Ridgway's Blog - Andrew Ridgwayhttp://localhost:8000/2023-12-15T20:00:00+10:00Dynamically Generating a DBT sources.yml With Datahub2023-12-15T20:00:00+10:002023-12-15T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-12-15:/datahub-dbt-sources.html<p>Leveraging the power of Datahub schemas to dynamically generate dbt sources.yml</p><p>I find that in our space the terms data catalog, data governance and data definitions can be dirty terms. I challenge any data professional to not say that these are at best after thoughts in a stack. Normally technologies that govern these areas of businesses data architecture are the unsexy ones, and there are good reasons for this. It is not fun to try and get multiple people in the room and get them to agree on any given metric. As my current boss is and has been fond of saying, "You get 3 people in the room to define how we measure a sale I will give you 3 completely different and yet valid answers". This is the core of the problem with Data governance, Catalogs and Definitions... no one can agree, and the result of this is engineers either ignore it... or put it on the back burner because its going to be an awful experience</p>Metabase and DuckDB2023-11-15T20:00:00+10:002023-11-15T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-11-15:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> <p>However, for this to work we need some form of conatinerised reporting application.... lucky for us there is <a href="https://www.metabase.com/">Metabase</a> which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network?</p> <h3>The Beginnings of an Idea</h3> <p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p> diff --git a/src/output/feeds/andrew-ridgway.rss.xml b/src/output/feeds/andrew-ridgway.rss.xml index ffdc260..9f4c46e 100644 --- a/src/output/feeds/andrew-ridgway.rss.xml +++ b/src/output/feeds/andrew-ridgway.rss.xml @@ -1,2 +1,2 @@ -Andrew Ridgway's Blog - Andrew Ridgwayhttp://localhost:8000/Wed, 18 Oct 2023 20:00:00 +1000Metabase and DuckDBhttp://localhost:8000/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p>Andrew RidgwayWed, 18 Oct 2023 20:00:00 +1000tag:localhost,2023-10-18:/metabase-duckdb.htmlBusiness Intelligencedata engineeringMetabaseDuckDBembeddedImplmenting Appflow in a Production Datalakehttp://localhost:8000/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p>Andrew RidgwayTue, 23 May 2023 20:00:00 +1000tag:localhost,2023-05-23:/appflow-production.htmlData Engineeringdata engineeringAmazonManaged ServicesDawn of another blog attempthttp://localhost:8000/how-i-built-the-damn-thing.html<p>Containers and How I take my learnings from home and apply them to work</p>Andrew RidgwayWed, 10 May 2023 20:00:00 +1000tag:localhost,2023-05-10:/how-i-built-the-damn-thing.htmlData Engineeringdata engineeringcontainers \ No newline at end of file +Andrew Ridgway's Blog - Andrew Ridgwayhttp://localhost:8000/Fri, 15 Dec 2023 20:00:00 +1000Dynamically Generating a DBT sources.yml With Datahubhttp://localhost:8000/datahub-dbt-sources.html<p>Leveraging the power of Datahub schemas to dynamically generate dbt sources.yml</p>Andrew RidgwayFri, 15 Dec 2023 20:00:00 +1000tag:localhost,2023-12-15:/datahub-dbt-sources.htmlData Engineeringdata engineeringdbtdatahubMetabase and DuckDBhttp://localhost:8000/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p>Andrew RidgwayWed, 15 Nov 2023 20:00:00 +1000tag:localhost,2023-11-15:/metabase-duckdb.htmlBusiness Intelligencedata engineeringMetabaseDuckDBembeddedImplmenting Appflow in a Production Datalakehttp://localhost:8000/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p>Andrew RidgwayTue, 23 May 2023 20:00:00 +1000tag:localhost,2023-05-23:/appflow-production.htmlData Engineeringdata engineeringAmazonManaged ServicesDawn of another blog attempthttp://localhost:8000/how-i-built-the-damn-thing.html<p>Containers and How I take my learnings from home and apply them to work</p>Andrew RidgwayWed, 10 May 2023 20:00:00 +1000tag:localhost,2023-05-10:/how-i-built-the-damn-thing.htmlData Engineeringdata engineeringcontainers \ No newline at end of file diff --git a/src/output/feeds/business-intelligence.atom.xml b/src/output/feeds/business-intelligence.atom.xml index aa85b8e..18b2805 100644 --- a/src/output/feeds/business-intelligence.atom.xml +++ b/src/output/feeds/business-intelligence.atom.xml @@ -1,5 +1,5 @@ -Andrew Ridgway's Blog - Business Intelligencehttp://localhost:8000/2023-10-18T20:00:00+10:00Metabase and DuckDB2023-10-18T20:00:00+10:002023-10-18T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-10-18:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> +Andrew Ridgway's Blog - Business Intelligencehttp://localhost:8000/2023-11-15T20:00:00+10:00Metabase and DuckDB2023-11-15T20:00:00+10:002023-11-15T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-11-15:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> <p>However, for this to work we need some form of conatinerised reporting application.... lucky for us there is <a href="https://www.metabase.com/">Metabase</a> which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network?</p> <h3>The Beginnings of an Idea</h3> <p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p> diff --git a/src/output/feeds/data-engineering.atom.xml b/src/output/feeds/data-engineering.atom.xml index e8f8ab3..070b770 100644 --- a/src/output/feeds/data-engineering.atom.xml +++ b/src/output/feeds/data-engineering.atom.xml @@ -1,5 +1,5 @@ -Andrew Ridgway's Blog - Data Engineeringhttp://localhost:8000/2023-05-23T20:00:00+10:00Implmenting Appflow in a Production Datalake2023-05-23T20:00:00+10:002023-05-17T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-05-23:/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p><p>I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called <a href="https://aws.amazon.com/appflow/">Amazon Appflow</a>. This product <em>claimed</em> to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.</p> +Andrew Ridgway's Blog - Data Engineeringhttp://localhost:8000/2023-12-15T20:00:00+10:00Dynamically Generating a DBT sources.yml With Datahub2023-12-15T20:00:00+10:002023-12-15T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-12-15:/datahub-dbt-sources.html<p>Leveraging the power of Datahub schemas to dynamically generate dbt sources.yml</p><p>I find that in our space the terms data catalog, data governance and data definitions can be dirty terms. I challenge any data professional to not say that these are at best after thoughts in a stack. Normally technologies that govern these areas of businesses data architecture are the unsexy ones, and there are good reasons for this. It is not fun to try and get multiple people in the room and get them to agree on any given metric. As my current boss is and has been fond of saying, "You get 3 people in the room to define how we measure a sale I will give you 3 completely different and yet valid answers". This is the core of the problem with Data governance, Catalogs and Definitions... no one can agree, and the result of this is engineers either ignore it... or put it on the back burner because its going to be an awful experience</p>Implmenting Appflow in a Production Datalake2023-05-23T20:00:00+10:002023-05-17T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-05-23:/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p><p>I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called <a href="https://aws.amazon.com/appflow/">Amazon Appflow</a>. This product <em>claimed</em> to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.</p> <p>This was particularly interesting to me because I had recently finished creating and s3 datalake in AWS for the company I work for. Today, I finally put my first Appflow integration to the Datalake into production and I have to say there are some rough edges to the deployment but it has been more or less as described on the box. </p> <p>Over the course of the next few paragraphs I'd like to explain the thinking I had as I investigated the product and then ultimately why I chose a managed service for this over implementing something myself in python using Dagster which I have also spun up within our cluster on AWS.</p> <h3>Datalake Extraction Layer</h3> diff --git a/src/output/index.html b/src/output/index.html index beaf745..4ded18b 100644 --- a/src/output/index.html +++ b/src/output/index.html @@ -84,6 +84,19 @@
+
+ +

+ Dynamically Generating a DBT sources.yml With Datahub +

+
+

Leveraging the power of Datahub schemas to dynamically generate dbt sources.yml

+ +
+

@@ -93,7 +106,7 @@

Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible


diff --git a/src/output/metabase-duckdb.html b/src/output/metabase-duckdb.html index 635b725..8310b88 100644 --- a/src/output/metabase-duckdb.html +++ b/src/output/metabase-duckdb.html @@ -52,7 +52,7 @@ - + @@ -91,7 +91,7 @@

Metabase and DuckDB

Posted by Andrew Ridgway - on Wed 18 October 2023 + on Wed 15 November 2023
diff --git a/src/output/tag/datahub.html b/src/output/tag/datahub.html new file mode 100644 index 0000000..e69de29 diff --git a/src/output/tag/dbt.html b/src/output/tag/dbt.html new file mode 100644 index 0000000..e69de29 diff --git a/src/output/tags.html b/src/output/tags.html index 3bb320e..39774c6 100644 --- a/src/output/tags.html +++ b/src/output/tags.html @@ -83,7 +83,9 @@

Tags for Andrew Ridgway's Blog

  • Amazon (1)
  • containers (1)
  • -
  • data engineering (3)
  • +
  • data engineering (4)
  • +
  • datahub (1)
  • +
  • dbt (1)
  • DuckDB (1)
  • embedded (1)
  • Managed Services (1)