diff --git a/requirements.txt b/requirements.txt index f7544ea..f274a1c 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1 +1,2 @@ pelican[markdown] +markdown-markup-emoji diff --git a/src/__pycache__/devpelconf.cpython-311.pyc b/src/__pycache__/devpelconf.cpython-311.pyc index 9a3ff74..42fc1de 100644 Binary files a/src/__pycache__/devpelconf.cpython-311.pyc and b/src/__pycache__/devpelconf.cpython-311.pyc differ diff --git a/src/content/images/metabase_duckdb.png b/src/content/images/metabase_duckdb.png new file mode 100644 index 0000000..f2e9917 Binary files /dev/null and b/src/content/images/metabase_duckdb.png differ diff --git a/src/content/metabase_duckdb.md b/src/content/metabase_duckdb.md new file mode 100644 index 0000000..c5b9b7c --- /dev/null +++ b/src/content/metabase_duckdb.md @@ -0,0 +1,24 @@ +Title: Metabase and DuckDB +Date: 2023-10-18 20:00 +Modified: 2023-10-18 20:00 +Category: Business Intelligence +Tags: data engineering, Metabase, DuckDB, embedded +Slug: metabase-duckdb +Authors: Andrew Ridgway +Summary: Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible + +Ahhhh [DuckDB](https://duckdb.org/) if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's _"Datawarehouse on your laptop"_ mantra. However, the OTHER application that sometimes gets missed is _"SQLite for OLAP workloads"_ and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded + +However, for this to work we need some form of conatinerised reporting application.... lucky for us there is [Metabase](https://www.metabase.com/) which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network? + +### The Beginnings of an Idea +Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here + +Duckdb Architecture + +But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. + +To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. + + + diff --git a/src/content/notebook_or_reporting.md b/src/content/notebook_or_reporting.md deleted file mode 100644 index c8ebb87..0000000 --- a/src/content/notebook_or_reporting.md +++ /dev/null @@ -1,13 +0,0 @@ -Title: Notebook or BI, What is the most appropiate communication medium -Date: 2023-07-13 20:00 -Modified: 2023-07-13 20:00 -Category: Data Analytics -Tags: data engineering, Data Analytics -Slug: notebook-or-bi -Authors: Andrew Ridgway -Summary: When is a notebook enough or when do we need a dashboard - -I want to preface this post by saying I think "Dashboards" or "BI" as terms are wayyyyyyyyyyyyyyyyy over saturated in the market. There seems to be a belief that any question answerable in data deserves the work associated with a dashboard when in fact a simple one off report, or notebook, would be more than enough. - - - diff --git a/src/devpelconf.py b/src/devpelconf.py index 40ca6f9..5a00b6b 100644 --- a/src/devpelconf.py +++ b/src/devpelconf.py @@ -16,6 +16,5 @@ TWITTER_URL = 'https://twitter.com/ar17787' FACEBOOK_URL = 'https://facebook.com/ar17787' DEFAULT_PAGINATION = 10 - # Uncomment following line if you want document-relative URLs when developing #RELATIVE_URLS = True diff --git a/src/output/archives.html b/src/output/archives.html index 9afb705..67e0951 100644 --- a/src/output/archives.html +++ b/src/output/archives.html @@ -82,6 +82,8 @@
+
Wed 18 October 2023
+
Metabase and DuckDB
Tue 23 May 2023
Implmenting Appflow in a Production Datalake
Wed 10 May 2023
diff --git a/src/output/author/andrew-ridgway.html b/src/output/author/andrew-ridgway.html index 532c985..aa82a0f 100644 --- a/src/output/author/andrew-ridgway.html +++ b/src/output/author/andrew-ridgway.html @@ -81,6 +81,19 @@
+
+ +

+ Metabase and DuckDB +

+
+

Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible

+ +
+

diff --git a/src/output/authors.html b/src/output/authors.html index 873b0ce..7861d42 100644 --- a/src/output/authors.html +++ b/src/output/authors.html @@ -84,7 +84,7 @@

- Andrew Ridgway (2) + Andrew Ridgway (3)

diff --git a/src/output/categories.html b/src/output/categories.html index f85145b..0ea8d98 100644 --- a/src/output/categories.html +++ b/src/output/categories.html @@ -82,6 +82,7 @@
diff --git a/src/output/category/business-intelligence.html b/src/output/category/business-intelligence.html new file mode 100644 index 0000000..834ab3e --- /dev/null +++ b/src/output/category/business-intelligence.html @@ -0,0 +1,165 @@ + + + + + + + + + + + Andrew Ridgway's Blog - Articles in the Business Intelligence category + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+

Articles in the Business Intelligence category

+
+
+
+
+
+ + +
+
+
+
+ +

+ Metabase and DuckDB +

+
+

Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible

+ +
+
+ + +
    + +
+ Page 1 / 1 +
+
+
+
+ +
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/output/category/data-analytics.html b/src/output/category/data-analytics.html new file mode 100644 index 0000000..17ca3ee --- /dev/null +++ b/src/output/category/data-analytics.html @@ -0,0 +1,165 @@ + + + + + + + + + + + Andrew Ridgway's Blog - Articles in the Data Analytics category + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+

Articles in the Data Analytics category

+
+
+
+
+
+ + +
+
+
+
+ +

+ Notebook or BI, What is the most appropiate communication medium +

+
+

When is a notebook enough or when do we need a dashboard

+ +
+
+ + +
    + +
+ Page 1 / 1 +
+
+
+
+ +
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/output/feeds/all-en.atom.xml b/src/output/feeds/all-en.atom.xml index caf81de..4be07e6 100644 --- a/src/output/feeds/all-en.atom.xml +++ b/src/output/feeds/all-en.atom.xml @@ -1,5 +1,11 @@ -Andrew Ridgway's Bloghttp://localhost:8000/2023-05-23T20:00:00+10:00Implmenting Appflow in a Production Datalake2023-05-23T20:00:00+10:002023-05-17T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-05-23:/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p><p>I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called <a href="https://aws.amazon.com/appflow/">Amazon Appflow</a>. This product <em>claimed</em> to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.</p> +Andrew Ridgway's Bloghttp://localhost:8000/2023-10-18T20:00:00+10:00Metabase and DuckDB2023-10-18T20:00:00+10:002023-10-18T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-10-18:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> +<p>However, for this to work we need some form of conatinerised reporting application.... lucky for us there is <a href="https://www.metabase.com/">Metabase</a> which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network?</p> +<h3>The Beginnings of an Idea</h3> +<p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p> +<p><img alt="Duckdb Architecture" height="auto" width="100%" src="http://localhost:8000/images/metabase_duckdb.png"></p> +<p>But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. </p> +<p>To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. </p>Implmenting Appflow in a Production Datalake2023-05-23T20:00:00+10:002023-05-17T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-05-23:/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p><p>I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called <a href="https://aws.amazon.com/appflow/">Amazon Appflow</a>. This product <em>claimed</em> to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.</p> <p>This was particularly interesting to me because I had recently finished creating and s3 datalake in AWS for the company I work for. Today, I finally put my first Appflow integration to the Datalake into production and I have to say there are some rough edges to the deployment but it has been more or less as described on the box. </p> <p>Over the course of the next few paragraphs I'd like to explain the thinking I had as I investigated the product and then ultimately why I chose a managed service for this over implementing something myself in python using Dagster which I have also spun up within our cluster on AWS.</p> <h3>Datalake Extraction Layer</h3> diff --git a/src/output/feeds/all.atom.xml b/src/output/feeds/all.atom.xml index f58c080..b8aa0b8 100644 --- a/src/output/feeds/all.atom.xml +++ b/src/output/feeds/all.atom.xml @@ -1,5 +1,11 @@ -Andrew Ridgway's Bloghttp://localhost:8000/2023-05-23T20:00:00+10:00Implmenting Appflow in a Production Datalake2023-05-23T20:00:00+10:002023-05-17T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-05-23:/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p><p>I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called <a href="https://aws.amazon.com/appflow/">Amazon Appflow</a>. This product <em>claimed</em> to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.</p> +Andrew Ridgway's Bloghttp://localhost:8000/2023-10-18T20:00:00+10:00Metabase and DuckDB2023-10-18T20:00:00+10:002023-10-18T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-10-18:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> +<p>However, for this to work we need some form of conatinerised reporting application.... lucky for us there is <a href="https://www.metabase.com/">Metabase</a> which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network?</p> +<h3>The Beginnings of an Idea</h3> +<p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p> +<p><img alt="Duckdb Architecture" height="auto" width="100%" src="http://localhost:8000/images/metabase_duckdb.png"></p> +<p>But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. </p> +<p>To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. </p>Implmenting Appflow in a Production Datalake2023-05-23T20:00:00+10:002023-05-17T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-05-23:/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p><p>I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called <a href="https://aws.amazon.com/appflow/">Amazon Appflow</a>. This product <em>claimed</em> to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.</p> <p>This was particularly interesting to me because I had recently finished creating and s3 datalake in AWS for the company I work for. Today, I finally put my first Appflow integration to the Datalake into production and I have to say there are some rough edges to the deployment but it has been more or less as described on the box. </p> <p>Over the course of the next few paragraphs I'd like to explain the thinking I had as I investigated the product and then ultimately why I chose a managed service for this over implementing something myself in python using Dagster which I have also spun up within our cluster on AWS.</p> <h3>Datalake Extraction Layer</h3> diff --git a/src/output/feeds/andrew-ridgway.atom.xml b/src/output/feeds/andrew-ridgway.atom.xml index 5cd016e..0cab1e8 100644 --- a/src/output/feeds/andrew-ridgway.atom.xml +++ b/src/output/feeds/andrew-ridgway.atom.xml @@ -1,5 +1,11 @@ -Andrew Ridgway's Blog - Andrew Ridgwayhttp://localhost:8000/2023-05-23T20:00:00+10:00Implmenting Appflow in a Production Datalake2023-05-23T20:00:00+10:002023-05-17T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-05-23:/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p><p>I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called <a href="https://aws.amazon.com/appflow/">Amazon Appflow</a>. This product <em>claimed</em> to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.</p> +Andrew Ridgway's Blog - Andrew Ridgwayhttp://localhost:8000/2023-10-18T20:00:00+10:00Metabase and DuckDB2023-10-18T20:00:00+10:002023-10-18T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-10-18:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> +<p>However, for this to work we need some form of conatinerised reporting application.... lucky for us there is <a href="https://www.metabase.com/">Metabase</a> which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network?</p> +<h3>The Beginnings of an Idea</h3> +<p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p> +<p><img alt="Duckdb Architecture" height="auto" width="100%" src="http://localhost:8000/images/metabase_duckdb.png"></p> +<p>But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. </p> +<p>To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. </p>Implmenting Appflow in a Production Datalake2023-05-23T20:00:00+10:002023-05-17T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-05-23:/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p><p>I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called <a href="https://aws.amazon.com/appflow/">Amazon Appflow</a>. This product <em>claimed</em> to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.</p> <p>This was particularly interesting to me because I had recently finished creating and s3 datalake in AWS for the company I work for. Today, I finally put my first Appflow integration to the Datalake into production and I have to say there are some rough edges to the deployment but it has been more or less as described on the box. </p> <p>Over the course of the next few paragraphs I'd like to explain the thinking I had as I investigated the product and then ultimately why I chose a managed service for this over implementing something myself in python using Dagster which I have also spun up within our cluster on AWS.</p> <h3>Datalake Extraction Layer</h3> diff --git a/src/output/feeds/andrew-ridgway.rss.xml b/src/output/feeds/andrew-ridgway.rss.xml index 0e44639..ffdc260 100644 --- a/src/output/feeds/andrew-ridgway.rss.xml +++ b/src/output/feeds/andrew-ridgway.rss.xml @@ -1,2 +1,2 @@ -Andrew Ridgway's Blog - Andrew Ridgwayhttp://localhost:8000/Tue, 23 May 2023 20:00:00 +1000Implmenting Appflow in a Production Datalakehttp://localhost:8000/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p>Andrew RidgwayTue, 23 May 2023 20:00:00 +1000tag:localhost,2023-05-23:/appflow-production.htmlData Engineeringdata engineeringAmazonManaged ServicesDawn of another blog attempthttp://localhost:8000/how-i-built-the-damn-thing.html<p>Containers and How I take my learnings from home and apply them to work</p>Andrew RidgwayWed, 10 May 2023 20:00:00 +1000tag:localhost,2023-05-10:/how-i-built-the-damn-thing.htmlData Engineeringdata engineeringcontainers \ No newline at end of file +Andrew Ridgway's Blog - Andrew Ridgwayhttp://localhost:8000/Wed, 18 Oct 2023 20:00:00 +1000Metabase and DuckDBhttp://localhost:8000/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p>Andrew RidgwayWed, 18 Oct 2023 20:00:00 +1000tag:localhost,2023-10-18:/metabase-duckdb.htmlBusiness Intelligencedata engineeringMetabaseDuckDBembeddedImplmenting Appflow in a Production Datalakehttp://localhost:8000/appflow-production.html<p>How Appflow simplified a major extract layer and when I choose Managed Services</p>Andrew RidgwayTue, 23 May 2023 20:00:00 +1000tag:localhost,2023-05-23:/appflow-production.htmlData Engineeringdata engineeringAmazonManaged ServicesDawn of another blog attempthttp://localhost:8000/how-i-built-the-damn-thing.html<p>Containers and How I take my learnings from home and apply them to work</p>Andrew RidgwayWed, 10 May 2023 20:00:00 +1000tag:localhost,2023-05-10:/how-i-built-the-damn-thing.htmlData Engineeringdata engineeringcontainers \ No newline at end of file diff --git a/src/output/feeds/business-intelligence.atom.xml b/src/output/feeds/business-intelligence.atom.xml new file mode 100644 index 0000000..8e3cee6 --- /dev/null +++ b/src/output/feeds/business-intelligence.atom.xml @@ -0,0 +1,8 @@ + +Andrew Ridgway's Blog - Business Intelligencehttp://localhost:8000/2023-10-18T20:00:00+10:00Metabase and DuckDB2023-10-18T20:00:00+10:002023-10-18T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-10-18:/metabase-duckdb.html<p>Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible</p><p>Ahhhh <a href="https://duckdb.org/">DuckDB</a> if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's <em>"Datawarehouse on your laptop"</em> mantra. However, the OTHER application that sometimes gets missed is <em>"SQLite for OLAP workloads"</em> and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded </p> +<p>However, for this to work we need some form of conatinerised reporting application.... lucky for us there is <a href="https://www.metabase.com/">Metabase</a> which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network?</p> +<h3>The Beginnings of an Idea</h3> +<p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p> +<p><img alt="Duckdb Architecture" height="auto" width="100%" src="http://localhost:8000/images/metabase_duckdb.png"></p> +<p>But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. </p> +<p>To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. </p> \ No newline at end of file diff --git a/src/output/feeds/data-analytics.atom.xml b/src/output/feeds/data-analytics.atom.xml new file mode 100644 index 0000000..71aa539 --- /dev/null +++ b/src/output/feeds/data-analytics.atom.xml @@ -0,0 +1,2 @@ + +Andrew Ridgway's Blog - Data Analyticshttp://localhost:8000/2023-07-13T20:00:00+10:00Notebook or BI, What is the most appropiate communication medium2023-07-13T20:00:00+10:002023-07-13T20:00:00+10:00Andrew Ridgwaytag:localhost,2023-07-13:/notebook-or-bi.html<p>When is a notebook enough or when do we need a dashboard</p><p>I want to preface this post by saying I think "Dashboards" or "BI" as terms are wayyyyyyyyyyyyyyyyy over saturated in the market. There seems to be a belief that any question answerable in data deserves the work associated with a dashboard when in fact a simple one off report, or notebook, would be more than enough.</p> \ No newline at end of file diff --git a/src/output/images/metabase_duckdb.png b/src/output/images/metabase_duckdb.png new file mode 100644 index 0000000..f2e9917 Binary files /dev/null and b/src/output/images/metabase_duckdb.png differ diff --git a/src/output/index.html b/src/output/index.html index 649a84a..beaf745 100644 --- a/src/output/index.html +++ b/src/output/index.html @@ -84,6 +84,19 @@
+
+ +

+ Metabase and DuckDB +

+
+

Using Metabase and DuckDB to create an embedded Reporting Container bringing the data as close to the report as possible

+ +
+

diff --git a/src/output/metabase-duckdb.html b/src/output/metabase-duckdb.html new file mode 100644 index 0000000..210a0d0 --- /dev/null +++ b/src/output/metabase-duckdb.html @@ -0,0 +1,179 @@ + + + + + + + + + + + Andrew Ridgway's Blog + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+

Metabase and DuckDB

+ Posted by + Andrew Ridgway + on Wed 18 October 2023 + + +
+
+
+
+
+ + +
+
+
+ +
+

Ahhhh DuckDB if you're even partly floating around in the data space you've probably been hearing ALOT about it and it's "Datawarehouse on your laptop" mantra. However, the OTHER application that sometimes gets missed is "SQLite for OLAP workloads" and it was this concept that once I grasped it gave me a very interesting idea.... What if we could take the very pretty Aggregate Layer of our Data(warehouse/LakeHouse/Lake) and put that data right next to presentation layer of the lake, reducing network latency and... hopefully... have presentation reports running over very large workloads in the blink of an eye. It might even be fast enough that it could be deployed and embedded

+

However, for this to work we need some form of conatinerised reporting application.... lucky for us there is Metabase which is a fantastic little reporting application that has an open core. So this got me thinking... Can I put these two applications together and create a Reporting Layer with report embedding capabilities that is deployable in the cluster and has a admin UI accesible over a web page all whilst keeping the data locked to our network?

+

The Beginnings of an Idea

+

Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here

+

Duckdb Architecture

+

But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?.

+

To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source.

+
+ +
+ +
+
+
+ +
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/output/notebook-or-bi.html b/src/output/notebook-or-bi.html new file mode 100644 index 0000000..96d3a99 --- /dev/null +++ b/src/output/notebook-or-bi.html @@ -0,0 +1,171 @@ + + + + + + + + + + + Andrew Ridgway's Blog + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+

Notebook or BI, What is the most appropiate communication medium

+ Posted by + Andrew Ridgway + on Thu 13 July 2023 + + +
+
+
+
+
+ + +
+
+
+ +
+

I want to preface this post by saying I think "Dashboards" or "BI" as terms are wayyyyyyyyyyyyyyyyy over saturated in the market. There seems to be a belief that any question answerable in data deserves the work associated with a dashboard when in fact a simple one off report, or notebook, would be more than enough.

+
+ +
+ +
+
+
+ +
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/output/tag/data-analytics.html b/src/output/tag/data-analytics.html new file mode 100644 index 0000000..e69de29 diff --git a/src/output/tag/duckdb.html b/src/output/tag/duckdb.html new file mode 100644 index 0000000..e69de29 diff --git a/src/output/tag/embedded.html b/src/output/tag/embedded.html new file mode 100644 index 0000000..e69de29 diff --git a/src/output/tag/metabase.html b/src/output/tag/metabase.html new file mode 100644 index 0000000..e69de29 diff --git a/src/output/tags.html b/src/output/tags.html index 5a4dc2d..3bb320e 100644 --- a/src/output/tags.html +++ b/src/output/tags.html @@ -83,8 +83,11 @@

Tags for Andrew Ridgway's Blog

  • Amazon (1)
  • containers (1)
  • -
  • data engineering (2)
  • +
  • data engineering (3)
  • +
  • DuckDB (1)
  • +
  • embedded (1)
  • Managed Services (1)
  • +
  • Metabase (1)