updates to content

This commit is contained in:
Andrew Ridgway 2023-10-28 10:34:43 +10:00
parent 236c1bec1e
commit d45891ffd9
6 changed files with 109 additions and 6 deletions

View File

@ -18,7 +18,25 @@ Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. Bu
But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?.
To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source.
To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. The reality is its a jdbc driver compiled against metabase.
Thankfully Metabase point you to a [community driver](https://github.com/AlexR2D2/metabase_duckdb_driver) for linking to duckdb ( hopefully it will be brought into metabase proper sooner rather than later )
Now the release of this driver is still compiled against 0.8 of duckdb and 0.9 is the latest stable but hopefully the [PR](https://github.com/AlexR2D2/metabase_duckdb_driver/pull/19) for thi will land very soon giving a good quick way to link to the latest and greatest in duckdb from metabase
### But How do we get Data?
Brilliant, using the recomended DockerFile we can load up a metabase container with the duckdb driver pre built
```
FROM openjdk:19-buster
ENV MB_PLUGINS_DIR=/home/plugins/
ADD https://downloads.metabase.com/v0.46.2/metabase.jar /home
ADD https://github.com/AlexR2D2/metabase_duckdb_driver/releases/download/0.1.6/duckdb.metabase-driver.jar /home/plugins/
RUN chmod 744 /home/plugins/duckdb.metabase-driver.jar
CMD ["java", "-jar", "/home/metabase.jar"]
```
Great Now the big question. How do we get the data into the damn thing. Interestingly initially when I was designing this I had the thought of leveragin the in memory capabilities of duckdb and pulling in from the parquet on s3 directly as needed, after all the cluster is on AWS so the s3 API requests should be unbelievably fast anyway so why bother with a persistent database?

View File

@ -5,7 +5,24 @@
<p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p>
<p><img alt="Duckdb Architecture" height="auto" width="100%" src="http://localhost:8000/images/metabase_duckdb.png"></p>
<p>But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. </p>
&lt;p&gt;To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. &lt;/p&gt;</content><category term="Business Intelligence"></category><category term="data engineering"></category><category term="Metabase"></category><category term="DuckDB"></category><category term="embedded"></category></entry><entry><title>Implmenting Appflow in a Production Datalake</title><link href="http://localhost:8000/appflow-production.html" rel="alternate"></link><published>2023-05-23T20:00:00+10:00</published><updated>2023-05-17T20:00:00+10:00</updated><author><name>Andrew Ridgway</name></author><id>tag:localhost,2023-05-23:/appflow-production.html</id><summary type="html">&lt;p&gt;How Appflow simplified a major extract layer and when I choose Managed Services&lt;/p&gt;</summary><content type="html">&lt;p&gt;I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called &lt;a href="https://aws.amazon.com/appflow/"&gt;Amazon Appflow&lt;/a&gt;. This product &lt;em&gt;claimed&lt;/em&gt; to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.&lt;/p&gt;
&lt;p&gt;To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. The reality is its a jdbc driver compiled against metabase. &lt;/p&gt;
&lt;p&gt;Thankfully Metabase point you to a &lt;a href="https://github.com/AlexR2D2/metabase_duckdb_driver"&gt;community driver&lt;/a&gt; for linking to duckdb ( hopefully it will be brought into metabase proper sooner rather than later ) &lt;/p&gt;
&lt;p&gt;Now the release of this driver is still compiled against 0.8 of duckdb and 0.9 is the latest stable but hopefully the &lt;a href="https://github.com/AlexR2D2/metabase_duckdb_driver/pull/19"&gt;PR&lt;/a&gt; for thi will land very soon giving a good quick way to link to the latest and greatest in duckdb from metabase&lt;/p&gt;
&lt;h3&gt;But How do we get Data?&lt;/h3&gt;
&lt;p&gt;Brilliant, using the recomended DockerFile we can load up a metabase container with the duckdb driver pre built&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;openjdk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;buster&lt;/span&gt;
&lt;span class="n"&gt;ENV&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MB_PLUGINS_DIR&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;downloads&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;46.2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;
&lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;AlexR2D2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metabase_duckdb_driver&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;releases&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="n"&gt;RUN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;chmod&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;744&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;
&lt;span class="n"&gt;CMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;java&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-jar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/home/metabase.jar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Great Now the big question. How do we get the data into the damn thing. Interestingly initially when I was designing this I had the thought of leveragin the in memory capabilities of duckdb and pulling in from the parquet on s3 directly as needed, after all the cluster is on AWS so the s3 API requests should be unbelievably fast anyway so why bother with a persistent database? &lt;/p&gt;</content><category term="Business Intelligence"></category><category term="data engineering"></category><category term="Metabase"></category><category term="DuckDB"></category><category term="embedded"></category></entry><entry><title>Implmenting Appflow in a Production Datalake</title><link href="http://localhost:8000/appflow-production.html" rel="alternate"></link><published>2023-05-23T20:00:00+10:00</published><updated>2023-05-17T20:00:00+10:00</updated><author><name>Andrew Ridgway</name></author><id>tag:localhost,2023-05-23:/appflow-production.html</id><summary type="html">&lt;p&gt;How Appflow simplified a major extract layer and when I choose Managed Services&lt;/p&gt;</summary><content type="html">&lt;p&gt;I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called &lt;a href="https://aws.amazon.com/appflow/"&gt;Amazon Appflow&lt;/a&gt;. This product &lt;em&gt;claimed&lt;/em&gt; to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.&lt;/p&gt;
&lt;p&gt;This was particularly interesting to me because I had recently finished creating and s3 datalake in AWS for the company I work for. Today, I finally put my first Appflow integration to the Datalake into production and I have to say there are some rough edges to the deployment but it has been more or less as described on the box. &lt;/p&gt;
&lt;p&gt;Over the course of the next few paragraphs I'd like to explain the thinking I had as I investigated the product and then ultimately why I chose a managed service for this over implementing something myself in python using Dagster which I have also spun up within our cluster on AWS.&lt;/p&gt;
&lt;h3&gt;Datalake Extraction Layer&lt;/h3&gt;

View File

@ -5,7 +5,24 @@
&lt;p&gt;Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here &lt;/p&gt;
&lt;p&gt;&lt;img alt="Duckdb Architecture" height="auto" width="100%" src="http://localhost:8000/images/metabase_duckdb.png"&gt;&lt;/p&gt;
&lt;p&gt;But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. &lt;/p&gt;
&lt;p&gt;To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. &lt;/p&gt;</content><category term="Business Intelligence"></category><category term="data engineering"></category><category term="Metabase"></category><category term="DuckDB"></category><category term="embedded"></category></entry><entry><title>Implmenting Appflow in a Production Datalake</title><link href="http://localhost:8000/appflow-production.html" rel="alternate"></link><published>2023-05-23T20:00:00+10:00</published><updated>2023-05-17T20:00:00+10:00</updated><author><name>Andrew Ridgway</name></author><id>tag:localhost,2023-05-23:/appflow-production.html</id><summary type="html">&lt;p&gt;How Appflow simplified a major extract layer and when I choose Managed Services&lt;/p&gt;</summary><content type="html">&lt;p&gt;I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called &lt;a href="https://aws.amazon.com/appflow/"&gt;Amazon Appflow&lt;/a&gt;. This product &lt;em&gt;claimed&lt;/em&gt; to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.&lt;/p&gt;
&lt;p&gt;To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. The reality is its a jdbc driver compiled against metabase. &lt;/p&gt;
&lt;p&gt;Thankfully Metabase point you to a &lt;a href="https://github.com/AlexR2D2/metabase_duckdb_driver"&gt;community driver&lt;/a&gt; for linking to duckdb ( hopefully it will be brought into metabase proper sooner rather than later ) &lt;/p&gt;
&lt;p&gt;Now the release of this driver is still compiled against 0.8 of duckdb and 0.9 is the latest stable but hopefully the &lt;a href="https://github.com/AlexR2D2/metabase_duckdb_driver/pull/19"&gt;PR&lt;/a&gt; for thi will land very soon giving a good quick way to link to the latest and greatest in duckdb from metabase&lt;/p&gt;
&lt;h3&gt;But How do we get Data?&lt;/h3&gt;
&lt;p&gt;Brilliant, using the recomended DockerFile we can load up a metabase container with the duckdb driver pre built&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;openjdk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;buster&lt;/span&gt;
&lt;span class="n"&gt;ENV&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MB_PLUGINS_DIR&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;downloads&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;46.2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;
&lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;AlexR2D2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metabase_duckdb_driver&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;releases&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="n"&gt;RUN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;chmod&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;744&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;
&lt;span class="n"&gt;CMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;java&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-jar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/home/metabase.jar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Great Now the big question. How do we get the data into the damn thing. Interestingly initially when I was designing this I had the thought of leveragin the in memory capabilities of duckdb and pulling in from the parquet on s3 directly as needed, after all the cluster is on AWS so the s3 API requests should be unbelievably fast anyway so why bother with a persistent database? &lt;/p&gt;</content><category term="Business Intelligence"></category><category term="data engineering"></category><category term="Metabase"></category><category term="DuckDB"></category><category term="embedded"></category></entry><entry><title>Implmenting Appflow in a Production Datalake</title><link href="http://localhost:8000/appflow-production.html" rel="alternate"></link><published>2023-05-23T20:00:00+10:00</published><updated>2023-05-17T20:00:00+10:00</updated><author><name>Andrew Ridgway</name></author><id>tag:localhost,2023-05-23:/appflow-production.html</id><summary type="html">&lt;p&gt;How Appflow simplified a major extract layer and when I choose Managed Services&lt;/p&gt;</summary><content type="html">&lt;p&gt;I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called &lt;a href="https://aws.amazon.com/appflow/"&gt;Amazon Appflow&lt;/a&gt;. This product &lt;em&gt;claimed&lt;/em&gt; to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.&lt;/p&gt;
&lt;p&gt;This was particularly interesting to me because I had recently finished creating and s3 datalake in AWS for the company I work for. Today, I finally put my first Appflow integration to the Datalake into production and I have to say there are some rough edges to the deployment but it has been more or less as described on the box. &lt;/p&gt;
&lt;p&gt;Over the course of the next few paragraphs I'd like to explain the thinking I had as I investigated the product and then ultimately why I chose a managed service for this over implementing something myself in python using Dagster which I have also spun up within our cluster on AWS.&lt;/p&gt;
&lt;h3&gt;Datalake Extraction Layer&lt;/h3&gt;

View File

@ -5,7 +5,24 @@
&lt;p&gt;Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here &lt;/p&gt;
&lt;p&gt;&lt;img alt="Duckdb Architecture" height="auto" width="100%" src="http://localhost:8000/images/metabase_duckdb.png"&gt;&lt;/p&gt;
&lt;p&gt;But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. &lt;/p&gt;
&lt;p&gt;To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. &lt;/p&gt;</content><category term="Business Intelligence"></category><category term="data engineering"></category><category term="Metabase"></category><category term="DuckDB"></category><category term="embedded"></category></entry><entry><title>Implmenting Appflow in a Production Datalake</title><link href="http://localhost:8000/appflow-production.html" rel="alternate"></link><published>2023-05-23T20:00:00+10:00</published><updated>2023-05-17T20:00:00+10:00</updated><author><name>Andrew Ridgway</name></author><id>tag:localhost,2023-05-23:/appflow-production.html</id><summary type="html">&lt;p&gt;How Appflow simplified a major extract layer and when I choose Managed Services&lt;/p&gt;</summary><content type="html">&lt;p&gt;I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called &lt;a href="https://aws.amazon.com/appflow/"&gt;Amazon Appflow&lt;/a&gt;. This product &lt;em&gt;claimed&lt;/em&gt; to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.&lt;/p&gt;
&lt;p&gt;To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. The reality is its a jdbc driver compiled against metabase. &lt;/p&gt;
&lt;p&gt;Thankfully Metabase point you to a &lt;a href="https://github.com/AlexR2D2/metabase_duckdb_driver"&gt;community driver&lt;/a&gt; for linking to duckdb ( hopefully it will be brought into metabase proper sooner rather than later ) &lt;/p&gt;
&lt;p&gt;Now the release of this driver is still compiled against 0.8 of duckdb and 0.9 is the latest stable but hopefully the &lt;a href="https://github.com/AlexR2D2/metabase_duckdb_driver/pull/19"&gt;PR&lt;/a&gt; for thi will land very soon giving a good quick way to link to the latest and greatest in duckdb from metabase&lt;/p&gt;
&lt;h3&gt;But How do we get Data?&lt;/h3&gt;
&lt;p&gt;Brilliant, using the recomended DockerFile we can load up a metabase container with the duckdb driver pre built&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;openjdk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;buster&lt;/span&gt;
&lt;span class="n"&gt;ENV&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MB_PLUGINS_DIR&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;downloads&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;46.2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;
&lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;AlexR2D2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metabase_duckdb_driver&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;releases&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="n"&gt;RUN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;chmod&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;744&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;
&lt;span class="n"&gt;CMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;java&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-jar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/home/metabase.jar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Great Now the big question. How do we get the data into the damn thing. Interestingly initially when I was designing this I had the thought of leveragin the in memory capabilities of duckdb and pulling in from the parquet on s3 directly as needed, after all the cluster is on AWS so the s3 API requests should be unbelievably fast anyway so why bother with a persistent database? &lt;/p&gt;</content><category term="Business Intelligence"></category><category term="data engineering"></category><category term="Metabase"></category><category term="DuckDB"></category><category term="embedded"></category></entry><entry><title>Implmenting Appflow in a Production Datalake</title><link href="http://localhost:8000/appflow-production.html" rel="alternate"></link><published>2023-05-23T20:00:00+10:00</published><updated>2023-05-17T20:00:00+10:00</updated><author><name>Andrew Ridgway</name></author><id>tag:localhost,2023-05-23:/appflow-production.html</id><summary type="html">&lt;p&gt;How Appflow simplified a major extract layer and when I choose Managed Services&lt;/p&gt;</summary><content type="html">&lt;p&gt;I recently attended a meetup where there was a talk by an AWS spokesperson. Now don't get me wrong, I normally take these things with a grain of salt. At this talk there was this tiny tiny little segment about a product that AWS had released called &lt;a href="https://aws.amazon.com/appflow/"&gt;Amazon Appflow&lt;/a&gt;. This product &lt;em&gt;claimed&lt;/em&gt; to be able to automate and make easy the link between different API endpoints, REST or otherwise and send that data to another point, whether that is Redshift, Aurora, a general relational db in RDS or otherwise or s3.&lt;/p&gt;
&lt;p&gt;This was particularly interesting to me because I had recently finished creating and s3 datalake in AWS for the company I work for. Today, I finally put my first Appflow integration to the Datalake into production and I have to say there are some rough edges to the deployment but it has been more or less as described on the box. &lt;/p&gt;
&lt;p&gt;Over the course of the next few paragraphs I'd like to explain the thinking I had as I investigated the product and then ultimately why I chose a managed service for this over implementing something myself in python using Dagster which I have also spun up within our cluster on AWS.&lt;/p&gt;
&lt;h3&gt;Datalake Extraction Layer&lt;/h3&gt;

View File

@ -5,4 +5,21 @@
&lt;p&gt;Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here &lt;/p&gt;
&lt;p&gt;&lt;img alt="Duckdb Architecture" height="auto" width="100%" src="http://localhost:8000/images/metabase_duckdb.png"&gt;&lt;/p&gt;
&lt;p&gt;But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. &lt;/p&gt;
&lt;p&gt;To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. &lt;/p&gt;</content><category term="Business Intelligence"></category><category term="data engineering"></category><category term="Metabase"></category><category term="DuckDB"></category><category term="embedded"></category></entry></feed>
&lt;p&gt;To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. The reality is its a jdbc driver compiled against metabase. &lt;/p&gt;
&lt;p&gt;Thankfully Metabase point you to a &lt;a href="https://github.com/AlexR2D2/metabase_duckdb_driver"&gt;community driver&lt;/a&gt; for linking to duckdb ( hopefully it will be brought into metabase proper sooner rather than later ) &lt;/p&gt;
&lt;p&gt;Now the release of this driver is still compiled against 0.8 of duckdb and 0.9 is the latest stable but hopefully the &lt;a href="https://github.com/AlexR2D2/metabase_duckdb_driver/pull/19"&gt;PR&lt;/a&gt; for thi will land very soon giving a good quick way to link to the latest and greatest in duckdb from metabase&lt;/p&gt;
&lt;h3&gt;But How do we get Data?&lt;/h3&gt;
&lt;p&gt;Brilliant, using the recomended DockerFile we can load up a metabase container with the duckdb driver pre built&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;openjdk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;buster&lt;/span&gt;
&lt;span class="n"&gt;ENV&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MB_PLUGINS_DIR&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;downloads&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;46.2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;
&lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;github&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;AlexR2D2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;metabase_duckdb_driver&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;releases&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="n"&gt;RUN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;chmod&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;744&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metabase&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jar&lt;/span&gt;
&lt;span class="n"&gt;CMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;java&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-jar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/home/metabase.jar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Great Now the big question. How do we get the data into the damn thing. Interestingly initially when I was designing this I had the thought of leveragin the in memory capabilities of duckdb and pulling in from the parquet on s3 directly as needed, after all the cluster is on AWS so the s3 API requests should be unbelievably fast anyway so why bother with a persistent database? &lt;/p&gt;</content><category term="Business Intelligence"></category><category term="data engineering"></category><category term="Metabase"></category><category term="DuckDB"></category><category term="embedded"></category></entry></feed>

View File

@ -112,7 +112,24 @@
<p>Ok so... Big first question. Can Duckdb and Metabase talk? Well... not quite. But first lets take a quick look at the architecture we'll be employing here </p>
<p><img alt="Duckdb Architecture" height="auto" width="100%" src="http://localhost:8000/images/metabase_duckdb.png"></p>
<p>But you'll notice this pretty glossed over line, "Connector", that right there is the clincher. So what is this "Connector"?. </p>
<p>To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. </p>
<p>To Deep dive into this would take a whole blog so to give you something to quickly wrap your head around its the glue that will make metabase be able to query your data source. The reality is its a jdbc driver compiled against metabase. </p>
<p>Thankfully Metabase point you to a <a href="https://github.com/AlexR2D2/metabase_duckdb_driver">community driver</a> for linking to duckdb ( hopefully it will be brought into metabase proper sooner rather than later ) </p>
<p>Now the release of this driver is still compiled against 0.8 of duckdb and 0.9 is the latest stable but hopefully the <a href="https://github.com/AlexR2D2/metabase_duckdb_driver/pull/19">PR</a> for thi will land very soon giving a good quick way to link to the latest and greatest in duckdb from metabase</p>
<h3>But How do we get Data?</h3>
<p>Brilliant, using the recomended DockerFile we can load up a metabase container with the duckdb driver pre built</p>
<div class="highlight"><pre><span></span><code><span class="n">FROM</span><span class="w"> </span><span class="n">openjdk</span><span class="p">:</span><span class="mi">19</span><span class="o">-</span><span class="n">buster</span>
<span class="n">ENV</span><span class="w"> </span><span class="n">MB_PLUGINS_DIR</span><span class="o">=/</span><span class="n">home</span><span class="o">/</span><span class="n">plugins</span><span class="o">/</span>
<span class="n">ADD</span><span class="w"> </span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">downloads</span><span class="o">.</span><span class="n">metabase</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">v0</span><span class="o">.</span><span class="mf">46.2</span><span class="o">/</span><span class="n">metabase</span><span class="o">.</span><span class="n">jar</span><span class="w"> </span><span class="o">/</span><span class="n">home</span>
<span class="n">ADD</span><span class="w"> </span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">AlexR2D2</span><span class="o">/</span><span class="n">metabase_duckdb_driver</span><span class="o">/</span><span class="n">releases</span><span class="o">/</span><span class="n">download</span><span class="o">/</span><span class="mf">0.1</span><span class="o">.</span><span class="mi">6</span><span class="o">/</span><span class="n">duckdb</span><span class="o">.</span><span class="n">metabase</span><span class="o">-</span><span class="n">driver</span><span class="o">.</span><span class="n">jar</span><span class="w"> </span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">plugins</span><span class="o">/</span>
<span class="n">RUN</span><span class="w"> </span><span class="n">chmod</span><span class="w"> </span><span class="mi">744</span><span class="w"> </span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">plugins</span><span class="o">/</span><span class="n">duckdb</span><span class="o">.</span><span class="n">metabase</span><span class="o">-</span><span class="n">driver</span><span class="o">.</span><span class="n">jar</span>
<span class="n">CMD</span><span class="w"> </span><span class="p">[</span><span class="s2">&quot;java&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;-jar&quot;</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;/home/metabase.jar&quot;</span><span class="p">]</span>
</code></pre></div>
<p>Great Now the big question. How do we get the data into the damn thing. Interestingly initially when I was designing this I had the thought of leveragin the in memory capabilities of duckdb and pulling in from the parquet on s3 directly as needed, after all the cluster is on AWS so the s3 API requests should be unbelievably fast anyway so why bother with a persistent database? </p>
</article>
<hr>