Finally, a better way to deploy DBT on Google Cloud!

All the code for this article can be found in the template repository linked below and you can also read this artlice on my Medium.

Code: github.com/matthh9797/dbt-cloud-run-template
You can also read this article on Meidum: medium.com/@matthh9797/finally-a-better-way-to-deploy-dbt-on-google-cloud

For the last year or so I’ve been looking for a good way to productionise DBT pipelines on Google Cloud Platform but I’ve been frustrated by the solutions I’ve found on the web. Either, they seem far too complicated or they involve multiple programming languages. I am not totally against using multiple programming langauges in one project but since DBT is a python command line tool it would surely make sense to use a python solution to host it.

To my delight, I recently stumbled upon a new release, as of DBT 1.5 you can invoke DBT using a python programme, see programmatic invocations. This means that we can create a relatively simple Flask application which can be deployed as a Cloud Run service to host our DBT pipelines. Finally, a better way to deploy DBT on Google Cloud!

Pre-requisites

There are several ways to recreate this article but to follow along line for line you will need: ¹

The gcloud CLI installed and configured.
Anaconda installed and configured.
You have a billable GCP project setup.
Docker
VS Code
VS Code Extension - Cloud Code (For local development)

Getting Started

Let’s create a new conda environment with dbt-bigquery and Flask.

conda create -n dbt-cloud-run pip
conda activate dbt-cloud-run
pip install dbt-bigquery
pip install google-cloud-logging
pip install Flask

Hello World Cloud Run App

Now create a new python flask project, using Cloud Code by clicking ‘Create New Cloud Run App’, this will create a project with a folder structured for a Flask application. Test the app by starting a Docker Daemen (you can do this by opening Docker Desktop) and check the Hello World app is working by clicking ‘Run App on Local Cloud Run Emulator’ on the Cloud Code extension.

If your app is working properly you should be able to view the home page at http://localhost:8080/. You can view detailing logs on the VS Code OUTPUT tab by clicking ‘Cloud Run: Run/Debug Locally - Detailed’ including HTTP requests.

You can remove the folders/files for the Getting Started app after testing. However, personally I like to keep them for a simple way to check if the app is running.

Add DBT to your Flask App

Update (2023-11-14) This section has been updated from the orginal version of the article to be more secure, instead of creating a temporary service account, we can use the gcp auth addon to connect with GCP locally.

You can follow the steps in my other article Local Environment Setup For DBT With GCP Using Conda to set up your local dbt project. I like to rename my DBT project folder to a folder named dbt.

I like to name the folder containing my DBT project dbt, note, the name of this folder does not affect your DBT project configuration.

For production deployment create a file called profiles.yml inside you dbt project directory and copy the project configuration from the default dbt profiles at ~/.dbt/profiles.yml.

YOUR_DBT_PROJECT:
  target: dev
  outputs:
    dev:
      dataset: YOUR_DATASET
      job_execution_timeout_seconds: 300
      job_retries: 1
      location: EU
      method: oauth
      priority: interactive
      project: YOUR_PROJECT
      threads: 4
      type: bigquery
    prod:
      dataset: YOUR_DATASET
      job_execution_timeout_seconds: 300
      job_retries: 1
      location: EU
      method: oauth
      priority: interactive
      project: YOUR_PROJECT
      threads: 4
      type: bigquery

Usually, you will have at least one more profile, for example, prod for your production data

Authenticate with GCP from your Local Container

To authenticate your local cloud code container with GCP first login with your gcloud credentials and set your project.

gcloud auth application-default login
gcloud config set project YOUR_PROJECT

Next, enable the GCP Auth addon in your VS Code workspace setttings.

{
    "cloudcode.useGcloudAuthSkaffold": true
}

Add DBT Deployment Scripts

In app.py add the following imports to the top of your script and set up google cloud logging, this will ensure that your DBT logs are sent to Google Cloud Logging.

import os
import logging
import json
import os

from flask import Flask, request, escape, render_template
import google.cloud.logging
from dbt.cli.main import dbtRunner, dbtRunnerResult


client = google.cloud.logging.Client()
client.setup_logging()

Now add an endpoint to your app to run a certain set of DBT commands programmatically. The following snippet takes a POST request and reads the optional parameter target which defaults to dev, then runs dbt source freshness and dbt build.

You might want to buld on this by, for example, adding another endpoint called weekly which is invoked every week and add the --full-refresh flag to fully refresh any incremental models

@app.route('/daily', methods=['POST'])
def daily():
    """DBT Daily Runner."""

    try:

        json = request.get_json(force=True) # https://stackoverflow.com/questions/53216177/http-triggering-cloud-function-with-cloud-scheduler/60615210#60615210
        target = escape(json['target']) if 'target' in json else 'prod'

        # initialize
        dbt = dbtRunner()

        # create CLI args as a list of strings
        cli_args = ["--project-dir", "dbt", "--profiles-dir", "dbt"]
        target_arg = ['--target', target]
 
        logging.info('Running: dbt source freshness')
        res: dbtRunnerResult = dbt.invoke(['source', 'freshness'] + cli_args + target_arg)

        logging.info('Running: dbt build')
        res: dbtRunnerResult = dbt.invoke(['build'] + cli_args + target_arg)

        ok = 'DBT Run Successfully'
        logging.info(ok)
        return ok     
    
    except Exception as e:
        logging.exception(e)
        return e

Amend your dockerfile by using the image that dbt labs official image and add a line to run dbt deps after COPY . ..

# Python image to use.
FROM ghcr.io/dbt-labs/dbt-bigquery:1.5.6

...

# Download dbt dependencies
RUN dbt deps --profiles-dir dbt --project-dir dbt

To test the package downloads are working you can add a package to the dbt/packages.yml file like so.

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1 # Update to current version if required

Test the Deployment Locally

To test if the deployment is working locally, you can test your application locally by running it again on the Cloud Run Emulator. When the app is running you can test it by invoking the local endpoint, I like using https://reqbin.com/ for this. Send a post request to http://localhost:8080/daily with the message {"target": "dev"}. If your application has run successfully you will get the success message returned. You can check on the detailed logs that logging is being sent to the google-cloud-logging API.

DBT Run Successfully

It’s not often the case that everything works first time, for debugging, I would recommend using the ‘Debug App on Local Cloud Run Emulator’ option on cloud code with breakpoints.

Deploy Your App into Production

Now our app is working locally you can deploy it to Cloud Run by running the following bash script. If you have a Windows machine you may need to run this with the GCP Cloud Shell. Now that your cloud run service is deployed you can easily set up Cloud Scheduler to invoke it on a daily basis. For more information on that I would recommend checking out Data Science on GCP.

SERVICE_ACCOUNT=YOUR_SERVICE_ACCOUNT
SERVICE_NAME=YOUR_SERVICE_NAME # e.g. dbt-daily
REGION=YOUR_REGION # e.g. europe-west2

gcloud builds submit \
  --tag gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}

gcloud run deploy ${SERVICE_NAME} --region $REGION \
    --image gcr.io/$(gcloud config get-value project)/${SERVICE_NAME} \
    --service-account ${SERVICE_ACCOUNT}@$(gcloud config get-value project).iam.gserviceaccount.com \
    --platform managed \
    --no-allow-unauthenticated

Conclusion

We have just tested and deployed a simple Flask application to wrap up a DBT pipeline and run it in production. I believe this deployment is an improvement on other popular solutions I have found on the internet because, 1, it is more simple, 2, it is a python application. One of the main advantages of it being a python application is that in my opinion most people who know DBT are likely to be familiar with python as opposed to GO, for example. This means we can easily extend this application to handle the results and add things like monitoring. I plan to add more detail on this in the future so stay tuned!

Image by tawatchai07 on Freepik ↩︎

Matthew Henderson