Running Python in Tableau with Snowflake

April 25, 2024
January 10, 2023
5
min read

Introduction

Python User-Defined-Functions (UDFs) have become available in Public Preview in Snowflake. They allow you to write Python code and call it within the Snowflake UI, just as if you were writing and executing SQL queries.

Although it is likely that most Snowflake users are more familiar with SQL than with Python, there are several advantages to using Python UDFs. According to Snowflake’s documentation, Python UDFs are particularly appropriate if:

  • You already have Python code (source or compiled) that you can re-use.
  • Your code uses functions that already exist in standard Python packages.
  • You know Python as well as or better than the other languages that support UDFs.
  • You want to take advantage of Python’s rich 3rd-party ecosystem.

Being able to leverage the versatility of Python as a programming language within the Snowflake platform opens a world of possibilities. We will explore some of the possibilities in this blog post, through simple examples.

Additionally, because Python UDFs are called through a SQL query, we will also show here how you can use Tableau to call these functions in Snowflake. This is a nice feature that allows end-users without programming knowledge to leverage the power of preexisting Python code, in a code-free visualization environment they are more familiar with.

Follow along

If you would like a more hands-on experience and have the possibility of using Tableau, you can set up a Snowflake trial account and follow along with our examples.

You might also find it helpful to read our previous blog posts: “From JSON to insight with Snowflake” and “Writing into Snowflake using Tableau Prep” - for an introduction to Snowflake and overview of data loading.

Initial setup

We will start by setting up the environment in our Snowflake trial account, with the creation of a warehouse, a database and a table with a single dummy column:

CREATE OR REPLACE WAREHOUSE my_warehouse WAREHOUSE_SIZE = 'LARGE' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE;
USE WAREHOUSE my_warehouse;
CREATE OR REPLACE DATABASE my_database;
USE DATABASE my_database;
USE SCHEMA public;
CREATE OR REPLACE TABLE sales (dummy int);

We will be playing with Tableau’s well-known Superstore dataset, which we will load onto our newly created Snowflake database using Tableau Prep Builder and following the instructions in our previous blog post “Writing into Snowflake using Tableau Prep”. So let us launch Tableau Prep Builder, and open the Superstore sample flow available at the bottom of the window:

To keep things simple, we will write the USCA orders only onto our Snowflake sales table. We have therefore slightly modified the Superstore sample flow and added a new output step at the relevant point, where we filled in our Snowflake trial account information:

Once we run this new output step, a simple select statement in Snowflake shows our table is populated with sales data:

SELECT * FROM sales LIMIT 10;

Our goal now is to analyze the sales data in terms of holidays. For example, we want to try to answer the question – do we sell more over the holidays? We do not immediately have the information in our dataset to answer this question. We know the dates in which the orders were placed, but not all holidays occur on fixed dates every year. Furthermore, we know which states in the US the orders were placed in, but not all states observe the same holidays.

Python UDFs in Snowflake

Snowflake has partnered with Anaconda to provide several open-source third-party Python packages, to be used out-of-the-box inside Snowflake at no additional cost. To use these packages you need to acknowledge the Snowflake Third Party Terms by following the instructions here on your Snowflake UI. In particular:

  1. Click the dropdown menu next to your login name, then click Switch Role > ORGADMIN to change to the organization administrator role.
  2. Click Admin > Billing > Terms & Billing.
  3. Scroll to the Anaconda section and click the Enable button. The Anaconda Packages (Preview Feature) dialog opens.
  4. Click the link to review the Snowflake Third Party Terms.
  5. If you agree to the terms, click the Acknowledge & Continue button.

Once you have followed all the steps, don’t forget to switch back to the role you were previously using.

Back in our worksheet, we will use a SQL query to show which Python packages are available:

SELECT * FROM INFORMATION_SCHEMA.packages WHERE LANGUAGE = 'python';

One of the available packages is holidays, a fast and efficient Python library for generating country, province, and state specific holidays, making it easy to determine whether a specified date is a holiday. As you can see in the documentation (always check which version of the package is available in Snowflake and check the documentation for that version) using this package is straightforward – here are a few example lines of code:

import holidays
us_pr_holidays = holidays.UnitedStates(state='PR') # or holidays.US(...)
us_pr_holidays.get('2014-01-01')  # "New Year's Day"
'2018-01-06' in us_pr_holidays  # True

Let us now investigate the basic syntax to create a python UDF in Snowflake (keep in mind arguments, packages and imports are optional):

CREATE OR REPLACE FUNCTION <UDFname>(<argument list>)
RETURNS <data type>
LANGUAGE python
RUNTIME_VERSION = '3.8'
PACKAGES = (‘<package list>’)
IMPORTS = (‘<files on a stage>’)
HANDLER = '<Python function name>'
AS $$
def <Python function name> (<arguments>):
	# your Python code goes here
return <output>
$$;

To call this function in Snowflake you would just run the following SQL query in a worksheet:

SELECT <UDFname>(<argument list>);

At this point we have all the necessary ingredients to create a UDF that uses the holidays package to let us know if a specific date is a holiday in a specific US state:

CREATE OR REPLACE FUNCTION holiday(mydate STRING, mystate STRING)
RETURNS string
LANGUAGE python
RUNTIME_VERSION = 3.8
PACKAGES = ('holidays')
HANDLER = 'id_holiday'
AS $$
import holidays
def id_holiday(mydate, mystate):
	us_holidays = holidays.US(state=mystate)
    return us_holidays.get(mydate)
$$;

Here is an example output for a few dates and states:

SELECT holiday('31-03-2022', 'CA'), holiday('31-03-2022', 'MI'), holiday('19-06-2022', 'MI');

We are not quite ready yet to use this new UDF on our sales data. As you can see, the holidays package takes in the two-letter code for the US States as argument, not the full name, which is what is available to us in our sales table. So, to use the holidays package on our sales data, we will first need to translate the state names. For this we will use Python once again.

Here is a simple python dictionary that you can find online for this exact purpose. You can download the dictionary and place it anywhere in your computer. In our example we put it in a temp folder: C:\temp\us_state_abbrev.py. To make this dictionary available in Snowflake we need to upload it onto a stage using SnowSQL, Snowflake’s command line client.

To keep things simple, we uploaded the file directly onto the user stage (no need to create a new dedicated stage). We also made sure to use AUTO_COMPRESS = FALSE, to be able to use the dictionary directly in the UDF:

PUT file://C:\temp\us_state_abbrev.py @~ AUTO_COMPRESS = FALSE OVERWRITE = TRUE;

Back in the Snowflake worksheet, we can see the file is available in our user stage:

LIST @~;

We can now write a new Python UDF that imports the dictionary from the stage and uses it to translate the US state name into their corresponding two letter codes (if the state name does not exist in the dictionary, which is possible because our sales table contains data from Canada, we will simply get the state name back):

CREATE OR REPLACE FUNCTION us_state_code(state string)
RETURNS string
LANGUAGE python
RUNTIME_VERSION = 3.8
IMPORTS = ('@~/us_state_abbrev.py')
HANDLER = 'get_code'
AS $$
import us_state_abbrev
def get_code(state):
	code = ''
	if state in us_state_abbrev.us_state_to_abbrev:
		code = us_state_abbrev.us_state_to_abbrev[state]
	else:
		code = state
	return code
$$;

We can test out our new UDF directly with data from our sales table. You can see we use the UDF to translate the data in the “State/Province” column - California into CA, Wisconsin into WI, Utah into UT, etc.:

SELECT "Order ID", "Sales", "Order Date", "State/Province", us_state_code("State/Province") AS "State Code" FROM sales LIMIT 10;

And now that we have the two letter codes for the states, we can use our holidays UDF on the sales data. Here’s an example query that already allows us to spot a sale in Wisconsin on Veterans Day:

WITH sales_coded AS(
	SELECT "Order ID", "Sales", "Order Date", "State/Province", us_state_code("State/Province") AS "State Code"
	FROM sales)
SELECT "Order ID", "Sales", "Order Date", "State/Province", "State Code", holiday("Order Date", "State Code")
FROM sales_coded
LIMIT 10;

Running Python UDFs in Snowflake through Tableau

We will move to Tableau Desktop for the rest of this exercise. There, we can connect to our Snowflake account, and bring in the sales table for analysis. For the purpose of this exercise, we will add a data source filter to focus on sales from the United States:

Here is a simple bar chart showing the sum of sales in each US state:

Let us now attempt to call our Snowflake Python UDFs using Tableau. For that, we will make use of Tableau’s passthrough SQL functions, which allow us to send SQL expressions directly to the database. For example:

RAWSQL_STR("select public.us_state_code(%1)", [State/Province])

This simple calculation uses the RAWSQL_STR function in Tableau to send the following SQL query to Snowflake:

SELECT public.us_state_code(%1)

where “public” specifies the database schema (this needs to be specified), and “%1” specifies a value in the database, in our case the state name. This is exactly the same query we would use in the Snowflake worksheet.

If we add the State Code calculated field to our bar chart (in the Text Marks), we get the following:

Keep in mind the State Code itself does not exist in our data. We are using Tableau to run python code in Snowflake that translates the State Name (which does exist in our table) into the State Code!

We can do something similar and use the holiday UDF to retrieve the holidays based on the State Code and Order Date:

RAWSQL_str(    "select public.holiday(%1,%2)", str([Order Date]), [State code])

With a few additional calculations we can modify our original visualization to show the average daily sales in each state, split into holidays and non-holidays:

not ISNULL([Holiday])
AVG({ FIXED [Order Date], [State/Province]: sum([Sales])})

This new bar chart shows that states like California and New York make, on average, more daily sales during holidays. On the other hand, states like Virginia and Michigan make significantly less sales, on average, during holidays:

We can make another simple visualization using our previously calculated Holiday field, to explore which holidays are more popular. This chart shows Election Day, the Day after Thanksgiving and Christmas Day are particularly popular days, where average daily sales in each state are the highest:

Again, keep in mind the holidays do not exist on our original sales table. It is information obtained through Python code in Snowflake that we do not see and only interact with via Tableau!

Hope you could follow along. Please feel free to send comments or questions at info@argusa.ch.

Tableau
Snowflake
Tableau
Snowflake
Tableau
Snowflake

Introduction

Python User-Defined-Functions (UDFs) have become available in Public Preview in Snowflake. They allow you to write Python code and call it within the Snowflake UI, just as if you were writing and executing SQL queries.

Although it is likely that most Snowflake users are more familiar with SQL than with Python, there are several advantages to using Python UDFs. According to Snowflake’s documentation, Python UDFs are particularly appropriate if:

  • You already have Python code (source or compiled) that you can re-use.
  • Your code uses functions that already exist in standard Python packages.
  • You know Python as well as or better than the other languages that support UDFs.
  • You want to take advantage of Python’s rich 3rd-party ecosystem.

Being able to leverage the versatility of Python as a programming language within the Snowflake platform opens a world of possibilities. We will explore some of the possibilities in this blog post, through simple examples.

Additionally, because Python UDFs are called through a SQL query, we will also show here how you can use Tableau to call these functions in Snowflake. This is a nice feature that allows end-users without programming knowledge to leverage the power of preexisting Python code, in a code-free visualization environment they are more familiar with.

Follow along

If you would like a more hands-on experience and have the possibility of using Tableau, you can set up a Snowflake trial account and follow along with our examples.

You might also find it helpful to read our previous blog posts: “From JSON to insight with Snowflake” and “Writing into Snowflake using Tableau Prep” - for an introduction to Snowflake and overview of data loading.

Initial setup

We will start by setting up the environment in our Snowflake trial account, with the creation of a warehouse, a database and a table with a single dummy column:

CREATE OR REPLACE WAREHOUSE my_warehouse WAREHOUSE_SIZE = 'LARGE' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE;
USE WAREHOUSE my_warehouse;
CREATE OR REPLACE DATABASE my_database;
USE DATABASE my_database;
USE SCHEMA public;
CREATE OR REPLACE TABLE sales (dummy int);

We will be playing with Tableau’s well-known Superstore dataset, which we will load onto our newly created Snowflake database using Tableau Prep Builder and following the instructions in our previous blog post “Writing into Snowflake using Tableau Prep”. So let us launch Tableau Prep Builder, and open the Superstore sample flow available at the bottom of the window:

To keep things simple, we will write the USCA orders only onto our Snowflake sales table. We have therefore slightly modified the Superstore sample flow and added a new output step at the relevant point, where we filled in our Snowflake trial account information:

Once we run this new output step, a simple select statement in Snowflake shows our table is populated with sales data:

SELECT * FROM sales LIMIT 10;

Our goal now is to analyze the sales data in terms of holidays. For example, we want to try to answer the question – do we sell more over the holidays? We do not immediately have the information in our dataset to answer this question. We know the dates in which the orders were placed, but not all holidays occur on fixed dates every year. Furthermore, we know which states in the US the orders were placed in, but not all states observe the same holidays.

Python UDFs in Snowflake

Snowflake has partnered with Anaconda to provide several open-source third-party Python packages, to be used out-of-the-box inside Snowflake at no additional cost. To use these packages you need to acknowledge the Snowflake Third Party Terms by following the instructions here on your Snowflake UI. In particular:

  1. Click the dropdown menu next to your login name, then click Switch Role > ORGADMIN to change to the organization administrator role.
  2. Click Admin > Billing > Terms & Billing.
  3. Scroll to the Anaconda section and click the Enable button. The Anaconda Packages (Preview Feature) dialog opens.
  4. Click the link to review the Snowflake Third Party Terms.
  5. If you agree to the terms, click the Acknowledge & Continue button.

Once you have followed all the steps, don’t forget to switch back to the role you were previously using.

Back in our worksheet, we will use a SQL query to show which Python packages are available:

SELECT * FROM INFORMATION_SCHEMA.packages WHERE LANGUAGE = 'python';

One of the available packages is holidays, a fast and efficient Python library for generating country, province, and state specific holidays, making it easy to determine whether a specified date is a holiday. As you can see in the documentation (always check which version of the package is available in Snowflake and check the documentation for that version) using this package is straightforward – here are a few example lines of code:

import holidays
us_pr_holidays = holidays.UnitedStates(state='PR') # or holidays.US(...)
us_pr_holidays.get('2014-01-01')  # "New Year's Day"
'2018-01-06' in us_pr_holidays  # True

Let us now investigate the basic syntax to create a python UDF in Snowflake (keep in mind arguments, packages and imports are optional):

CREATE OR REPLACE FUNCTION <UDFname>(<argument list>)
RETURNS <data type>
LANGUAGE python
RUNTIME_VERSION = '3.8'
PACKAGES = (‘<package list>’)
IMPORTS = (‘<files on a stage>’)
HANDLER = '<Python function name>'
AS $$
def <Python function name> (<arguments>):
	# your Python code goes here
return <output>
$$;

To call this function in Snowflake you would just run the following SQL query in a worksheet:

SELECT <UDFname>(<argument list>);

At this point we have all the necessary ingredients to create a UDF that uses the holidays package to let us know if a specific date is a holiday in a specific US state:

CREATE OR REPLACE FUNCTION holiday(mydate STRING, mystate STRING)
RETURNS string
LANGUAGE python
RUNTIME_VERSION = 3.8
PACKAGES = ('holidays')
HANDLER = 'id_holiday'
AS $$
import holidays
def id_holiday(mydate, mystate):
	us_holidays = holidays.US(state=mystate)
    return us_holidays.get(mydate)
$$;

Here is an example output for a few dates and states:

SELECT holiday('31-03-2022', 'CA'), holiday('31-03-2022', 'MI'), holiday('19-06-2022', 'MI');

We are not quite ready yet to use this new UDF on our sales data. As you can see, the holidays package takes in the two-letter code for the US States as argument, not the full name, which is what is available to us in our sales table. So, to use the holidays package on our sales data, we will first need to translate the state names. For this we will use Python once again.

Here is a simple python dictionary that you can find online for this exact purpose. You can download the dictionary and place it anywhere in your computer. In our example we put it in a temp folder: C:\temp\us_state_abbrev.py. To make this dictionary available in Snowflake we need to upload it onto a stage using SnowSQL, Snowflake’s command line client.

To keep things simple, we uploaded the file directly onto the user stage (no need to create a new dedicated stage). We also made sure to use AUTO_COMPRESS = FALSE, to be able to use the dictionary directly in the UDF:

PUT file://C:\temp\us_state_abbrev.py @~ AUTO_COMPRESS = FALSE OVERWRITE = TRUE;

Back in the Snowflake worksheet, we can see the file is available in our user stage:

LIST @~;

We can now write a new Python UDF that imports the dictionary from the stage and uses it to translate the US state name into their corresponding two letter codes (if the state name does not exist in the dictionary, which is possible because our sales table contains data from Canada, we will simply get the state name back):

CREATE OR REPLACE FUNCTION us_state_code(state string)
RETURNS string
LANGUAGE python
RUNTIME_VERSION = 3.8
IMPORTS = ('@~/us_state_abbrev.py')
HANDLER = 'get_code'
AS $$
import us_state_abbrev
def get_code(state):
	code = ''
	if state in us_state_abbrev.us_state_to_abbrev:
		code = us_state_abbrev.us_state_to_abbrev[state]
	else:
		code = state
	return code
$$;

We can test out our new UDF directly with data from our sales table. You can see we use the UDF to translate the data in the “State/Province” column - California into CA, Wisconsin into WI, Utah into UT, etc.:

SELECT "Order ID", "Sales", "Order Date", "State/Province", us_state_code("State/Province") AS "State Code" FROM sales LIMIT 10;

And now that we have the two letter codes for the states, we can use our holidays UDF on the sales data. Here’s an example query that already allows us to spot a sale in Wisconsin on Veterans Day:

WITH sales_coded AS(
	SELECT "Order ID", "Sales", "Order Date", "State/Province", us_state_code("State/Province") AS "State Code"
	FROM sales)
SELECT "Order ID", "Sales", "Order Date", "State/Province", "State Code", holiday("Order Date", "State Code")
FROM sales_coded
LIMIT 10;

Running Python UDFs in Snowflake through Tableau

We will move to Tableau Desktop for the rest of this exercise. There, we can connect to our Snowflake account, and bring in the sales table for analysis. For the purpose of this exercise, we will add a data source filter to focus on sales from the United States:

Here is a simple bar chart showing the sum of sales in each US state:

Let us now attempt to call our Snowflake Python UDFs using Tableau. For that, we will make use of Tableau’s passthrough SQL functions, which allow us to send SQL expressions directly to the database. For example:

RAWSQL_STR("select public.us_state_code(%1)", [State/Province])

This simple calculation uses the RAWSQL_STR function in Tableau to send the following SQL query to Snowflake:

SELECT public.us_state_code(%1)

where “public” specifies the database schema (this needs to be specified), and “%1” specifies a value in the database, in our case the state name. This is exactly the same query we would use in the Snowflake worksheet.

If we add the State Code calculated field to our bar chart (in the Text Marks), we get the following:

Keep in mind the State Code itself does not exist in our data. We are using Tableau to run python code in Snowflake that translates the State Name (which does exist in our table) into the State Code!

We can do something similar and use the holiday UDF to retrieve the holidays based on the State Code and Order Date:

RAWSQL_str(    "select public.holiday(%1,%2)", str([Order Date]), [State code])

With a few additional calculations we can modify our original visualization to show the average daily sales in each state, split into holidays and non-holidays:

not ISNULL([Holiday])
AVG({ FIXED [Order Date], [State/Province]: sum([Sales])})

This new bar chart shows that states like California and New York make, on average, more daily sales during holidays. On the other hand, states like Virginia and Michigan make significantly less sales, on average, during holidays:

We can make another simple visualization using our previously calculated Holiday field, to explore which holidays are more popular. This chart shows Election Day, the Day after Thanksgiving and Christmas Day are particularly popular days, where average daily sales in each state are the highest:

Again, keep in mind the holidays do not exist on our original sales table. It is information obtained through Python code in Snowflake that we do not see and only interact with via Tableau!

Hope you could follow along. Please feel free to send comments or questions at info@argusa.ch.

By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.