Relationships in Tableau 2020.3
Tableau rolled out their new data model in their 2020.2 release. This data model introduced the concept of physical and logical tables in Tableau desktop, as well as of ‘relationships’ between logical tables. With 2020.3, Tableau has improved the user experience of the interface and rendered the relationship more flexible.
Use case 1: Simulate recursive joins
With 2020.3, Tableau has made available “inequality” operators for relationships. An excellent use case for this powerful feature is to simulate recursive joins. Let us demonstrate the feature with an example.
Before we dive into the business requirement for a recursive join, we will prepare the basic dataset we will need: we join the Books dataset with the Info dataset in order to access the Series and Volume of a given book.
The resulting dataset looks as follows.
The business requirement is to show, in front of each book, the books of the same series but of previous volumes. So, for Series BS-CHP, the BookID BF374 is the last book in the series and it has three ‘previous’ books. The BookID BR858 is the 3rd book in the series and has two ‘previous’ books, and so on. This desired result is shown in the table below.
In order to create this result, we would have to make multiple relationships or joins of the same dataset with itself, each time matching on SeriesID and VolumeID = VolumeID+1. We would also need to know the maximum number of books a series can have, in order to implement the number of relationships/joins necessary to extract the information required.
With the “inequality” operator for Tableau relationships, we can achieve this result with one single relationship. The operator is recursive and will apply the logic of VolumeID > VolumeID(2) between the two datasets, until all volumes of a given series have been treated. The picture above shows the output using the following relationship conditions in Tableau desktop.
Use case 2: Dealing with datasets of different granularities
Before the introduction of relationships, if we joined datasets with different granularities, the effect was to multiply the number of records for each dataset in the join.
Let us take an example with a dataset containing books, a dataset containing the number of checkouts per month for each book, and a dataset containing ratings. All the three datasets have BookID in common, so it can be used to join them.
For the example BookID AD222, we expect to see 384 as the total number of checkouts. But since we have a rating dataset that contains multiple rows with the same BookID, each row of the checkouts dataset is also multiplied that many times.
In the present case, we will see 11328 (944 ratings x 12 months) records for this BookID, and 362496 (384 checkouts x 944 ratings) checkouts. This is clearly not a valid result.
Before the introduction of relationships in the Tableau data model, would have two options:
1. Create LoD formulas to obtain the correct number of checkouts
LoDs are great but can impact performance on large datasets. In addition, this is an extra formula to maintain in the data model.
2. Create two separate data sources, Checkouts and Ratings, and use a blend on BookID.
With this solution, we would not be able to access the number of checkouts per month, but only the sum over all months, for each BookID.
With the introduction of relationships (Tableau 2020.2), we do not need to use LoDs or blends to achieve the desired result in this scenario.
Using relationships between the three datasets that we are using for this example; we can achieve the correct numeric results without any LoD formulas or data source manipulation:
1. Instead of a global “Number of records” field, we now have access to the number of records in the individual datasets. This is 12 and 944 for the checkout and ratings dataset for BookID AD222.
2. The relationship works as a ‘full outer join’ meaning that it brings lines into the dataset that did not have a correspondence in the Books dataset. In the previous example, with joins, we chose an inner join and we lost vision of such cases. This behavior of relationships is useful for analytics as well as data quality purposes.
3. We can choose to bring in the “Checkout Month” field in the view and this will not distort the numerical results. You can check out this functionality on our published workbook.
Use case 3: Empower users with comprehensive datasets without compromising on performance
Even if you do not have datasets with different granularities or the need to perform recursive joins, the new Tableau data model has one big advantage: performance. When we use relationships, only the parts of the data model used in the visualization are queried. For example, using the comprehensive data model shown above, if we inspect books and their sales, Tableau will not attempt to query the entire dataset, including the information from Authors, Awards, checkouts, etc. Instead, Tableau will only execute the query on the datasets Books, Editions and Sales.
As an organization, this allows you to get rid of specific data sources in favor of more complete data sources, and helps you reduce the number of published data sources on the server. Users interested in Sales will not be hit by the performance penalty of having to query authors and vice-versa.
You can deep dive into how the new data model (relationships) creates queries as compared to the join model with this article.
We hope you enjoyed our top three uses cases for relationships in Tableau. To know more about Tableau and Argusa, and how we can help, please contact us at email@example.com.