Debbie King - Aug 25, 2014

Updating Tableau Extracts in the Cloud

Whether your association is using DSK’s cloud solution or another cloud option to host your business intelligence environment (see our post Tableau Online for Associations?), there are times when you may want to connect to a data extract instead of a live connection to a data mart.  Extracts are saved subsets of a data source that can improve performance, provide advanced capabilities, and allow offline analysis.  However there are technical challenges to consider when updating extracts.  After implementing the solution multiple times, we created this short guide to explain how address these challenges.

The Problem

While an on-premise implementation of Tableau Server allows the server to directly connect and update data extracts, in the public cloud (such as AWS), Tableau Server is not inside your organization’s network.  Therefore, in order to maintain self-updating extracts, DSK Solutions implements a process to “push” the data extract out to the cloud, rather than having the server “pull” the information on scheduled tasks.  Although the entire issue can be avoided by placing both the data mart and the Tableau server within the same virtual private cloud (in fact that is the architecture used at DSK), there are many situations where the two machines are on different networks (see figure 1 for a sample).

Figure 1:  Sample architecture where Tableau Server is outside the association network.

Cloud Architecture


The Process

1. Install Tableau Desktop on the data mart machine: The process requires the “refreshextract” utility within a licensed Tableau Desktop.

2. Publish Extracts to cloud server: Using Desktop, create your data extract by connecting to your data source and generating the extract within Tableau. I recommend not using the default names for the extract because it uses spaces and characters that are not always friendly with Windows. For this example, I named my extract simply “Membership” as opposed to "DSK_Membership_Detail + DSK_Persons  (DSK_Datamart)".  Next, publish the extract as a data connection on your cloud server (Data-->Extract_Name-->Publish to Server…)

3. Create an extract update configuration file: On the data mart machine, create a folder to house the process files (eg: C:DataExtractTableauServerDataSync). Next, create a .txt file containing the information for connecting to the cloud server as well as the original data source being used to update the extract (in this case I’m connected to a SQL server data mart). I saved this file as “TableauExport_Membership.txt”.  Pro tip:  do not add spaces -->  cmd cannot handle a space around the '=':


**Obviously, it's not ideal to have you passwords in a clear text file.  If it is a public machine or against internal policy, it is necessary to take additional security steps such as adding encryption.

4. Create the Windows batch file to run the update: Create this file in notepad, however save is as a .bat (Windows batch file). I saved this file as “Update_Extracts.bat.” You can test that the batch and configuration works by double clicking the .bat file. It will then run in cmd and give you the message “Data source refresh completed. XXXX rows uploaded”.  Pro tip:  change the path in the change directory (cd) line to correspond with your specific version of Tableau Desktop:

cd C:Program FilesTableauTableau 8.2bin
tableau refreshextract --config-file C:DataExtractTableauServerDataSyncTableauExport_Membership.txt
timeout /t 30

5. Finally schedule the job to run at set intervals: The method DSK uses is to incorporate the .bat file into our SSIS jobs used to also update the data mart from original sources. This ensures the data mart is updated before the Tableau extracts. However, the simplest method is to create a Windows task to run the batch file on a nightly basis.

In many cases DSK recommends having power users develop off of the published extracts rather than connecting directly to the associated data source (in this use case, the SQL data mart).  This ensures that both the data is being updated and that power users are using the single version of the truth designed by your association.


Written by Debbie King