Skip to content

Tips for Optimizing Data Source Queries in Snowflake

  • July 20, 2023

By now, we all know the reasons you want to put your data in the cloud, right? Cloud storage is cheap and using tools like Snowflake allows you to store even less of it (one copy of your data, “zero copy cloned” to provide development and test environments). On top of that, it’s fast. 

So, what happens when what’s supposed to be cheap and fast…isn’t? 

Recently I was working with a client to combine three of their largest data sources to be used as one monster source for a Tableau report. Each of these data sources were views built in Snowflake, each queried dozens of tables and returned tens of millions of rows. On their own they’d take a minute or two to return results, but what I was doing was joining the three together and returning a subset. When I created my Tableau data source based on a custom query, I was looking at nearly 10 minutes to return data. 

tenor-20471404

Not cool, man. Not cool. 

So, what to do? After rewriting portions of my query to better optimize it, I was still looking at data that took eight minutes to load. Still not nearly fast enough. 

I went to a colleague, Andy Redeker, who had a bit more Snowflake experience to see if he’d ever run into a similar problem, and if he did, what he did to resolve it. 

The answer was the opposite of what my old-school SQL brain would have thought: create a task that runs my custom SQL query and writes the results to a table and use that table as my data source. 

It makes sense in the realm of Snowflake, however. Instead of waiting for my custom query to churn through its subqueries and comb through nearly sixty tables, I was querying a single table. This made returning more than 45 million records like cutting through butter with a hot knife. It was quick and painless. 

And oh, by the way, since your consummation of Snowflake credits is also based on compute power, this saves money as well. In this instance, it was acceptable to update the data once a day, so my client would consume credits for that monster query to run once, then only consume credits using a smaller warehouse that could more optimally take advantage of cached data. In the end, this data would refresh in Tableau in mere seconds, without the user ever being aware of its size. 

If you find yourself in a similar situation, I’d invite you to try this. If it still doesn’t work? Give us a call. I’m sure we’ve got someone at Continuus who can help. 

RELATED NEWS

Contribute on my charity work by your donation.

Easy Button: Alteryx Server Log Collection

Collecting Logs for Alteryx Support 

Background

Occasionally you may encounter an issue with your Alteryx server that...

by Conrad Kite

The Power of Snowflake and FactSet

The days of hosting big technology and staff or spending thousands to access fast information are over. The Data Cloud...

by Andy Leichtle