In this blog I am going to show you how you can optimize a simple measure involving CROSSJOIN and in the process we are also going to learn how to read DAX query plans.
By the end of the blog you will be able to transform a code that runs in 40 seconds to run into approximately 10 milliseconds as shown in the below image:
The first thing we have to do is to open DAX Studio and connect it to a Power BI file, or simply open DAX Studio from inside the PBI.
Before we start writing the code in DAX Studio we need to ensure 3 things are checked:
Clear Cach then Run - Clears data cache from memory if it exists because of the similar quries sent previously
Query Plan - Shows detailed view of Logical Query Plan and Physical Query Plan
Server Timings - Shows xmSQL code and time spent in Formula Engine vs Storage Engine
Now let's create the first query which retrieves DISTINCT Colors from the Products table.
EVALUATE VALUES ( Products[Color] )
This code retrieves a list of 16 distinct colors.
For getting this list of colors a query is sent to the Storage Engine that is shown in form of xmSQL, which nothing but pseudo-SQL code converted into human readable format.
The logical query plan shows the operator that is used to extract the Color, which is Scan_Vertipaq and the Physical query plan has an operator called IterCols which means the operation is going to return a table, which in this case is Products[Color]
Now the same extraction needs to be done for a different column which is CountryRegion in Customer table:
And then we are going to perform a CROSSJOIN of these 2 fields, i.e Color and Country Region using the below code:
And for this query 2 Storage Engine queries are executed, which are:
For the same CROSSJOIN the following Logical and Physical query plan is prepared and executed.
Pay attention to the above image, the CROSSJOIN in Logical Query Plan corresponds to CrossApply in Physical Query Plan.
To answer the query that we wrote, what DAX engine does is:
Formula Engine prepares the logical query plan and then might partially execute it to get the data cache for optimizing the code
Formula Engine asks for data caches from the Storage Engine
xmSQL code is executed to retrieve uncompressed in-memory data cache
The data caches sent by Storage Engine are then used by the Formula Engine to answer the query
Then the Physical Query Plan is executed that uses the data caches as the leaf level nodes
If you pay attention to the Line 2 of the Physical Query Plan you can see that right after CrossApply we see 2 child operations at line 3 and line 6, if you read the whole line you will see IterCols operator which basically returns a table and on the line 3 it returns Color and on line 6 it returns CountryRegion, then these 2 tables are used by the CrossApply operator to perform a cartesian product and returns 464 rows ( 16 * 29 )
So far everything is all good, now let's start making the code complex and understand how to not to write code involving CROSSJOIN.
The next thing to do is to add a Sales Amount column to the result of CROSSJOIN.
The following xmSQL queries are executed for the ADDCOLUMNS and CROSSJOIN operation:
The below query gets Total Sales grouped by Customer[CountryRegion] and Products[Color]
The below query gets the Customer[CountryRegion] for the CROSSJOIN
The below query gets the Products[Color] for the CROSSJOIN
The result of the ADDCOLUMNS and CROSSJOIN construct is a table of 464 rows that contains sales amount for the existing combination in the model and BLANK for the non-existing combinations.
The Physical and Logical plans are :
Everything that we have seen earlier has now been encapsulated inside ADDCOLUMNS in both Physical and Logical plans, after the CrossApply operator creates cartesian product of Color * CountryRegion i.e. 16 * 29, it then uses these combinations and does a lookup against the data cache that contains 214 rows which contains Color, CountryRegion and Total Sales.
Now next what we want is to remove the blank rows so that the result doesn't show any non-existing combinations.
The code to use is the following one:
The above code returns only 214 rows with the sales amount for the combinations that have an existing combination in the sales table.
The xmSQL code remains the same however the Physical and Logical query plans have changed because we have included FILTER, NOT, and ISBLANK into the calculation.
Physical Query Plan:
Logical Query Plan:
Now the problem with the way we have written the code so far is that the engine is producing a cartesian product which is then stored so that it can populate the sales amount on it, but the problem with this way of writing the code is that it can be extremely slow as soon as you start including columns that are of high cardinality.
To confirm this now I am going to use 2 different columns that are:
Customer Key with 18,869 unique values
Product Key with 2,517 unique values
Let's run the same construct on the new columns and see how much time it takes:
Please note I have to use ROW to just get the COUNT of the rows because the cartesian product of 18,869 * 2,517 is 47,493,273 and it will take forever to populate the result in the pane of DAX Studio.
Take a look at the result of CROSSJOIN in the Physical Query Plan after the engine generates 47,493,273 combinations it has to do a Lookup between this new table and the table of 4,272 rows containing Sales Amount grouped by Customer[CustomerKey] and Products[ProductKey] which is highlighted in Blue.
The code runs in around 18 seconds which is way too much for this simple looking code, also notice that everything is getting computed in Formula Engine which is a slower engine than the Storage Engine ( Vertipaq in this case )
Now let's see how we can improve the performance drastically, the idea behind the optimization is that instead of keeping those 47 Million combinations throught the context transition and the filtering why not just simply remove the non-existing combination before using ADDCOLUMNS?
The code is now modified to include FILTER after CROSSJOIN and before ADDCOLUMNS
The Physical Query Plan doesn't have any CrossApply operator and there is only one data cache that contains Total Sales grouped by CustomerKey and ProductKey, which then is later counted because of the COUNTROWS function as shown in the Line 4 of the Physical Query Plan.
The time taken in the execution of this code is simply unreal, it runs in 3 ms which blows my mind!
So now you have seen how we can improve the performance of a code involving CROSSJOIN by filtering non-existing combinations before doing any other operation.
After this I thought I should try to mimic a real report where we are slicing the data by Products[Color] and the results are surprising:
When I include FILTER as the top level function the query runs in 40 seconds, it takes double the time we have seen earlier because of the ROLLUPADDISSUBTOTAL, and this is because at the grand total the calculation is performed once again for the grand totals
When I include FILTER right after CROSSJOIN, it runs in 9 milliseconds and performance is as expected.
CROSSJOIN is a DAX function that utilizes Formula Engine, which is although a robust but a slow engine compared to Storage Engine which can utilize multiple cores in parallel.
If SUMMARIZE function is extremely slow in your code, becasue of the time taken by Storage Engine to scan Vertipaq, then you can push the computation towards the Formula Engine using CROSSJOIN but you need to ensure that the cartesian product isn't colossal otherwise it will bog down the performance.
Filtering right after generating cartesian product is strongly preferred rather than relying on Filtering toward the end of the calculation.
Cardinality matters, before writing DAX, you need to know the structure of the columns, DAX Engines have to scan partitions of data, if your data is highly compressed and optimally sorted, the scan would be exponentially fast. Based on these factors develop your DAX and test works best for you.