top of page

Applying a Filter on Dimension vs Fact in DAX

While working with DAX, a common mistake that anyone makes (that I also made) is to think that applying a filter on a column of a Dimension table should produce the same result as of applying the same filter to a related column in the Fact table.


For example let's say you work with these 2 tables:

And you produce a report that slices some measure by Products[ProductKey], and you also add a measure that shows the Transactions for the ProductKey = 8 by applying the filter on the Products table, which is a Dimension table here.

You might also wonder that why not use the ProductKey column from the Sales table and do the same operation? If you try that...

You get a number only when the ProductKey is 8 but for others it returns a blank, why? Understanding the why is difficult because you need to mentally map things, but I will try to make it easy.


When we write a predicate statement in the filter argument of CALCULATE like Products[ProductKey] = 8, internally it expands to the bold highlighted part below.

Transactions PK 8 = 
CALCULATE ( 
    [Sales Transactions],
    FILTER ( 
        ALL ( Products[ProductKey] ),
        Products[ProductKey] = 8
    )
)

That means when we are at a different ProductKey such as 1, the engine iterates the DISTINCT values of Products[ProductKey] filters out the row where it is 8 and then applies it to the existing filter context.


The existing filter context outside CALCULATE is Products[ProductKey] = 1, since CALCULATE has a new filter it OVERWRITES any filter on the same column and for that reason you will see the Measure value of 8 everywhere in the report.


However, when we use Sales[ProductKey] = 8, this time the filter is applied to a different column, the code expands to this:

Transactions FK 8 = 
CALCULATE ( 
    [Sales Transactions],
    FILTER ( 
        ALL ( Sales[ProductKey] ),
        Sales[ProductKey] = 8
    )
)

But this time CALCULATE applies the value 8 to the Sales[ProductKey] and there is already a Filter on Products[ProductKey] which is 1, in this particular scenario CALCULATE doesn't OVERWRITES the existing filter context, simply because the filters are on different columns and it has to respect both.


Because there are two filters Sales[ProductKey] = 8 & Products[ProductKey] = 1, there is an AND operation here, there are no rows in the Sales table that belong to both 8 and 1 at the same time, otherwise that's a data modelling mistake. There can be rows that belong to 8 and 1 but they will be different rows, here the engine is requesting to return rows where each row in 8 & 1 at the same time and that's not possible.


Since it can't find any row it returns a blank, and only when the filter on Sales[ProductKey] = 8 & Products[ProductKey] = 8 the engine returns a value.


Now you can easily fix the second scenario by removing the filter context from the Products table by using a REMOVEFILTERS so that only 1 filter is visible in the filter context.

But this is a bad practice, you should not apply filters on Fact table unless you are in a scenario where you de-normalized a small dimension with 1-2 columns into Fact table, other than that I recommend to always apply filter on Dimension.


You can verify for yourself if a predicate statement actually expands or not with the help of DAX Studio:

As you can see Logical Query Plan indicates how the code that we wrote is further expanded and simplified for further processing by the DAX Engine, and the lines 6-8 show the expansion of the predicate argument.


Now for the second argument that I made on Sales[ProductKey] and how there is an AND operation here.

  1. At line 2 we see that there is a Scan_Vertipaq event and the DAX Engine has extracted Products[ProductKey] column which marked with 0 number.

  2. At line 7 we see there is Scan_Vertipaq operation which extracts the Sales[ProductKey] column which is marked with 1

  3. As you know that CALCULATE/CALCULATETABLE first prepare and applies the filter context and then evaluates the measure/table respectively, similarly in the image first the filter is prepared and applied on the Sales[ProductKey] and 8 is shown on line 8.

  4. After CALCULATE applies the filter, the operation on line 5 is performed, this one Scans the data but it is not dependent on Sales[ProductKey] it is Dependent on Products[ProductKey] so that means in a filter context where we specify filter on Sales the engine is looking for a row based on the filter on Products[ProductKey] and this is your AND operation, if both matches then we get a result otherwise a blank is returned.

And when we introduce the REMOVEFILTERS in the Sales[ProductKey] code, we completely remove the dependency from the Products table, during the Scan_Vertipaq event.

And there is another scenario where your code is:

Transactions PK 8 = 
CALCULATE ( 
    [Sales Transactions],
    Products[ProductKey] = 8
)

But you don't want to the same value everywhere, and show a value only when the row in the report is 8 and for that you can use KEEPFILTERS which basically INTERSECTS both filter context outside and inside CALCULATE.

Or you can also use:

Transactions PK 8 = 
CALCULATE ( 
    [Sales Transactions],
    FILTER ( 
        VALUES ( Products[ProductKey] ),
        Products[ProductKey] = 8 
    )
)

As you have seen how your calculations can be wrong and if you don't know why it is not working identifying that can be tricky and would require you to mentally layout each step, therefore don't apply filters to Fact table always try to work with Dimensions, and let's only aggregate values from Fact tables.

3,013 views0 comments

Comments


bottom of page