DAX language of Power BI and SSAS Tabular doesn't have a function for Full Outer Join so in this blog we are going to see a work around to implement that functionality.
Here are 2 tables that we are going to use:
Customers:
Orders:
If I do a FULL OUTER JOIN on these two tables, we get this: Matching rows based on the Join condition and all the rows that do not match.
Now we need to implement the same logic in DAX in Power BI, so let's begin.
We are going to simulate 2 scenarios, one where you are able to create relationship and one where you can't.
Without Relationship
Since there is no OUTER JOIN function in DAX what we can do is use the next best option, that is NATURALLEFTOUTERJOIN
But if you try that you will get an error that:
The issue is both tables have different data lineage and for NATURALLEFTOUTERJOIN you need to have the same data lineage and same name, the names of the columns are same, i.e. CustomerKey but the tables are two different object, hence they have a different lineage.
So what we can actually do is break this lineage completely so that we do not fall into the error, one way of breaking the lieage is to use any expression tagged with the column name that is to be used for joining and the easiest expression is double quotes.
But to be able to break the lineage we need to extend the code by using SELECTCOLUMNS.
Same thing needs to be done for Orders
Now if we try NATURALLEFTOUTERJOIN, it works:
But you can see that only 1 CustomerKey column is returned and the Column CustomerKey from Orders Table isn't returned and to fix that we need to add a seprate CustomerKey Column in the Orders Table so that we can get CustomerKey of Orders Table
Now that we have all the rows of left(Customers) table and the matching rows of the right(Orders) table we can begin to think how to insert non matching rows.
And for that we need to find out all the CustomerKeys that exists in Orders Table but not in Customers Table and we can do that with the EXCEPT function:
Finally we are going to use the result of EXCEPT to inject a filter over Orders table and return all the rows for CustomerKey 8 & 10, by using CALCULATETABLE we can achieve that:
Now we can simply append ExistsInOrdersButNotInCustomers to LeftJoin Variable but that is not possible because both variables/tables do not have the same number of columns as well as same order.
So, first we need to add empty column to the ExistsInOrdersButNotInCustomers variable and for that we are going to use SELECTCOLUMNS, we can use ADDCOLUMNS as well, however then later we will have to reorder using SELECTCOLUMNS, that's why I will use SELECTCOLUMNS to add new column and reduce redundancy in the code.
I have used BLANK() instead of "" otherwise we won't be able to change the format of Birth Date Column to Date data type.
Now, you can see that we have the columns in the same order as we have in the LeftJoin variable.
Now, in the final variable all we need to do is use UNION to append one table to another.
And you can see that we have achieved a FULL OUTER JOIN condition in DAX and here is the result we got in SQL Server:
Here is the complete code used:
FULL OUTER JOIN =
VAR CustomersTable =
SELECTCOLUMNS (
Customers,
"CustomerKey",
Customers[CustomerKey] & "",
"Name", Customers[Name],
"Birth Date", Customers[Birth Date],
"Gender", Customers[Gender]
)
VAR OrdersTable =
SELECTCOLUMNS (
Orders,
"CustomerKey",
Orders[CustomerKey] & "",
"CustomerKey Orders", Orders[CustomerKey],
"Order Date", Orders[Order Date],
"Quantity", Orders[Quantity],
"Net Price", Orders[Net Price]
)
VAR LeftJoin =
NATURALLEFTOUTERJOIN (
CustomersTable,
OrdersTable
)
VAR ExistsInOrdersButNotInCustomers =
CALCULATETABLE (
Orders,
EXCEPT (
DISTINCT ( Orders[CustomerKey] ),
DISTINCT ( Customers[CustomerKey] )
)
)
VAR AddAndArrangeCols =
SELECTCOLUMNS (
ExistsInOrdersButNotInCustomers,
"CustomerKey", BLANK (),
"Name", BLANK (),
"Birth Date", BLANK (),
"Gender", BLANK (),
"CustomerKey Orders", Orders[CustomerKey],
"Order Date", Orders[Order Date],
"Quantity", Orders[Quantity],
"Net Price", Orders[Net Price]
)
VAR Result =
UNION (
LeftJoin,
AddAndArrangeCols
)
RETURN
Result
With Relationship
Now let's take a look at how we can simplify the code by creating a One to Many relationship:
So in a scenario where we have a relationship we can use GENERATE function and pair it with CALCULATETABLE to perform context transition over Orders, so let's try that.
But if you do that you can see we get an errors, the reason is the result will contain 2 columns with same name, i.e. CustomerKey, so we need to rename the CustomerKey Column in the Orders table.
If we rename the column, we can see that the code works
But now there is a problem, we are returning 20 rows which is like a CROSSJOIN, 4 Customers * 5 Orders, to fix this we need to introduce CALCULATETABLE to perform Context Transition.
Once we introduce CALCULATETABLE we have another problem, i.e. GENERATE is only returning those rows that have a match in the Orders Table, so how can we fix this? We can use GENERATEALL for this purpose as GENERATEALL still return all the rows of the Customers table even if the Orders table is EMPTY for that row.
And now you can see that we get all the rows of Customers table and the Matching rows of Orders table.
Now we can move ahead and implement the logic of identifying the rows that exists in Orders but not in Customers, and we are going to use a different technique than shown in the without relationship section.
To get the intended result we need to keep Orders in the first argument of GENERATE and Customers in the second, but you will see we get 20 (4 * 5) and that's because we are trying to send the Filter from Orders to Customers which is not allowed since the Cross-Filter direction is from Customers to Orders.
To fix this we need to use CROSSFILTER inside CALCULATETABLE.
And now you can see it works!
But you can see we get 1 & 2 and we need to Filter this out so that we only keep rows where CustomerKey from Customer table is BLANK and for that we are going to use FILTER and ISBLANK.
Finally, we need to arrange the cols using SELECTCOLUMNS so that we can append both tables using UNION.
Final result:
So that was a somewhat long article on how you can mimic the FULL OUTER JOIN behaviour of SQL in DAX.
Full code for the with relationship scenario:
FULL OUTER JOIN =
VAR LeftJoin =
GENERATEALL (
Customers,
SELECTCOLUMNS (
CALCULATETABLE ( Orders ),
"CustomerKey Orders", Orders[CustomerKey],
"Order Date", Orders[Order Date],
"Quantity", Orders[Quantity],
"Net Price", Orders[Net Price]
)
)
VAR ExistsInOrdersButNotInCustomers =
FILTER (
GENERATEALL (
SELECTCOLUMNS (
Orders,
"CustomerKey Orders", Orders[CustomerKey],
"Order Date", Orders[Order Date],
"Quantity", Orders[Quantity],
"Net Price", Orders[Net Price]
),
CALCULATETABLE (
Customers,
CROSSFILTER (
Orders[CustomerKey],
Customers[CustomerKey],
BOTH
)
)
),
ISBLANK ( [CustomerKey] )
)
VAR RearrageCols =
SELECTCOLUMNS (
ExistsInOrdersButNotInCustomers,
"CustomerKey", Customers[CustomerKey],
"Name", Customers[Name],
"Birth Date", Customers[Birth Date],
"Gender", Customers[Gender],
"CustomerKey Orders", [CustomerKey Orders],
"Order Date", [Order Date],
"Quantity", [Quantity],
"Net Price", [Net Price]
)
VAR Result =
UNION ( LeftJoin, RearrageCols )
RETURN
Result
Comments