Storing a column as String or Integer in Analysis Services
Updated: Dec 28, 2021
Recently someone asked a question that whether they should store a column in a Tabular model in String or Integer format and in this short blog I just want to help you decide it by yourself.
In Power Query I have create 2 tables with different data types
Let's load these tables in the data model and open DAX Studio and view the vertipaq Analyzer metrics.
We can see that both column have a Dictionary encoding:
The size of the dictionary varies however, the overall size mentioned in the Data field remains the same, why? the reason being Tabular models are data type independent.
If you are working with Power BI, Power Pivot or SSAS, from the Column perspective alone it doesn't matter if you store a particular column as Integer or String, the only object that grows or reduces in size is the Dictionary that contains unique values and an Index number that points to a value in the column.
In case of Hash Encoding Analysis Services produces a dictionary that stores unique values and an Index number, so let's assume you have some colors in a column then the below image explains how that column will get stored.
Please note that after the Dictionary Encoding Run Length Encoding is applied to further compress the column, but to make it easier to understand I have only shown Dictionary Encoding.
Therefore if you use String over Integer then only the size of the dictionary will increase and not the size of the column. And this is applicable to other data types as well.
To prove the point I loaded the same column in both DateTime and Decimal number format and changed the encoding hint to Hash and you can see that only the size of the Dictionary increases and not the column size.