top of page

Extract numbers from an alphanumeric string using DAX

Updated: Oct 22, 2022

Extracting numbers from an alphanumeric string like "b52h1l1h8gyv3kb7qi3" and then summing or just concatenating those values is really an easy task in Power Query, but have you ever tried doing it with DAX? The solution is much more complex than you would imagine.


In this article I am going to show you how you can use DAX to extraxt numbers from the aforementioned string and then we will be able to sum those numbers of just concatenate them. So the end result would look like this


Sum - 30

Concatenate - 52118373


Here is some mock up data that we are going to use and adjacent to the first column is the result of SUM and CONCATENATE:

Now its time to write some DAX code and initially I am not going to create a column in the table with data as shown in the above image, what I want is to be able to view the result of the itermediate calculations that we are going to store in variables and we are going to use a lot of them.


The first thing is to create a new calculated table, I have named it as Extract Numbers and have create first variable which hold the alphanumeric string on which we are going to operate and extract numbers:

VAR CurrentText =      "b52h1l1h8gyv3kb7qi3"

I am not going to return the result of this variable because it is self explanatory, but if you want to do you will have to RETURN the variable wrapped inside Curly brackets, i.e. { CurrentText }


The next variable that we need to create is going to retrieve the length of the string that we have supplied in the first variable:

VAR StringLength =     LEN ( CurrentText ) 

Now what we need to do is create a series starting from 1 till the length of the alphanumeric string and for that we can use GENERATESERIES.

VAR StringLengthSeries =    GENERATESERIES ( 1, StringLength, 1 ) 

Let's see what this variable is going to return:

We get list of numbers starting from 1 to 19 with an increment of 1. 19 is the length of the alphanumeric string.


"Value" is probably not a good name for the column from the readability point of view so let's rename this column by using SELECTCOLUMNS

VAR StringLengthSeries =
    SELECTCOLUMNS ( 
        GENERATESERIES ( 1, StringLength, 1 ), 
        "@Number", [Value]
    )

Next what we need to do is to assign the alphanumeric string for each row of the numbers that we have received. And we can do that with either ADDCOLUMNS or GENERATE/ROW construct

VAR JoinStringAndNumberSeries =
    ADDCOLUMNS ( 
        StringLengthSeries, 
        "@String", 
        CurrentText
    )

or

VAR JoinStringAndNumberSeries =
    GENERATE ( 
        StringLengthSeries, 
        ROW ( "@String", CurrentText )
    )

The below image show the result of the JoinStringAndNumberSeries variable


Now what we will do is utilize the numbers on each row to extract the corresponding value based on position from our alphanumeric string. So, if we are at 11, I want the character at the 11th position. And for that we are going to use MID and then use it to add a new column to the previous variable.

VAR SplitString =
    ADDCOLUMNS (
        JoinStringAndNumberSeries,
        "@Single Character", MID ( [@String], [@Number], 1 )
    )

This is the output of the SplitString variable:

Now what we will do is extract numbers from the @Single Character Column and for that we will have to do some complex operations


The first thing we will do is add another column to the SplitString variable and name it as "@String to Numbers" this column will have a nested variable


The first variable CurrentCharacter gets the value of the [@Single Character] in the currently iterated row supplied because of the row context created by ADDCOLUMNS


Then the IF function in the Result variable checks if conversion of the CurrentCharacter to numeric result in error or not, if it is not an error then do the conversion and we simply store the value else return BLANK

VAR ConvertNumeric =
    ADDCOLUMNS (
        SplitString,
        "String to Numbers",
            VAR CurrentCharacter = [@Single Character]
            VAR Result =
                IF ( NOT ISERROR ( VALUE ( CurrentCharacter ) ), VALUE ( CurrentCharacter ) )
            RETURN
                Result
    )

Result of this variable:

Now that we have our Integers all we can do is either concatenate them or sum them.


Let's contenate first by creating another variable:

VAR NumberString =
    CONCATENATEX ( 
        ConvertNumeric, 
        [@String to Numbers], 
        "", 
        [@Number], 
        ASC
    )

Result of NumberString Variable:












In the above variable CONCATENATEX simply combines all of the values by iterating row by row, the result returned by CONCATENATEX is still a text data type because it is left aligned


We need to use CONVERT to convert the data type of the column to Whole Number














Now that we have combined the numbers what we can do is sum those and for that what we need to do is use SUMX

VAR SumOfNumbers  =     
	SUMX ( ConvertNumeric, [@String to Numbers] ) 




Now that we have both solution that we wanted we can go back to the original table add 2 Calculated columns for concatenate and sum.


The only change that we need to make in the code that we have written so far is that instead of specifiying an explicit string in the CurrentText variable we are going to let the row context do the Job, so instead of writing

VAR CurrentText = "b52h1l1h8gyv3kb7qi3"

we will write

VAR CurrentText = Data[Text]

Here is the compiled code for concatenating numbers:

Numer Concatenate =
// The Value in the currently iterated row of the Column Text
VAR CurrentText = Data[Text] 
VAR StringLength =
    LEN ( CurrentText )
VAR StringLengthSeries =
    SELECTCOLUMNS ( 
    	GENERATESERIES ( 1, StringLength, 1 ), 
    	"@Number", [Value]
    )
VAR JoinStringAndNumberSeries =
    ADDCOLUMNS ( 
    	StringLengthSeries, 
    	"@String", CurrentText
    )
VAR SplitString =
    ADDCOLUMNS (
        JoinStringAndNumberSeries,
        "@Single Character", MID ( [@String], [@Number], 1 )
    )
VAR ConvertNumeric =
    ADDCOLUMNS (
        SplitString,
        "@String to Numbers",
            VAR CurrentCharacter = [@Single Character]
            VAR Result =
                IF ( 
                    NOT ISERROR ( VALUE ( CurrentCharacter ) ), 
                    VALUE ( CurrentCharacter ) 
            	)
            RETURN
                Result
    )
VAR NumberString =
    CONVERT (
        CONCATENATEX ( 
            ConvertNumeric, 
            [@String to Numbers], "", [@Number], 
            ASC
        ),
        INTEGER
    )
RETURN
    NumberString

And the compiled code for summing those numbers:

Numer Concatenate =
// The Value in the currently iterated row of the Column Text
VAR CurrentText = Data[Text] 
VAR StringLength =
    LEN ( CurrentText )
VAR StringLengthSeries =
    SELECTCOLUMNS ( 
    	GENERATESERIES ( 1, StringLength, 1 ), 
    	"@Number", [Value]
    )
VAR JoinStringAndNumberSeries =
    ADDCOLUMNS ( 
    	StringLengthSeries, 
    	"@String", CurrentText
    )
VAR SplitString =
    ADDCOLUMNS (
        JoinStringAndNumberSeries,
        "@Single Character", MID ( [@String], [@Number], 1 )
    )
VAR ConvertNumeric =
    ADDCOLUMNS (
        SplitString,
        "@String to Numbers",
            VAR CurrentCharacter = [@Single Character]
            VAR Result =
                IF ( 
                    NOT ISERROR ( VALUE ( CurrentCharacter ) ), 
                    VALUE ( CurrentCharacter ) 
            	)
            RETURN
                Result
    )
VAR SumOfNumbers =
    SUMX ( 
        ConvertNumeric, 
        [@String to Numbers]
    )
RETURN
    SumOfNumbers

11,258 views0 comments

Comments


bottom of page