Daily DAX : Day 203 UNICODE
The UNICODE function in Power BI DAX returns the Unicode code point (numeric value) of the first character in a given text string. It is useful for text analysis, data cleansing, or scenarios where you need to work with character encodings.
Syntax
UNICODE(<text>)
<text>: A string or a column containing text. The function evaluates only the first character of the input.
Return Value
An integer representing the Unicode code point of the first character.
If the input is empty or null, it returns a null value.
Use Case
The UNICODE function is typically used in:
Text Analysis: To identify or compare characters based on their Unicode values (e.g., distinguishing between similar-looking characters like "A" and "А").
Data Validation/Cleansing: To detect special characters, non-printable characters, or specific symbols in a dataset.
Conditional Formatting: To apply rules based on character types (e.g., filtering out rows starting with certain characters).
Integration with Other Systems: When mapping or transforming text data for systems that rely on Unicode standards.
Example
Suppose you have a column [ProductName] in a table, and you want to find the Unicode value of the first character of each product name.
DAX
UnicodeValue = UNICODE([ProductName])
Sample Data
ProductName UnicodeValue
Apple 65
Banana 66
Çherry 199
123Item 49
For "Apple", the first character is "A", and UNICODE("A") returns 65 (Unicode code point for "A").
For "Çherry", the first character is "Ç", and UNICODE("Ç") returns 199.
Practical Scenario
Use Case: Detecting special characters in a dataset.
You can create a calculated column to flag rows where the first character is a non-standard letter (e.g., Unicode value outside the range for A-Z, 65–90).
DAX
SpecialCharFlag =
IF(
UNICODE([ProductName]) < 65 || UNICODE([ProductName]) > 90,
"Special/Non-Letter",
"Standard Letter"
)
This helps identify entries starting with numbers, symbols, or non-ASCII characters for further data cleansing.
Notes
The function only evaluates the first character. To analyze other characters, you may need to combine it with functions like MID or LEFT.
UNICODE is case-sensitive: UNICODE("A") (65) differs from UNICODE("a") (97).
It aligns with the Unicode standard, making it reliable for internationalization tasks.
For more complex text processing, combine UNICODE with other DAX functions like FIND, SEARCH, or SUBSTITUTE.
Comments
Post a Comment