Daily DAX : Day 147 INFO.COLUMNSTORAGES
The INFO.COLUMNSTORAGES function in Power BI's Data Analysis Expressions (DAX) is one of the newer "INFO" functions introduced to provide metadata about a semantic model. Specifically, INFO.COLUMNSTORAGES retrieves information about the storage properties of columns within a table in your Power BI data model. This function is particularly useful for advanced users, such as data modelers or administrators, who need to optimize model performance or troubleshoot storage-related issues.
Syntax
The INFO.COLUMNSTORAGES function does not take any arguments. It is typically used within a DAX query, such as in the DAX Query View in Power BI Desktop, and requires an EVALUATE statement to return its results. The basic syntax is:
DAX
EVALUATE
INFO.COLUMNSTORAGES()
When executed, it returns a table containing detailed metadata about column storage in the model.
Returned Columns
The table returned by INFO.COLUMNSTORAGES includes several columns that describe the storage characteristics of each column in the model. While the exact column names and details may evolve with Power BI updates, based on the general pattern of INFO functions (modeled after Dynamic Management Views or DMVs like TMSCHEMA), it typically includes fields such as:
TableID: A unique identifier for the table containing the column.
ColumnID: A unique identifier for the column within the table.
TableName: The name of the table (may require joining with INFO.TABLES to resolve).
ColumnName: The name of the column.
StorageType: Indicates how the column data is stored (e.g., in-memory, compressed, etc.).
DataType: The data type of the column (e.g., Integer, String, DateTime).
SegmentCount: The number of segments used to store the column data, which relates to how the data is partitioned internally.
DictionarySize: The size of the dictionary used for compression (if applicable).
DataSize: The size of the actual data stored in the column.
IsCompressed: A Boolean indicating whether the column is compressed.
These fields provide a deep dive into how Power BI's VertiPaq engine (the in-memory analytics engine) manages and stores column data.
Use Case
The primary use case for INFO.COLUMNSTORAGES is performance optimization and model diagnostics. Here’s how it can be applied:
Identifying Storage Inefficiencies:
By examining the DataSize and DictionarySize, you can identify columns that consume disproportionate amounts of memory. For example, a column with high cardinality (many unique values) might have a large dictionary size, indicating poor compression and potential memory inefficiency.
Use this insight to decide whether to reduce cardinality (e.g., by splitting a column into multiple lower-cardinality columns) or change data types to optimize storage.
Analyzing Compression Effectiveness:
The IsCompressed and StorageType fields reveal whether a column benefits from VertiPaq’s compression algorithms. If a column isn’t compressed effectively, you might consider transforming the data (e.g., normalizing text values) to improve compression ratios.
Optimizing Refresh Performance:
Columns with a high SegmentCount may indicate fragmentation, which can slow down data refresh operations. This could prompt you to investigate partitioning strategies or reduce the number of rows processed during refreshes.
Model Documentation:
You can use INFO.COLUMNSTORAGES to document the storage characteristics of your model, providing a detailed report for stakeholders or for future reference. For instance, export the results to Excel for analysis or visualization.
Practical Example
Suppose you’re optimizing a Power BI model with a large sales dataset. You suspect that some columns, like a free-text "Comments" field, are bloating the model size. You run the following DAX query:
DAX
EVALUATE
INFO.COLUMNSTORAGES()
The result might show that the "Comments" column has a DictionarySize of 500 MB and a DataSize of 1 GB, with IsCompressed set to FALSE. This indicates that the column’s high cardinality prevents effective compression, consuming significant memory. Based on this, you could decide to:
Remove the column if it’s not critical to reports.
Extract key terms into a separate table to reduce cardinality.
Use Power Query to preprocess the data before loading it into the model.
To make the output more readable, you might join it with INFO.TABLES to get table names:
DAX
EVALUATE
SELECTCOLUMNS(
NATURALLEFTOUTERJOIN(
INFO.TABLES(),
INFO.COLUMNSTORAGES()
),
"Table Name", [Name],
"Column Name", [ColumnName],
"Data Size (Bytes)", [DataSize],
"Dictionary Size (Bytes)", [DictionarySize],
"Compressed", [IsCompressed]
)
ORDER BY [DataSize] DESC
This query provides a sorted list of columns by data size, helping you quickly spot the biggest memory consumers.
When to Use It
Performance Tuning: When your model is slow to refresh or consumes too much memory.
Debugging: When investigating why certain visuals or queries perform poorly.
Learning: To understand how Power BI’s VertiPaq engine handles your data.
Limitations
Preview Feature: As of March 08, 2025, INFO functions are relatively new (introduced in late 2023) and may require enabling preview features in Power BI Desktop.
Technical Audience: The function is geared toward advanced users familiar with data modeling and the VertiPaq engine.
Query-Only: It’s designed for DAX queries (e.g., in DAX Query View or external tools like DAX Studio), not for creating measures or calculated columns in the model.
Conclusion
INFO.COLUMNSTORAGES is a powerful tool for peering into the internals of your Power BI model’s column storage. Its primary use case lies in optimizing performance by identifying memory-heavy or poorly compressed columns, making it invaluable for large or complex models. By leveraging this function, you can make informed decisions to streamline your data model, ensuring faster refreshes and more efficient resource usage—all without leaving the Power BI environment.
Comments
Post a Comment