This article describes how to set type and status for each column of a Snowflake table to properly map it to a dataset in your Data Hub.
Once you have selected a dataset and a table to connect to your Data Hub, you need to determine the dataset structure by setting the type and status for each field. The platform determines the structure automatically based on column content, but it is recommended to review the default settings and make changes manually if necessary. This article will help you doing this by giving definitions of available column types and categories (statuses), their intersections, as well as some common mapping cases. Use links below to jump to a corresponding section:
Column type
'Type' setting defines what kind of content is contained in a column. There are the following column types:
- Date - this type corresponds to category detail or date;
- Text - this type can be mapped to categories dimension, detail, attribute, or date;
- Number - this type can be mapped to categories dimension, detail, attribute or measure;
- Currency - this type has specific decimal type. It can be mapped to category measure;
- Latitude - coordinate value for geo data;
- Longitude - coordinate value for geo data;
- Array - for snowflake ARRAY type, displaying value is comma joined string.
Column category (status)
- Date - column to build timeseries (x-axis), filter by date. Values in rows should be valid dates in format:
- yyyy
- yyyy MMMM
- "yyyy MMM"
- "MMM-yy"
- "M/d/yyyy"
- "yyyy-MM-dd HH:mm:ss.fffZ"
- default format of system (usually it’s yyyy-MM-dd)
- statistical date (e.g. 2020Q1, 2020FQ3, 2021M1 and other formats with frequency in date)
- Dimension - column which is supposed to be used in visualizations either as an axis category, element of a legend, or a filter.
Note: If count of unique elements of a column is more than 150,000 it is not recommended to set this column as a dimension. Instead, it is advised to keep this column's category as detail (see below). Also note that at least one column should be mapped to a dimension category but there cannot be more than 15 dimensions per dataset.
- Measure - numeric values of data points. Please, note that invalid numbers will cause error, while empty data points will be skipped. You should provide unit of measurement for each 'Measure' column using Details column.
- Detail - any value for data point. The content of these columns cannot be used to build visualizations except for a grid table.
- Attribute - dimension element attributes which can be viewed in the Dataset Viewer as presented on a screenshot below. Please, note that you cannot have different values for the same attribute of the same element.
Column details
The 'Details' setting should include semicolon-separated settings for columns including the following ones:
- Frequency - a property of date-type of columns.
Frequency=<constant frequency: A, H, FQ, Q, W, D, T> (e.g. Frequency=FQ)
Frequency=[Frequency column], in this case “Frequency column” should be mapped as detail
- Unit - a property of measure type of columns.
Unit=<constant unit(e.g. $ or Number)> (e.g. Unit=USD)
Unit could be mapped to the column e.g. Unit=[Unit column], in this case “Unit column” should be mapped as detail
- Expression - snowflake expression which will be applied to column. It can be used, for example, to convert text column to date.
E.g. Expression=TO_DATE(DATE, 'YYYY/MM/DD'). DATE is text column. Expression works only for columns with category Date, Frequency or Unit.
- Default date - x-axis column which will be used to build timeseries, other date columns will be details.
DefaultDate=<true or false>
Colum groups
Columns can be grouped to a dimension-column. It is used for mapping column to another column as attribute (property) or as hierarchy element (selected dimension will be parent for grouped column). An example of a hierarchical list is presented on a screenshot below.