Skip to main content
Feedback

Databricks mapping columns

Supported data type

Data Integration DatatypeDatabricks DatatypeModeIs Splitted
STRINGSTRINGNullablex
TEXTSTRINGNullablex
INTEGERBIGINTNullablex
SMALLINTBIGINTNullablex
BIGINTBIGINTNullablex
DATEDATENullablex
TIMETIMENullablex
CHARSTRINGNullablex
JSONSTRINGNullablex
RECORDSTRINGNullablex
VARIANTSTRINGNullablex
ARRAYSTRINGRepeatedx
DATETIMETIMESTAMPNullablex
TINYINTINTEGERNullablex
BOOLEANBOOLEANNullablex
BITBOOLEANNullablex
REALFLOATNullablex
FLOATFLOATNullablex
NUMBERINTEGERNullablex
DOUBLEFLOATNullablex
OBJECTSTRINGNullablex
TIMESTAMPTIMESTAMPNullablex
STRINGSTRINGNullablex
DECIMALFLOATNullablex

Loading unstructured data into Databricks table

Databricks has powerful JSON extraction functions and also supports STRUCT/MAP data types. To prevent hitting limitations or rigid data types, and to provide dynamic loading of data from unstructured data sources (especially dynamic keys in the source, such as in MongoDB), keep any unstructured column in the mapping as a STRING data type in the target table. That means the table will contain some structured data types, like float, integer, or timestamp, alongside unstructured data types represented as STRING, such as ARRAYS and RECORDS (Objects).

To select JSON columns in the target table, Databricks provides number of JSON functions.

Here is an example of handling JSON data in Databricks:

/* 
Action table has action_calc json column with the following values:
{"name": "click", "amount": 4 }
*/

SELECT
get_json_object(`action_calc`,'$.name') AS `name`,
cast(get_json_object(`action_calc`,'$.amount') as INTEGER) AS `amount`,
get_json_object(`action_calc`,'$.nosuchcol') as `missing_col`
FROM actions;

/* Result:
| name | amount | missing_col |
| 'click' | 4 | null |
*/

Expressions

A column can also serve as a target of an expression value. In that case, use the Expression definition on the right side of the mapping table. When you define an expression, the platform ignores the source column.

The expression can reference other columns in the Target section at the mapping table, and any valid expression suppoered in a Databricks clause. 

For example, the next expression calculates the table loading date by data integration, and targeted under ods_update_date columns.

note

By default, Databricks treats empty fields in CSV files as null values. Because it lacks a direct null if configuration, it loads both NULLs and empty strings from the source as empty strings.

On this Page