Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Exported file in a spreadsheet editor


(warning) It is a good practice to also include the primary key in the export definition, otherwise it can lead to duplicated records in the result of the export process. The duplicated records can occur because the dataset's columns and metric values needs to be joined and if there is no primary key, then the join is performed using all available columns. And if any of the columns contains some NULL values, the join process is not able to correctly match the records, because of the NULL values inequality. 

Let's have an example. We have a dataset of branches like this:

branch_idbranch_namecity
10Branch PraguePrague
20Branch LondonLondon
30Branch New YorkNULL


The correct definition of the export will look like this:

Code Block
"content": {
    "properties": [
        "branches.branch_id",
        "branches.branch_name",
        "branches.city"
    ]
}


In this case the metric values are correctly joined based on the primary key branches.branch_id. The result will look like this:

branch_idbranch_namecitynumber_of_employees
10Branch PraguePrague100
20Branch LondonLondon250
30Branch New YorkNULL300


The incorrect definition of the export will look like this:

Code Block
"content": {
    "properties": [
        "branches.branch_name",
        "branches.city"
    ]
}


In this case the metric values needs to be joined using columns branches.branch_name and branches.branch_city, because there is no primary key defined. And because the column city contains NULL value on the third row, the records are not correctly matched in the case of "Branch New York" and it results to duplicated records.

The result will look like this:

branch_idbranch_namecitynumber_of_employees
10Branch PraguePrague100
20Branch LondonLondon250
30Branch New YorkNULLNULL
NULLNULLNULL300