Creating a dataset and uploading the data

Let's have a look at the available data before we define the corresponding dataset object.

Customers table

The data we will visualize in this tutorial represent our customers. The anonymized table contains each customer's internal ID, city, address, sex, age group and most importantly - latitude and longitude of the address. The table also contains code of the neighborhood to which the address belongs, which we'll use later.

This tutorial contains geocoded data - we know the latitude and longitude of the address.

This does not always have to be the case. In case you need to geocode your data - use one of the web services, e.g. OpenCage Geocoder, or contact us.

The CSV file can be downloaded here: customers.csv

NameTitleData type
customer_idCustomer IDinteger
neighborhood_codeNeighborhood codestring
cityCustomer's citystring
addressCustomer's addressstring
sexCustomer's sexstring
age_groupCustomer's age groupstring
latAddress latitudelatitude
lngAddress longitudelongitude

Download the CSV file and put it in the /data folder of your dump.

Creating a dataset

Now, we will create the corresponding dataset. Dataset object has some specifics which differ it from other metadata objects:

  • it contains properties with featureTitle and featureSubtitle settings
  • has a ref object instead of content

The properties.featureTitle and properties.featureSubtitle properties specify the content of the tooltip shown when hovering the dataset's features in the map. In this case, it will be customer_id and an address.

Now, to the ref object. The type of the dataset is dwh, and the subtype is geometryPoint, because the table represents customers' addreses (points) that have a latitude and longitude. The table's primaryKey is the customer_id property. In the visualizations object, we say that we want to visualize it as a dotmap and a heatmap (the only two available for geometryPoint). The dataset is not categorizable by default, and none of its properties are filterable, as they will not appear in filters (more about filters later). It's data are also not allowed to be searched by full text search - fullTextIndex property.

The zoom object at the end can be used to modify the zoom levels for the dotmap visualization. This can be handy when there's a lot of dots, which could be a performance problem.

The dwh ref.properties list must correspond with the columns in the CSV file, including order and data types.

Customers dataset syntax
{
    "name": "customers",
    "type": "dataset",
    "title": "Customers",
    "description": "Customers registered in the loyalty program.",
    "properties": {
        "featureTitle": {
            "type": "property",
            "value": "customer_id"
        },
        "featureSubtitle": {
            "type": "property",
            "value": "address"
        }
    },
    "ref": {
        "type": "dwh",
        "subtype": "geometryPoint",
        "visualizations": [
            {
                "type": "dotmap"
            },
            {
                "type": "heatmap"
            }
        ],
        "primaryKey": "customer_id",
        "categorizable": true,
        "fullTextIndex": false,
        "properties": [
            {
                "name": "customer_id",
                "title": "Customer ID",
                "column": "customer_id",
                "type": "integer",
                "filterable": false
            },
            {
                "name": "neighborhood_code",
                "title": "Neighborhood code",
                "column": "neighborhood_code",
                "type": "string",
                "filterable": false
            },
            {
                "name": "city",
                "title": "City",
                "column": "city",
                "type": "string",
                "filterable": false
            },
            {
                "name": "address",
                "title": "Aaddress",
                "column": "address",
                "type": "string",
                "filterable": false
            },
            {
                "name": "sex",
                "title": "Sex",
                "column": "sex",
                "type": "string",
                "filterable": true
            },
            {
                "name": "age_group",
                "title": "Age group",
                "column": "age_group",
                "type": "string",
                "filterable": true
            },
            {
                "name": "lat",
                "title": "Address latitude",
                "column": "lat",
                "type": "latitude",
                "filterable": false
            },
            {
                "name": "lng",
                "title": "Address longitude",
                "column": "lng",
                "type": "longitude",
                "filterable": false
            }
        ],
        "zoom": {
            "min": 7,
            "optimal": 9,
            "max": 18
        }
    }
}

Using your text editor, save this dataset as customers.json to the /metadata/datasets subdirectory in your dump directory.

Using the status command, the dataset and the corresponding CSV file will be listed as new.

Use addMetadata to add the dataset to the project, and pushProject to upload the CSV file.

tomas.schmidl@secure.clevermaps.io/project:k5t8mf2a80tay2ng/dump:2020-06-23_17-48-00$ status 
Checking status of project k5t8mf2a80tay2ng (First project) against dump 2020-06-23_17-48-00...

No files have been modified locally

No files have been modified on the server

2 new files have been detected:
	/var/local/metadata/k5t8mf2a80tay2ng/2020-06-23_17-48-00/metadata/datasets/customers.json
	/var/local/metadata/k5t8mf2a80tay2ng/2020-06-23_17-48-00/data/customers.csv

tomas.schmidl@secure.clevermaps.io/project:k5t8mf2a80tay2ng/dump:2020-06-23_17-48-00$ addMetadata 
Adding all new objects to the server...

Added object customers.json

1 new object has been successfully uploaded to project k5t8mf2a80tay2ng

tomas.schmidl@secure.clevermaps.io/project:k5t8mf2a80tay2ng/dump:2020-06-23_17-48-00$ pushProject 
No metadata objects were changed - nothing to push

Asynchronous data upload started...

CSV file customers.csv successfully loaded into dataset customers (4822 rows loaded)

DWH data of project k5t8mf2a80tay2ng successfully updated from dump 2020-06-23_17-48-00

Checking model integrity of project k5t8mf2a80tay2ng... OK

(tick)  That's it! In the next chapter of this tutorial, we will define a metric and an indicator to finally see the data in the map.