Importing Your Data

Qubit provides a method to import data from external sources for use throughout the Qubit platform, providing a pathway to join data from your CRM systems and offline business systems with the data generated from your websites, mobile apps, or other devices.

By importing data from across otherwise independent and sometimes disparate data points, and joining it with data from online sources, via Segments, you can get the full picture on visitor behavior and the visitor journey, better identify and understand trends, and be far better placed to make informed business decisions based on the entirety of the data collected across your business.

Imported data can be used immediately in the Qubit platform to create new segments and enhance existing ones, deliver highly personalized experiences via the Import API, create reports that include also offline data, and perform analytics in Live Tap. See Next steps for more information.

Key features

  • Ability to import your data according to predefined schema templates or using a custom schema
  • Ingest CSV files with multiple columns and many millions of rows
  • Set up automated file ingest via Google Cloud Storage (GCS), connecting your backend systems with Qubit
  • Update previous imports with further ingests to align data from external sources with the data held in the Qubit platform

Before you begin

Please observe the following requirements:

If importing manually:

  • The ingested file must be a .CSV file
  • The file headers should correspond to the defined schema
  • The file size must be less than 10GB

If importing programmatically:

  • The file to be transferred must be a CSV and placed in the relevant bucket location
  • Access to the GCS bucket location via an authentication key is required

Getting started

Select Data tools and then Import from the side menu. On opening, you will find a list of your previous imports, including a timestamp that indicates the last time data was ingested into the import.

WARNING: errors indicates that issues were found the last time data was ingested. See Error reporting for more information.

INFO: If you are also using Derived Datasets, your previous imports are shown in the Ingested data tab.

To get more in-depth information about a previous import or to ingest data, you will need to open the import. To do this, simply select it from the list.

Alternatively, you can shortcut straight to a new ingest by selecting datasets. See Ingesting data into an import for details of ingesting data manually or programmatically.

On opening an import, you will be presented with key information, organized into 3 main views:

  • Preview - shows the keys and values for your import's PRIMARY KEY column
  • Details - activity associated with the import and the information you will need to use the data once it has been ingested:

    • In Import activity we show details of each data import since the creation of the import

    • In Info, we provide the import Id. When using the Import API to use imported data in an experience, you will need to pass this Id in your GET request

    • In Lookup Access, we provide the namespace and endpoint for the import. This information can be used to query the import using Import API. You can copy the endpoint by selecting copy key and then paste it into a browser window, replacing id=<key> with an actual key from your import

    • In Live Tap Access, we provide the project and table name for the imported data; you will need this information to explore the data in Live Tap
  • Schema - shows the import's schema, including, importantly, which fields in the schema are available for lookup. See A focus on lookup availability for more information

Creating a new import

To import your data into the Qubit platform, you must first setup the import by defining a unique name, then choose a schema and the access method: manual or programmatic.

Step 1

Select New import in the list view and enter a name for the new import. This must be unique and without special characters:

datasets

INFO: If you are also using Derived Datasets, when you select New import you will need to select Ingested data.

Step 2

Select one of the pre-built schema templates, which represent the most popular schemas, or create a custom one

In the following example, the user has selected the option Custom to create a custom schema:

custom schema

WARNING: It is not possible to remove fields from pre-built schemas, only append new ones.

WARNING: A custom schema must contain at least two fields to be valid. Importantly, when selecting a custom schema you must define the name of the primary key.

primary key

Step 3

If required, to align the schema with the import, you can add additional fields. To do this, select Add new attribute, enter a name for the attribute, and select a type: String, Integer, Float, Timestamp, Boolean

In the following example, the user has added an attribute called email and left the default setting to make the field unavailable for lookup:

image 13

WARNING: You can remove any attributes you added to a schema by selecting image 14.

Step 4

If you want the field to be available for lookup, select the toggle and then select Enable lookup to confirm

DANGER: Please pay particular attention to the disclaimer, which outlines the consequences of making a field available for lookup.

See A focus on lookup availability for more information.

Step 5

Select Save to finish. At this point, you are ready to ingest data into the import

WARNING: Once you have saved the import, you cannot make any changes to the import name or schema. If you have selected the wrong schema by mistake, we recommend you delete the import and start again. To delete, select image 14 and Delete.


A focus on lookup availability

You can specify which fields you want to make available for lookup. Only fields available for lookup can be used when building experiences.

Fields will only be available for lookup if the primary key is also available.

INFO: Segment resolutions are done server-side so you can use any fields when building segments, irrespective of whether they are made available for lookup or not.

DANGER: If a field is marked as available for lookup, the data it contains can be retrieved over the public Internet without any authentication. Fields which contain personal data should therefore not be marked as available for lookup.

DANGER: You should not mark a field as available for lookup if you are unclear what this means, or if you are not authorized to do so. Please reach out to Customer Support at Qubit for more information.


Ingesting data into an import

Once you have created the import and defined the schema, you can ingest your data.

You have 2 options:

  1. Manual CSV upload - one time manual upload of a .CSV file, recommended for clients that wish to import data that will not change over time
  2. Programmatic batch upload - automated file uploads via Google Cloud Storage, recommended for clients that wish to import data that is likely to change over time

Manual CSV upload

This option can be used for a one-time manual upload of data through a .CSV file. When choosing this option, please be aware of the following conditions:

  • The ingested file must be a .CSV file
  • The file headers should correspond to the defined schema
  • The file size must be less than 10GB

To ensure that the file headers correspond to the chosen schema, you can download a template file in .CSV file format. To do this, select the template link in the Import new data window.

Step 1

If your import is not already open, select it from your list and then select Import data. The Import new data window displays

Step 2

Select Manual upload and either drag and drop your CSV file into the space provided or select inside the same space and select the .CSV file from a local or server directory

Step 3

Select Upload

WARNING: Any data you upload into an import will overwrite any previously uploaded data. Data in an upload is not joined to a previous upload.

Programmatic batch upload

This method allows you to set up automated file uploads through Google Cloud Storage (GCS). You can perform the upload to GCS either manually or programmatically. In both cases, the file will be uploaded to our system.

Before uploading to GCS, you will need an authentication key. You can either use an existing key or generate a new one. See Authentication Keys if you are not sure how to do this.

INFO: The file to be transferred must be a .CSV file.

INFO: Before getting started, you will need to download and install gsutil. See here.

Step 1

If your import is not already open, select it from your list and then select Import data. The Import new data window displays

Select Programmatic Batch

Step 2

Open your key file, locate the key client_email, and copy the key value, for example:

client-36902-22422219017643717@qubit-client-36902.iam.gserviceaccount.com

Step 3

Open a terminal window and enter:

gcloud auth activate-service-account [email] --key-file [file]

Where:

  • [email] is the key value from step 2
  • [file] is the path to the directory where you downloaded the file

Step 4

You can now upload the file to the GCS bucket location shown in the Import new data window using the following command:

gsutil cp [your data] [path] [file]

Where:

  • [your data] is the name of the CSV file you want to upload, e.g. 20180323.csv
  • [path] is the GCS bucket location shown in the Import new data window, as shown in the following example:

GCS location

DANGER: Please do not use the location shown in the above example. You will find the correct location in the Import new data window.

[FILE] is the file you want to create on GCS

In our example, the command would be:

gsutil cp 20180323.csv gs://qubit-client-36902-kn8-aux-processing/kn8/tone_test/my_first_upload.csv

The upload will now begin. If you see an error returned that begins with AccessDeniedException: 403, you must enable the programmatic file transfer for the authentication key. See Configuring An Existing Key For Programmatic File Transfer.

INFO: To ensure that the file can be automatically retrieved by Qubit, you must adhere to the location hierarchy given in the transfer details. The CSV needs to be in the correct format before you can transfer it.

WARNING: Any data you upload into an import will overwrite any previously uploaded data. Data in an upload is not joined to a previous upload.

Error reporting

As mentioned earlier, in your list of imports, errors will display when an error was encountered the last time data was imported into the dataset.

You can get more details by opening the import and looking in the Details tab.

In Import activity, you will find details of each of the data imports. In the following example, we see that the last two imports failed:

import fail

When you select one of the items in the activity log, you can find additional details relating to the failure:

import failure details

Typically, failure is caused either by the issues with imported CSV file, for example, the file headers do not correspond to the defined schema, or problems in one of Qubit's internal services. If you see two consecutive failures, we recommend reaching out to Customer Support.

Next steps

Using your data

How you use your imported data in the Qubit platform depends on your personalization goals.

One option is to create new or enhance existing segments, using your offline data to deliver one-to-many personalizations. This rules-based approach targets specific groups of visitors based on loosely-aligned preferences and behaviors. See Using Imported Data to Create Segments for more information.

A more powerful and flexible option is the Import API, which can deliver one-to-few and one-to-one personalizations that target smaller sub-sets of visitors, and even individual visitors based on individual behavioral patterns and interactions. This approach offers a greater connection between online and offline campaign messaging than can be achieved with segments.

One of the most powerful features of the API is that it provides an endpoint that can be directly called in an experience to target visitors. It supports complex data types and per field filtering.

Analyzing your data

All imported data is available instantly in Live Tap so you can get started right away with your analysis, dashboards, or ad-hoc queries.

The data is stored alongside all the collected behavioral event data, so can be joined in a query to further understand your customers, for example, including CRM data when analyzing transactions.

Last updated: April 2020
Did you find this article useful?