Load Data Into Cosmos DB with ADF


Load Data Into Cosmos DB with ADF

In this lab, you will populate an Azure Cosmos DB container from an existing set of data using tools built in to Azure. After importing, you will use the Azure portal to view your imported data.

Before you start this lab, you will need to create an Azure Cosmos DB database and container that you will use throughout the lab. You will also use the Azure Data Factory (ADF) to import existing data into your container.

Create Azure Cosmos DB Database and Container

You will now create a database and container within your Azure Cosmos DB account.

  1. On the left side of the portal, click the Resource groups link.

    Resource groups

  2. In the Resource groups blade, locate and select the cosmoslabs Resource Group.

    Lab resource group

  3. In the cosmoslabs blade, select the Azure Cosmos DB account you recently created.

    Cosmos resource

  4. In the Azure Cosmos DB blade, locate and click the Overview link on the left side of the blade. At the top click the Add Container button.

    Add container

  5. In the Add Container popup, perform the following actions:

    1. In the Database id field, select the Create new option and enter the value ImportDatabase.

    2. Do not check the Provision database throughput option.

      Provisioning throughput for a database allows you to share the throughput among all the containers that belong to that database. Within an Azure Cosmos DB database, you can have a set of containers which shares the throughput as well as containers, which have dedicated throughput.

    3. In the Container Id field, enter the value FoodCollection.

    4. In the Partition key field, enter the value /foodGroup.

    5. In the Throughput field, enter the value 11000.

    6. Click the OK button.

  6. Wait for the creation of the new database and container to finish before moving on with this lab.

Import Lab Data Into Container

You will use Azure Data Factory (ADF) to import the JSON array stored in the nutrition.json file from Azure Blob Storage.

  1. On the left side of the portal, click the Resource groups link.

    To learn more about copying data to Cosmos DB with ADF, please read ADF’s documentation.

    Resource groups

  2. In the Resource groups blade, locate and select the cosmoslabs Resource Group.

  3. Click Add to add a new resource

    Add adf

  4. Search for Data Factory and select it. Create a new Data Factory. You should name this data factory importnutritiondata with a unique number appended and select the relevant Azure subscription. You should ensure your existing cosmoslabs resource group is selected as well as a Version V2. Select East US as the region. Click create.

    df

  5. After creation, open your newly created Data Factory. Select Author & Monitor and you will launch ADF. You should see a screen similar to the screenshot below. Select Copy Data. We will be using ADF for a one-time copy of data from a source JSON file on Azure Blob Storage to a database in Cosmos DB’s SQL API. ADF can also be used for more frequent data transfers from Cosmos DB to other data stores.

  6. Edit basic properties for this data copy. You should name the task ImportNutrition and select to Run once now. Do not select nable git.

    adf-properties

  7. Create a new connection and select Azure Blob Storage. We will import data from a json file on Azure Blob Storage. In addition to Blob Storage, you can use ADF to migrate from a wide variety of sources. We will not cover migration from these sources in this tutorial.

  8. Name the source NutritionJson and select SAS URI as the Authentication method. Please use the following SAS URI for read-only access to this Blob Storage container: https://cosmosdblabsv3.blob.core.windows.net/?sv=2018-03-28&ss=bfqt&srt=sco&sp=rlp&se=2022-01-01T04:55:28Z&st=2019-08-05T20:02:28Z&spr=https&sig=%2FVbismlTQ7INplqo6WfU8o266le72o2bFdZt1Y51PZo%3D

  9. Click Next and then Browse to select the nutrition folder

  10. Do not check Copy file recursively or Binary Copy. Also ensure that other fields are empty.

  11. Select the file format as JSON format. You should also make sure you select Array of Objects as the File pattern.

  12. You have now successfully connected the Blob Storage container with the nutrition.json file as the source.

  13. For the Destination data store add the Cosmos DB target data store by selecting Create new connection and selecting Azure Cosmos DB (SQL API).

  14. Name the linked service targetcosmosdb and select your Azure subscription and Cosmos DB account. You should also select the Cosmos DB ImportDatabase that you created earlier.

  15. Select your newly created targetcosmosdb connection as the Destination date store.

  16. Select your FoodCollection container from the drop-down menu. You will map your Blob storage file to the correct Cosmos DB container. Select Skip column mapping for all tables before continuing.

  17. You should have selected to skip column mappings in a previous step. Click through this screen.

  18. There is no need to change any settings. Click next.

  19. Click Next to begin deployment After deployment is complete, select Monitor.

  20. After a few minutes, refresh the page and the status for the ImportNutrition pipeline should be listed as Succeeded.

  21. Once the import process has completed, close the ADF. You will now proceed to validate your imported data.

Validate Imported Data

The Azure Cosmos DB Data Explorer allows you to view documents and run queries directly within the Azure Portal. In this exercise, you will use the Data Explorer to view the data stored in our container.

You will validate that the data was successfully imported into your container using the Items view in the Data Explorer.

  1. Return to the Azure Portal (http://portal.azure.com).

  2. On the left side of the portal, click the Resource groups link.

    Resource groups

  3. In the Resource groups blade, locate and select the cosmoslabs Resource Group.

    Lab resource group

  4. In the cosmoslabs blade, select the Azure Cosmos DB account you recently created.

    Cosmos resource

  5. In the Azure Cosmos DB blade, locate and click the Data Explorer link on the left side of the blade.

    Data Explorer pane

  6. In the Data Explorer section, expand the ImportDatabase database node and then expand the FoodCollection container node.

    Container node

  7. Within the FoodCollection node, click the Items link to view a subset of the various documents in the container. Select a few of the documents and observe the properties and structure of the documents.

    Documents

    Example document