Creating a Multi-Partition Solution using Azure Cosmos DB


Creating a Multi-Partition Solution using Azure Cosmos DB

In this lab, you will create multiple Azure Cosmos DB containers. Some of the containers will be unlimited and configured with a partition key, while others will be fixed-sized. You will then use the SQL API and .NET SDK to query specific containers using a single partition key or across multiple partition keys.

Log-in to the Azure Portal

  1. In a new window, sign in to the Azure Portal (http://portal.azure.com).

  2. Once you have logged in, you may be prompted to start a tour of the Azure portal. You can safely skip this step.

Setup

Before you start this lab, you will need to create an Azure Cosmos DB database and collection that you will use throughout the lab. The .NET SDK requires credentials to connect to your Azure Cosmos DB account. You will collect and store these credentials for use throughout the lab.

Retrieve Account Credentials

  1. On the left side of the portal, click the Resource groups link.

    Resource groups

  2. In the Resource groups blade, locate and select the cosmosgroup-lab Resource Group.

    Lab resource group

  3. In the cosmosgroup-lab blade, select the Azure Cosmos DB account you recently created.

    Cosmos resource

  4. In the Azure Cosmos DB blade, locate the Settings section and click the Keys link.

    Keys pane

  5. In the Keys pane, record the values in the CONNECTION STRING, URI and PRIMARY KEY fields. You will use these values later in this lab.

    Credentials

Create Containers using the .NET SDK

You will start by using the .NET SDK to create both fixed-size and unlimited containers to use in the lab.

Create a .NET Core Project

  1. On your local machine, create a new folder that will be used to contain the content of your .NET Core project.

  2. In the new folder, right-click the folder and select the Open with Code menu option.

    Open with Visual Studio Code

    Alternatively, you can run a command prompt in your current directory and execute the code . command.

  3. In the Visual Studio Code window that appears, right-click the Explorer pane and select the Open in Command Prompt menu option.

    Open in Command Prompt

  4. In the open terminal pane, enter and execute the following command:

     dotnet new console --output .
    

    This command will create a new .NET Core 2.1 project. The project will be a console project and the project will be created in the current directly since you used the --output . option.

  5. Visual Studio Code will most likely prompt you to install various extensions related to .NET Core or Azure Cosmos DB development. None of these extensions are required to complete the labs.

  6. In the terminal pane, enter and execute the following command:

     dotnet add package Microsoft.Azure.DocumentDB.Core --version 1.9.1
    

    This command will add the Microsoft.Azure.DocumentDB.Core NuGet package as a project dependency. The lab instructions have been tested using the 1.9.1 version of this NuGet package.

  7. In the terminal pane, enter and execute the following command:

     dotnet add package Bogus --version 22.0.8
    

    This command will add the Bogus NuGet package as a project dependency. This library will allow us to quickly generate test data using a fluent syntax and minimal code. We will use this library to generate test documents to upload to our Azure Cosmos DB instance. The lab instructions have been tested using the 22.0.8 version of this NuGet package.

  8. In the terminal pane, enter and execute the following command:

     dotnet restore
    

    This command will restore all packages specified as dependencies in the project.

  9. In the terminal pane, enter and execute the following command:

     dotnet build
    

    This command will build the project.

  10. Click the đź—™ symbol to close the terminal pane.

  11. Observe the Program.cs and [folder name].csproj files created by the .NET Core CLI.

    Project files

  12. Double-click the [folder name].csproj link in the Explorer pane to open the file in the editor.

  13. We will now add a new PropertyGroup XML element to the project configuration within the Project element. To add a new PropertyGroup, insert the following lines of code under the line that reads <Project Sdk="Microsoft.NET.Sdk">:

     <PropertyGroup>
         <LangVersion>latest</LangVersion>
     </PropertyGroup>
    
  14. Your new XML should look like this:

     <Project Sdk="Microsoft.NET.Sdk">        
         <PropertyGroup>
             <LangVersion>latest</LangVersion>
         </PropertyGroup>        
         <PropertyGroup>
             <OutputType>Exe</OutputType>
             <TargetFramework>netcoreapp2.0</TargetFramework>
         </PropertyGroup>        
         <ItemGroup>
             <PackageReference Include="Bogus" Version="22.0.7" />
             <PackageReference Include="Microsoft.Azure.DocumentDB.Core" Version="1.9.1" />
         </ItemGroup>        
     </Project>
    
  15. Double-click the Program.cs link in the Explorer pane to open the file in the editor.

    Open editor

Create DocumentClient Instance

The DocumentClient class is the main “entry point” to using the SQL API in Azure Cosmos DB. We are going to create an instance of the DocumentClient class by passing in connection metadata as parameters of the class’ constructor. We will then use this class instance throughout the lab.

  1. Within the Program.cs editor tab, Add the following using blocks to the top of the editor:

     using System.Collections.Generic;
     using System.Collections.ObjectModel;
     using System.Linq;
     using System.Net;
     using System.Threading.Tasks;
     using Microsoft.Azure.Documents;
     using Microsoft.Azure.Documents.Client;
     using Microsoft.Azure.Documents.Linq;
    
  2. Locate the Program class and replace it with the following class:

     public class Program
     {
         public static async Task Main(string[] args)
         {         
         }
     }
    
  3. Within the Program class, add the following lines of code to create variables for your connection information:

     private static readonly Uri _endpointUri = new Uri("");
     private static readonly string _primaryKey = "";
    
  4. For the _endpointUri variable, replace the placeholder value with the URI value from your Azure Cosmos DB account that you recorded earlier in this lab:

    For example, if your uri is https://cosmosacct.documents.azure.com:443/, your new variable assignment will look like this: private static readonly Uri _endpointUri = new Uri("https://cosmosacct.documents.azure.com:443/");.

    Keep the URI value recorded, you will use it again later in this lab.

  5. For the _primaryKey variable, replace the placeholder value with the PRIMARY KEY value from your Azure Cosmos DB account that you recorded earlier in this lab:

    For example, if your primary key is elzirrKCnXlacvh1CRAnQdYVbVLspmYHQyYrhx0PltHi8wn5lHVHFnd1Xm3ad5cn4TUcH4U0MSeHsVykkFPHpQ==, your new variable assignment will look like this: private static readonly string _primaryKey = "elzirrKCnXlacvh1CRAnQdYVbVLspmYHQyYrhx0PltHi8wn5lHVHFnd1Xm3ad5cn4TUcH4U0MSeHsVykkFPHpQ==";.

    Keep the PRIMARY KEY value recorded, you will use it again later in this lab.

  6. Locate the Main method:

     public static async Task Main(string[] args)
     { 
     }
    
  7. Within the Main method, add the following lines of code to author a using block that creates and disposes a DocumentClient instance:

     using (DocumentClient client = new DocumentClient(_endpointUri, _primaryKey))
     {        
     }
    
  8. Your Program class definition should now look like this:

     public class Program
     { 
         private static readonly Uri _endpointUri = new Uri("<your uri>");
         private static readonly string _primaryKey = "<your key>";
         public static async Task Main(string[] args)
         {    
             using (DocumentClient client = new DocumentClient(_endpointUri, _primaryKey))
             {
             }     
         }
     }
    

    We will now execute a build of the application to make sure our code compiles successfully.

  9. Save all of your open editor tabs.

  10. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  11. In the open terminal pane, enter and execute the following command:

     dotnet build
    

    This command will build the console project.

  12. Click the đź—™ symbol to close the terminal pane.

  13. Close all open editor tabs.

Create Database using the SDK

  1. Locate the using block within the Main method:

     using (DocumentClient client = new DocumentClient(_endpointUri, _primaryKey))
     {                        
     }
    
  2. Add the following code to the method to create a new Database instance:

     Database targetDatabase = new Database { Id = "EntertainmentDatabase" };
    
  3. Add the following code to create a new database instance if one does not already exist:

     targetDatabase = await client.CreateDatabaseIfNotExistsAsync(targetDatabase);
    

    This code will check to see if a database exists in your Azure Cosmos DB account that meets the specified parameters. If a database that matches does not exist, it will create a new database.

  4. Add the following code to print out the self-link of the database:

     await Console.Out.WriteLineAsync($"Database Self-Link:\t{targetDatabase.SelfLink}");
    

    The targetDatabase variable will have metadata about the database whether a new database is created or an existing one is read.

  5. Save all of your open editor tabs.

  6. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  7. In the open terminal pane, enter and execute the following command:

     dotnet run
    

    This command will build and execute the console project.

  8. Observe the output of the running command.

    In the console window, you will see the self-link string for the database resource in your Azure Cosmos DB account.

  9. In the open terminal pane, enter and execute the following command again:

     dotnet run
    

    This command will build and execute the console project.

  10. Again, observe the output of the running command.

    Since the database already exists, you will see the same self-link on both executions of the console application. This simply means that the SDK detected that the database already exists and used the existing database instance instead of creating a new instance of the database.

  11. Click the đź—™ symbol to close the terminal pane.

Create an Unlimited Collection using the SDK

Unlimited containers have higher storage and throughput limits. To create a container as unlimited, you must specify a partition key and a minimum throughput of 1,000 RU/s. You will specify those values when creating a container in this task. A partition key is a logical hint for distributing data onto a scaled out underlying set of physical partitions and for efficiently routing queries to the appropriate underlying partition. To learn more, refer to /docs.microsoft.com/azure/cosmos-db/partition-data.

  1. Locate the using block within the Main method and delete any existing code:

     using (DocumentClient client = new DocumentClient(_endpointUri, _primaryKey))
     {                        
     }
    
  2. Add the following code to the method to open a connection to the database asynchronously:

     await client.OpenAsync();
    

    By default, the first request has a higher latency because it has to fetch the address routing table. To avoid this startup latency on the first request, you should call OpenAsync() once during initialization as follows.

  3. Add the following code to the method to create a self-link to an existing database:

     Uri databaseLink = UriFactory.CreateDatabaseUri("EntertainmentDatabase");
    
  4. Add the following code to create a new IndexingPolicy instance with a custom indexing policy configured:

     IndexingPolicy indexingPolicy = new IndexingPolicy
     {
         IndexingMode = IndexingMode.Consistent,
         Automatic = true,
         IncludedPaths = new Collection<IncludedPath>
         {
             new IncludedPath
             {
                 Path = "/*",
                 Indexes = new Collection<Index>
                 {
                     new RangeIndex(DataType.Number, -1),
                     new RangeIndex(DataType.String, -1)                           
                 }
             }
         }
     };
    

    By default, all Azure Cosmos DB data is indexed. Although many customers are happy to let Azure Cosmos DB automatically handle all aspects of indexing, you can specify a custom indexing policy for collections. This indexing policy is very similar to the default indexing policy created by the SDK but it implements a Range index on string types instead of a Hash index.

  5. Add the following code to create a new PartitionKeyDefinition instance with a single partition key of /type defined:

     PartitionKeyDefinition partitionKeyDefinition = new PartitionKeyDefinition
     {
         Paths = new Collection<string> { "/type" }
     };
    

    This definition will create a partition key on the /type path. Partition key paths are case sensitive. This is especially important when you consider JSON property casing in the context of .NET CLR object to JSON object serialization.

  6. Add the following lines of code to create a new DocumentCollection instance where you specify values for multiple properties:

     DocumentCollection customCollection = new DocumentCollection
     {
         Id = "CustomCollection",
         PartitionKey = partitionKeyDefinition,
         IndexingPolicy = indexingPolicy
     };   
    

    We are going to explicitly specify various values for a collection created using the .NET SDK.

  7. Add the following code to create a new RequestOptions instance seting the throughput for the collection:

     RequestOptions requestOptions = new RequestOptions
     {
         OfferThroughput = 10000
     };
    

    Here is where we can specify the RU/s allocated for the collection. If this is not specified, the SDK has a default value for RU/s assigned to a collection.

  8. Add the following code to create a new collection instance if one does not already exist within your database:

     customCollection = await client.CreateDocumentCollectionIfNotExistsAsync(databaseLink, customCollection, requestOptions);         
    

    This code will check to see if a collection exists in your database that meets all of the specified parameters. If a collection that matches does not exist, it will create a new collection.

  9. Add the following code to print out the self-link of the database:

     await Console.Out.WriteLineAsync($"Custom Collection Self-Link:\t{customCollection.SelfLink}");  
    

    The customCollection variable will have metadata about the collection whether a new collection is created or an existing one is read.

  10. Save all of your open editor tabs.

  11. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  12. In the open terminal pane, enter and execute the following command:

     dotnet run
    

    This command will build and execute the console project.

  13. Observe the output of the running command.

  14. Click the đź—™ symbol to close the terminal pane.

  15. Close all open editor tabs.

Populate a Collection with Documents using the SDK

You will now use the .NET SDK to populate your collection with various documents of varying schemas. These documents will be serialized instances of multiple C# classes that you will create in your project.

Create Classes

  1. In the Visual Studio Code window, right-click the Explorer pane and select the New File menu option.

    New File

  2. Name the new file IInteraction.cs . The editor tab will automatically open for the new file.

    Interaction Interface File

  3. Paste in the following code for the IInteraction interface:

     public interface IInteraction
     {
         string type { get; }
     }
    
  4. In the Visual Studio Code window, right-click the Explorer pane and select the New File menu option.

  5. Name the new file PurchaseFoodOrBeverage.cs . The editor tab will automatically open for the new file.

  6. Paste in the following code for the PurchaseFoodOrBeverage class:

     public class PurchaseFoodOrBeverage : IInteraction
     {
         public decimal unitPrice { get; set; }
         public decimal totalPrice { get; set; }
         public int quantity { get; set; }
         public string type { get; set; }
     }
    
  7. In the Visual Studio Code window, right-click the Explorer pane and select the New File menu option.

  8. Name the new file ViewMap.cs . The editor tab will automatically open for the new file.

  9. Paste in the following code for the ViewMap class:

     public class ViewMap : IInteraction
     {	
         public int minutesViewed { get; set; }
         public string type { get; set; }
     }
    
  10. In the Visual Studio Code window, right-click the Explorer pane and select the New File menu option.

  11. Name the new file WatchLiveTelevisionChannel.cs . The editor tab will automatically open for the new file.

  12. Paste in the following code for the WatchLiveTelevisionChannel class:

     public class WatchLiveTelevisionChannel : IInteraction
     {
         public string channelName { get; set; }
         public int minutesViewed { get; set; }
         public string type { get; set; }
     }
    
  13. Observe your newly created files in the Explorer pane.

    New files

  14. Save all of your open editor tabs.

  15. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  16. In the open terminal pane, enter and execute the following command:

     dotnet build
    

    This command will build the console project.

  17. Click the đź—™ symbol to close the terminal pane.

  18. Close all open editor tabs.

Populate Unlimited Collection with Data

  1. Double-click the Program.cs link in the Explorer pane to open the file in the editor.

  2. Locate the using block within the Main method and delete any existing code:

     using (DocumentClient client = new DocumentClient(_endpointUri, _primaryKey))
     {                        
     }
    
  3. Add the following code to the method to create an asynchronous connection:

     await client.OpenAsync();
    
  4. Add the following code to the method to create a self-link to an existing collection:

     Uri collectionLink = UriFactory.CreateDocumentCollectionUri("EntertainmentDatabase", "CustomCollection");
    
  5. Observe the code in the Main method.

    For the next few instructions, we will use the Bogus library to create test data. This library allows you to create a collection of objects with fake data set on each object’s property. For this lab, our intent is to focus on Azure Cosmos DB instead of this library. With that intent in mind, the next set of instructions will expedite the process of creating test data.

  6. Add the following code to create a collection of PurchaseFoodOrBeverage instances:

     var foodInteractions = new Bogus.Faker<PurchaseFoodOrBeverage>()
         .RuleFor(i => i.type, (fake) => nameof(PurchaseFoodOrBeverage))
         .RuleFor(i => i.unitPrice, (fake) => Math.Round(fake.Random.Decimal(1.99m, 15.99m), 2))
         .RuleFor(i => i.quantity, (fake) => fake.Random.Number(1, 5))
         .RuleFor(i => i.totalPrice, (fake, user) => Math.Round(user.unitPrice * user.quantity, 2))
         .Generate(500);
    

    As a reminder, the Bogus library generates a set of test data. In this example, you are creating 1000 items using the Bogus library and the rules listed above. The GenerateLazy method tells the Bogus library to prepare for a request of 500 items by returning a variable of type **IEnumerable**. Since LINQ uses deferred execution by default, the items aren't actually created until the collection is iterated.

  7. Add the following foreach block to iterate over the PurchaseFoodOrBeverage instances:

     foreach(var interaction in foodInteractions)
     {
     }
    
  8. Within the foreach block, add the following line of code to asynchronously create a document and save the result of the creation task to a variable:

     ResourceResponse<Document> result = await client.CreateDocumentAsync(collectionLink, interaction);
    

    The CreateDocumentAsync method of the DocumentClient class takes in a self-link for a collection and an object that you would like to serialize into JSON and store as a document within the specified collection.

  9. Still within the foreach block, add the following line of code to write the value of the newly created resource’s id property to the console:

     await Console.Out.WriteLineAsync($"Document #{foodInteractions.IndexOf(interaction):000} Created\t{result.Resource.Id}");
    

    The ResourceResponse type has a property named Resource that can give you access to interesting data about a document such as it’s unique id, time-to-live value, self-link, ETag, timestamp, and attachments.

  10. Your Main method should look like this:

     public static async Task Main(string[] args)
     {    
         using (DocumentClient client = new DocumentClient(_endpointUri, _primaryKey))
         {
             await client.OpenAsync();
             Uri collectionLink = UriFactory.CreateDocumentCollectionUri("EntertainmentDatabase", "CustomCollection");
             var foodInteractions = new Bogus.Faker<PurchaseFoodOrBeverage>()
                 .RuleFor(i => i.type, (fake) => nameof(PurchaseFoodOrBeverage))
                 .RuleFor(i => i.unitPrice, (fake) => Math.Round(fake.Random.Decimal(1.99m, 15.99m), 2))
                 .RuleFor(i => i.quantity, (fake) => fake.Random.Number(1, 5))
                 .RuleFor(i => i.totalPrice, (fake, user) => Math.Round(user.unitPrice * user.quantity, 2))
                 .Generate(500);
             foreach(var interaction in foodInteractions)
             {
                 ResourceResponse<Document> result = await client.CreateDocumentAsync(collectionLink, interaction);
                 await Console.Out.WriteLineAsync($"Document #{foodInteractions.IndexOf(interaction):000} Created\t{result.Resource.Id}");
             }
         }     
     }
    

    As a reminder, the Bogus library generates a set of test data. In this example, you are creating 1000 items using the Bogus library and the rules listed above. The GenerateLazy method tells the Bogus library to prepare for a request of 500 items by returning a variable of type **IEnumerable**. Since LINQ uses deferred execution by default, the items aren't actually created until the collection is iterated. The **foreach** loop at the end of this code block iterates over the collection and creates documents in Azure Cosmos DB.

  11. Save all of your open editor tabs.

  12. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  13. In the open terminal pane, enter and execute the following command:

     dotnet run
    

    This command will build and execute the console project.

  14. Observe the output of the console application.

    You should see a list of document ids associated with new documents that are being created by this tool.

  15. Click the đź—™ symbol to close the terminal pane.

Populate Unlimited Collection with Data of Different Types

  1. Locate the Main method and delete any existing code:

     public static async Task Main(string[] args)
     {                           
     }
    
  2. Replace the Main method with the following implementation:

     public static async Task Main(string[] args)
     {  
         using (DocumentClient client = new DocumentClient(_endpointUri, _primaryKey))
         {
             await client.OpenAsync();
             Uri collectionLink = UriFactory.CreateDocumentCollectionUri("EntertainmentDatabase", "CustomCollection");
             var tvInteractions = new Bogus.Faker<WatchLiveTelevisionChannel>()
                 .RuleFor(i => i.type, (fake) => nameof(WatchLiveTelevisionChannel))
                 .RuleFor(i => i.minutesViewed, (fake) => fake.Random.Number(1, 45))
                 .RuleFor(i => i.channelName, (fake) => fake.PickRandom(new List<string> { "NEWS-6", "DRAMA-15", "ACTION-12", "DOCUMENTARY-4", "SPORTS-8" }))
                 .Generate(500);
             foreach(var interaction in tvInteractions)
             {
                 ResourceResponse<Document> result = await client.CreateDocumentAsync(collectionLink, interaction);
                 await Console.Out.WriteLineAsync($"Document #{tvInteractions.IndexOf(interaction):000} Created\t{result.Resource.Id}");
             }
         }
     }
    

    As a reminder, the Bogus library generates a set of test data. In this example, you are creating 1000 items using the Bogus library and the rules listed above. The GenerateLazy method tells the Bogus library to prepare for a request of 500 items by returning a variable of type **IEnumerable**. Since LINQ uses deferred execution by default, the items aren't actually created until the collection is iterated. The **foreach** loop at the end of this code block iterates over the collection and creates documents in Azure Cosmos DB.

  3. Save all of your open editor tabs.

  4. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  5. In the open terminal pane, enter and execute the following command:

     dotnet run
    

    This command will build and execute the console project.

  6. Observe the output of the console application.

    You should see a list of document ids associated with new documents that are being created.

  7. Click the đź—™ symbol to close the terminal pane.

  8. Locate the Main method and delete any existing code:

     public static async Task Main(string[] args)
     {                            
     }
    
  9. Replace the Main method with the following implementation:

     public static async Task Main(string[] args)
     {  
         using (DocumentClient client = new DocumentClient(_endpointUri, _primaryKey))
         {
             await client.OpenAsync();
             Uri collectionLink = UriFactory.CreateDocumentCollectionUri("EntertainmentDatabase", "CustomCollection");
             var mapInteractions = new Bogus.Faker<ViewMap>()
                 .RuleFor(i => i.type, (fake) => nameof(ViewMap))
                 .RuleFor(i => i.minutesViewed, (fake) => fake.Random.Number(1, 45))
                 .Generate(500);
             foreach(var interaction in mapInteractions)
             {
                 ResourceResponse<Document> result = await client.CreateDocumentAsync(collectionLink, interaction);
                 await Console.Out.WriteLineAsync($"Document #{mapInteractions.IndexOf(interaction):000} Created\t{result.Resource.Id}");
             }
         }
     }
    

    As a reminder, the Bogus library generates a set of test data. In this example, you are creating 1000 items using the Bogus library and the rules listed above. The GenerateLazy method tells the Bogus library to prepare for a request of 500 items by returning a variable of type **IEnumerable**. Since LINQ uses deferred execution by default, the items aren't actually created until the collection is iterated. The **foreach** loop at the end of this code block iterates over the collection and creates documents in Azure Cosmos DB.

  10. Save all of your open editor tabs.

  11. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  12. In the open terminal pane, enter and execute the following command:

     dotnet run
    

    This command will build and execute the console project.

  13. Observe the output of the console application.

    You should see a list of document ids associated with new documents that are being created.

  14. Click the đź—™ symbol to close the terminal pane.

  15. Close all open editor tabs.

  16. Close the Visual Studio Code application.

Benchmark A Simple Collection using a .NET Core Application

In the next part of this lab, you will compare various partition key choices for a large dataset using a special benchmarking tool available on GitHub.com. First, you will learn how to use the benchmarking tool using a simple collection and partition key.

Clone Existing .NET Core Project

  1. On your local machine, create a new folder that will be used to contain the content of your new .NET Core project.

  2. In the new folder, right-click the folder and select the Open with Code menu option.

    Alternatively, you can run a command prompt in your current directory and execute the code . command.

  3. In the Visual Studio Code window that appears, right-click the Explorer pane and select the Open in Command Prompt menu option.

  4. In the open terminal pane, enter and execute the following command:

     git clone https://github.com/seesharprun/cosmos-benchmark.git .
    

    This command will create a copy of a .NET Core project located on GitHub (https://github.com/seesharprun/cosmos-benchmark) in your local folder.

  5. Visual Studio Code will most likely prompt you to install various extensions related to .NET Core or Azure Cosmos DB development. None of these extensions are required to complete the labs.

  6. In the terminal pane, enter and execute the following command:

     dotnet restore
    

    This command will restore all packages specified as dependencies in the project.

  7. In the terminal pane, enter and execute the following command:

     dotnet build
    

    This command will build the project.

  8. Click the đź—™ symbol to close the terminal pane.

  9. Observe the Program.cs and benchmark.csproj files created by the .NET Core CLI.

  10. Double-click the sample.json link in the Explorer pane to open the file in the editor.

  11. Observe the sample JSON file

    This file will show you a sample of the types of JSON documents that will be uploaded to your collection. Pay close attention to the Submit* fields, the DeviceId field and the LocationId field.

Update the Application’s Settings

  1. Double-click the appsettings.json link in the Explorer pane to open the file in the editor.

  2. Locate the /cosmosSettings.endpointUri JSON path:

     "endpointUri": ""
    

    Update the endPointUri property by setting it’s value to the URI value from your Azure Cosmos DB account that you recorded earlier in this lab:

    For example, if your uri is https://cosmosacct.documents.azure.com:443/, your new property will look like this: "endpointUri": "https://cosmosacct.documents.azure.com:443/".

  3. Locate the /cosmosSettings.primaryKey JSON path:

     "primaryKey": ""
    

    Update the primaryKey property by setting it’s value to the PRIMARY KEY value from your Azure Cosmos DB account that you recorded earlier in this lab:

    For example, if your primary key is elzirrKCnXlacvh1CRAnQdYVbVLspmYHQyYrhx0PltHi8wn5lHVHFnd1Xm3ad5cn4TUcH4U0MSeHsVykkFPHpQ==, your new property will look like this: "primaryKey": "elzirrKCnXlacvh1CRAnQdYVbVLspmYHQyYrhx0PltHi8wn5lHVHFnd1Xm3ad5cn4TUcH4U0MSeHsVykkFPHpQ==".

Configure a Simple Collection for Benchmarking

  1. Double-click the appsettings.json link in the Explorer pane to open the file in the editor.

  2. Locate the /collectionSettings JSON path:

     "collectionSettings": [],
    

    Update the collectionSettings property by setting it’s value to the following array of JSON objects:

     "collectionSettings": [
         {
             "id": "CollectionWithHourKey",
             "throughput": 10000,
             "partitionKeys": [ "/SubmitHour" ]
         }
     ],
    

    The object above will instruct the benchmark tool to create a single collection and set it’s throughput and partition key to the specified values. For this simple demo, we will use the hour when an IoT device recording was submitted as our partition key.

    Collection Name Throughput Partition Key
    CollectionWithHourKey 10000 /SubmitHour
  3. Save all of your open editor tabs.

Run the Benchmark Application

  1. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  2. In the open terminal pane, enter and execute the following command:

     dotnet run
    
  3. Observe the results of the application’s execution. Your results should look very similar to the code sample below:

     DocumentDBBenchmark starting...
     Database Validated:     dbs/MOEFAA==/
     Collection Validated:   dbs/MOEFAA==/colls/MOEFAN6FoQU=/
     Summary:
     ---------------------------------------------------------------------
     Endpoint:               https://cosmosacct.documents.azure.com/
     Database                IoTDeviceData
     Collection              CollectionWithHourKey
     Partition Key:          /SubmitHour
     Throughput:             10000 Request Units per Second (RU/s)
     Insert Operation:       100 Tasks Inserting 1000 Documents Total
     ---------------------------------------------------------------------
    
     Starting Inserts with 100 tasks
     Inserted 1000 docs @ 997 writes/s, 7220 RU/s (19B max monthly 1KB reads)
    
     Summary:
     ---------------------------------------------------------------------
     Total Time Elapsed:     00:00:01.0047125
     Inserted 1000 docs @ 995 writes/s, 7209 RU/s (19B max monthly 1KB reads)
     ---------------------------------------------------------------------
    

    The benchmark tool tells you how long it takes to write a specific number of documents to your collection. You also get useful metadata such as the amount of RU/s being used and the total execution time. We are not tuning our partition key choice quite yet, we are simply learning to use the tool.

  4. Press the ENTER key to complete the execution of the console application.

Update the Application’s Settings

  1. Double-click the appsettings.json link in the Explorer pane to open the file in the editor.

  2. Locate the /cosmosSettings.numberOfDocumentsToInsert JSON path:

     "numberOfDocumentsToInsert": 1000
    

    Update the numberOfDocumentsToInsert property by setting it’s value to 50,000:

     "numberOfDocumentsToInsert": 50000
    
  3. Save all of your open editor tabs.

Run the Benchmark Application

  1. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  2. In the open terminal pane, enter and execute the following command:

     dotnet run
    
  3. Observe the results of the application’s execution.

    Observe the amount of time required to import multiple records.

  4. Press the ENTER key to complete the execution of the console application.

Benchmark Various Partition Key Choices using a .NET Core Application

Now you will use multiple collections and partition key options to compare various strategies for partitioning a large dataset.

Configure Multiple Collections for Benchmarking

  1. Double-click the appsettings.json link in the Explorer pane to open the file in the editor.

  2. Locate the /collectionSettings JSON path:

     "collectionSettings": [],
    

    Update the collectionSettings property by setting it’s value to the following array of JSON objects:

     "collectionSettings": [
         {
             "id": "CollectionWithMinuteKey",
             "throughput": 10000,
             "partitionKeys": [ "/SubmitMinute" ]
         },
         {
             "id": "CollectionWithDeviceKey",
             "throughput": 10000,
             "partitionKeys": [ "/DeviceId" ]
         }
     ],
    

    The object above will instruct the benchmark tool to create multiple collections and set their throughput and partition key to the specified values. For this demo, we will compare the results using each partition key.

    Collection Name Throughput Partition Key
    CollectionWithMinuteKey 10000 /SubmitMinute
    CollectionWithDeviceKey 10000 /DeviceId
  3. Save all of your open editor tabs.

Run the Benchmark Application

  1. In the Visual Studio Code window, right-click the Explorer pane and select the Open in Command Prompt menu option.

  2. In the open terminal pane, enter and execute the following command:

     dotnet run
    
  3. Observe the results of the application’s execution.

    The timestamp on these IoT records is based on the time when the record was created. We submit the records as soon as they are created so there’s very little latency between the client and server timestamp. Most of the records being submitted will be within the same minute so they share the same SubmitMinute partition key. This will cause a hot partition key and can constraint throughput. In this context, a hot partition key refers to when requests to the same partition key exceed the provisioned throughput and are rate-limited. A hot partition key causes high volumes of data to be stored within the same partition. Such uneven distribution is inefficient. In this demo, you should expect a total time of >20 seconds.

     ---------------------------------------------------------------------
     Collection              CollectionWithMinuteKey
     Partition Key:          /SubmitMinute
     Total Time Elapsed:     00:00:57.4233616
     Inserted 50000 docs @ 871 writes/s, 6304 RU/s (16B max monthly 1KB reads)
     ---------------------------------------------------------------------
    

    The SubmitMinute partition key will most likely take longer to execute than the DeviceId partition key. Using the DeviceId partition key creates a more even distribution of requests across your various partition keys. Because of this behavior, you should notice drastically improved performance.

     ---------------------------------------------------------------------
     Collection              CollectionWithDeviceKey
     Partition Key:          /DeviceId
     Total Time Elapsed:     00:00:27.2769234
     Inserted 50000 docs @ 1833 writes/s, 13272 RU/s (34B max monthly 1KB reads)
     ---------------------------------------------------------------------
    
  4. Compare the RU/s and total time for both collections.

  5. Press the ENTER key to complete the execution of the console application.

Observe the New Collections and Database in the Azure Portal

  1. Return to the Azure Portal (http://portal.azure.com).

  2. On the left side of the portal, click the Resource groups link.

  3. In the Resource groups blade, locate and select the cosmosgroup-lab Resource Group.

  4. In the cosmosgroup-lab blade, select the Azure Cosmos DB account you recently created.

  5. In the Azure Cosmos DB blade, locate and click the Data Explorer link on the left side of the blade.

  6. In the Data Explorer section, expand the IoTDeviceData database node and then observe the various collection nodes.

  7. Expand the CollectionWithDeviceKey node. Within the node, click the Scale & Settings link.

  8. Observe the following properties of the collection:

    • Storage Capacity

    • Assigned Throughput

    • Indexing Policy

  9. Click the New SQL Query button at the top of the Data Explorer section.

  10. In the query tab, replace the contents of the query editor with the following SQL query:

     SELECT VALUE COUNT(1) FROM recordings
    
  11. Click the Execute Query button in the query tab to run the query.

  12. In the Results pane, observe the results of your query.

  13. Back in the Data Explorer section, right-click the IoTDeviceData database node and select the Delete Database option.

    Since you created multiple collections in this database with high throughput, it makes sense to dispose of the database immediately to minimize your Azure subscription consumption.

  14. In the Delete Database popup enter the name of the database (IoTDeviceData) in the field and then press the OK button.

  15. Close your browser application.