MICROSOFT-DP203 – Practice Exam Questions (91

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:✑ A workload for data engineers who will use Python and SQL.✑ A workload for jobs that will run notebooks that use Python, Scala, and SQL.✑ A workload that data scientists will use to perform ad hoc analysis in Scala and R.The enterprise architecture team at your company identifies the following standards for Databricks environments:✑ The data engineers must share a cluster.✑ The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.✑ All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.You need to create the Databricks clusters for the workloads.Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.Does this meet the goal?

#91

Microsoft's DP-203 You are designing an Azure Databricks cluster that runs user-defined local processes.You need to recommend a cluster configuration that meets the following requirements:✑ Minimize query latency.✑ Maximize the number of users that can run queries on the cluster at the same time.✑ Reduce overall costs without compromising other requirements.Which cluster type should you recommend?

#92

Microsoft's DP-203 You are creating a new notebook in Azure Databricks that will support R as the primary language but will also support Scala and SQL.Which switch should you use to switch between languages?

#93

Microsoft's DP-203 You have an Azure Data Factory pipeline that performs an incremental load of source data to an Azure Data Lake Storage Gen2 account.Data to be loaded is identified by a column named LastUpdatedDate in the source table.You plan to execute the pipeline every four hours.You need to ensure that the pipeline execution meets the following requirements:✑ Automatically retries the execution when the pipeline run fails due to concurrency or throttling limits.✑ Supports backfilling existing data in the table.Which type of trigger should you use?

#94

Microsoft's DP-203 You are designing a solution that will copy Parquet files stored in an Azure Blob storage account to an Azure Data Lake Storage Gen2 account.The data will be loaded daily to the data lake and will use a folder structure of {Year}/{Month}/{Day}/.You need to design a daily Azure Data Factory data load to minimize the data transfer between the two accounts.Which two configurations should you include in the design? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point

#95

Microsoft's DP-203 You plan to build a structured streaming solution in Azure Databricks. The solution will count new events in five-minute intervals and report only events that arrive during the interval. The output will be sent to a Delta Lake table.Which output mode should you use?

#96

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the serving layer ofTable1.You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.Solution: In an Azure Synapse Analytics pipeline, you use a data flow that contains a Derived Column transformation.Does this meet the goal?

#97

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the serving layer ofTable1.You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.Solution: You use a dedicated SQL pool to create an external table that has an additional DateTime column.Does this meet the goal?

#98

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the serving layer ofTable1.You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.Solution: You use an Azure Synapse Analytics serverless SQL pool to create an external table that has an additional DateTime column.Does this meet the goal?

#99

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the serving layer ofTable1.You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.Solution: In an Azure Synapse Analytics pipeline, you use a Get Metadata activity that retrieves the DateTime of the files.Does this meet the goal?

#100

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Data Lake Storage account that contains a staging zone.You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes an Azure Databricks notebook, and then inserts the data into the data warehouse.Does this meet the goal?

#101

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Data Lake Storage account that contains a staging zone.You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes mapping data flow, and then inserts the data into the data warehouse.Does this meet the goal?

#102

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Data Lake Storage account that contains a staging zone.You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.Solution: You schedule an Azure Databricks job that executes an R notebook, and then inserts the data into the data warehouse.Does this meet the goal?

#103

Microsoft's DP-203 You plan to create an Azure Data Factory pipeline that will include a mapping data flow.You have JSON data containing objects that have nested arrays.You need to transform the JSON-formatted data into a tabular dataset. The dataset must have one row for each item in the arrays.Which transformation method should you use in the mapping data flow?

#104

Microsoft's DP-203 You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account.You need to output the count of tweets during the last five minutes every five minutes. Each tweet must only be counted once.Which windowing function should you use?

#105

Microsoft's DP-203 You are planning a streaming data solution that will use Azure Databricks. The solution will stream sales transaction data from an online store. The solution has the following specifications:The output data will contain items purchased, quantity, line total sales amount, and line total tax amount.✑ Line total sales amount and line total tax amount will be aggregated in Databricks.✑ Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.You need to recommend an output mode for the dataset that will be processed by using Structured Streaming. The solution must minimize duplicate data.What should you recommend?

#106

Microsoft's DP-203 You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on a server named Server1.You need to determine the size of the transaction log file for each distribution of DW1.What should you do?

#107

Microsoft's DP-203 You are designing an anomaly detection solution for streaming data from an Azure IoT hub. The solution must meet the following requirements:✑ Send the output to Azure Synapse.✑ Identify spikes and dips in time series data.✑ Minimize development and configuration effort.Which should you include in the solution?

#108

Microsoft's DP-203 A company uses Azure Stream Analytics to monitor devices.The company plans to double the number of devices that are monitored.You need to monitor a Stream Analytics job to ensure that there are enough processing resources to handle the additional load.Which metric should you monitor?

#109

Microsoft's DP-203 You have an Azure Stream Analytics job.You need to ensure that the job has enough streaming units provisioned.You configure monitoring of the SU % Utilization metric.Which two additional metrics should you monitor? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point. E. Late Input Events

#110

Microsoft's DP-203 You have an activity in an Azure Data Factory pipeline. The activity calls a stored procedure in a data warehouse in Azure Synapse Analytics and runs daily.You need to verify the duration of the activity when it ran last.What should you use?

#111

Microsoft's DP-203 You have an Azure Data Factory pipeline that is triggered hourly.The pipeline has had 100% success for the past seven days.The pipeline execution fails, and two retries that occur 15 minutes apart also fail. The third failure returns the following error.ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'NotFound'. Account: 'contosoproduksouth'. Filesystem: wwi. Path: 'BIKES/CARBON/year=2021/month=01/day=10/hour=06'. ErrorCode: 'PathNotFound'. Message: 'The specified path does not exist.'. RequestId: '6d269b78-901f-001b-4924-e7a7bc000000'. TimeStamp: 'Sun, 10 Jan 2021 07:45:05What is a possible cause of the error?

#112

Microsoft's DP-203 You have an Azure Synapse Analytics job that uses Scala.You need to view the status of the job.What should you do?

#113

Microsoft's DP-203 You have an Azure data factory named ADF1.You currently publish all pipeline authoring changes directly to ADF1.You need to implement version control for the changes made to pipeline artifacts. The solution must ensure that you can apply version control to the resources currently defined in the UX Authoring canvas for ADF1.Which two actions should you perform? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point. E. From the UX Authoring canvas, select Publish. F. From the UX Authoring canvas, run Publish All.

#114

Microsoft's DP-203 You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named SQLPool1.SQLPool1 is currently paused.You need to restore the current state of SQLPool1 to a new SQL pool.What should you do first?

#115

Microsoft's DP-203 You are designing an Azure Synapse Analytics workspace.You need to recommend a solution to provide double encryption of all the data at rest.Which two components should you include in the recommendation? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point. E. an Azure key vault that has purge protection enabled

#116

Microsoft's DP-203 You have an Azure Synapse Analytics serverless SQL pool named Pool1 and an Azure Data Lake Storage Gen2 account named storage1. TheAllowBlobPublicAccess property is disabled for storage1.You need to create an external data source that can be used by Azure Active Directory (Azure AD) users to access storage from Pool1.What should you create first?

#117

Microsoft's DP-203 You have an Azure Data Factory pipeline named Pipeline1. Pipeline1 contains a copy activity that sends data to an Azure Data Lake Storage Gen2 account.Pipeline1 is executed by a schedule trigger.You change the copy activity sink to a new storage account and merge the changes into the collaboration branch.After Pipeline1 executes, you discover that data is NOT copied to the new storage account.You need to ensure that the data is copied to the new storage account.What should you do?

#118

Microsoft's DP-203 You have an Azure Data Factory pipeline named pipeline1 that is invoked by a tumbling window trigger named Trigger1. Trigger1 has a recurrence of 60 minutes.You need to ensure that pipeline1 will execute only if the previous execution completes successfully.How should you configure the self-dependency for Trigger1?

#119

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:✑ A workload for data engineers who will use Python and SQL.✑ A workload for jobs that will run notebooks that use Python, Scala, and SQL.✑ A workload that data scientists will use to perform ad hoc analysis in Scala and R.The enterprise architecture team at your company identifies the following standards for Databricks environments:✑ The data engineers must share a cluster.✑ The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.✑ All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.You need to create the Databricks clusters for the workloads.Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.Does this meet the goal?

#120