MICROSOFT-DP203 – Practice Exam Questions (121

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:✑ A workload for data engineers who will use Python and SQL.✑ A workload for jobs that will run notebooks that use Python, Scala, and SQL.✑ A workload that data scientists will use to perform ad hoc analysis in Scala and R.The enterprise architecture team at your company identifies the following standards for Databricks environments:✑ The data engineers must share a cluster.✑ The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.✑ All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.You need to create the Databricks clusters for the workloads.Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a High Concurrency cluster for the jobs.Does this meet the goal?

#121

Microsoft's DP-203 You are designing a folder structure for the files in an Azure Data Lake Storage Gen2 account. The account has one container that contains three years of data.You need to recommend a folder structure that meets the following requirements:✑ Supports partition elimination for queries by Azure Synapse Analytics serverless SQL pools✑ Supports fast data retrieval for data from the current month✑ Simplifies data security management by departmentWhich folder structure should you recommend?

#122

Microsoft's DP-203 You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 receives new data once every 24 hours.You have the following function.You have the following query.The query is executed once every 15 minutes and the @parameter value is set to the current date.You need to minimize the time it takes for the query to return results.Which two actions should you perform? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point. E. Change the table distribution to replicate.

#123

Microsoft's DP-203 You need to design a solution that will process streaming data from an Azure Event Hub and output the data to Azure Data Lake Storage. The solution must ensure that analysts can interactively query the streaming data.What should you use?

#124

Microsoft's DP-203 You are creating an Apache Spark job in Azure Databricks that will ingest JSON-formatted data.You need to convert a nested JSON string into a DataFrame that will contain multiple rows.Which Spark SQL function should you use?

#125

Microsoft's DP-203 You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure Data Lake Storage Gen2 container.Which type of trigger should you use?

#126

Microsoft's DP-203 You have an Azure subscription that contains an Azure SQL database named DB1 and a storage account named storage1. The storage1 account contains a file named File1.txt. File1.txt contains the names of selected tables in DB1.You need to use an Azure Synapse pipeline to copy data from the selected tables in DB1 to the files in storage1. The solution must meet the following requirements:• The Copy activity in the pipeline must be parameterized to use the data in File1.txt to identify the source and destination of the copy.• Copy activities must occur in parallel as often as possible.Which two pipeline activities should you include in the pipeline? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point.

#127

Microsoft's DP-203 You have an Azure data factory that connects to a Microsoft Purview account. The data factory is registered in Microsoft Purview.You update a Data Factory pipeline.You need to ensure that the updated lineage is available in Microsoft Purview.What should you do first?

#128

Microsoft's DP-203 You have a Microsoft Purview account.The Lineage view of a CSV file is shown in the following exhibit.How is the data for the lineage populated?

#129

Microsoft's DP-203 You have an Azure subscription that contains a Microsoft Purview account named MP1, an Azure data factory named DF1, and a storage account named storage1. MP1 is configured to scan storage1. DF1 is connected to MP1 and contains a dataset named DS1. DS1 references a file in storage1.In DF1, you plan to create a pipeline that will process data from DS1.You need to review the schema and lineage information in MP1 for the data referenced by DS1.Which two features can you use to locate the information? Each correct answer presents a complete solution.NOTE: Each correct answer is worth one point.

#130

Microsoft's DP-203 You use Azure Data Factory to create data pipelines.You are evaluating whether to integrate Data Factory and GitHub for source and version control.What are two advantages of the integration? Each correct answer presents a complete solution.NOTE: Each correct selection is worth one point.

#131

Microsoft's DP-203 You have two Azure Blob Storage accounts named account1 and account2.You plan to create an Azure Data Factory pipeline that will use scheduled intervals to replicate newly created or modified blobs from account1 to account2.You need to recommend a solution to implement the pipeline. The solution must meet the following requirements:• Ensure that the pipeline only copies blobs that were created or modified since the most recent replication event.• Minimize the effort to create the pipeline.What should you recommend?

#132

Microsoft's DP-203 You have an Azure Data Factory pipeline named pipeline1 that contains a data flow activity named activity1.You need to run pipeline1.Which runtime will be used to run activity1?

#133

Microsoft's DP-203 You have an Azure data factory named ADF1 and an Azure Synapse Analytics workspace that contains a pipeline named SynPipeLine1. SynPipeLine1 includes a Notebook activity.You create a pipeline in ADF1 named ADFPipeline1.You need to invoke SynPipeLine1 from ADFPipeline1.Which type of activity should you use?

#134

Microsoft's DP-203 You have an Azure Synapse Analytics dedicated SQL pool.You need to create a pipeline that will execute a stored procedure in the dedicated SQL pool and use the returned result set as the input for a downstream activity. The solution must minimize development effort.Which type of activity should you use in the pipeline?

#135

Microsoft's DP-203 You have an Azure SQL database named DB1 and an Azure Data Factory data pipeline named pipeline1.From Data Factory, you configure a linked service to DB1.In DB1, you create a stored procedure named SP1. SP1 returns a single row of data that has four columns.You need to add an activity to pipeline1 to execute SP1. The solution must ensure that the values in the columns are stored as pipeline variables.Which two types of activities can you use to execute SP1? Each correct answer presents a complete solution.NOTE: Each correct selection is worth one point.

#136

Microsoft's DP-203 You have an Azure data factory named ADF1.You currently publish all pipeline authoring changes directly to ADF1.You need to implement version control for the changes made to pipeline artifacts. The solution must ensure that you can apply version control to the resources currently defined in the Azure Data Factory Studio for ADF1.Which two actions should you perform? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point. E. From the Azure Data Factory Studio, select Set up code repository. F. From the Azure Data Factory Studio, select Publish.

#137

Microsoft's DP-203 You have an Azure data factory named ADF1 that contains a pipeline named Pipeline1.Pipeline1 must execute every 30 minutes with a 15-minute offset.You need to create a trigger for Pipeline1. The trigger must meet the following requirements:• Backfill data from the beginning of the day to the current time.• If Pipeline1 fails, ensure that the pipeline can re-execute within the same 30-minute period.• Ensure that only one concurrent pipeline execution can occur.• Minimize development and configuration effort.Which type of trigger should you create?

#138

Microsoft's DP-203 You have an Azure Data Lake Storage Gen2 account named account1 and an Azure event hub named Hub1. Data is written to account1 by using Event Hubs Capture.You plan to query account by using an Apache Spark pool in Azure Synapse Analytics.You need to create a notebook and ingest the data from account1. The solution must meet the following requirements:• Retrieve multiple rows of records in their entirety.• Minimize query execution time.• Minimize data processing.Which data format should you use?

#139

Microsoft's DP-203 You have an Azure Blob Storage account named blob1 and an Azure Data Factory pipeline named pipeline1.You need to ensure that pipeline1 runs when a file is deleted from a container in blob1. The solution must minimize development effort.Which type of trigger should you use?

#140

Microsoft's DP-203 You are building a data flow in Azure Data Factory that upserts data into a table in an Azure Synapse Analytics dedicated SQL pool.You need to add a transformation to the data flow. The transformation must specify logic indicating when a row from the input data must be upserted into the sink.Which type of transformation should you add to the data flow?

#141

Microsoft's DP-203 You have an on-premises database named db1 and a set-hosted integration runtime.You have an Azure subscription that contains an Azure Data Lake Storage account named dl1.You need to develop four data pipeline projects that will use Microsoft Power Query to copy data from db1 to dl1. The solution must meet the following requirements:• All pipelines must use the self-hosted integration runtime.• Each project must be stored in a separate Git repository.• Development effort must be minimized.What should you use?

#142

Microsoft's DP-203 You have the Azure Synapse Analytics pipeline shown in the following exhibit.You need to add a set variable activity to the pipeline to ensure that after the pipeline’s completion, the status of the pipeline is always successful.What should you configure for the set variable activity?

#143

Microsoft's DP-203 You have an on-premises Linux server that contains a database named DB1.You have an Azure subscription that contains an Azure data factory named ADF1 and an Azure Data Lake Storage account named ADLS1.You need to create a pipeline in ADF1 that will copy data from DB1 to ADLS1.Which type of integration runtime should you use to read the data from DB1?

#144

Microsoft's DP-203 You have an Azure Data Factory pipeline named P1.You need to schedule P1 to run at 10:15 AM, 12:15 PM, 2:15 PM, and 4:15 PM every day.Which frequency and interval should you configure for the scheduled trigger?

#145

Microsoft's DP-203 You are creating an Azure Data Factory pipeline.You need to add an activity to the pipeline. The activity must execute a Transact-SQL stored procedure that has the following characteristics:• Returns the number of sales invoices for a current date• Does NOT require input parametersWhich type on activity should you use?

#146

Microsoft's DP-203 You have an Azure subscription that contains a Microsoft Purview account.You need to search the Microsoft Purview Data Catalog to identify assets that have an assetType property of Table or View.Which query should you run?

#147

Microsoft's DP-203 You have an Azure subscription that contains an Azure Synapse Analytics account. The account is integrated with an Azure Repos repository named Repo1 and contains a pipeline named Pipeline1. Repo1 contains the branches shown in the following table.From featuredev, you develop and test changes to Pipeline1.You need to publish the changes.What should you do first?

#148

Microsoft's DP-203 You plan to create an Azure Synapse Analytics dedicated SQL pool.You need to minimize the time it takes to identify queries that return confidential information as defined by the company's data privacy regulations and the users who executed the queues.Which two components should you include in the solution? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point.

#149

Microsoft's DP-203 You are designing an enterprise data warehouse in Azure Synapse Analytics that will contain a table named Customers. Customers will contain credit card information.You need to recommend a solution to provide salespeople with the ability to view all the entries in Customers. The solution must prevent all the salespeople from viewing or inferring the credit card information.What should you include in the recommendation?

#150