MICROSOFT-DP203 – Practice Exam Questions (1

Microsoft's DP-203 You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by using the following Transact-SQL statement.You need to alter the table to meet the following requirements:✑ Ensure that users can identify the current manager of employees.✑ Support creating an employee reporting hierarchy for your entire company.✑ Provide fast lookup of the managers' attributes such as name and job title.Which column should you add to the table?

#1

Microsoft's DP-203 You have an Azure Synapse workspace named MyWorkspace that contains an Apache Spark database named mytestdb.You run the following command in an Azure Synapse Analytics Spark pool in MyWorkspace.CREATE TABLE mytestdb.myParquetTable(EmployeeID int,EmployeeName string,EmployeeStartDate date)USING Parquet -You then use Spark to insert a row into mytestdb.myParquetTable. The row contains the following data.One minute later, you execute the following query from a serverless SQL pool in MyWorkspace.SELECT EmployeeID -FROM mytestdb.dbo.myParquetTableWHERE EmployeeName = 'Alice';What will be returned by the query?

#2

Microsoft's DP-203 You have files and folders in Azure Data Lake Storage Gen2 for an Azure Synapse workspace as shown in the following exhibit.You create an external table named ExtTable that has LOCATION='/topfolder/'.When you query ExtTable by using an Azure Synapse Analytics serverless SQL pool, which files are returned?

#3

Microsoft's DP-203 You are designing the folder structure for an Azure Data Lake Storage Gen2 container.Users will query data by using a variety of services including Azure Databricks and Azure Synapse Analytics serverless SQL pools. The data will be secured by subject area. Most queries will include data from the current year or current month.Which folder structure should you recommend to support fast queries and simplified folder security?

#4

Microsoft's DP-203 You need to design an Azure Synapse Analytics dedicated SQL pool that meets the following requirements:✑ Can return an employee record from a given point in time.✑ Maintains the latest employee information.✑ Minimizes query complexity.How should you model the employee data?

#5

Microsoft's DP-203 You have an enterprise-wide Azure Data Lake Storage Gen2 account. The data lake is accessible only through an Azure virtual network named VNET1.You are building a SQL pool in Azure Synapse that will use data from the data lake.Your company has a sales team. All the members of the sales team are in an Azure Active Directory group named Sales. POSIX controls are used to assign theSales group access to the files in the data lake.You plan to load data to the SQL pool every hour.You need to ensure that the SQL pool can load the sales data from the data lake.Which three actions should you perform? Each correct answer presents part of the solution.NOTE: Each area selection is worth one point. E. Use the shared access signature (SAS) as the credentials for the data load process. F. Create a managed identity.

#6

Microsoft's DP-203 You have an Azure Data Lake Storage Gen2 container that contains 100 TB of data.You need to ensure that the data in the container is available for read workloads in a secondary region if an outage occurs in the primary region. The solution must minimize costs.Which type of data redundancy should you use?

#7

Microsoft's DP-203 You plan to implement an Azure Data Lake Gen 2 storage account.You need to ensure that the data lake will remain available if a data center fails in the primary Azure region. The solution must minimize costs.Which type of replication should you use for the storage account?

#8

Microsoft's DP-203 You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns.FactPurchase will have 1 million rows of data added daily and will contain three years of data.Transact-SQL queries similar to the following query will be executed daily.SELECT -SupplierKey, StockItemKey, IsOrderFinalized, COUNT(*)FROM FactPurchase -WHERE DateKey >= 20210101 -AND DateKey <= 20210131 -GROUP By SupplierKey, StockItemKey, IsOrderFinalizedWhich table distribution will minimize query times?

#9

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.You need to prepare the files to ensure that the data copies quickly.Solution: You convert the files to compressed delimited text files.Does this meet the goal?

#10

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.You need to prepare the files to ensure that the data copies quickly.Solution: You copy the files to a table that has a columnstore index.Does this meet the goal?

#11

Microsoft's DP-203 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.You need to prepare the files to ensure that the data copies quickly.Solution: You modify the files to ensure that each row is more than 1 MB.Does this meet the goal?

#12

Microsoft's DP-203 You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool.Analysts write a complex SELECT query that contains multiple JOIN and CASE statements to transform data for use in inventory reports. The inventory reports will use the data and additional WHERE parameters depending on the report. The reports will be produced once daily.You need to implement a solution to make the dataset available for the reports. The solution must minimize query times.What should you implement?

#13

Microsoft's DP-203 You have an Azure Synapse Analytics workspace named WS1 that contains an Apache Spark pool named Pool1.You plan to create a database named DB1 in Pool1.You need to ensure that when tables are created in DB1, the tables are available automatically as external tables to the built-in serverless SQL pool.Which format should you use for the tables in DB1?

#14

Microsoft's DP-203 You are planning a solution to aggregate streaming data that originates in Apache Kafka and is output to Azure Data Lake Storage Gen2. The developers who will implement the stream processing solution use Java.Which service should you recommend using to process the streaming data?

#15

Microsoft's DP-203 You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files. The size of the files will vary based on the number of events that occur per hour.File sizes range from 4 KB to 5 GB.You need to ensure that the files stored in the container are optimized for batch processing.What should you do?

#16

Microsoft's DP-203 You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns:✑ TransactionType: 40 million rows per transaction type✑ CustomerSegment: 4 million per customer segment✑ TransactionMonth: 65 million rows per monthAccountType: 500 million per account typeYou have the following query requirements:✑ Analysts will most commonly analyze transactions for a given month.✑ Transactions analysis will typically summarize transactions by transaction type, customer segment, and/or account typeYou need to recommend a partition strategy for the table to minimize query times.On which column should you recommend partitioning the table?

#17

Microsoft's DP-203 You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and PolyBase in Azure Synapse Analytics.You need to recommend a Stream Analytics data output format to ensure that the queries from Databricks and PolyBase against the files encounter the fewest possible errors. The solution must ensure that the files can be queried quickly and that the data type information is retained.What should you recommend?

#18

Microsoft's DP-203 You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains a partitioned fact table named dbo.Sales and a staging table named stg.Sales that has the matching table and partition definitions.You need to overwrite the content of the first partition in dbo.Sales with the content of the same partition in stg.Sales. The solution must minimize load times.What should you do?

#19

Microsoft's DP-203 You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool.You plan to keep a record of changes to the available fields.The supplier data contains the following columns.Which three additional columns should you add to the data to create a Type 2 SCD? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point. E. effective end date F. foreign key

#20

Microsoft's DP-203 You are designing a partition strategy for a fact table in an Azure Synapse Analytics dedicated SQL pool. The table has the following specifications:✑ Contain sales data for 20,000 products.Use hash distribution on a column named ProductID.✑ Contain 2.4 billion records for the years 2019 and 2020.Which number of partition ranges provides optimal compression and performance for the clustered columnstore index?

#21

Microsoft's DP-203 You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns.FactPurchase will have 1 million rows of data added daily and will contain three years of data.Transact-SQL queries similar to the following query will be executed daily.SELECT -SupplierKey, StockItemKey, COUNT(*)FROM FactPurchase -WHERE DateKey >= 20210101 -AND DateKey <= 20210131 -GROUP By SupplierKey, StockItemKeyWhich table distribution will minimize query times?

#22

Microsoft's DP-203 You are implementing a batch dataset in the Parquet format.Data files will be produced be using Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an Azure Synapse Analytics serverless SQL pool.You need to minimize storage costs for the solution.What should you do?

#23

Microsoft's DP-203 You are designing a data mart for the human resources (HR) department at your company. The data mart will contain employee information and employee transactions.From a source system, you have a flat extract that has the following fields:✑ EmployeeIDFirstName -✑ LastName✑ Recipient✑ GrossAmount✑ TransactionID✑ GovernmentID✑ NetAmountPaid✑ TransactionDateYou need to design a star schema data model in an Azure Synapse Analytics dedicated SQL pool for the data mart.Which two tables should you create? Each correct answer presents part of the solution.NOTE: Each correct selection is worth one point. E. a fact table for Transaction

#24

Microsoft's DP-203 You are designing a dimension table for a data warehouse. The table will track the value of the dimension attributes over time and preserve the history of the data by adding new rows as the data changes.Which type of slowly changing dimension (SCD) should you use?

#25

Microsoft's DP-203 You are performing exploratory analysis of the bus fare data in an Azure Data Lake Storage Gen2 account by using an Azure Synapse Analytics serverless SQL pool.You execute the Transact-SQL query shown in the following exhibit.What do the query results include?

#26

Microsoft's DP-203 You have an Azure Synapse Analytics Apache Spark pool named Pool1.You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.You need to load the files into the tables. The solution must maintain the source data types.What should you do?

#27

Microsoft's DP-203 You have an Azure Databricks workspace named workspace1 in the Standard pricing tier. Workspace1 contains an all-purpose cluster named cluster1.You need to reduce the time it takes for cluster1 to start and scale up. The solution must minimize costs.What should you do first?

#28

Microsoft's DP-203 You have an Azure subscription that contains an Azure Blob Storage account named storage1 and an Azure Synapse Analytics dedicated SQL pool namedPool1.You need to store data in storage1. The data will be read by Pool1. The solution must meet the following requirements:Enable Pool1 to skip columns and rows that are unnecessary in a query.✑ Automatically create column statistics.✑ Minimize the size of files.Which type of file should you use?

#29

Microsoft's DP-203 You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. Table1 contains the following:✑ One billion rows✑ A clustered columnstore index✑ A hash-distributed column named Product Key✑ A column named Sales Date that is of the date data type and cannot be nullThirty million rows will be added to Table1 each month.You need to partition Table1 based on the Sales Date column. The solution must optimize query performance and data loading.How often should you create a partition?

#30