DP-201 Exam Dumps Pass with Updated Jan-2022 Tests Dumps [Q28-Q44]

DP-201 Exam Dumps Pass with Updated Jan-2022 Tests Dumps

DP-201 exam questions for practice in 2022 Updated 207 Questions

Topic 3. Designing for Data Security & Compliance (25-30%)

Designing security for data standards and policies: this entails the ability to perform such tasks as to design data encryption for data in transit and at rest; design for data masking and data auditing; design for data categorization and data privacy; design a Data Retention Policy; plan an archiving methodology; plan to purge data in accordance with the business needs.
Designing security for source data access: the skills measured within this subsection include planning for secure endpoints (public/private); choosing the proper authentication technique, such as shared access signatures (SAS), access keys, and Azure Active Directory (Azure AD).

Potential Candidates and Prerequisites

The Microsoft DP-201 exam is intended mainly for the Azure data engineers. These are the professionals who work closely with the business stakeholders in order to determine and satisfy the data requirements and design data solutions that utilize Azure data services. These specialists are also responsible for designing Azure data storage solutions that utilize relational & non-relational data stores, real-time and batch data processing solutions, as well as data security & compliance solutions.

To successfully complete the DP-201 test, the applicants should have competence in designing data solutions that utilize Azure services, including Azure SQL Database, Azure Cosmos DB, Azure Data Lake Storage, Azure Synapse Analytics, Azure Stream Analytics, Azure Data Factory, Azure Blob storage, and Azure Databricks.

NEW QUESTION 28
You have a MongoDB database that you plan to migrate to an Azure Cosmos DB account that uses the MongoDB API.
During testing, you discover that the migration takes longer than expected.
You need to recommend a solution that will reduce the amount of time it takes to migrate the data.
What are two possible recommendations to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Increase the Request Units (RUs).
B. Create unique indexes.
C. Add a write region.
D. Create compound indexes.
E. Turn off indexing.

Answer: A,E

Explanation:
A: Increase the throughput during the migration by increasing the Request Units (RUs).
For customers that are migrating many collections within a database, it is strongly recommend to configure database-level throughput. You must make this choice when you create the database. The minimum database-level throughput capacity is 400 RU/sec. Each collection sharing database-level throughput requires at least 100 RU/sec.
B: By default, Azure Cosmos DB indexes all your data fields upon ingestion. You can modify the indexing policy in Azure Cosmos DB at any time. In fact, it is often recommended to turn off indexing when migrating data, and then turn it back on when the data is already in Cosmos DB.
References:
https://docs.microsoft.com/bs-latn-ba/Azure/cosmos-db/mongodb-pre-migration

NEW QUESTION 29
Which Azure data storage solution should you recommend for each application? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:

Explanation:
Health Review: Azure SQL Database
Scenario: ADatum identifies the following requirements for the Health Review application:
* Ensure that sensitive health data is encrypted at rest and in transit.
* Tag all the sensitive health data in Health Review. The data will be used for auditing.
Health Interface: Azure Cosmos DB
A Datum identifies the following requirements for the Health Interface application:
* Upgrade to a data storage solution that will provide flexible schemas and increased throughput for writing data. Data must be regionally located close to each hospital, and reads must display be the most recent committed version of an item.
* Reduce the amount of time it takes to add data from new hospitals to Health Interface.
* Support a more scalable batch processing solution in Azure.
* Reduce the amount of development effort to rewrite existing SQL queries.
Health Insights: Azure SQL Data Warehouse
Azure SQL Data Warehouse is a cloud-based enterprise data warehouse that leverages massively parallel processing (MPP) to quickly run complex queries across petabytes of data. Use SQL Data Warehouse as a key component of a big data solution.
You can access Azure SQL Data Warehouse (SQL DW) from Databricks using the SQL Data Warehouse connector (referred to as the SQL DW connector), a data source implementation for Apache Spark that uses Azure Blob Storage, and PolyBase in SQL DW to transfer large volumes of data efficiently between a Databricks cluster and a SQL DW instance.
Scenario: ADatum identifies the following requirements for the Health Insights application:
* The new Health Insights application must be built on a massively parallel processing (MPP) architecture that will support the high performance of joins on large fact tables References:
https://docs.databricks.com/data/data-sources/azure/sql-data-warehouse.html

NEW QUESTION 30
A company purchases IoT devices to monitor manufacturing machinery. The company uses an Azure IoT Hub to communicate with the IoT devices.
The company must be able to monitor the devices in real-time.
You need to design the solution.
What should you recommend?

A. Azure Data Factory instance using Microsoft Visual Studio
B. Azure Analysis Services using Microsoft Visual Studio
C. Azure Stream Analytics Edge application using Microsoft Visual Studio
D. Azure Data Factory instance using Azure Portal

Answer: C

Explanation:
Azure Stream Analytics (ASA) on IoT Edge empowers developers to deploy near-real-time analytical intelligence closer to IoT devices so that they can unlock the full value of device-generated data.
You can use Visual Studio plugin to create an ASA Edge job.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-edge

NEW QUESTION 31
You are designing a storage solution for streaming data that is processed by Azure Databricks. The solution must meet the following requirements:
* The data schema must be fluid.
* The source data must have a high throughput.
* The data must be available in multiple Azure regions as quickly as possible.
What should you include in the solution to meet the requirements?

A. Azure SQL Database
B. Azure Cosmos DB
C. Azure Synapse Analytics
D. Azure Data Lake Storage

Answer: B

Explanation:
Azure Cosmos DB is Microsoft's globally distributed, multi-model database. Azure Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure's geographic regions.
It offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs).
You can read data from and write data to Azure Cosmos DB using Databricks.
Note on fluid schema:
If you are managing data whose structures are constantly changing at a high rate, particularly if transactions can come from external sources where it is difficult to enforce conformity across the database, you may want to consider a more schema-agnostic approach using a managed NoSQL database service like Azure Cosmos DB.
Reference:
https://docs.databricks.com/data/data-sources/azure/cosmosdb-connector.html
https://docs.microsoft.com/en-us/azure/cosmos-db/relational-nosql

NEW QUESTION 32
A company is evaluating data storage solutions.
You need to recommend a data storage solution that meets the following requirements:
* Minimize costs for storing blob objects.
* Optimize access for data that is infrequently accessed.
* Data must be stored for at least 30 days.
* Data availability must be at least 99 percent.
What should you recommend?

A. Premium
B. Archive
C. Hot
D. Cold

Answer: D

Explanation:
Azure's cool storage tier, also known as Azure cool Blob storage, is for infrequently-accessed data that needs to be stored for a minimum of 30 days. Typical use cases include backing up data before tiering to archival systems, legal data, media files, system audit information, datasets used for big data analysis and more.
The storage cost for this Azure cold storage tier is lower than that of hot storage tier. Since it is expected that the data stored in this tier will be accessed less frequently, the data access charges are high when compared to hot tier. There are no additional changes required in your applications as these tiers can be accessed using APIs in the same manner that you access Azure storage.
Reference:
https://cloud.netapp.com/blog/low-cost-storage-options-on-azure

NEW QUESTION 33
You are designing an Azure data factory that will copy data from Azure Blob storage to a data warehouse in Azure Synapse Analytics.
You need to recommend an authentication mechanism that meet the following requirements:
Identities must be validated by using Azure Active Directory (Azure AD).
Development and maintenance effort must be minimized.
Which authentication mechanism should you recommend for each service? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:

Reference:
https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad-msi
https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-sql-data-warehouse

NEW QUESTION 34
Which Azure Data Factory components should you recommend using together to import the customer data from Salesforce to Data Lake Storage? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:

NEW QUESTION 35
You are designing an Azure Data Factory pipeline for processing data. The pipeline will process data that is stored in general-purpose standard Azure storage.
You need to ensure that the compute environment is created on-demand and removed when the process is completed.
Which type of activity should you recommend?

A. Databricks Python activity
B. Databricks Jar activity
C. HDInsight Pig activity
D. Data Lake Analytics U-SQL activity

Answer: C

Explanation:
Explanation
The HDInsight Pig activity in a Data Factory pipeline executes Pig queries on your own or on-demand HDInsight cluster.
References:
https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-hadoop-pig

NEW QUESTION 36
What should you recommend using to secure sensitive customer contact information?

A. row-level security
B. data labels
C. column-level security
D. Transparent Data Encryption (TDE)

Answer: C

Explanation:
Scenario: Limit the business analysts' access to customer contact information, such as phone numbers, because this type of data is not analytically relevant.
Always Encrypted is a feature designed to protect sensitive data stored in specific database columns from access (for example, credit card numbers, national identification numbers, or data on a need to know basis).
This includes database administrators or other privileged users who are authorized to access the database to perform management tasks, but have no business need to access the particular data in the encrypted columns. The data is always encrypted, which means the encrypted data is decrypted only for processing by client applications with access to the encryption key.
Incorrect Answers:
A: Transparent Data Encryption (TDE) encrypts SQL Server, Azure SQL Database, and Azure Synapse Analytics data files, known as encrypting data at rest. TDE does not provide encryption across communication channels.
Reference:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-security-overview Design for data security and compliance Testlet 2 Case study This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.
To start the case study
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.
Background
Current environment
The company has the following virtual machines (VMs):

Requirements
Storage and processing
You must be able to use a file system view of data stored in a blob.
You must build an architecture that will allow Contoso to use the DB FS filesystem layer over a blob store. The architecture will need to support data files, libraries, and images. Additionally, it must provide a web-based interface to documents that contain runnable command, visualizations, and narrative text such as a notebook.
CONT_SQL3 requires an initial scale of 35000 IOPS.
CONT_SQL1 and CONT_SQL2 must use the vCore model and should include replicas. The solution must support 8000 IOPS.
The storage should be configured to optimized storage for database OLTP workloads.
Migration
* You must be able to independently scale compute and storage resources.
* You must migrate all SQL Server workloads to Azure. You must identify related machines in the on- premises environment, get disk size data usage information.
* Data from SQL Server must include zone redundant storage.
* You need to ensure that app components can reside on-premises while interacting with components that run in the Azure public cloud.
* SAP data must remain on-premises.
* The Azure Site Recovery (ASR) results should contain per-machine data.
Business requirements
* You must design a regional disaster recovery topology.
* The database backups have regulatory purposes and must be retained for seven years.
* CONT_SQL1 stores customers sales data that requires ETL operations for data analysis. A solution is required that reads data from SQL, performs ETL, and outputs to Power BI. The solution should use managed clusters to minimize costs. To optimize logistics, Contoso needs to analyze customer sales data to see if certain products are tied to specific times in the year.
* The analytics solution for customer sales data must be available during a regional outage.
Security and auditing
* Contoso requires all corporate computers to enable Windows Firewall.
* Azure servers should be able to ping other Contoso Azure servers.
* Employee PII must be encrypted in memory, in motion, and at rest. Any data encrypted by SQL Server must support equality searches, grouping, indexing, and joining on the encrypted data.
* Keys must be secured by using hardware security modules (HSMs).
* CONT_SQL3 must not communicate over the default ports
Cost
* All solutions must minimize cost and resources.
* The organization does not want any unexpected charges.
* The data engineers must set the SQL Data Warehouse compute resources to consume 300 DWUs.
* CONT_SQL2 is not fully utilized during non-peak hours. You must minimize resource costs for during non- peak hours.
Design for data security and compliance
Testlet 3
Case study
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.
To start the case study
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.
Overview
General Overview
ADatum Corporation is a medical company that has 5,000 physicians located in more than 300 hospitals across the US. The company has a medical department, a sales department, a marketing department, a medical research department, and a human resources department.
You are redesigning the application environment of ADatum.
Physical Locations
ADatum has three main offices in New York, Dallas, and Los Angeles. The offices connect to each other by using a WAN link. Each office connects directly to the Internet. The Los Angeles office also has a datacenter that hosts all the company's applications.
Existing Environment
Health Review
ADatum has a critical OLTP web application named Health Review that physicians use to track billing, patient care, and overall physician best practices.
Health Interface
ADatum has a critical application named Health Interface that receives hospital messages related to patient care and status updates. The messages are sent in batches by each hospital's enterprise relationship management (ERM) system by using a VPN. The data sent from each hospital can have varying columns and formats.
Currently, a custom C# application is used to send the data to Health Interface. The application uses deprecated libraries and a new solution must be designed for this functionality.
Health Insights
ADatum has a web-based reporting system named Health Insights that shows hospital and patient insights to physicians and business users. The data is created from the data in Health Review and Health Interface, as well as manual entries.
Database Platform
Currently, the databases for all three applications are hosted on an out-of-date VMware cluster that has a single instance of Microsoft SQL Server 2012.
Problem Statements
ADatum identifies the following issues in its current environment:
* Over time, the data received by Health Interface from the hospitals has slowed, and the number of messages has increased.
* When a new hospital joins ADatum, Health Interface requires a schema modification due to the lack of data standardization.
* The speed of batch data processing is inconsistent.
Business Requirements
Business Goals
ADatum identifies the following business goals:
* Migrate the applications to Azure whenever possible.
* Minimize the development effort required to perform data movement.
* Provide continuous integration and deployment for development, test, and production environments.
* Provide faster access to the applications and the data and provide more consistent application performance.
* Minimize the number of services required to perform data processing, development, scheduling, monitoring, and the operationalizing of pipelines.
Health Review Requirements
ADatum identifies the following requirements for the Health Review application:
* Ensure that sensitive health data is encrypted at rest and in transit.
* Tag all the sensitive health data in Health Review. The data will be used for auditing.
Health Interface Requirements
ADatum identifies the following requirements for the Health Interface application:
* Upgrade to a data storage solution that will provide flexible schemas and increased throughput for writing data. Data must be regionally located close to each hospital, and reads must display be the most recent committed version of an item.
* Reduce the amount of time it takes to add data from new hospitals to Health Interface.
* Support a more scalable batch processing solution in Azure.
* Reduce the amount of development effort to rewrite existing SQL queries.
Health Insights Requirements
ADatum identifies the following requirements for the Health Insights application:
* The analysis of events must be performed over time by using an organizational date dimension table.
* The data from Health Interface and Health Review must be available in Health Insights within 15 minutes of being committed.
* The new Health Insights application must be built on a massively parallel processing (MPP) architecture that will support the high performance of joins on large fact tables.

NEW QUESTION 37
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
A company is developing a solution to manage inventory data for a group of automotive repair shops. The solution will use Azure Synapse Analytics as the data store.
Shops will upload data every 10 days.
Data corruption checks must run each time data is uploaded. If corruption is detected, the corrupted data must be removed.
You need to ensure that upload processes and data corruption checks do not impact reporting and analytics processes that use the data warehouse.
Proposed solution: Configure database-level auditing in Azure Synapse Analytics and set retention to 10 days.
Does the solution meet the goal?

A. No
B. Yes

Answer: A

Explanation:
Instead, create a user-defined restore point before data is uploaded. Delete the restore point after data corruption checks complete.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/backup-and-restore

NEW QUESTION 38
A company stores data in multiple types of cloud-based databases.
You need to design a solution to consolidate data into a single relational database. Ingestion of data will occur at set times each day.
What should you recommend?

A. Azure Database Migration Service
B. Data Migration Assistant
C. SQL Server Migration Assistant
D. Azure Data Factory
E. SQL Data Sync

Answer: D

Explanation:
Incorrect Answers:
D: Azure Database Migration Service is used to migrate on-premises SQL Server databases to the cloud.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/introduction
https://azure.microsoft.com/en-us/blog/operationalize-azure-databricks-notebooks-using-data-factory/
https://azure.microsoft.com/en-us/blog/data-ingestion-into-azure-at-scale-made-easier-with-latest- enhancements-to-adf-copy-data-tool/

NEW QUESTION 39
You are designing a solution for a company. You plan to use Azure Databricks.
You need to recommend workloads and tiers to meet the following requirements:
* Provide managed clusters for running production jobs.
* Provide persistent clusters that support auto-scaling for analytics processes.
* Provide role-based access control (RBAC) support for Notebooks.
What should you recommend? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:

Explanation

Box 1: Data Engineering Only
Box 2: Data Engineering and Data Analytics
Box 3: Standard
Box 4: Data Analytics only
Box 5: Premium
Premium required for RBAC. Data Analytics Premium Tier provide interactive workloads to analyze data collaboratively with notebooks References:
https://azure.microsoft.com/en-us/pricing/details/databricks/

NEW QUESTION 40
You need to recommend a storage solution to store flat files and columnar optimized files. The solution must meet the following requirements:
* Store standardized data that data scientists will explore in a curated folder.
* Ensure that applications cannot access the curated folder.
* Store staged data for import to applications in a raw folder.
* Provide data scientists with access to specific folders in the raw folder and all the content the curated folder.
Which storage solution should you recommend?

A. Azure SQL Database
B. Azure Synapse Analytics
C. Azure Blob storage
D. Azure Data Lake Storage Gen2

Answer: C

Explanation:
Azure Blob Storage containers is a general purpose object store for a wide variety of storage scenarios. Blobs are stored in containers, which are similar to folders.
Incorrect Answers:
C: Azure Data Lake Storage is an optimized storage for big data analytics workloads.
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/data-storage

NEW QUESTION 41
You need to recommend a solution for storing the image tagging data.
What should you recommend?

A. Azure SQL Database
B. Azure Blob Storage
C. Azure SQL Data Warehouse
D. Azure Cosmos DB
E. Azure File Storage

Answer: B

Explanation:
Image data must be stored in a single data store at minimum cost.
Note: Azure Blob storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that does not adhere to a particular data model or definition, such as text or binary data.
Blob storage is designed for:
* Serving images or documents directly to a browser.
* Storing files for distributed access.
* Streaming video and audio.
* Writing to log files.
* Storing data for backup and restore, disaster recovery, and archiving.
* Storing data for analysis by an on-premises or Azure-hosted service.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

NEW QUESTION 42
You have an on-premises data warehouse that includes the following fact tables. Both tables have the following columns: DataKey, ProductKey, RegionKey. There are 120 unique product keys and 65 unique region keys.

Queries that use the data warehouse take a long time to complete.
You plan to migrate the solution to use Azure SQL Data Warehouse. You need to ensure that the Azure-based solution optimizes query performance and minimizes processing skew.
What should you recommend? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:

Explanation

Box 1: Hash-distributed
Box 2: ProductKey
ProductKey is used extensively in joins.
Hash-distributed tables improve query performance on large fact tables.
Box 3: Round-robin
Box 4: RegionKey
Round-robin tables are useful for improving loading speed.
Consider using the round-robin distribution for your table in the following scenarios:
* When getting started as a simple starting point since it is the default
* If there is no obvious joining key
* If there is not good candidate column for hash distributing the table
* If the table does not share a common join key with other tables
* If the join is less significant than other joins in the query
* When the table is a temporary staging table
Note: A distributed table appears as a single table, but the rows are actually stored across 60 distributions. The rows are distributed with a hash or round-robin algorithm.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute