aiotestking uk

Professional-Data-Engineer Exam Questions - Online Test


Professional-Data-Engineer Premium VCE File

Learn More 100% Pass Guarantee - Dumps Verified - Instant Download
150 Lectures, 20 Hours

Want to know Examcollection Professional-Data-Engineer Exam practice test features? Want to lear more about Google Google Professional Data Engineer Exam certification experience? Study Free Google Professional-Data-Engineer answers to Avant-garde Professional-Data-Engineer questions at Examcollection. Gat a success with an absolute guarantee to pass Google Professional-Data-Engineer (Google Professional Data Engineer Exam) test on your first attempt.

Also have Professional-Data-Engineer free dumps questions for you:

NEW QUESTION 1

Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-to-date YouTube channels log data. How should you set up the log data transfer into Google Cloud?

  • A. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.
  • B. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional bucket as a final destination.
  • C. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.
  • D. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Regional storage bucket as a final destination.

Answer: B

NEW QUESTION 2

Your United States-based company has created an application for assessing and responding to user actions. The primary table’s data volume grows by 250,000 records per second. Many third parties use your application’s APIs to build the functionality into their own frontend applications. Your application’s APIs should comply with the following requirements:
Professional-Data-Engineer dumps exhibit Single global endpoint
Professional-Data-Engineer dumps exhibit ANSI SQL support
Professional-Data-Engineer dumps exhibit Consistent access to the most up-to-date data What should you do?

  • A. Implement BigQuery with no region selected for storage or processing.
  • B. Implement Cloud Spanner with the leader in North America and read-only replicas in Asia and Europe.
  • C. Implement Cloud SQL for PostgreSQL with the master in Norht America and read replicas in Asia and Europe.
  • D. Implement Cloud Bigtable with the primary cluster in North America and secondary clusters in Asia and Europe.

Answer: B

NEW QUESTION 3

You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to operate is of primary concern.
Which service do you select for storing and serving your data?

  • A. Cloud Spanner
  • B. Cloud Bigtable
  • C. Cloud Firestore
  • D. Cloud SQL

Answer: D

NEW QUESTION 4

Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every hour based on the time when the data entered the pipeline?

  • A. An hourly watermark
  • B. An event time trigger
  • C. The with Allowed Lateness method
  • D. A processing time trigger

Answer: D

Explanation:
When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window.
Processing time triggers. These triggers operate on the processing time – the time when the data element is processed at any given stage in the pipeline.
Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data
element. Beam’s default trigger is event time-based.
Reference: https://beam.apache.org/documentation/programming-guide/#triggers

NEW QUESTION 5

Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)

  • A. Supervised learning to determine which transactions are most likely to be fraudulent.
  • B. Unsupervised learning to determine which transactions are most likely to be fraudulent.
  • C. Clustering to divide the transactions into N categories based on feature similarity.
  • D. Supervised learning to predict the location of a transaction.
  • E. Reinforcement learning to predict the location of a transaction.
  • F. Unsupervised learning to predict the location of a transaction.

Answer: BCE

NEW QUESTION 6

Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?

  • A. Use Google Stackdriver Audit Logs to review data access.
  • B. Get the identity and access management IIAM) policy of each table
  • C. Use Stackdriver Monitoring to see the usage of BigQuery query slots.
  • D. Use the Google Cloud Billing API to see what account the warehouse is being billed to.

Answer: C

NEW QUESTION 7

Dataproc clusters contain many configuration files. To update these files, you will need to use the --properties option. The format for the option is: file_prefix:property= .

  • A. details
  • B. value
  • C. null
  • D. id

Answer: B

Explanation:
To make updating files and properties easy, the --properties command uses a special format to specify the configuration file and the property and value within the file that should be updated. The formatting is as follows: file_prefix:property=value.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-properties#formatting

NEW QUESTION 8

Which of these sources can you not load data into BigQuery from?

  • A. File upload
  • B. Google Drive
  • C. Google Cloud Storage
  • D. Google Cloud SQL

Answer: D

Explanation:
You can load data into BigQuery from a file upload, Google Cloud Storage, Google Drive, or Google Cloud Bigtable. It is not possible to load data into BigQuery directly from Google Cloud SQL. One way to get data from Cloud SQL to BigQuery would be to export data from Cloud SQL to Cloud Storage and then load it from there.
Reference: https://cloud.google.com/bigquery/loading-data

NEW QUESTION 9

Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.
Which approach should you take?

  • A. Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.
  • B. Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.
  • C. Use the NOW () function in BigQuery to record the event’s time.
  • D. Use the automatically generated timestamp from Cloud Pub/Sub to order the data.

Answer: B

NEW QUESTION 10

Which of the following is not true about Dataflow pipelines?

  • A. Pipelines are a set of operations
  • B. Pipelines represent a data processing job
  • C. Pipelines represent a directed graph of steps
  • D. Pipelines can share data between instances

Answer: D

Explanation:
The data and transforms in a pipeline are unique to, and owned by, that pipeline. While your program can create multiple pipelines, pipelines cannot share data or transforms
Reference: https://cloud.google.com/dataflow/model/pipelines

NEW QUESTION 11

You work for a shipping company that uses handheld scanners to read shipping labels. Your company has strict data privacy standards that require scanners to only transmit recipients’ personally identifiable information (PII) to analytics systems, which violates user privacy rules. You want to quickly build a scalable solution using cloud-native managed services to prevent exposure of PII to the analytics systems. What should you do?

  • A. Create an authorized view in BigQuery to restrict access to tables with sensitive data.
  • B. Install a third-party data validation tool on Compute Engine virtual machines to check the incoming data for sensitive information.
  • C. Use Stackdriver logging to analyze the data passed through the total pipeline to identify transactions that may contain sensitive information.
  • D. Build a Cloud Function that reads the topics and makes a call to the Cloud Data Loss Prevention AP
  • E. Use the tagging and confidence levels to either pass or quarantine the data in a bucket for review.

Answer: A

NEW QUESTION 12

Your financial services company is moving to cloud technology and wants to store 50 TB of financial timeseries data in the cloud. This data is updated frequently and new data will be streaming in all the time. Your company also wants to move their existing Apache Hadoop jobs to the cloud to get insights into this data.
Which product should they use to store the data?

  • A. Cloud Bigtable
  • B. Google BigQuery
  • C. Google Cloud Storage
  • D. Google Cloud Datastore

Answer: A

Explanation:
Reference: https://cloud.google.com/bigtable/docs/schema-design-time-series

NEW QUESTION 13

You are responsible for writing your company’s ETL pipelines to run on an Apache Hadoop cluster. The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?

  • A. PigLatin using Pig
  • B. HiveQL using Hive
  • C. Java using MapReduce
  • D. Python using MapReduce

Answer: D

NEW QUESTION 14

Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization requires that all BigQuery data access logs be retained for 6 months. You need to ensure that only audit personnel in your company can access the data access logs for all projects. What should you do?

  • A. Enable data access logs in each Data Analyst’s projec
  • B. Restrict access to Stackdriver Logging via Cloud IAM roles.
  • C. Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts’ project
  • D. Restrict access to the Cloud Storage bucket.
  • E. Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit log
  • F. Restrict access to the project with the exported logs.
  • G. Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit log
  • H. Restrict access to the project that contains the exported logs.

Answer: D

NEW QUESTION 15

You need to choose a database for a new project that has the following requirements:
Professional-Data-Engineer dumps exhibit Fully managed
Professional-Data-Engineer dumps exhibit Able to automatically scale up
Professional-Data-Engineer dumps exhibit Transactionally consistent
Professional-Data-Engineer dumps exhibit Able to scale up to 6 TB
Professional-Data-Engineer dumps exhibit Able to be queried using SQL Which database do you choose?

  • A. Cloud SQL
  • B. Cloud Bigtable
  • C. Cloud Spanner
  • D. Cloud Datastore

Answer: C

NEW QUESTION 16

You need to create a new transaction table in Cloud Spanner that stores product sales data. You are deciding what to use as a primary key. From a performance perspective, which strategy should you choose?

  • A. The current epoch time
  • B. A concatenation of the product name and the current epoch time
  • C. A random universally unique identifier number (version 4 UUID)
  • D. The original order identification number from the sales system, which is a monotonically increasing integer

Answer: C

NEW QUESTION 17

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low.
You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)

  • A. Introduce data compression for each file to increase the rate file of file transfer.
  • B. Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.
  • C. Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
  • D. Assemble 1,000 files into a tape archive (TAR) fil
  • E. Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.
  • F. Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.

Answer: CE

NEW QUESTION 18

You are building an application to share financial market data with consumers, who will receive data feeds. Data is collected from the markets in real time. Consumers will receive the data in the following ways:
Professional-Data-Engineer dumps exhibit Real-time event stream
Professional-Data-Engineer dumps exhibit ANSI SQL access to real-time stream and historical data
Professional-Data-Engineer dumps exhibit Batch historical exports
Which solution should you use?

  • A. Cloud Dataflow, Cloud SQL, Cloud Spanner
  • B. Cloud Pub/Sub, Cloud Storage, BigQuery
  • C. Cloud Dataproc, Cloud Dataflow, BigQuery
  • D. Cloud Pub/Sub, Cloud Dataproc, Cloud SQL

Answer: A

NEW QUESTION 19

You’re using Bigtable for a real-time application, and you have a heavy load that is a mix of read and writes. You’ve recently identified an additional use case and need to perform hourly an analytical job to calculate certain statistics across the whole database. You need to ensure both the reliability of your production application as well as the analytical workload.
What should you do?

  • A. Export Bigtable dump to GCS and run your analytical job on top of the exported files.
  • B. Add a second cluster to an existing instance with a multi-cluster routing, use live-traffic app profile for your regular workload and batch-analytics profile for the analytics workload.
  • C. Add a second cluster to an existing instance with a single-cluster routing, use live-traffic app profile for your regular workload and batch-analytics profile for the analytics workload.
  • D. Increase the size of your existing cluster twice and execute your analytics workload on your new resized cluster.

Answer: B

NEW QUESTION 20

Which of these is not a supported method of putting data into a partitioned table?

  • A. If you have existing data in a separate file for each day, then create a partitioned table and upload each file into the appropriate partition.
  • B. Run a query to get the records for a specific day from an existing table and for the destination table, specify a partitioned table ending with the day in the format "$YYYYMMDD".
  • C. Create a partitioned table and stream new records to it every day.
  • D. Use ORDER BY to put a table's rows into chronological order and then change the table's type to "Partitioned".

Answer: D

Explanation:
You cannot change an existing table into a partitioned table. You must create a partitioned table from scratch. Then you can either stream data into it every day and the data will automatically be put in the right partition, or you can load data into a specific partition by using "$YYYYMMDD" at the end of the table name.
Reference: https://cloud.google.com/bigquery/docs/partitioned-tables

NEW QUESTION 21

You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and replication status from MariaDB with minimal development effort and use StackDriver for dashboards and alerts.
What should you do?

  • A. Install the OpenCensus Agent and create a custom metric collection application with a StackDriver exporter.
  • B. Place the MariaDB instances in an Instance Group with a Health Check.
  • C. Install the StackDriver Logging Agent and configure fluentd in_tail plugin to read MariaDB logs.
  • D. Install the StackDriver Agent and configure the MySQL plugin.

Answer: C

NEW QUESTION 22

The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster .

  • A. application node
  • B. conditional node
  • C. master node
  • D. worker node

Answer: C

Explanation:
The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster master node. The cluster master-host-name is the name of your Cloud Dataproc cluster followed by an -m suffix—for example, if your cluster is named "my-cluster", the master-host-name would be "my-cluster-m".
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#interfaces

NEW QUESTION 23

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

  • A. Subsample your test dataset.
  • B. Subsample your training dataset.
  • C. Increase the number of input features to your model.
  • D. Increase the number of layers in your neural network.

Answer: D

Explanation:
Reference: https://towardsdatascience.com/how-to-increase-the-accuracy-of-a-neural-network-9f5d1c6f407d

NEW QUESTION 24
......

P.S. Downloadfreepdf.net now are offering 100% pass ensure Professional-Data-Engineer dumps! All Professional-Data-Engineer exam questions have been updated with correct answers: https://www.downloadfreepdf.net/Professional-Data-Engineer-pdf-download.html (239 New Questions)