aiotestking uk

Professional-Data-Engineer Exam Questions - Online Test


Professional-Data-Engineer Premium VCE File

Learn More 100% Pass Guarantee - Dumps Verified - Instant Download
150 Lectures, 20 Hours

Passleader Professional-Data-Engineer Questions are updated and all Professional-Data-Engineer answers are verified by experts. Once you have completely prepared with our Professional-Data-Engineer exam prep kits you will be ready for the real Professional-Data-Engineer exam without a problem. We have Down to date Google Professional-Data-Engineer dumps study guide. PASSED Professional-Data-Engineer First attempt! Here What I Did.

Check Professional-Data-Engineer free dumps before getting the full version:

NEW QUESTION 1

You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery. The reference data is small enough to fit in memory on a single worker. The pipeline should write enriched results to BigQuery for analysis. Which job type and transforms should this pipeline use?

  • A. Batch job, PubSubIO, side-inputs
  • B. Streaming job, PubSubIO, JdbcIO, side-outputs
  • C. Streaming job, PubSubIO, BigQueryIO, side-inputs
  • D. Streaming job, PubSubIO, BigQueryIO, side-outputs

Answer: A

NEW QUESTION 2

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer data. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

  • A. Increase the CPU size on your server.
  • B. Increase the size of the Google Persistent Disk on your server.
  • C. Increase your network bandwidth from your datacenter to GCP.
  • D. Increase your network bandwidth from Compute Engine to Cloud Storage.

Answer: C

NEW QUESTION 3

You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the “Trust No One” (TNO) approach to encrypt your data to prevent the cloud provider staff from decrypting your data. What should you do?

  • A. Use gcloud kms keys create to create a symmetric ke
  • B. Then use gcloud kms encrypt to encrypt each archival file with the key and unique additional authenticated data (AAD). Use gsutil cp to upload each encrypted file to the Cloud Storage bucket, and keep the AAD outside of Google Cloud.
  • C. Use gcloud kms keys create to create a symmetric ke
  • D. Then use gcloud kms encrypt to encrypt each archival file with the ke
  • E. Use gsutil cp to upload each encrypted file to the Cloud Storage bucke
  • F. Manually destroy the key previously used for encryption, and rotate the key once and rotate the key once.
  • G. Specify customer-supplied encryption key (CSEK) in the .boto configuration fil
  • H. Use gsutil cp to upload each archival file to the Cloud Storage bucke
  • I. Save the CSEK in Cloud Memorystore as permanent storage of the secret.
  • J. Specify customer-supplied encryption key (CSEK) in the .boto configuration fil
  • K. Use gsutil cp to upload each archival file to the Cloud Storage bucke
  • L. Save the CSEK in a different project that only the security team can access.

Answer: B

NEW QUESTION 4

How can you get a neural network to learn about relationships between categories in a categorical feature?

  • A. Create a multi-hot column
  • B. Create a one-hot column
  • C. Create a hash bucket
  • D. Create an embedding column

Answer: D

Explanation:
There are two problems with one-hot encoding. First, it has high dimensionality, meaning that instead of having just one value, like a continuous feature, it has many values, or dimensions. This makes computation more time-consuming, especially if a feature has a very large number of categories. The second problem is that it doesn’t encode any relationships between the categories. They are completely independent from each other, so the network has no way of knowing which ones are similar to each other.
Both of these problems can be solved by representing a categorical feature with an embedding
column. The idea is that each category has a smaller vector with, let’s say, 5 values in it. But unlike a one-hot vector, the values are not usually 0. The values are weights, similar to the weights that are used for basic features in a neural network. The difference is that each category has a set of weights (5 of them in this case).
You can think of each value in the embedding vector as a feature of the category. So, if two categories are very similar to each other, then their embedding vectors should be very similar too.
Reference:
https://cloudacademy.com/google/introduction-to-google-cloud-machine-learning-engine-course/a-wide-and-dee

NEW QUESTION 5

You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine. Which learning algorithm should you use?

  • A. Linear regression
  • B. Logistic classification
  • C. Recurrent neural network
  • D. Feedforward neural network

Answer: A

NEW QUESTION 6

You are creating a new pipeline in Google Cloud to stream IoT data from Cloud Pub/Sub through Cloud Dataflow to BigQuery. While previewing the data, you notice that roughly 2% of the data appears to be corrupt. You need to modify the Cloud Dataflow pipeline to filter out this corrupt data. What should you do?

  • A. Add a SideInput that returns a Boolean if the element is corrupt.
  • B. Add a ParDo transform in Cloud Dataflow to discard corrupt elements.
  • C. Add a Partition transform in Cloud Dataflow to separate valid data from corrupt data.
  • D. Add a GroupByKey transform in Cloud Dataflow to group all of the valid data together and discard the rest.

Answer: B

NEW QUESTION 7

You work for a shipping company that has distribution centers where packages move on delivery lines to route them properly. The company wants to add cameras to the delivery lines to detect and track any visual damage to the packages in transit. You need to create a way to automate the detection of damaged packages and flag them for human review in real time while the packages are in transit. Which solution should you choose?

  • A. Use BigQuery machine learning to be able to train the model at scale, so you can analyze the packages in batches.
  • B. Train an AutoML model on your corpus of images, and build an API around that model to integrate with the package tracking applications.
  • C. Use the Cloud Vision API to detect for damage, and raise an alert through Cloud Function
  • D. Integrate the package tracking applications with this function.
  • E. Use TensorFlow to create a model that is trained on your corpus of image
  • F. Create a Python notebook in Cloud Datalab that uses this model so you can analyze for damaged packages.

Answer: A

NEW QUESTION 8

Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in the dashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?

  • A. Check the dashboard application to see if it is not displaying correctly.
  • B. Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.
  • C. Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.
  • D. Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.

Answer: B

NEW QUESTION 9

You operate a database that stores stock trades and an application that retrieves average stock price for a given company over an adjustable window of time. The data is stored in Cloud Bigtable where the datetime of the stock trade is the beginning of the row key. Your application has thousands of concurrent users, and you notice that performance is starting to degrade as more stocks are added. What should you do to improve the performance of your application?

  • A. Change the row key syntax in your Cloud Bigtable table to begin with the stock symbol.
  • B. Change the row key syntax in your Cloud Bigtable table to begin with a random number per second.
  • C. Change the data pipeline to use BigQuery for storing stock trades, and update your application.
  • D. Use Cloud Dataflow to write summary of each day’s stock trades to an Avro file on Cloud Storage.Update your application to read from Cloud Storage and Cloud Bigtable to compute the responses.

Answer: A

NEW QUESTION 10

You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling.
Which Google database service should you use?

  • A. Cloud SQL
  • B. BigQuery
  • C. Cloud Bigtable
  • D. Cloud Datastore

Answer: A

NEW QUESTION 11

How would you query specific partitions in a BigQuery table?

  • A. Use the DAY column in the WHERE clause
  • B. Use the EXTRACT(DAY) clause
  • C. Use the PARTITIONTIME pseudo-column in the WHERE clause
  • D. Use DATE BETWEEN in the WHERE clause

Answer: C

Explanation:
Partitioned tables include a pseudo column named _PARTITIONTIME that contains a date-based timestamp for data loaded into the table. To limit a query to particular partitions (such as Jan 1st and 2nd of 2017), use a clause similar to this:
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-01-01') AND TIMESTAMP('2017-01-02')
Reference: https://cloud.google.com/bigquery/docs/partitioned-tables#the_partitiontime_pseudo_column

NEW QUESTION 12

If you're running a performance test that depends upon Cloud Bigtable, all the choices except one below are recommended steps. Which is NOT a recommended step to follow?

  • A. Do not use a production instance.
  • B. Run your test for at least 10 minutes.
  • C. Before you test, run a heavy pre-test for several minutes.
  • D. Use at least 300 GB of data.

Answer: A

Explanation:
If you're running a performance test that depends upon Cloud Bigtable, be sure to follow these steps as you
plan and execute your test:
Use a production instance. A development instance will not give you an accurate sense of how a production instance performs under load.
Use at least 300 GB of data. Cloud Bigtable performs best with 1 TB or more of data. However, 300 GB of data is enough to provide reasonable results in a performance test on a 3-node cluster. On larger clusters, use 100 GB of data per node.
Before you test, run a heavy pre-test for several minutes. This step gives Cloud Bigtable a chance to balance data across your nodes based on the access patterns it observes.
Run your test for at least 10 minutes. This step lets Cloud Bigtable further optimize your data, and it helps ensure that you will test reads from disk as well as cached reads from memory.
Reference: https://cloud.google.com/bigtable/docs/performance

NEW QUESTION 13

Cloud Dataproc is a managed Apache Hadoop and Apache service.

  • A. Blaze
  • B. Spark
  • C. Fire
  • D. Ignite

Answer: B

Explanation:
Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you use open source data tools for batch processing, querying, streaming, and machine learning.
Reference: https://cloud.google.com/dataproc/docs/

NEW QUESTION 14

You have developed three data processing jobs. One executes a Cloud Dataflow pipeline that transforms data uploaded to Cloud Storage and writes results to BigQuery. The second ingests data from on-premises servers and uploads it to Cloud Storage. The third is a Cloud Dataflow pipeline that gets information from third-party data providers and uploads the information to Cloud Storage. You need to be able to schedule and monitor the execution of these three workflows and manually execute them when needed. What should you do?

  • A. Create a Direct Acyclic Graph in Cloud Composer to schedule and monitor the jobs.
  • B. Use Stackdriver Monitoring and set up an alert with a Webhook notification to trigger the jobs.
  • C. Develop an App Engine application to schedule and request the status of the jobs using GCP API calls.
  • D. Set up cron jobs in a Compute Engine instance to schedule and monitor the pipelines using GCP API calls.

Answer: D

NEW QUESTION 15

You are operating a Cloud Dataflow streaming pipeline. The pipeline aggregates events from a Cloud Pub/Sub subscription source, within a window, and sinks the resulting aggregation to a Cloud Storage bucket. The source has consistent throughput. You want to monitor an alert on behavior of the pipeline with Cloud Stackdriver to ensure that it is processing data. Which Stackdriver alerts should you create?

  • A. An alert based on a decrease of subscription/num_undelivered_messages for the source and a rate of change increase of instance/storage/used_bytes for the destination
  • B. An alert based on an increase of subscription/num_undelivered_messages for the source and a rate of change decrease of instance/storage/used_bytes for the destination
  • C. An alert based on a decrease of instance/storage/used_bytes for the source and a rate of change increase of subscription/num_undelivered_messages for the destination
  • D. An alert based on an increase of instance/storage/used_bytes for the source and a rate of change decrease of subscription/num_undelivered_messages for the destination

Answer: B

NEW QUESTION 16

You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the shade of each dot represents what class it is. You want to classify this data accurately using a linear algorithm.
Professional-Data-Engineer dumps exhibit
To do this you need to add a synthetic feature. What should the value of that feature be?

  • A. X^2+Y^2
  • B. X^2
  • C. Y^2
  • D. cos(X)

Answer: D

NEW QUESTION 17

Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?

  • A. Preemptible workers cannot use persistent disk.
  • B. Preemptible workers cannot store data.
  • C. If a preemptible worker is reclaimed, then a replacement worker must be added manually.
  • D. A Dataproc cluster cannot have only preemptible workers.

Answer: BD

Explanation:
The following rules will apply when you use preemptible workers with a Cloud Dataproc cluster: Processing only—Since preemptibles can be reclaimed at any time, preemptible workers do not store data.
Preemptibles added to a Cloud Dataproc cluster only function as processing nodes.
No preemptible-only clusters—To ensure clusters do not lose all workers, Cloud Dataproc cannot create preemptible-only clusters.
Persistent disk size—As a default, all preemptible workers are created with the smaller of 100GB or the primary worker boot disk size. This disk space is used for local caching of data and is not available through HDFS.
The managed group automatically re-adds workers lost due to reclamation as capacity permits. Reference: https://cloud.google.com/dataproc/docs/concepts/preemptible-vms

NEW QUESTION 18

You are a head of BI at a large enterprise company with multiple business units that each have different priorities and budgets. You use on-demand pricing for BigQuery with a quota of 2K concurrent on-demand slots per project. Users at your organization sometimes don’t get slots to execute their query and you need to correct this. You’d like to avoid introducing new projects to your account.
What should you do?

  • A. Convert your batch BQ queries into interactive BQ queries.
  • B. Create an additional project to overcome the 2K on-demand per-project quota.
  • C. Switch to flat-rate pricing and establish a hierarchical priority model for your projects.
  • D. Increase the amount of concurrent slots per project at the Quotas page at the Cloud Console.

Answer: C

Explanation:
Reference https://cloud.google.com/blog/products/gcp/busting-12-myths-about-bigquery

NEW QUESTION 19

You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want to monitor and accommodate input data volume that will vary in size with minimal manual intervention. What should you do?

  • A. Use Cloud Dataproc to run your transformation
  • B. Monitor CPU utilization for the cluste
  • C. Resize the number of worker nodes in your cluster via the command line.
  • D. Use Cloud Dataproc to run your transformation
  • E. Use the diagnose command to generate an operational output archiv
  • F. Locate the bottleneck and adjust cluster resources.
  • G. Use Cloud Dataflow to run your transformation
  • H. Monitor the job system lag with Stackdrive
  • I. Use the default autoscaling setting for worker instances.
  • J. Use Cloud Dataflow to run your transformation
  • K. Monitor the total execution time for a sampling of job
  • L. Configure the job to use non-default Compute Engine machine types when needed.

Answer: B

NEW QUESTION 20

Which is not a valid reason for poor Cloud Bigtable performance?

  • A. The workload isn't appropriate for Cloud Bigtable.
  • B. The table's schema is not designed correctly.
  • C. The Cloud Bigtable cluster has too many nodes.
  • D. There are issues with the network connection.

Answer: C

Explanation:
The Cloud Bigtable cluster doesn't have enough nodes. If your Cloud Bigtable cluster is overloaded, adding more nodes can improve performance. Use the monitoring tools to check whether the cluster is overloaded.
Reference: https://cloud.google.com/bigtable/docs/performance

NEW QUESTION 21

Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers. Which cloud-native service should you use to orchestrate the entire pipeline?

  • A. Cloud Dataflow
  • B. Cloud Composer
  • C. Cloud Dataprep
  • D. Cloud Dataproc

Answer: D

NEW QUESTION 22

An organization maintains a Google BigQuery dataset that contains tables with user-level datA. They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?

  • A. Create and share an authorized view that provides the aggregate results.
  • B. Create and share a new dataset and view that provides the aggregate results.
  • C. Create and share a new dataset and table that contains the aggregate results.
  • D. Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.

Answer: D

Explanation:
Reference: https://cloud.google.com/bigquery/docs/access-control

NEW QUESTION 23

You need to compose visualization for operations teams with the following requirements:
Professional-Data-Engineer dumps exhibit Telemetry must include data from all 50,000 installations for the most recent 6 weeks (sampling once every minute)
Professional-Data-Engineer dumps exhibit The report must not be more than 3 hours delayed from live data.
Professional-Data-Engineer dumps exhibit The actionable report should only show suboptimal links.
Professional-Data-Engineer dumps exhibit Most suboptimal links should be sorted to the top.
Professional-Data-Engineer dumps exhibit Suboptimal links can be grouped and filtered by regional geography.
Professional-Data-Engineer dumps exhibit User response time to load the report must be <5 seconds.
You create a data source to store the last 6 weeks of data, and create visualizations that allow viewers to see multiple date ranges, distinct geographic regions, and unique installation types. You always show the latest data without any changes to your visualizations. You want to avoid creating and updating new visualizations each month. What should you do?

  • A. Look through the current data and compose a series of charts and tables, one for each possible combination of criteria.
  • B. Look through the current data and compose a small set of generalized charts and tables bound to criteria filters that allow value selection.
  • C. Export the data to a spreadsheet, compose a series of charts and tables, one for each possible combination of criteria, and spread them across multiple tabs.
  • D. Load the data into relational database tables, write a Google App Engine application that queries all rows, summarizes the data across each criteria, and then renders results using the Google Charts and visualization API.

Answer: B

NEW QUESTION 24
......

P.S. Easily pass Professional-Data-Engineer Exam with 239 Q&As DumpSolutions.com Dumps & pdf Version, Welcome to Download the Newest DumpSolutions.com Professional-Data-Engineer Dumps: https://www.dumpsolutions.com/Professional-Data-Engineer-dumps/ (239 New Questions)