[2026] Google Certified Professional Data Engineer - Professional-Data-Engineer 무료 시험 문제

문제1

You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?

A. Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.

B. Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.

C. Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.

D. Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.

정답: D

문제2

Your new customer has requested daily reports that show their net consumption of Google Cloud compute resources and who used the resources. You need to quickly and efficiently generate these daily reports. What should you do?

A. Do daily exports of Cloud Logging data to BigQuery. Create views filtering by project, log type, resource, and user.

B. Filter data in Cloud Logging by project, resource, and user; then export the data in CSV format.

C. Export Cloud Logging data to Cloud Storage in CSV format. Cleanse the data using Dataprep, filtering by project, resource, and user.

D. Filter data in Cloud Logging by project, log type, resource, and user, then import the data into BigQuery.

정답: A

설명: (Fast2test 회원만 볼 수 있음)

문제3

The Dataflow SDKs have been recently transitioned into which Apache service?

A. Apache Spark

B. Apache Hadoop

C. Apache Kafka

D. Apache Beam

정답: D

설명: (Fast2test 회원만 볼 수 있음)

문제4

You have a table that contains millions of rows of sales data, partitioned by date. Various applications and users query this data many times a minute. The query requires aggregating values by using AVG, MAX, and SUM, and does not require joining to other tables. The required aggregations are only computed over the past year of data, though you need to retain full historical data in the base tables. You want to ensure that the query results always include the latest data from the tables, while also reducing computation cost, maintenance overhead, and duration. What should you do?

A. Create a materialized view to aggregate the base table data. Configure a partition expiration on the base table to retain only the last one year of partitions.

B. Create a new table that aggregates the base table data. Include a filter clause to specify the last year of partitions. Set up a scheduled query to recreate the new table every hour.

C. Create a materialized view to aggregate the base table data. Include a filter clause to specify the last one year of partitions.

D. Create a view to aggregate the base table data. Include a filter clause to specify the last year of partitions.

정답: C

문제5

You're using Bigtable for a real-time application, and you have a heavy load that is a mix of read and writes. You've recently identified an additional use case and need to perform hourly an analytical job to calculate certain statistics across the whole database. You need to ensure both the reliability of your production application as well as the analytical workload. What should you do?

A. Export Bigtable dump to GCS and run your analytical job on top of the exported files.

B. Add a second cluster to an existing instance with a single-cluster routing, use live-traffic app profile for your regular workload and batch-analytics profile for the analytics workload.

C. Add a second cluster to an existing instance with a multi-cluster routing, use live-traffic app profile for your regular workload and batch-analytics profile for the analytics workload.

D. Increase the size of your existing cluster twice and execute your analytics workload on your new resized cluster.

정답: B

문제6

You are designing a pipeline that publishes application events to a Pub/Sub topic. Although message ordering is not important, you need to be able to aggregate events across disjoint hourly intervals before loading the results to BigQuery for analysis. What technology should you use to process and load this data to BigQuery while ensuring that it will scale with large volumes of events?

A. Create a Cloud Function to perform the necessary data processing that executes using the Pub/Sub trigger every time a new message is published to the topic.

B. Create a streaming Dataflow job that reads continually from the Pub/Sub topic and performs aggregations using tumbling windows.

C. Schedule a batch Dataflow job to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.

D. Schedule a Cloud Function to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.

정답: B

설명: (Fast2test 회원만 볼 수 있음)

문제7

You are designing a cloud-native historical data processing system to meet the following conditions:
- The data being analyzed is in CSV, Avro, and PDF formats and will be
accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.
- A streaming data pipeline stores new data daily.
- Peformance is not a factor in the solution.
- The solution design should maximize availability.
How should you design data storage for this solution?

A. Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.

B. Create a Cloud Dataproc cluster with high availability. Store the data in HDFS, and peform analysis as needed.

C. Store the data in BigQuery. Access the data using the BigQuery Connector on Cloud Dataproc and Compute Engine.

D. Store the data in a regional Cloud Storage bucket. Access the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.

정답: A

문제8

The data analyst team at your company uses BigQuery for ad-hoc queries and scheduled SQL pipelines in a Google Cloud project with a slot reservation of 2000 slots. However, with the recent introduction of hundreds of new non time-sensitive SQL pipelines, the team is encountering frequent quota errors. You examine the logs and notice that approximately 1500 queries are being triggered concurrently during peak time. You need to resolve the concurrency issue. What should you do?

A. Increase the slot capacity of the project with baseline as 2000 and maximum reservation size as
3000.

B. Update SQL pipelines and ad-hoc queries to run as interactive query jobs.

C. Increase the slot capacity of the project with baseline as 0 and maximum reservation size as
3000.

D. Update SQL pipelines to run as a batch query, and run ad-hoc queries as interactive query jobs.

정답: D

문제9

You are building a report-only data warehouse where the data is streamed into BigQuery via the streaming API. Following Google's best practices, you have both a staging and a production table for the data. How should you design your data loading to ensure that there is only one master dataset without affecting performance on either the ingestion or reporting pieces?

A. Have a staging table that is an append-only model, and then update the production table every three hours with the changes written to staging

B. Have a staging table that is an append-only model, and then update the production table every ninety minutes with the changes written to staging

C. Have a staging table that moves the staged data over to the production table and deletes the contents of the staging table every thirty minutes

D. Have a staging table that moves the staged data over to the production table and deletes the contents of the staging table every three hours

정답: D

설명: (Fast2test 회원만 볼 수 있음)

문제10

A shipping company has live package-tracking data that is sent to an Apache Kafka stream in real time. This is then loaded into BigQuery. Analysts in your company want to query the tracking data in BigQuery to analyze geospatial trends in the lifecycle of a package. The table was originally created with ingest-date partitioning. Over time, the query processing time has increased. You need to implement a change that would improve query performance in BigQuery.
What should you do?

A. Implement clustering in BigQuery on the ingest date column.

B. Tier older data onto Google Cloud Storage files and create a BigQuery table using GCS as an external data source.

C. Re-create the table using data partitioning on the package delivery date.

D. Implement clustering in BigQuery on the package-tracking ID column.

정답: D

문제11

Which of the following job types are supported by Cloud Dataproc (select 3 answers)?

A. Hive

B. Pig

C. Spark

D. YARN

정답: A,B,C

설명: (Fast2test 회원만 볼 수 있음)

Google Certified Professional Data Engineer - Professional-Data-Engineer무료 덤프문제 풀어보기

우리와 연락하기

유용한 링크

최신 업데이트