Google announced an upcoming open-source benchmarking tool. PerfKit Benchmarker helps measure and compare cloud offerings along with Dataflow job tests.
Google defines Dataflow as a managed service that’s responsible for managing and executing an array of data processing patterns. It was launched in 2015, and its functions involve selecting tests, cleaning the cloud’s resources, executing benchmark analyses, gathering data and enabling actionable reporting.
Performance benchmarking is designed to make sure that the sizes of data pipelines are configured correctly and meet the predicted amount of data without exceeding cost budgets or reaching capacity limits.
How to use PerfKit Benchmarker
The PerfKit Benchmarker (PKB) doc has everything to help anyone interested in using PKB get started.
Individuals who are more into watching tutorials can take advantage of a series of videos focusing on the PerfKit Benchmarker setup process, different PKB commands and the process of visualizing a benchmarking test using Data Studio. Besides this, the repo comprises numerous examples of PerfKit Benchmarker configuration files that help execute a series of performance benchmarking tests.
Users opting for PerfKit Benchmarker have to replace their <MY_BUCKET> and <MY_PROJECT> instances in their GCP project and bucket. They must also create an input Pub/Sub subscription with their own version of the test data pre-provisioned. According to Google, the PerfKit Benchmarker can handle, store and restore snaps containing Pub/Sub subscriptions for every test run rehearsal.