How I cracked Google Certified Professional Data Engineer exam



Its no wonder Cloud Computing is taking over the IT industry at a rapid pace and in no time it will become a norm. So why not get our hands dirty learning about teh cloud infrastructure and its offerings.

Why Google Cloud Platform?

The answer is simple, its Google. Though they are not the first choice in many cases, its growing everyday. The product catalog and ease of accessing coupled with well structured documentation, its going to steal the top spot in no time. At the moment, I believe, hands-down its the best cloud platform for Big data and Machine Learning. With TensorFlow + GPU/TPU combination, there is no way any other cloud platform can compete with GCP.

What is Data Engineering exam all about?

Often people confound Pro Data Engineer exam with Pro Architect exam. Hey how many questions did you see related to Compute Engine? how was Networking tested? How about Load Balancing? NO. This exam doesnt deal with ARCHITECTURE in general. Of course we can expect a couple of questions related to Storage and Transfer Services but not many.


I think of Data engineer exam as "Architect exam for Data Science/ML". What product to use when? Would you go with on-prem services or switch to cloud offerings? Could it be hybrid? And so on. Its all about products dealing with Big data and ML.

How did I  prepare for the big day?

As with any technology, we understand it better when someone explains it, like a blueprint, I enrolled in GCP courses offered by Google in Coursera . That gives us an overall idea about the concepts. Then comes the building part, treading he sophisticated documentation ( I read the whole documentation at-least twice). From the various blocks in documentation, I focused on "CONCEPTS" and "HOW TOs"  and the final step, to make the learning concrete, use QuickLab quests.

Various bloggers posted info about the other resources they used including "Linux Academy", "Cloud Guru" etc, If you have ample time, try them out. I had 2 months target and so I did just 4 things:


1. Completed Data Engineer Specialization in Coursera

2. Read the documentation thoroughly and took notes

3.  Experimented a lot in Quick Labs

4. Read about the use-cases in GCP blogs and other resources. Trust me, this will give us an over all understanding about how IDEAS are made into PRODUCTS and implemented per requirement.

Surprises in the exam?

THERE ARE NO DUMPS ANYWHERE... SO DONT LOOK FOR ONE. Often people ask me, did you get any dumps? Now why would we need dumps when we have prepared well enough? There are 25 practice questions in the Data Engineering Certification website. Thats should give us a first hand experience on the structure of questions. Again, I wish we had more practice exams.


A surprise for me was questions about Kafka Integration and Cloud Workflow.  I hadnt expected direct Kafka integration questions. Its good to understand what Kafka is. Also, if there is a product in GCP, try to learn about the equivalent open-source products or other proprietary products that work in a similar way. Just an overview should be good enough.

Expect questions on:
  • Data Loss Prevention
  • Cloud Workflow
  • On prem Kafka with Pub-Sub
Primary focus is given to:
  • Big Query (Dont forget to read about Quotas, Authorized Views, IAM roles)
  • Data Flow - You will see it everywhere
  • Machine Learning (Understand the concepts and products well) 
What not to ignore while reading docs?

It feels overwhelming reading the entire documentation. But its to be realized that NO VIDEO TUTORIAL CAN COVER EVERYTHING. The best way to understand the internals is to READ THE DOCS. Not all concepts are lengthy. For instance Data Studio, Data Prep and Data Labs are short. while Big Query, Data Flow, PubSub are lengthy because they have important roles to play in the architecture.

Dont skip on "QUOTAS" and "IAM ROLES". There were a couple of questions interrelating he products with their quotas or roles.

A piece of advice:

Initially it will look difficult with 3-6 line questions with lots of tech jargon. Try to dilute the questions and relate it to the products in GCP. Pay attention to details, sometimes teeny tiny details could be ignored while reading the lengthy questions. The best way to crack the questions is TO UNDERSTAND EACH SENTENCE. This is of prime importance. UNDERSTAND the question clearly.

I finished the exam in an hour and 15inutes keeping 45 minutes for revision. It helps a lot to review the answers and may be find a couple of wrong answers and fix them.  Go at your own pace but make sure to REVIEW before submitting.

Can't emphasize more than this, "READ DOCUMENTATION"  and "PRACTICE". Its not an easy one to crack unless we understand the concepts. For more insight, read various blogs and testimonies of engineers who racked the exam and their best practices as well.






Comments