|1. Introduction to Series- Spark, SparkCore, and SparkSQL||15mins 8s|
|2. Getting Started with Spark- A General Introduction to Spark||13mins 20s|
|3. Programming using Spark and Eclipse||23mins 26s|
|4. Resilient Distributed Datasets-RDD||20mins 16s|
|5. Fly with RDDs||30mins 15s|
|6. Key Value Stores/Pair RDDs||37mins 5s|
|7. Loading and Saving Data using Spark||25mins 41s|
|8. Accumulators and Broadcasters||13mins 40s|
|9. SparkSQL-I – SparkSQL using native FileSystems||18mins 17s|
|10. SparkSQL-II – SparkSQL using Hadoop||27mins 50s|
|11. Spark Properties||15mins 28s|
|12. HandsOn 1 – Getting Started with HDP SandBox||19mins 27s|
|13. HandsOn 2 – First Spark App -XML Validator||13mins 29s|
|14. HandsOn 3 – Persistence and Partitioning||25mins 44s|
The HDPCD Spark Developer Certification is a hands-on, performance-intensive certification for Apache Spark Developers on the Hortonworks Data Platform. Apache Spark is a fast, in-memory data computation engine with expressive APIs to facilitate Data Science, Machine Learning, Streaming applications and providing iterative access. It is an extremely sought out technology that is currently being used by Data Barons such as Samsung, TripAdvisor, Yahoo!, eBay and many others. HDPCD Spark Certified Developers have an edge over the rest of the world because examinees perform a specific number of tasks on a live installation platform provided by Hortonworks rather than simply answering questions. Memorizing and reciting the by-hearted concepts doesn’t work with HDPCD- these are the developers that get work done and the world sees them in a different light altogether.
In this certification, with Spark, having an extremely wide application base, Hortonworks recognizes that all tasks to be performed on a live cluster are rather daunting. Hence, they mandate that aspirants work only on SparkCore and SparkSQL applications before appearing for this certification. Developers may choose from either Scala and/or Python as the programming language and create applications using them. This Whizlabs course recommends Scala as the preferred Analytics language due to its simple LINQ type syntax.
No, there are no MCQs; instead, live, performance-based test is conducted to gauge the application of concepts. Usually, there are 7-8 tasks provided, out of which a candidate must perform at least 6. The exam is of 2 hours and costs 250 USD per attempt.
Note that the exam vouchers are valid for 1 year from the date of purchase.
HDPCD Spark Certification is valid for a particular version of the Spark. So, for example, if at the time of appearing you worked on Spark v2.2 (current), your certification would hold good until Spark v2.2 is in use.
This certification is open for all, i.e. aspirants who wish to make Data Science as their career path should pursue the certification. Going by usual trends, Analysts and Developers, both alike, usually appear for this certification.
A minimally qualified HDPCD Spark candidate should be aware of the following concepts:
Yes. We write frequently about certification preparation tips on our blog. Here’s how to prepare for Spark Developer Certification (HDPCD) Exam?
If you have any queries related to this course, payments, etc., please feel free to contact us at Whizlabs Helpdesk. A member of our support staff will respond to you as soon as possible.
|1. Introduction to Series- Spark, SparkCore and SparkSQL||a) What is Apache Spark and why is the world after it?
b) Topics Covered in the Tutorial- Spark Architecture and Functioning, Spark Core Programming and
SparkSQL Programming on Scala
c) Discussing the Apache Spark Stack
d) Spark users and HDPCD Pre-requisites.
|2. Getting Started with Spark- A General Introduction to Spark||a) Spark Application Categories
b) Basic Terminologies used is Spark Programs
c) Installing Spark on a Windows system as as standalone installation
|3. Programming using Spark and Eclipse||a) Downloading and installing Eclipse and ScalaIDE
b) Create your first Spark Scala Program
c) Export your project as a JAR and running it using spark-submit
|4. Resilient Distributed Datasets-RDD||a) What are RDDs?
b) RDD Operations- Transformations
c) RDD Operations- Actions
c) Taking Transformations and Actions to Eclipse
|5. Fly with RDDs||a) Advanced Transformations
b) Advanced Actions
c) Programming using Scala of Advanced RDD Operations
|6. Key Value Stores/Pair RDDs||a) What are pair RDDs?
b) Transformations with Pair RDDs
c) Actions with Pair RDDs
d) Partitioning Data
e) Working on data on a per-partition basis
|7. Loading and Saving Data using Spark||a) Interacting with Filesystems(NAS/AWS_S3/HDFS/ EXT4/ NTFS )
b) Interacting with databases – JDBC
c) Interacting with HiveContext
|8. Accumulators and Broadcastors||a) Accumulators and their execution procedures
b) Broadcastors and their execution proocedures
c) Numeric RDD Operations
|9. SparkSQL-I – SparkSQL using native FileSystems||a) Discussing what a Dataset and Dataframe is in Spark
b) Creation of Dataframes from text file data
c) Operations with Dataframes
d) Program involving SparkSQL Dataframes
|10. SparkSQL-II – SparkSQL using Hadoop||a) Connecting your program to the Hadoop cluster
b) Creating Datraframes using Hive Tables
c) Comparing Hive vs TextFile creation procedure of dataframes
d) SparkSQL Hive Operations
e) SparkSQL Hive Application
|11. Spark Properties||15mins 28s|
|12. Spark HandsOn 1- Getting started with HDP SandBox||19mins 27s|
|13. Spark HandsOn 2- First Spark App – XML Validator||13mins 27s|
|14. Spark HandsOn 3- Persistence and Partitioning||25mins 44s|