Description
Data Engineering Essentials using SQL, Python, and PySpark Course & PDF Guides
Learn key Data Engineering Skills such as SQL, Python, Apache Spark (Spark SQL and Pyspark) with Exercises and Projects
What you’ll learn
- Setup Development Environment to learn building Data Engineering Applications on GCP
- Database Essentials for Data Engineering using Postgres such as creating tables, indexes, running SQL Queries, using important pre-defined functions, etc.
- Data Engineering Programming Essentials using Python such as basic programming constructs, collections, Pandas, Database Programming, etc.
- Data Engineering using Spark Dataframe APIs (PySpark). Learn all important Spark Data Frame APIs such as select, filter, groupBy, orderBy, etc.
- Data Engineering using Spark SQL (PySpark and Spark SQL). Learn how to write high quality Spark SQL queries using SELECT, WHERE, GROUP BY, ORDER BY, ETC.
- Relevance of Spark Metastore and integration of Dataframes and Spark SQL
- Ability to build Data Engineering Pipelines using Spark leveraging Python as Programming Language
- Use of different file formats such as Parquet, JSON, CSV etc in building Data Engineering Pipelines
- Setup self support single node Hadoop and Spark Cluster to get enough practice on HDFS and YARN
- Understanding Complete Spark Application Development Life Cycle to build Spark Applications using Pyspark. Review the applications using Spark UI.
Requirements
- Laptop with decent configuration (Minimum 4 GB RAM and Dual Core)
- Sign up for GCP with the available credit or AWS Access
- Setup self support lab on cloud platforms (you might have to pay the applicable cloud fee unless you have credit)
- CS or IT degree or prior IT experience is highly desired
Description
As part of this course, you will learn all the Data Engineering Essentials related to building Data Pipelines using SQL, Python as Hadoop, Hive, or Spark SQL as well as PySpark Data Frame APIs. You will also understand the development and deployment lifecycle of Python applications using Docker as well as PySpark on multinode clusters. You will also gain basic knowledge about reviewing Spark Jobs using Spark UI.
About Data Engineering
Data Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.
Here are some of the challenges the learners have to face to learn key Data Engineering Skills such as Python, SQL, PySpark, etc.
- Having an appropriate environment with Apache Hadoop, Apache Spark, Apache Hive, etc working together.
- Good quality content with proper support.
- Enough tasks and exercises for practice
This course is designed to address these key challenges for professionals at all levels to acquire the required Data Engineering Skills (Python, SQL, and Apache Spark).
To make sure you spend time learning rather than struggling with technical challenges, here is what we have done.
- Make sure we have a system with the right configuration and quickly set up a lab using Docker with all the required Python, SQL, Pyspark as well as Spark SQL material. It will address a lot of pain points related to networking, database integration, etc. Feel free to reach out to us via Udemy Q&A, in case you struck at the time of setting up the environment.
- You will start with foundational skills such as Python as well as SQL using a Jupyter-based environment. Most of the lecturers have quite a few tasks and also at the end of each and every module, there are enough exercises or practice tests to evaluate the skills taught.
- Once you are comfortable with programming using Python and SQL, then you will ensure you understand how to quickly set up and access Single Node Hadoop and Spark Cluster.
- The content is streamlined in such a way that, you use learner-friendly interfaces such as Jupyter Lab to practice them.
If you end up signing up for the course do not forget to rate us 5* if you like the content. If not, feel free to reach out to us and we will address your concerns.
Content Details
As part of this course, you will be learning Data Engineering Essentials such as SQL, and Programming using Python and Apache Spark. Here is the detailed agenda for the course.
Discover more from Expert Training
Subscribe to get the latest posts sent to your email.
Reviews
There are no reviews yet.