Big Data Fundamentals with PySpark

Updated on

Course overview

Provider
Datacamp
Course type
Free trial availiable
Deadline
Flexible
Duration
4 hours
Certificate
Available on completion
Course author
Upendra Kumar Devisetty

Description

Learn the fundamentals of working with big data with PySpark.
There's been a lot of buzz about Big Data over the past few years, and it's finally become mainstream for many companies. But what is this Big Data? This course covers the fundamentals of Big Data via PySpark. Spark is a "lightning fast cluster computing" framework for Big Data. It provides a general data processing platform engine and lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop. You’ll use PySpark, a Python package for Spark programming and its powerful, higher-level libraries such as SparkSQL, MLlib (for machine learning), etc. You will explore the works of William Shakespeare, analyze Fifa 2018 data and perform clustering on genomic datasets. At the end of this course, you will have gained an in-depth understanding of PySpark and its application to general Big Data analysis.

Similar courses

Datacamp
  • Flexible deadline
  • 4 hours
  • Certificate
Datacamp
  • Flexible deadline
  • 4 hours
  • Certificate
Datacamp
  • Flexible deadline
  • 4 hours
  • Certificate
  • English language

  • Recommended provider

  • Certificate available