The requested schedule is currently not available. Leave your contact details to send you schedule updates.
In this course you will learn how to get most from your data by combing statistical analysis, data mining and machine learning on your Big Data resources. You will gain hands on lab real life experience which will allow you easily start exploring your own data just after finishing this course.
We’ll start with learning about overall process of working with Big Data and what are the main challenges in collecting, storing and processing those data. You will learn how open source stack solutions from Apache Hadoop ecosystem can be used in your organization. Next, we will dive into solving practical problems with those tools starting from scratch and building our cutting-edge solution. First, we will prepare environment for collecting from multiple sources and storing our Big Data. You will learn how to use Zookeeper to centralize management of your Hadoop solution and manage large number of nodes with command shell. Next, we will focus on managing data storage and working with data – how to migrate them, transform and filter.
After that we will focus on processing data with Workflows and especially how to incorporate Map Reduce to get results in our distributed systems. Then you will learn how to analyze data in distributed environment with Python, Impala and Hive.
Next module will focus on ETL, Warehousing and Data Mining in Big Data environment. At the end we will go beyond classic approach for data analysis and we will use machine learning and data science techniques to get most out of your data.
Our goal is to teach you how to handle Big Data in different solutions and how to get additional insight on your business.
This course is intendent for data analysts, data scientist, big data analyst, developers and IT professionals who wants to get deep knowledge and skills regarding processing big data in Hadoop ecosystem.
hours
18
language
English
Summary
In this course you will learn how to get most from your data by combing statistical analysis, data mining and machine learning on your Big Data resources. You will gain hands on lab real life experience which will allow you easily start exploring your own data just after finishing this course.
We’ll start with learning about overall process of working with Big Data and what are the main challenges in collecting, storing and processing those data. You will learn how open source stack solutions from Apache Hadoop ecosystem can be used in your organization. Next, we will dive into solving practical problems with those tools starting from scratch and building our cutting-edge solution. First, we will prepare environment for collecting from multiple sources and storing our Big Data. You will learn how to use Zookeeper to centralize management of your Hadoop solution and manage large number of nodes with command shell. Next, we will focus on managing data storage and working with data – how to migrate them, transform and filter.
After that we will focus on processing data with Workflows and especially how to incorporate Map Reduce to get results in our distributed systems. Then you will learn how to analyze data in distributed environment with Python, Impala and Hive.
Next module will focus on ETL, Warehousing and Data Mining in Big Data environment. At the end we will go beyond classic approach for data analysis and we will use machine learning and data science techniques to get most out of your data.
Our goal is to teach you how to handle Big Data in different solutions and how to get additional insight on your business.
This course is intendent for data analysts, data scientist, big data analyst, developers and IT professionals who wants to get deep knowledge and skills regarding processing big data in Hadoop ecosystem.
Target Audience
Data analysts, database analysts, big data analysts, data scientists and IT professionals who wants master big data management and analysis.
prerequisites
To attend this training, you should have experience with basic statistical analysis, it is recommended that participants would understand basic concepts of object-oriented programing languages, control flow statements like IF, FOR, FOREACH, concept of variables, datatypes, collections, datasets.
minimize course outline
The requested schedule is currently not available. Leave your contact details to send you schedule updates.