Setup Big Data Development Environment

Visit Tutorial Page ( Report)
Big Data is open source and there are many technologies one need to learn to be proficient in Big Data eco system tools such as Hadoop, Spark, Hive, Pig, Sqoop etc. This course will cover how to set up development environment on personal computer or laptop using distributions such as Cloudera or Hortonworks. Both Cloudera and Hortonworks provide virtual machine image which contain all Big Data eco system tools packaged. This free course will provide Comparison of Virtualization software such as Virtualbox and VMWare Step by step instructions to set up virtualization software such as virtualbox or VMWare Choosing Cloudera or Hortonworks image Step by step instructions to set up VM using chosen image Setup necessary additional components such as MySQL database and log generation tool Review HDFS, Map Reduce, Sqoop, Pig, Hive, Spark etc
  • Introduction
    • Getting Started
    • Overview of Big Data sandboxes or virtual machine images
    • Pre-requisites
    • Choosing Virtualization Software (very important)
    • Installing VMWare Fusion on Mac
    • Installing Oracle VirtualBox on Mac
  • Cloudera Quickstart VM on VMWare Fusion
    • Setup Cloudera Quickstart VM - VMWare image
    • Review retail_db and gen_logs in Cloudera Quickstart VM
  • Cloudera Quickstart VM on Virtual Box
    • Download Cloudera Quickstart VM for Virtualbox
    • Setup Cloudera Quickstart VM for Virtualbox
    • Review retail_db and gen_logs in Cloudera Quickstart VM
  • Hortonworks Sandbox on VMWare Fusion
    • Setup Hortonworks Sandbox on VMWare - Mac
    • Setup MySQL Database - retail_db
    • Setup gen_logs application to generate logs
  • Hortonworks Sandbox on Virtual Box
    • Setup Hortonworks Sandbox on Virtual Box
    • Reset admin password
    • Setup MySQL Database - retail_db
    • Setup gen_logs application to generate logs
  • Setup IDE for Map Reduce
    • Setup Eclipse with Maven Plugin - Introduction

      As part of this topic we will see how to setup and validate IDE to develop map reduce applications. Pre-requisites Download Eclipse with Maven plugin Install Eclipse with Maven plugin Create java application using Maven Project Run the default program of simple java application

    • Setup Eclipse with Maven Plugin

      Following are the installation steps to setup eclipse with maven plugin Make sure Java 1.7 or later installed (1.8 recommended) Setup Eclipse Setup Maven STS (Spring Tool Suite) comes with eclipse with maven plugin We recommend STS If you already have eclipse, just add maven plugin from marketplace

    • Create java application using Maven Project

      This class will facilitate to create simple maven project with eclipse. Open eclipse with maven plugin (STS) For first time, create new workspace simpleapps File -> New -> Maven Project Give artifact id, group id. Make sure you give correct package name. It will create Maven project with App.java Run application and validate

    • Develop word count program introduction

      This is introduction to develop word count program using Hadoop map reduce using java using eclipse. Create new workspace and new maven project Updating pom file with dependencies Generate test data Copy existing map reduce job for word count Go to "Run Configurations" and add parameters Run the program and validate the results

    • Develop word count program

      Following are the steps Create new workspace directory bigdata-mr for all map reduce applications Launch STS with new workspace directory Create new maven project groupId: org.itversity artifactId: mr Name: demomr Open pom.xml In pom.xml if <name> tag shows some thing else, make sure to replace it to demomr Also rename the project name to demomr (from mr) Define repositories in pom.xml(if necessary) Define dependencies in pom.xml (see below) Save and wait, so that maven downloads all the necessary packages Make sure there are no failures Develop wordcount program Create package wordcount Create java program WordCount in package wordcount See the code below

    • Run word count program

      Following are the steps to run word count program Make sure there are no errors Generate test data as demonstrated Pass input path and output path as arguments Run the program Go to output path Validate files in output path

    • Setup github project - Introduction

      As part of this topic we will see how to download and configure sample github project covering map reduce apis Understand resources available to learn map reduce apis in detail Download sample github project Import github project as maven project Make sure there are no errors highlighted in eclipse Run and validate github project

    • Download and setup github project

      Following are the steps to download and setup github project Our project that is created earlier is named as demomr. Delete it from STS. Go to github and download the repository or run git clone command Make sure the downloaded directory is in right location Open STS pointing to correct workspace Import it as new project Make sure there are no errors

    • Validate github project

      Following are the steps to validate github project Make sure there are no errors Run word count program as demonstrated using Eclipse Go to output directory and check whether files are created Validate output files created

  • Setup IDE for Scala and Spark
    • Setup scala and sbt - Introduction

      Even though we have virtual machine images from Cloudera and Hortonworks with all the necessary tools installed, it is good idea to set up development environment on our PC along with IDE. It require following to be installed to set up development environment for building scala based spark applications Java Scala Sbt Eclipse with Scala IDE

    • Setup and Validate Scala

      Here are the instructions to setup scala Download scala binaries Install (untar) scala binaries Update environment variable PATH Launch scala interpreter/CLI and run simple scala program Copy below code snippet and paste in scala interpreter/CLI

    • Run simple scala application

      Following are the steps to create simple scala application Make sure you are in right directory Create src/main/scala mkdir -p src/main/scala create file hw.scala under src/main/scala Paste above code, save and exit Run using scala src/main/scala/hw.scala

    • Setup sbt and run scala application

      Here are the instructions to setup sbt Download sbt Install sbt Go to the directory where you have scala source code Create build.sbt Package and run using sbt

    • Setup Scala IDE for Eclipse - Introduction

      We are in the process of setting up development environment on our PC/Mac so that we can develop the modules assigned. Following tasks are completed. Make sure Java is installed Setup Eclipse (as part of Setup Eclipse with Maven) Setup Scala Setup sbt Validate all the components As part of this topic we will see Installation of Scala IDE for Eclipse Develop simple application using Scala IDE Add eclipse plugin for sbt (sbteclipse) Validate the integration of eclipse, scala and sbt

    • Install Scala IDE for Eclipse

      Before setting up Scala IDE let us understand the advantages of having IDE The Scala IDE for Eclipse project lets you edit Scala code in Eclipse. Syntax highlighting Code completion Debugging, and many other features It makes Scala development in Eclipse a pleasure. Steps to install Scala IDE for Eclipse Launch Eclipse Go to "Help" in top menu -> "Eclipse Marketplace" Search for Scala IDE Click on Install Once installed restart Eclipse Go to File -> New -> and see whether "Scala Application" is available or not.

    • Integrate sbt with Scala IDE for Eclipse
    • Develop Spark applications using Scala IDE - Introduction

      As part of this topic we will see Create sbt project for Spark using Scala Integrate with Eclipse Develop simple Spark application using Scala Run it on the cluster To perform this, we need Hadoop cluster or Virtual Machine Scala IDE for Eclipse Integration of sbt with Scala IDE (sbteclipse plugin)

    • Develop Spark applications using Scala IDE and sbt
    • Run Spark applications on cluster

      As the program is successfully developed, we will see how we can run it on the cluster Build jar file using sbt sbt package Make sure you have the environment ready with VM or Cluster If not, follow this to setup environment on PC or AWS scp jar file to the target environment (VM in this case) Run it on the remote cluster

Write Your Review

Reviews