Friday, June 15, 2012

PIG (I don't like eyaakhh..). oK...starting on it Pig Latin scripting..


How to setup
http://pig.apache.org/docs/r0.10.0/start.html

Pig Execution Mode are two...

Running Pig

  You can run Pig in using the "pig" command (the bin/pig Perl script) or the "java" command
(java -cp pig.jar ...).  You can run Pig (execute Pig Latin statements and Pig commands) using various execution modes or exectypes based on type of Hadoop cluster you are working or Standalone(Local):
  • Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local). So this is kind of Hadoop Local (Standalone) Mode  
  • Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don't need to, specify it using the -x flag (pig OR pig -x mapreduce). So this is kind of Hadoop Distributed Mode
Local Mode  Mapreduce Mode
Interactive Mode yes yes
Batch Mode yes yes
 

Using PIG Commands                 Using JAVA Commands

/* local mode */                           $ pig -x local ...     

/* local mode */
$ java -cp pig.jar org.apache.pig.Main -x local ...   

  
    
/* mapreduce mode */
$ pig ...  
or
$ pig -x mapreduce ..  
/* mapreduce mode */
$ java -cp pig.jar org.apache.pig.Main ...
or
$ java -cp pig.jar org.apache.pig.Main -x mapreduce ...



 PIG Commands  Execution Mode : Interactive Mode
You can run Pig in interactive mode using the Grunt shell. Invoke the Grunt shell using the "pig" command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.



Example

These Pig Latin statements extract all user IDs from the /etc/passwd file. First, copy the /etc/passwd file to your local working directory. Next, invoke the Grunt shell by typing the "pig" command (in local or hadoop mode). Then, enter the Pig Latin statements interactively at the grunt prompt (be sure to include the semicolon after each statement). The DUMP operator will display the results to your terminal screen.

grunt> A = load 'passwd' using PigStorage(':'); 
grunt> B = foreach A generate $0 as id; 
grunt> dump B; 





Hadoop cluster runs in one of the three supported modes:
  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode





Apache Pig Introduction:



( This amazing instroduction is from Apache Site)


Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

The salient property of Pig programs (  Pig data analysis programs ) is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.


@ Pig's infrastructure layer  (at present) consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject).


@ Pig's language layer currently consists of a textual language called Pig Latin, which has the following key properties:
  • Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.
  • Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency.
  • Extensibility. Users can create their own functions to do special-purpose processing.

Now Questions!!!

What  is data analysis programs ?
Which is Infrastructure to Evaluate these programs?
What it means to Structure being amenable to parallelization?
Hadoop subproject?
What is Multiple interrelated data transformations?
Special-purpose processing?

Answers Coming Soon!! --->
x-----------------------------------------x