Friday, June 15, 2012

PIG (I don't like eyaakhh..). oK...starting on it Pig Latin scripting..


How to setup
http://pig.apache.org/docs/r0.10.0/start.html

Pig Execution Mode are two...

Running Pig

  You can run Pig in using the "pig" command (the bin/pig Perl script) or the "java" command
(java -cp pig.jar ...).  You can run Pig (execute Pig Latin statements and Pig commands) using various execution modes or exectypes based on type of Hadoop cluster you are working or Standalone(Local):
  • Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local). So this is kind of Hadoop Local (Standalone) Mode  
  • Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don't need to, specify it using the -x flag (pig OR pig -x mapreduce). So this is kind of Hadoop Distributed Mode
Local Mode  Mapreduce Mode
Interactive Mode yes yes
Batch Mode yes yes
 

Using PIG Commands                 Using JAVA Commands

/* local mode */                           $ pig -x local ...     

/* local mode */
$ java -cp pig.jar org.apache.pig.Main -x local ...   

  
    
/* mapreduce mode */
$ pig ...  
or
$ pig -x mapreduce ..  
/* mapreduce mode */
$ java -cp pig.jar org.apache.pig.Main ...
or
$ java -cp pig.jar org.apache.pig.Main -x mapreduce ...



 PIG Commands  Execution Mode : Interactive Mode
You can run Pig in interactive mode using the Grunt shell. Invoke the Grunt shell using the "pig" command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.



Example

These Pig Latin statements extract all user IDs from the /etc/passwd file. First, copy the /etc/passwd file to your local working directory. Next, invoke the Grunt shell by typing the "pig" command (in local or hadoop mode). Then, enter the Pig Latin statements interactively at the grunt prompt (be sure to include the semicolon after each statement). The DUMP operator will display the results to your terminal screen.

grunt> A = load 'passwd' using PigStorage(':'); 
grunt> B = foreach A generate $0 as id; 
grunt> dump B; 





Hadoop cluster runs in one of the three supported modes:
  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode





No comments:

Post a Comment