How to setup
http://pig.apache.org/docs/r0.10.0/start.html
Pig Execution Mode are two...
Running Pig
You can run Pig in using the "pig" command (the bin/pig Perl script) or the "java" command
(java -cp pig.jar ...). You can run Pig (execute Pig Latin statements and Pig commands) using various execution modes or exectypes based on type of Hadoop cluster you are working or Standalone(Local):
(java -cp pig.jar ...). You can run Pig (execute Pig Latin statements and Pig commands) using various execution modes or exectypes based on type of Hadoop cluster you are working or Standalone(Local):
- Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local). So this is kind of Hadoop Local (Standalone) Mode
- Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don't need to, specify it using the -x flag (pig OR pig -x mapreduce). So this is kind of Hadoop Distributed Mode
| Local Mode | Mapreduce Mode | |
| Interactive Mode | yes | yes |
| Batch Mode | yes | yes |
Using PIG Commands
Using JAVA Commands
/* local mode */ $ pig -x local ...
/* local mode */
$ java -cp pig.jar org.apache.pig.Main -x local ...
$ pig ...
or
$ pig -x mapreduce ..
/* mapreduce mode */ $ java -cp pig.jar org.apache.pig.Main ... or $ java -cp pig.jar org.apache.pig.Main -x mapreduce ...
PIG Commands Execution Mode : Interactive Mode
You can run Pig in interactive mode using the Grunt shell. Invoke the Grunt shell using the "pig" command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.
Example
These Pig Latin statements extract all user IDs from the /etc/passwd file. First, copy the /etc/passwd file to your local working directory. Next, invoke the Grunt shell by typing the "pig" command (in local or hadoop mode). Then, enter the Pig Latin statements interactively at the grunt prompt (be sure to include the semicolon after each statement). The DUMP operator will display the results to your terminal screen.grunt> A = load 'passwd' using PigStorage(':');
grunt> B = foreach A generate $0 as id;
grunt> dump B;
Hadoop cluster runs in one of the three supported modes:
- Local (Standalone) Mode
- Pseudo-Distributed Mode
- Fully-Distributed Mode
No comments:
Post a Comment