hadoop
Friday, July 4, 2014
Hadoop Basic Terms Random
Tar bal ( Tape Archive ball)
Source File
Package ( they are precompiled for specific kernel and architechture )
Examples respectively
.src.rpm :- source code that will compile on your machine and this produces a .rpm file.
.tar.gz :- just plain zipped source code
rpm :- pre compiled source
Tar bal ( Tape Archive ball)
A tarball is a good way of installing new software however the down side of tarballs is there is no good way of removing software. Typically when you build install a tarbal you run
Code:
./configure make make install
Source
Package
.src.rpm files are sort of between .tar.gz (called tarball)and .rpm files.
the .src.rpm files are like a tarball but with som aditional information that makes it possible for 'rpm' to compile and build a .rpm package. This package is compiled for your machine and saved in some RPM directory.
For me it is
Code:
/usr/src/RPM/RPMS/i586/
This is nice because you can now install the .rpm file and get the program in your rpm database4. Using RPM
In its simplest form, RPM can be used to install packages:
rpm -i foobar-1.0-1.i386.rpm
The next simplest command is to uninstall a package:
rpm -e foobar
If that don't help try this:
While these are simple commands, rpm can be used in a multitude of ways. To see which options are available in your version of RPM, type:
rpm --help
You can find more details on what those options do in the RPM man page, found by typing:
man rpm
RPM Vs Tarball Or Source
RPMs are generally easier to manage, and as mentioned above, they are easier to remove, and also to keep track of updates (especially if you use something like yum to update), because rpm maintains a database of what's installed.
However, there are two downsides to rpms that come to mind. First, not every piece of software is in an rpm, or at least in an rpm for your distro. The other is that you are reliant on the packagers to produce updates in a timely fashion. You may, for example, want to update something like clamav as soon as it is released, rather than wait a week or two for a new package.
Some rpm's also have quirks. There is, supposedly and issue of compatability between some repos, for example livna.org and the freshrpms repos are meant to have issues working together (something about renaming system files). I've also found occasionally that dependencies can be problematic for some packages.
BUILD RPM FROM SOURCE
Use the rpmbuild command to build an RPM from the source rpm. E.G.: "rpmbuild -ba package.spec"
There are several options, such as "-bp" to just apply the patches, and "-bi" to build a package and install it.
See the "rpmbuild" man-page for details.
A source RPM will install a tarball in the
The source RPM will contain an earlier version of the source, and patches to make it current.
FINAL which type of installation is better.either installed by rpm or tar
package.?
If you use a rpm based distro, then use the rpm package. If you use the tarball you may have to configure it and compile it too.
courtesy :
http://www.linuxquestions.org/questions/linux-software-2/difference-between-src-rpm-and-tar-gz-packages-103683/
Monday, May 19, 2014
SVN Commands
* To Commit only particular Revision of a file
svn update -r37325 /appl/abc/xyz/bvp_udf.jar
svn update -r37374 /appl/abc/xyz/ki/efg/run.sh
Monday, May 12, 2014
sqoop 2
Errors
1) Error Message
ava.lang.RuntimeException:
1) Error Message
ava.lang.RuntimeException:
java.lang.RuntimeException :com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received
any packets from the server.at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 )
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivilege(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
• Problem: Communications Link Failure caused by incorrect permissions.
•Solution:
–Verify that you can connect to the database from the node where you are running Sqoop:
•$ mysql --host= --database=test --user= --password
–Add the network port for the server to your my.cnf file
–Set up a user account to connect via Sqoop
. Grant permissions to the user to access the database over the network:
• Log into MySQL as root
mysql-u root -p
•Issue the following command:
mysql> grant all privileges on test.* to 'testuser'@'%' identified by 'testpassword'
Create a Proxy for the mysql host and port on the current host if not present already
# bindadress bindport connectaddress connectport
Create a Proxy for the mysql host and port on the current host if not present already
mysql-proxy --proxy-backend-addresses=qa-srv:3308 --log-level=debug --proxy-address=127.0.0.1:3306
try login on another console using proxy now
mysql -u -h 127.0.0.1
Alternate to this is another Tool used on the Linux is
Rinetd – redirects TCP connections from one IP address and port to another
edit /etc/rinetd.conf and add the remote and local server binding
# bindadress bindport connectaddress connectport
192.168.2.1 80 192.168.2.3 80
192.168.2.1 443 192.168.2.3 443
Sqoop 1
Sqoop
- Sqoop from local system i.e. system on which the database as well as hadoop runs.
- Sqoop from remote system, here we need to grant the permission so the communication link failure and other issues are avoided as the data will be access over the network, so the database admin needs to provide permission where the data can access over the network using a particular ip address.
Friday, June 15, 2012
PIG (I don't like eyaakhh..). oK...starting on it Pig Latin scripting..
How to setup
http://pig.apache.org/docs/r0.10.0/start.html
Pig Execution Mode are two...
Running Pig
You can run Pig in using the "pig" command (the bin/pig Perl script) or the "java" command
(java -cp pig.jar ...). You can run Pig (execute Pig Latin statements and Pig commands) using various execution modes or exectypes based on type of Hadoop cluster you are working or Standalone(Local):
(java -cp pig.jar ...). You can run Pig (execute Pig Latin statements and Pig commands) using various execution modes or exectypes based on type of Hadoop cluster you are working or Standalone(Local):
- Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local). So this is kind of Hadoop Local (Standalone) Mode
- Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don't need to, specify it using the -x flag (pig OR pig -x mapreduce). So this is kind of Hadoop Distributed Mode
Local Mode | Mapreduce Mode | |
Interactive Mode | yes | yes |
Batch Mode | yes | yes |
Using PIG Commands
Using JAVA Commands
/* local mode */ $ pig -x local ...
/* local mode */
$ java -cp pig.jar org.apache.pig.Main -x local ...
$ pig ...
or
$ pig -x mapreduce ..
/* mapreduce mode */ $ java -cp pig.jar org.apache.pig.Main ... or $ java -cp pig.jar org.apache.pig.Main -x mapreduce ...
PIG Commands Execution Mode : Interactive Mode
You can run Pig in interactive mode using the Grunt shell. Invoke the Grunt shell using the "pig" command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.
Example
These Pig Latin statements extract all user IDs from the /etc/passwd file. First, copy the /etc/passwd file to your local working directory. Next, invoke the Grunt shell by typing the "pig" command (in local or hadoop mode). Then, enter the Pig Latin statements interactively at the grunt prompt (be sure to include the semicolon after each statement). The DUMP operator will display the results to your terminal screen.grunt> A = load 'passwd' using PigStorage(':'); grunt> B = foreach A generate $0 as id; grunt> dump B;
Hadoop cluster runs in one of the three supported modes:
- Local (Standalone) Mode
- Pseudo-Distributed Mode
- Fully-Distributed Mode
Subscribe to:
Posts (Atom)