Friday, July 4, 2014

git

Hadoop Basic Terms Random



Tar bal ( Tape Archive ball)
Source File
Package ( they are precompiled for specific kernel and architechture )

Examples respectively
.src.rpm :- source code that will compile on your machine and this produces a .rpm file.
.tar.gz :- just plain zipped source code

 rpm :- pre compiled source 

Tar bal ( Tape Archive ball) 

A tarball is a good way of installing new software however the down side of tarballs is there is no good way of removing software. Typically when you build install a tarbal you run
Code:
./configure
make
make install
If you want to remove this software you can, if you are lucky, run make remove or make uninstall. But for this to work the makefile has to include instructions of what to remove, and that is sometimes not the case. 

Source

Package
.src.rpm files are sort of between .tar.gz (called tarball)and .rpm files.

the .src.rpm files are like a tarball but with som aditional information that makes it possible for 'rpm' to compile and build a .rpm package. This package is compiled for your machine and saved in some RPM directory.
For me it is

Code:
/usr/src/RPM/RPMS/i586/
because I use the i586 arch.

This is nice because you can now install the .rpm file and get the program in your rpm database4. Using RPM

In its simplest form, RPM can be used to install packages: 

rpm -i foobar-1.0-1.i386.rpm

The next simplest command is to uninstall a package: 

rpm -e foobar

If that don't help try this:
While these are simple commands, rpm can be used in a multitude of ways. To see which options are available in your version of RPM, type:
rpm --help
You can find more details on what those options do in the RPM man page, found by typing:
man rpm



RPM Vs Tarball Or Source
RPMs are generally easier to manage, and as mentioned above, they are easier to remove, and also to keep track of updates (especially if you use something like yum to update), because rpm maintains a database of what's installed.

However, there are two downsides to rpms that come to mind. First, not every piece of software is in an rpm, or at least in an rpm for your distro. The other is that you are reliant on the packagers to produce updates in a timely fashion. You may, for example, want to update something like clamav as soon as it is released, rather than wait a week or two for a new package.

Some rpm's also have quirks. There is, supposedly and issue of compatability between some repos, for example livna.org and the freshrpms repos are meant to have issues working together (something about renaming system files). I've also found occasionally that dependencies can be problematic for some packages.

BUILD RPM FROM SOURCE

Use the rpmbuild command to build an RPM from the source rpm. E.G.: "rpmbuild -ba package.spec"
There are several options, such as "-bp" to just apply the patches, and "-bi" to build a package and install it. 

See the "rpmbuild" man-page for details.

A source RPM will install a tarball in the /SOURCES/ directory. It will also install patches in the same directory. Also, a package.specfile is installed in /SPECS/ directory.

The source RPM will contain an earlier version of the source, and patches to make it current.

FINAL which type of installation is better.either installed by rpm or tar 
package.?


If you use a rpm based distro, then use the rpm package. If you use the tarball you may have to configure it and compile it too. 

courtesy :
http://www.linuxquestions.org/questions/linux-software-2/difference-between-src-rpm-and-tar-gz-packages-103683/

Monday, May 19, 2014

SVN Commands



* To Commit only particular Revision of a file

svn update -r37325 /appl/abc/xyz/bvp_udf.jar
svn update -r37374 /appl/abc/xyz/ki/efg/run.sh

Monday, May 12, 2014

sqoop 2

Errors

1) Error Message
ava.lang.RuntimeException:
java.lang.RuntimeException :com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received
any packets from the server.at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 )
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivilege(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264) 
Problem: Communications Link Failure caused by incorrect permissions.
•Solution:
–Verify that you can connect to the database from the node where you are running Sqoop:
•$ mysql --host= --database=test --user= --password
–Add the network port for the server to your my.cnf file
–Set up a user account to connect via Sqoop
. Grant permissions to the user to access the database over the network:
Log into MySQL as root
mysql-u root -p
•Issue the following command:
mysql> grant all privileges on test.* to 'testuser'@'%' identified by 'testpassword' 

Create a Proxy for the mysql host and port on the current host if not present already


mysql-proxy --proxy-backend-addresses=qa-srv:3308 --log-level=debug --proxy-address=127.0.0.1:3306
try login on another console using proxy now
 mysql -u  -h 127.0.0.1 
Alternate to this is another Tool used on the Linux is 

Rinetd – redirects TCP connections from one IP address and port to another

 

edit  /etc/rinetd.conf and add the remote and local server binding

# bindadress bindport connectaddress connectport
192.168.2.1 80 192.168.2.3 80
192.168.2.1 443 192.168.2.3 443

Sqoop 1

Sqoop

  • Sqoop from local system i.e. system on which the database as well as hadoop runs.
  • Sqoop from remote system, here we need to grant the permission so the communication link failure and other issues are avoided as the data will be access over the network, so the database admin needs to provide permission where the data can access over the network using a particular ip address.
Sqoop -> Hive -> Indexing
Schemaless storage of the data specially data coming from XML and JSON objects.
Zookeeper

Friday, June 15, 2012

PIG (I don't like eyaakhh..). oK...starting on it Pig Latin scripting..


How to setup
http://pig.apache.org/docs/r0.10.0/start.html

Pig Execution Mode are two...

Running Pig

  You can run Pig in using the "pig" command (the bin/pig Perl script) or the "java" command
(java -cp pig.jar ...).  You can run Pig (execute Pig Latin statements and Pig commands) using various execution modes or exectypes based on type of Hadoop cluster you are working or Standalone(Local):
  • Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local). So this is kind of Hadoop Local (Standalone) Mode  
  • Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don't need to, specify it using the -x flag (pig OR pig -x mapreduce). So this is kind of Hadoop Distributed Mode
Local Mode  Mapreduce Mode
Interactive Mode yes yes
Batch Mode yes yes
 

Using PIG Commands                 Using JAVA Commands

/* local mode */                           $ pig -x local ...     

/* local mode */
$ java -cp pig.jar org.apache.pig.Main -x local ...   

  
    
/* mapreduce mode */
$ pig ...  
or
$ pig -x mapreduce ..  
/* mapreduce mode */
$ java -cp pig.jar org.apache.pig.Main ...
or
$ java -cp pig.jar org.apache.pig.Main -x mapreduce ...



 PIG Commands  Execution Mode : Interactive Mode
You can run Pig in interactive mode using the Grunt shell. Invoke the Grunt shell using the "pig" command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.



Example

These Pig Latin statements extract all user IDs from the /etc/passwd file. First, copy the /etc/passwd file to your local working directory. Next, invoke the Grunt shell by typing the "pig" command (in local or hadoop mode). Then, enter the Pig Latin statements interactively at the grunt prompt (be sure to include the semicolon after each statement). The DUMP operator will display the results to your terminal screen.

grunt> A = load 'passwd' using PigStorage(':'); 
grunt> B = foreach A generate $0 as id; 
grunt> dump B; 





Hadoop cluster runs in one of the three supported modes:
  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode