All you need to know about Unit Testing | Minimal Post

Unit/Integration test­ing is some­thing most devel­op­ers do on a dai­ly basis. I had always want­ed to write a blog on this sub­ject — some­thing min­i­mal but that cov­ers the essen­tials and gives one the nec­es­sary ter­mi­nol­o­gy and tools to get going in the begin­ning. This post is tar­get­ed towards junior devel­op­ers. So, here we go -

Rec­om­mend­ed Book — The Art of Unit Test­ing, 2nd Edi­tion

What is Inte­gra­tion Test­ing and what is Unit Test­ing?

Well, soft­ware test­ing comes in two fla­vors, if you will — Unit test­ing and Inte­gra­tion test­ing. What’s the basic dif­fer­ence, you ask? Let’s say you want to test some com­po­nent X of the soft­ware. Most­ly, this X is some method/function. X might depend on some db calls, file sys­tem, etc. If you use real exter­nal resources like a real db call, etc. to test X then you are doing Inte­gra­tion test­ing but instead, if you are fak­ing or stub­bing these exter­nal resources then you are doing Unit test­ing — basi­cal­ly test­ing just that unit X and fak­ing most things it depends on. And it makes sense, you want to test just one sin­gle func­tion­al­i­ty and not depend on any­thing for that.

Using stubs to break depen­den­cies

Your code X ———> depends on some exter­nal depen­den­cy (any object like file sys­tem, threads, db calls, etc.)

But when you want to Unit test X, it would be some­thing like this -

Your code X ———> stub in place of exter­nal depen­den­cy

You might have heard about Mocks, stubs, etc. Don’t get bogged down by ter­mi­nol­o­gy — stubs and mocks are both fakes. And, from The Art of Unit Test­ing — Mocks are like stubs but you assert against a mock but you don’t assert against a stub. We’ll see all this soon.


public bool IsValidFileName(String fn)
    //read some conf file from file system
    //to see if extension is supported by your company or not.

What­ev­er code is there in place of com­ments above would qual­i­fy as exter­nal depen­den­cy. So, how do we test the above?
One way would be to actu­al­ly read a conf file and use it for test­ing and then destroy it. Of course, this would be time con­sum­ing and would be an inte­gra­tion test and not a Unit test.

So, how to test with­out resort­ing to inte­gra­tion test — The code at hand is direct­ly tied to exter­nal depen­den­cy. It is call­ing the fs direct­ly, we need to first decou­ple this. Some­thing like this,


If we do this, our code would not direct­ly depend on fs but on some­thing else that is direct­ly depen­dent on fs. This some­thing else, the fileSys­tem­Man­ag­er in our code is some­thing we that can use for a real fs in case of actu­al code and a fake fs in case of test­ing.

Anoth­er ter­mi­nol­o­gy — Seams

Oppor­tu­ni­ties in code where a dif­fer­ent func­tion­al­i­ty can be plugged in. This can be done through inter­faces (fileSys­tem­Man­ag­er above) so that it can be over­rid­den. Seams are imple­ment­ed through what’s called an Open-Closed prin­ci­ple — class should be open for exten­sion but closed for mod­i­fi­ca­tion.

So, for exam­ple, from the above we can have -

    public class FileSystemManager: IFileSystemManager
        public bool IsValid(String fileName)
         ..some production code interacting with real fs 

So, now your code looks something like this - 

    public bool IsValidFileName(String fileName)
        IFileSystemManager mgr = new FileSystemManager();
        return mrg.IsValid(fileName);

So, that is your code but what to use in tests? — a stub!

    public class FakeFileSystemManager: IFileSystemManager
        public bool IsValid(String fileName)
            return true; //no need to interact with real fs

So, ok, good, we have cre­at­ed a way to bifur­cate between actu­al pro­duc­tion code and tests code. But wait! Remem­ber that we need­ed to test IsValid­File­Name method. The issue is the fol­low­ing line of code -

IFileSys­tem­Man­ag­er mgr = new FileSys­tem­Man­ag­er();
We need to remove this instan­ti­at­ing of a con­crete class that’s using ‘new’ right now. Because if we don’t, no mat­ter where we call IsValid­File­Name from(tests or actu­al code), it will always try to new up FileSys­tem­Man­ag­er. This is where DI con­tain­ers come in but we won’t be going into details of DI here, so let’s take a look at some­thing called ‘con­struc­tor lev­el injec­tion’.

    public classUnderTest
        private IFileSystemManager _mgr;
        public classUnderTest(IFileSystemManager mgr)
            _mgr = mgr 

        public IsValidFileName(String fileName)
            return _mgr.IsValid(fileName); //so just like that we got rid of new! and this
            //code is ready for testing - we will send in fake fileSystemManager in case of tests and
            //'real' fileSystem in case of production code

So, the actu­al test now would look some­thing like -

    public class ClassUnderTest_Tests

     public void
         IFileSystemManager myFakeManager =
                 new FakeFileSystemManager();
         myFakeManager.WillBeValid = true;
         ClassUnderTest obj = new ClassUnderTest (myFakeManager);
         bool result = obj.IsValidFileName("short.ext");
    internal class FakeFileSystemManager : IFileSystemManager
        public bool WillBeValid = false;
        public bool IsValid(string fileName)
            return WillBeValid;

Once you’ve under­stood the above basic con­cept, oth­er stuff will come eas­i­ly. So for instance, here’s one way to make your fakes return an excep­tion -

    class FakeFileSystemManager: IFileSystemManager
        public bool WillBeValid = false;
        public Exception WillThrow = null;

        public bool IsValid(String fileName)
            if(WillThrow != null) //where WillThrow can be configured from the calling code.
                throw WillThrow;
            return WillBeValid;

Till now we have seen how to write our own fakes but let’s see how to avoid hand­writ­ten fakes.

But before that, I would like to broad­ly say that test­ing can be of three types (from Art of Unit Test­ing) -
1. Val­ue based — When you want to check whether the val­ue returned is good or not
2. State based — When you want to check whether the code being test­ed has expect­ed state or not (cer­tain vari­ables have been set cor­rect­ly or not)
3. Inter­ac­tion based — When you need to know if code in one object called anoth­er objec­t’s method cor­rect­ly or not. You use it when call­ing anoth­er object is the end result of your unit being test­ed. This is some­thing that you will encounter a lot in real life cod­ing.

Let’s make for­mal dis­tinc­tion between mocks, fakes and stubs now. I will be quot­ing from The Art of Unit Test­ing.
“mock object is a fake object in the sys­tem that decides whether the unit test has passed or failed. It does so by ver­i­fy­ing whether the object under test called the fake object as expect­ed”

A fake is a gener­ic term that can be used to describe either a stub or a mock object (hand­writ­ten or oth­er­wise), because they both look like the real object. Whether a fake is a stub or a mock depends on how it’s used in the cur­rent test. If it’s used to check an inter­ac­tion (assert­ed against), it’s a mock object. Oth­er­wise, it’s a stub.”

Let’s con­sid­er an exam­ple,

//an interface
public interface IWebService
    void LogError(string message);

and a hand-written fake

//note that this will only be a mock once we use it in some assert in some test
public class FakeWebService:IWebService
    public string LastError;
    public void LogError(string message)
        LastError = message;

What we want to test — When we encounter an error and post the mes­sage to our web ser­vice, the mes­sage should be saved as ‘LastEr­ror’. This means that we want to test whether our fake ser­vices’ LogEr­ror method appro­pri­ate­ly updates the ‘LastEr­ror’ or not.

 public void Analyze_TooShortFileName_CallsWebService()
     FakeWebService mockService = new FakeWebService();
     LogAnalyzer log = new LogAnalyzer(mockService);
     string tooShortFileName="abc.ext";
     StringAssert.Contains("Filename too short:abc.ext",
                        mockService.LastError); // now mockService is a 'mock'
 public class LogAnalyzer
    private IWebService service;
    public LogAnalyzer(IWebService service)
        this.service = service;
    public void Analyze(string fileName)
           service.LogError("Filename too short:"
           + fileName);

Let’s take anoth­er exam­ple, a bit more involved.

Say your code logs error by call­ing service.LogError, as above. But now, let’s add some more com­plex­i­ty — if service.LogError throws some kind of excep­tion you want to catch it and send an email to some­one to alert that some­thing is wrong with the ser­vice.

Some­thing like this,


if(fileName.Length < 8)
        service.LogError("too short " + fileName);
    catch(Exception e)
        email.SendEmail("something wrong with service", e.Message);

Now, there are two ques­tions in front of us -
1. What do we want to test?
2. How do we test it?

Let’s answer them -
What do you want to test -
1.1 When there is an error like short file name then mock web ser­vice should be called
1.2 Mock ser­vice throws an error when we tell it to
1.3 When error is thrown, an email is sent

See if the fol­low­ing will work -
Your test code calls your Log­An­a­lyz­er code inject­ing it a fake web ser­vice that throws an excep­tion when you tell it to. Your log­An­a­lyz­er will also need an email ser­vice that should be called when the excep­tion is thrown.

It should work some­thing like this -

But how will we assert that the email ser­vice was called cor­rect­ly? — Set some­thing in the mock email that we can assert lat­er!

class EmailInfo
    public string Body;
    public string To;
    public string Subject;

public void Analyze_WebServiceThrows_SendsEmail()
    FakeWebService stubService = new FakeWebService();
    stubService.ToThrow=  new Exception("fake exception");
    FakeEmailService mockEmail = new FakeEmailService();
    LogAnalyzer2 log = new LogAnalyzer2(stubService,mockEmail);
    string tooShortFileName="abc.ext";

    EmailInfo expectedEmail = new EmailInfo {
                                       Body = "fake exception",
                                       To = "",
                                       Subject = "can’t log" }


public class FakeEmailService:IEmailService
    public EmailInfo email = null;
    public void SendEmail(EmailInfo emailInfo)
        email = emailInfo;

public interface IEmailService
    void SendEmail(string to, string subject, string body);

public class LogAnalyzer2
    public LogAnalyzer2(IWebService service, IEmailService email)
         Email = email,
         Service = service;
    public IWebService Service
         get ;
         set ; 
    public IEmailService Email
        get ;
        set ; 

    public void Analyze(string fileName)
            try {
                  Service.LogError("Filename too short:" + fileName);
                catch (Exception e)
                                    "can’t log",e.Message);

public class FakeWebService:IWebService
    public Exception ToThrow;
    public void LogError(string message)
            throw ToThrow;

Rule of thumb — once mock per test. Oth­er­wise it usu­al­ly implies that you are test­ing more than one thing in a Unit test.

Let’s say you have some code like -

String connString = GlobalUtil.Configuration.DBConfiguration.ConnectionString

and you want to replace connString with one of your own dur­ing test­ing, you could set up a chain of stubs return­ing your val­ue — but that would not be so main­tain­able, would it!
On the oth­er hand, you could also make your code more testable by refac­tor­ing it to be some­thing like -

String connString = GetConnectionString();
public String GetConnectionString()
    return GlobalUtil.Configuration.DbConfiguration.ConnectionString;

Now, instead of hav­ing to fake a chain of meth­ods, you would only need to do that for Get­Con­nec­tion­String() method.

Prob­lem with hand­writ­ten mocks and stubs

1. Takes time to write
2. Dif­fi­cult to write for big or com­pli­cat­ed inter­faces and class­es
3. Main­te­nance of hand­writ­ten mocks and stubs as code changes
4. Hard to reuse in a lot of cas­es

Enter Iso­la­tion or Mock­ing frame­works

Def­i­n­i­tion — Frame­works to cre­ate and con­fig­ure fake objects at run­time (dynam­ic stubs and mocks)

So, let’s use one! We will be using NSub­sti­tute on Visu­al­Stu­dio for Mac Pre­view edi­tion. This is part of VS IDE devel­oped for .Net Core com­mu­ni­ty.

The code exam­ples men­tioned below are self suf­fi­cient and easy to read. Please go through them in the fol­low­ing order-

1. LogAnalyzerTests.cs — Shows com­par­i­son between writ­ing a hand­writ­ten fake and one using NSub­sti­tute. Also shows how to ver­i­fy the call made to a mock.
2. SimulateFakeReturns.cs — Shows how to make a stub return some­thing we want when­ev­er there is some par­tic­u­lar or gener­ic input.
3. LogAnalyzer2Example.cs — Again shows com­par­i­son between code writ­ten with and with­out a mock­ing frame­work. Shows how to throw excep­tion using NSub­sti­tute.

The above are sim­ple exam­ples show­cas­ing API capa­bil­i­ties of a mock­ing frame­work — things we’ve been doing by hand­writ­ten code till now.

Gen­er­al state­ment on how mock­ing frame­works work — Much the same way you write hand­writ­ten fakes with the excep­tion that these frame­works write code dur­ing run time. At run time, they give us the oppor­tu­ni­ty to over­ride inter­face meth­ods or vir­tu­al meth­ods much the same way we’ve seen till now. Gen­er­at­ing code at run time is not some­thing new to the pro­gram­ming world. How­ev­er, some frame­works allow much more than this — they even allow you to fake con­crete class­es and over­ride their meth­ods. How? In these cas­es the mock­ing frame­works inject code that you want in the .class or .dll and use some­thing IF DEF kind of con­di­tion­al run­ning of code. Of course, this capa­bil­i­ty requires under­stand­ing how to inject or weave code at run time which fur­ther requires under­stand­ing of inter­me­di­ate code tar­get­ing the run­time or VM.

Git Cheatsheet

This prob­a­bly serves as a quick git ref­er­ence or cheat­sheet. I used to main­tain this when I was new to git. Prob­a­bly still help­ful for begin­ners out there.

Revert all for files added but not com­mit­ted
git check­out .

Adding/Staging files
git add .

Com­mit­ting with mes­sage
com­mit ‑m “my mes­sage”

Make new branch (local)
git check­out ‑b new_branch

Then make that branch on serv­er ready to push
git push ‑u ori­gin feature_branch_name
Or sim­ply
git push –all ‑u (to push all your local branch­es to serv­er and set track­ing for them too)

Change from one branch to anoth­er
git check­out another_branch_name

Merge branch­es
Sup­pose you have branched ‘mas­ter’ and ‘feature1’ and you want to bring the con­tents of ‘mas­ter’ into ‘feature1’, means you want to update your ‘feature1’ branch, then you do -

git check­out feature1
git merge mas­ter

if you want to bring in the con­tents of ‘feature1’ into ‘mas­ter’, when your ‘feature1’ work is done, then you do

git check­out mas­ter
git merge feature1

In fact, this 2 step process is the bet­ter way to merge your fea­ture branch­es into mas­ter

In order to resolve con­flicts, you might have to do
git merge­tool

After all this is done, you do a com­mit merge and push from mas­ter

Equiv­a­lent of hg out­go­ing
If you want to list com­mits that are on your local branch dev, but not the the remote branch origin/dev, do:
git fetch ori­gin # Update origin/dev if need­ed
git log origin/

Equiv­a­lent of git incom­ing
git fetch ori­gin # Update origin/dev if need­ed
git log dev..origin/dev

See the his­to­ry of a file

gitk /pathtofile/

When you want to set­up a new repo in Github and already have some code in local

Cre­ate the remote repos­i­to­ry, and get the URL such as or, add readme or .git­ig­nore or what­ev­er you want
Local­ly, at the root direc­to­ry of your source, git init
git pull {url from step 1}
git add . then git com­mit ‑m ‘ini­tial com­mit com­ment’
git remote add ori­gin [URL From Step 1]
git pull ori­gin mas­ter
git push ori­gin mas­ter

Pull all branch­es to local

git fetch –all
git pull –all

List all branch­es

git branch ‑a –list all local branch­es
git branch ‑r –list all remote branch­es

and then do a sim­ple git check­out full­branch­name to move into that branch

Pull cer­tain branch from serv­er

git pull ori­gin brnach-name

Set­ting up git merge­tool on win­dows

down­load kdiff3 exe, install it

open git­con­fig file in C:\Program Files (x86)\Git\etc

open git­con­fig using cmd in Admin mode, by notepad git­con­fig com­mand

add the fol­low­ing there -

tool = kdiff3

[merge­tool “kdiff3”]
path = C:/Program Files/KDiff3/kdiff3.exe
keep­Back­up = false
trustEx­it­Code = false

save it, close it

on git bash do
git con­fig –glob­al merge.tool kdiff3


Just down­load and install kdiff 3
And do the fol­low­ing

$ git con­fig –glob­al –add merge.tool kdiff3
$ git con­fig –glob­al –add mergetool.kdiff3.path “C:/Program Files/KDiff3/kdiff3.exe”
$ git con­fig –glob­al –add mergetool.kdiff3.trustExitCode false
$ git con­fig –glob­al –add diff.guitool kdiff3
$ git con­fig –glob­al –add difftool.kdiff3.path “C:/Program Files/KDiff3/kdiff3.exe”
$ git con­fig –glob­al –add difftool.kdiff3.trustExitCode false
Delete a branch from local

git branch ‑D branch_name

Delete a branch from remote

git push ori­gin –delete <branch­Name>

Delete untracked file
git clean ‑f file­name­with­path
or git clean ‑f ‑d file­name­with­path


git clean ‑f ‑n to show what files will be removed
git clean ‑f to actu­al­ly remove those

use git clean ‑fd to remove untracked direc­to­ries

use git sta­tus to check whether some­thing left untracked or not

Reverse a com­mit­ted push
git reverse <commit’s hash> will cre­ate a new com­mit, which you’ll have to push

Cher­ry pick a com­mit
Lets say you want to cher­ry pick com­mit 6a23b56 from fea­ture branch to mas­ter. You must be in mas­ter and then do
git cher­ry-pick ‑x 6a23b56
that’s all !

Remov­ing Files
Say that u delet­ed a file from disk, now it will show as delet­ed in git sta­tus. How to make that change on serv­er also?
git rm <file path and name>
git com­mit
git push

Mov­ing repo from one loca­tion to anoth­er (or dupli­cat­ing repo to some new loca­tion)
git remote add new_repo_name new_repo_url
Then push the con­tent to the new loca­tion
git push new_repo_name mas­ter
Final­ly remove the old one
git remote rm ori­gin
After that edit the.git/config file to change the new_repo_name to ori­gin.

If you don’t remove the ori­gin (orig­i­nal remote repos­i­to­ry), you can sim­ply just push changes to the new repo with
git push new_repo_name mas­ter

If you delete a file in one branch and don’t com­mit or stash then those files will appear delet­ed on oth­er branch­es
Switch­ing branch­es car­ries uncom­mit­ted changes with you. Either com­mit first, run git check­out . to undo them, or run git stash before switch­ing. (You can get your changes back with git stash apply

Revert a spe­cif­ic file to some ear­li­er git ver­sion
Find the com­mit where you went wrong, either using git log or git lop ‑p or gitk
Find the com­mit hash
git check­out com­mit­code filepath

Now com­mit again

Com­mit a sin­gle file
git com­mit ‑m ‘com­ments’ filepath

SSH set­up between ur machine’s repo and serv­er repo
Cd to home dir
ssh-key­gen ‑t rsa ‑C “”
clip < ~/.ssh/

Save on serv­er

How to see my last n com­mits
git log ‑n 5 –author=vaibhavk

How to see con­tents of a par­tic­u­lar com­mit
git show hash­val­ue

What’s A, B and C in Kdiff3
A is the orig­i­nal file, before any merge con­flicts hap­pened
B is the your cur­rent file (includ­ing any uncom­mit­ted changes)
C is the incom­ing file that caused merge con­flict

How to see uncom­mit­ted changes for a spe­cif­ic file against ear­li­er ver­sion(?)
git diff filepath

Undo git add
Git reset filepath

See if a com­mit is in a branch or its in what branch­es
git branch ‑a –con­tains 4f08c85ad (remove ‑a to see only your local branch­es)
List git com­mits not pushed yet
git log origin/master..master
Or git log <since>..<until>
You can use this with grep to check for a spe­cif­ic, known com­mit:
git log <since>..<until> | grep <com­mit-hash>

Undo last com­mit

git com­mit ‑m “Some­thing ter­ri­bly mis­guid­ed”
git reset HEAD~
« edit files as nec­es­sary »
git add <what­ev­er>
git com­mit ‑c ORIG_HEAD

Making a Hadoop Cluster using Cloudera CDH 5.1.x | Experience

Let me start by say­ing that Lin­ux, Open Source in gen­er­al, JVM ecosys­tem and Hadoop rock!ele1


Installing a new clus­ter using Cloud­era 5.1.x

Note that IPs men­tioned here are just some dum­my sam­ple IPs cho­sen for this post!

  1. Install Cen­tOS 6.5 on all the nodes. We chose Cen­tOS but you could chose any oth­er Lin­ux flavour as well — just make sure it is sup­port­ed by Cloud­era
  2. Loop in your IT/Networking team to give sta­t­ic IPs to the nodes. Note that the machine you intend to make NameN­ode should have the fol­low­ing entries in its /etc/hosts file. This file is used by Cloud­era and oth­er nodes to resolve IP address­es and host names. In our case we kept as the NameN­ode — n37127.0.0.1 local­host localhost.localdomain localhost4 localhost4.localdomain4
    ::1 local­host localhost.localdomain localhost6 localhost6.localdomain6 n33 n36 n38 n39

    Note that for sta­t­ic IP set­up you need to have at least the fol­low­ing entries in your /etc/syscon­fig/net­work-script­s/ifcfg-eth0

    NAME=“System eth0”

    Once such entries have been made you will need to restart the net­work using the com­mand ser­vice net­work restart

  3. The oth­er 4  entries towards the end are for oth­er data nodes (or slave machines). The data nodes should have the fol­low­ing entries in their /etc/hosts file. For exam­ple, our node had the fol­low­ing - n37 n33 local­host localhost.localdomain localhost4 localhost4.localdomain4
    ::1 local­host localhost.localdomain localhost6 localhost6.localdomain6

    Also, make sure that the entries in /etc/resolv.conf on all nodes (includ­ing namen­ode) should be -

  4. Set­ting up pass­word-less logins — Next we set up ssh between the machines. Note that in our case we did every­thing by log­ging in as ‘root’. Oth­er than that, our user was ‘cloud­era’ and in our case, even the root pass­word is ‘cloud­era’. One thing I want point upfront is that you should keep the same pass­words for all machines because Cloud­era set­up might want pass­word to go into dif­fer­ent machines.log into your name node as ‘root’.

    sudo yum install openssh-client (in Ubun­tu’s case it will be ‘apt-get’ instead of ‘yum’)
    sudo yum install openssh-serv­er

    ssh-key­gen ‑t rsa ‑P “” ‑f ~/.ssh/id_dsa (gen­er­at­ing ssh-key)   – note that the passphrase is emp­ty string
    ssh-copy-id ‑i $HOME/.ssh/ username@slave-hostname (copy pub­lic key over to the node/slave machines. So, in our case, one exam­ple would be root@
    cat $HOME/.ssh/ » $HOME/.ssh/authorized_keys (you need to do this if the same machine would need to ssh itself. We did this too.). Note that the work is not done yet. We need to set­up pass­word less login from data nodes to name node also. At this point you will be able to log into data nodes from namen­ode with ssh root@datanodeName/IP, like So, log into data nodes one by one and fol­low the pro­ce­dure above to set pass­word less ssh login from each node to name node. Once this is done, restart all machine using the com­mand init 6. It’s awe­some when you con­trol so many machines from one!

  5. Oth­er con­fig­u­ra­tions -
    Dis­able selin­ux, vi /etc/selinux/config selinux=disabled
    chk­con­fig ipt­a­bles off in /etc  –> restart of the nodes will be need­ed after this

    We have the fol­low­ing machines — as namen­ode, datan­odes —,,,

  6. Java instal­la­tion. Check what Java ver­sions are sup­port­ed by Cloud­era. In our case, we chose to install the lat­est ver­sion of Java 1.7. It was 1.7.0_71

    Do the fol­low­ing on all nodes -
    Down­load the java rpm for 64 bit java 1.7 on any of machines ( Use scp com­mand like this to trans­fer this rpm to all machines -
    scp /root/Downloads/jdk_xyz_1.7.0_71.rpm root@
    On all nodes -
    yum remove java (to remove ear­li­er java ver­sions)
    rpm ‑ivh /root/Downloads/jdk_xyz_1.7.0_71.rpmRunning this rpm should install Java, now we need to set the paths and all right. Note that Cloud­era requires java to install in the fold­er (/usr/java/jdk_xyz_1.7.0_nn where nn is the ver­sion num­ber — 71 in our case) -
    Now, no point in set­ting the envi­ron­ment vari­able like export $JAVA_HOME=whatever. Why? You will see that this vari­able is reset in each ses­sion of the bash. So, do like this ‑make a file like this -

    vi /etc/profile.d/

    type the fol­low­ing in -

    export PATH JAVA_HOME
    export CLASSPATH=.
    save the file

    make the file an exe­cutable by -
    chmod +x /etc/profile.d/

    run the file by
    source /etc/profile.d/

    Now, check the java ‑ver­sion, which java. Every­thing should be good (smile)

    Refer to —‑1–8‑centos‑6–5/  and

  7. Now we are ready to begin the next bat­tle — actu­al Cloud­era Hadoop instal­la­tion!
    If you use Lin­ux on a reg­u­lar basis then read­ing doc­u­men­ta­tion should already be your sec­ond nature. In case it’s not, make it your sec­ond nature (smile)
    No mat­ter how com­plete any arti­cle is, you should always refer to the actu­al doc­u­men­ta­tion. In this case, we are try­ing to set­up CDH 5.1.x (Cloud­era Dis­tri­b­u­tion for Hadoop ver­sion 5.1.x) and the doc­u­men­ta­tion is at — . If it’s not here, you would still be able to find it with a sim­ple google search!We would be going for auto­mat­ed instal­la­tion of CDH (which is what is also rec­om­mend­ed by Cloud­era if you read their doc­u­men­ta­tion. This is doc­u­ment­ed as ‘Path A’ instal­la­tion!) —

  8. You should now have python 2.6 or 2.7 installed on your machine. Check if it’s already there by typ­ing in which python and python ‑ver­sion. If it’s not there, then­su ‑c ‘rpm ‑Uvh‑5–4.noarch.rpm

    yum install python26
    Down­load cloudera-manager-installer.bin from the cloud­era site.
    chmod u+x cloudera-manager-installer.bin  (give per­mis­sions to the exe­cutable)
    sudo ./cloudera-manager-installer.bin – run the exe­cutable

    When the instal­la­tion com­pletes, the com­plete URL pro­vid­ed for the Cloud­era Man­ag­er Admin Con­sole, includ­ing the port num­ber, which is 7180 by default. Press Return or Enter to choose OK to con­tin­ue.
    User­name: admin Pass­word: admin

  9. From this point onwards, you are on your own to set­up the clus­ter. I will point a few things here that I remem­ber from our expe­ri­ence of clus­ter set­up —

    the entry in vi /etc/cloudera-agent/config.ini, for server_host should be the IP of your namen­ode and on namen­ode, it should be ‘local­host’. In most cas­es, we found that the prob­lem was either with our /etc/hosts file (which is why I men­tioned the exact entries we had for our namen­odes and datan­odes) or with JAVA_HOME.

    We had used to option of ‘pass­word’ when Cloud­era asked us how it will login into the nodes (this is why we men­tion that make all root pass­words same — ‘cloud­era’ in our case). We also used the option of ’embed­ded Dbs’, it means that we did not install any Post­Gres or MySql data­bas­es on our own but we let Cloud­era do that for us and we sim­ply not­ed down the user­name, pass­word, port num­bers, etc. These are need­ed to serve as meta data hold­ing data­bas­es for things like Impala, Hive, etc. Oth­er than this we chose to install ‘Cus­tom Ser­vices’ because we faced some prob­lem installing Spark and we learned lat­er that it was due to some known issue in which we had to trans­fer a jar file from spark fold­er into HDFS. So, chose not to install Spark right away. We can any­ways do it lat­er.  So, we went for the instal­la­tion of HDFS, Impala, Sqoop2, Hive, YARN, Zookeep­er, OOzie, HBase and Solr.

    One prob­lem we faced was that even though we had told Cloud­era that we have our own Java installed still it went ahead and installed it’s own java on our machine and used it’s path instead of our. For this, go to Cloud­era Man­ag­er home page, click on the ‘Hosts’ tab, click on ‘Con­fig­u­ra­tion’, go to ‘Advanced’ —  there you will find an entry by the name ‘Java Home Direc­to­ry’ and change the path to point to the path of your instal­la­tion. In our case, we point­ed it to ‘/usr/java/jdk1.7.0_71’.

    Anoth­er thing I’d like to men­tion is that the CM web UI has some bugs. For exam­ple, when try­ing to re-con­fig­ure the ser­vices we want­ed to install, even though we had des­e­lect­ed a few ser­vices, it would still show them for instal­la­tion. For this we had to sim­ply delete the clus­ter and restart the whole process. In fact, we delet­ed and made the clus­ter many times dur­ing our clus­ter set­up! So, don’t be afraid of this — go ahead and delete the whole clus­ter and re-do if you’re fac­ing issues. Also, please restart the machines and the clus­ter many times (or when in doubt), you nev­er know when some­thing is not reflect­ing the lat­est changes.

    Our namen­ode direc­to­ry points to /dfs/nn2 in the hdfs rather than the default /dfs/nn, because for some rea­son, our /dfs/nn was cor­rupt­ed when we had tried to move a Spark assem­bly file to hdfs and it pre­vent­ed the namen­ode to get for­mat­ted (even though we tried to delete this file).

    I’d also like to men­tion that we also changed the node wise con­fig­u­ra­tion sug­gest­ed by Cloud­era. Basi­cal­ly, it was sug­gest­ing that we make as our namen­ode (even though there was no rea­son for this). Now Cloud­era had installed on the machine from where we were run­ning the cloud­era man­ag­er. So, to avoid any issues, we just made sure that has all the ser­vices that Cloud­era was sug­gest­ing for 33. Basi­cal­ly, we reversed the roles for 33 and 37!

    As if now, our clus­ter is up and run­ning. There might still be some glitch­es because we haven’t ver­i­fied the clus­ter by run­ning any­thing. I’ll post the changes we make as we make them.

  • Some pics —

    ele4 ele6ele4ele5ele6

Making your subdomain respect index.html, index.php, etc.

subdomains image

I am not sure about you but I faced an issue when I was try­ing to add my resume as a sep­a­rate sub­do­main on my domain. The prob­lem was that when I added the index.html page into the sub­do­main fold­er and tried to browse the site — it just did­n’t work! Although the main site also had an index.html and it worked per­fect­ly fine, the sub­do­main was not respect­ing the spe­cial posi­tion that’s to be enjoyed by index.html. After some search­ing on the web I found that I had to mod­i­fy the .htac­cess present in my root. I just added the fol­low­ing line -

Direc­to­ryIn­dex index.html index.php

and then it all worked just fine.

How to debug your tests without Re# in a jiffy

Resharper is a crutch






1. In Visu­al Stu­dio, right click on the NUnit Test project, select the Prop­er­ties option.
2. Go to the Debug tab, select the ‘Start Exter­nal Pro­gram’ radio but­ton. Browse to packages\NUnit.Runners.2.6.2\tools\nunit.exe
3. In the ‘Com­mand line argu­ments’ enter the test DLL (exam­ple — XYZ.ABC.Tests.dll). Don’t for­get the .dll
4. Make the test project as start­up project by right-click­ing on the project in the Solu­tion Explor­er and select­ing Set as Start­up Project.
Hit F5 to launch NUnit, select and run the appro­pri­ate test method to be debugged, and enjoy the debug­ging good­ness.

Note — Due to NUnit ver­sions and .net ver­sions incom­pat­i­bil­i­ties, you may still not be able to debug. In that case, go to packages\NUnit.Runners.2.6.2\tools and find nunit.exe.config and edit the start­up to be
<start­up useLegacyV2RuntimeActivationPolicy=“true”>
<!– Com­ment out the next line to force use of .NET 4.0 –>
<require­dRun­time version=“4.0.30319” />
edit nunit-agent.exe.config’s <start­up> to be
<start­up useLegacyV2RuntimeActivationPolicy=“false”>
<sup­port­e­dRun­time version=“v4.0.30319” />