hbase not-so-quick start

I wanted to play with Apache HBase so I downloaded v0.94.2 to a ubuntu VirtualBox and followed the quick start.  But it didn’t start at all.  The log file had exceptions similar to the following:

2012-11-15 09:37:28,728 INFO org.apache.hadoop.ipc.HBaseRPC: Server at localhost/ could not be reached after 1 tries, giving up.
 2012-11-15 09:37:28,732 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of -ROOT-,,0.70236052 to localhost,40408,1352990244709, trying to assign elsewhere instead; retry=0
 org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to localhost/ after attempts=1
 at org.apache.hadoop.hbase.ipc.HBaseRPC.handleConnectionException(HBaseRPC.java:291)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:259)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1313)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1269)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1256)
 at org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:550)
 at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:483)
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1640)
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1363)
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1338)
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1333)
 at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:2212)
 at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:632)
 at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:529)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
 at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:416)
 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:462)
 at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1150)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1000)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
 at $Proxy12.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:335)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:312)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:364)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
 ... 15 more
 2012-11-15 09:37:28,732 WARN org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT-,,0.70236052

After trying a few different things, I found this blog post.  The post was using Cloudera releases, but the symptoms were very similar to what I experienced.  It presented two approaches to solving the issue — either commenting out the “ COMPNAME” line in /etc/hosts, or adding a property “-Djava.net.preferIPv4Stack=true” to HADOOP_OPTS in hadoop-env.sh.  I tried the first approach by commenting out all the COMPNAME lines in /etc/hosts.  In my case, there were 2 such lines, one resolved to localhost and the other resolved to evabuntu (the name I gave to my virtual machine).

When I tried to start HBase again, I got a different exception:

2012-11-15 10:06:22,062 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMasterevabuntu
at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:134)
at org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:197)
at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:147)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:140)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1806)
Caused by: java.net.UnknownHostException: evabuntu: evabuntu
at java.net.InetAddress.getLocalHost(InetAddress.java:1438)
at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:185)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:241)
at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.<init>(HMasterCommandLine.java:215)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:131)
... 7 more
Caused by: java.net.UnknownHostException: evabuntu
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
at java.net.InetAddress.getLocalHost(InetAddress.java:1434)
... 15 more

This time, the exception was much easier to understand.  HBase picked up the name “evabuntu” from /etc/hostname but wasn’t able to resolve the IPv6 because there was no entry for it in /etc/hosts.  So all I had to do was add an IPv6 entry for it in /etc/hosts.

First use ifconfig to find the IPv6

> /sbin/ifconfig

eth0      Link encap:Ethernet  HWaddr 08:00:27:f4:8f:83
inet addr:  Bcast:  Mask:
inet6 addr: fe80::a00:27ff:fef4:8f83/64 Scope:Link
RX packets:66784 errors:0 dropped:0 overruns:0 frame:0
TX packets:38494 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:86804874 (86.8 MB)  TX bytes:2845363 (2.8 MB)

eth1      Link encap:Ethernet  HWaddr 08:00:27:df:4d:e0
inet addr:  Bcast:  Mask:
inet6 addr: fe80::a00:27ff:fedf:4de0/64 Scope:Link
RX packets:93 errors:0 dropped:0 overruns:0 frame:0
TX packets:92 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:11166 (11.1 KB)  TX bytes:12582 (12.5 KB)

lo        Link encap:Local Loopback
inet addr:  Mask:
inet6 addr: ::1/128 Scope:Host
RX packets:9652 errors:0 dropped:0 overruns:0 frame:0
TX packets:9652 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:971333 (971.3 KB)  TX bytes:971333 (971.3 KB)

in my case, I wanted to use eth1, but you may pick the IPv6 from any interface.  So I copied and pasted the inet6 address fe80::a00:27ff:fedf:4de0 to /etc/hosts

> cat /etc/hosts

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
fe80::a00:27ff:fedf:4de0 evabuntu

and HBase started.

Then I tried the second approach outlined in that blog post, which was to force Java to use IPv4.  The example in the post added the property to HADOOP_OPTS in conf/hadoop-env.sh, but in the case of Apache HBase, it needed to be added to HBASE_OPTS in conf/hbase-env.sh.  But this alone wasn’t enough because as I mentioned earlier, the /etc/hostname says evabuntu, so I also needed to add an entry for evabuntu in /etc/hosts.  This would be the same drill as above, except this time, you pick out the IPv4 assigned to each network interface instead of IPv6.



The mystery of Windows Experience Index fail to assess disk performance

I have posted this before on another blog space, but decide to repost it here as I intend to use this blog exclusively from now on.

So I bought a new Dell XPS laptop with core i7 running 64-bit Windows 7 almost a year ago.  After getting everything up and running, I couldn’t get Windows Experience Index (WEI) to compute the score and it kept failing at the very last step which was assessing the disk performance, saying something about data being invalid.  Granted this number doesn’t mean much, but still I felt as if something was missing.

To add to the confusion, the disk went bad within just a week (says a lot about the quality of XPS, which is supposed to be a high-end product line).  So at first, it seemed to make sense and I thought no wonder, because the disk was bad.  Then Dell sent out a technician and replaced the disk with a refurbished one (just like buying a new car, as soon as the machine left their warehouse, it started to lose its value and would only deserve refurbished replacement parts), but WEI still would not work.

Searching around a bit online and found a couple of other folks encountered the same problem, but there was no clear explanations nor solutions.  Finally I decided to bite the bullet and sent a help request to Microsoft’s customer support team.  The guy who got back to me immediately started to blame third-party software.  There are three words exactly describing how I felt when I saw his response: predictable, ignorant, and funny.  I guess it was their SOP to immediately pointing the finger at some unspecified “3rd party”, but what he didn’t know was that every time I get a new computer, I wipe out what came with it and do a clean installation using my own OS (a sincere thank to the deal between my university and Microsoft, we get Windows Ultimate edition at an extremely low price), so it was a clean installation of Windows 7 without any other software on it.  Even if it were true, that is, some 3rd party software indeed caused the problem, it didn’t make WEI look any better because it just show how vulnerable WEI can be.

As everything else seemed to be working fine, I decided to let it be since this WEI number really doesn’t mean anything.  Then just a few month ago, I saw another post complaining about this problem, but this time, the gentleman also found the cause.

Just like what I typically do, this gentleman re-installed his own Windows 7, but more importantly, he also moved all User directories to a different partition than C:\, and created symlinks on C:\ to point to the actual directories.  As it turns out, while the symlinks work fine for other applications, it does not work for WEI.  Instead, WEI relies on two environment variables TMP, and TEMP, to dump its I/O data, and by default, both TMP and TEMP point to the temporary directory within the AppData directory under each user.  Since the user directories are relocated to a different partition, and WEI doesn’t honor symlinks, it is unable to find the physical directory pointed to by the env vars and thus the error.  As soon as I reconfigured these two env vars to point to the physical temporary directories instead of through the symlink, WEI started to work.

So yes, keep blaming the 3rd party software.