Permissions for both HDFS and local fileSystem paths

| Jul 18, 2014

Hi All,

Permission issues is one of the key error , while setting up Hadoop Cluster, while debugging some error found below table on http://hadoop.apache.org/ . It’s a good scorecard to keep handy.

 

Permissions for both HDFS and local fileSystem paths

The following table lists various paths on HDFS and local filesystems (on all nodes) and recommended permissions:

Filesystem Path User:Group Permissions
local dfs.namenode.name.dir hdfs:hadoop drwx——
local dfs.datanode.data.dir (more...)

Need for Defining Reference Architecture For Big Data

Hi Fellow Big Data Admirers ,

With big data and analytics playing an influential role helping organizations achieve a competitive advantage, IT managers are advised not to deploy big data in silos but instead to take a holistic approach toward it and define a base reference architecture even before contemplating positioning the necessary tools. 

My latest print media article (5th in the series) for CIO magazine (ITNEXT) talks extensively about need of reference architecture in (more...)

How to find out a table type in Hive Metastore.

| Apr 10, 2014

Hi All

As Hive metastore is getting into the center of nervous system for the different type of  SQL engines like Shark and Impala. It getting equally difficult to distinguish type of table created in Hive metastore. Eg. if we create a impala table using impala shell you will see the same table on hive prompt and vice versa. See the below example

 

Step 1 : “Create Table” in Impala Shell and “Show Table” (more...)

How To Create External Hive Table on HBase

| Mar 28, 2014

Hi All,

While building a data flow for replacing one of the EDW’ workflow using Big Data technology stack , came across some interesting findings and issues.  Due to  UPSERT ( INSERT new records or UPDATE existing records depending) nature of data we had to use Hbase, but to expose the outbound feed we need to do some calculation on HBase and publish that to Hive as external. Even though conceptually , its easy to create an (more...)

Hbase : Co-relation between RegionServer and Region

| Mar 20, 2014

Hi All

While looking into HBase performance issue, one of the suggestion was to have more region for a larger table. There was some confusion around, “Region” vs “RegionServer” . While doing some digging, found a simple text written below.

The basic unit of scalability and load balancing in HBase is called a region. Regions are essentially contiguous ranges of rows stored together. They are dynamically split by the system when they become too large. Alternatively, they may (more...)

HDFS Free Space Command

| Mar 17, 2014

Hi All

With increasing data  volume , in HDFS space could be continued challenge. While running into some space related issue, following command came very handy, hence thought of sharing with extended virtual community.

hadoop dfsadmin -report

Post running the command, below is the result, it takes all the nodes in the cluster and gives the detail break-up based on the space availability and spaces used.


Configured Capacity: 13965170479105 (12.70 TB)
Present Capacity: 4208469598208  (more...)

Big Data : Right Approach Right Solution

 

Hi All,

Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.

Click Here To Read Full Article (more...)

Open Source Big Data Technologies

| Jan 29, 2014

Hi All

While doing a comparison analysis for building a reference architecture for Big Data technology stumbled on a very impressive Open source Big Data Technology mashup . Thanks to http://www.bigdata-startups.com/ . The most impressive part of this mashup is breaking the whole Big Data operational paradigm into multiple stages and giving available opensource technology.

Open Source Big Data Techonologies

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”


How To Run Graph In Endeca Outside of Endeca Server

| Jan 15, 2014

Hi All

Recently at one of the client,  we had a situation , where in Hadoop was taking lot longer that anticipated time to generate a file. The graph needed the file as an input, but since file was not getting generated on time, Endeca graph was picking up partially created file, causing data issue.  After looking into the issue, the best bet was to have a task dependency, we looked into running clover ETL (more...)

How To retrieve/backup Views In Endeca

Hi All,

Last few weeks I have been engaged with a customer, helping them them with remediation of Endeca project. During remediation, faced a typical challenge, where all the graphs and EQLs were erroring out. After doing some research found out that its a known issue . I spent good amount (more...)