As Hive metastore is getting into the center of nervous system for the different type of SQL engines like Shark and Impala. It getting equally difficult to distinguish type of table created in Hive metastore. Eg. if we create a impala table using impala shell you will see the same table on hive prompt and vice versa. See the below example
Step 1 : “Create Table” in Impala Shell and “Show Table” (more...)
While building a data flow for replacing one of the EDW’ workflow using Big Data technology stack , came across some interesting findings and issues. Due to UPSERT ( INSERT new records or UPDATE existing records depending) nature of data we had to use Hbase, but to expose the outbound feed we need to do some calculation on HBase and publish that to Hive as external. Even though conceptually , its easy to create an (more...)
While looking into HBase performance issue, one of the suggestion was to have more region for a larger table. There was some confusion around, “Region” vs “RegionServer” . While doing some digging, found a simple text written below.
The basic unit of scalability and load balancing in HBase is called a region. Regions are essentially contiguous ranges of rows stored together. They are dynamically split by the system when they become too large. Alternatively, they may (more...)
With increasing data volume , in HDFS space could be continued challenge. While running into some space related issue, following command came very handy, hence thought of sharing with extended virtual community.
hadoop dfsadmin -report
Post running the command, below is the result, it takes all the nodes in the cluster and gives the detail break-up based on the space availability and spaces used.
Configured Capacity: 13965170479105 (12.70 TB)
Present Capacity: 4208469598208 (more...)
Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.
Click Here To Read Full Article (more...)
While doing a comparison analysis for building a reference architecture for Big Data technology stumbled on a very impressive Open source Big Data Technology mashup . Thanks to http://www.bigdata-startups.com/ . The most impressive part of this mashup is breaking the whole Big Data operational paradigm into multiple stages and giving available opensource technology.
Hope This Helps
Sunil S Ranka
“Superior BI is the antidote to Business Failure”
Recently at one of the client, we had a situation , where in Hadoop was taking lot longer that anticipated time to generate a file. The graph needed the file as an input, but since file was not getting generated on time, Endeca graph was picking up partially created file, causing data issue. After looking into the issue, the best bet was to have a task dependency, we looked into running clover ETL (more...)
Last few weeks I have been engaged with a customer, helping them them with remediation of Endeca project. During remediation, faced a typical challenge, where all the graphs and EQLs were erroring out. After doing some research found out that its a known issue . I spent good amount (more...)
Wishing you all readers!! a very happy new year. 2013 is over and dawn of 2014 has arrived. It just feel like yesterday and now we are here sitting and waiting for the year number to change. By the time I am writting blog, Australia, Mumbai and Dubai (more...)
Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the (more...)