As Hive metastore is getting into the center of nervous system for the different type of SQL engines like Shark and Impala. It getting equally difficult to distinguish type of table created in Hive metastore. Eg. if we create a impala table using impala shell you will see the same table on hive prompt and vice versa. See the below example
Step 1 : “Create Table” in Impala Shell and “Show Table” (more...)
While building a data flow for replacing one of the EDW’ workflow using Big Data technology stack , came across some interesting findings and issues. Due to UPSERT ( INSERT new records or UPDATE existing records depending) nature of data we had to use Hbase, but to expose the outbound feed we need to do some calculation on HBase and publish that to Hive as external. Even though conceptually , its easy to create an (more...)
While looking into HBase performance issue, one of the suggestion was to have more region for a larger table. There was some confusion around, “Region” vs “RegionServer” . While doing some digging, found a simple text written below.
The basic unit of scalability and load balancing in HBase is called a region. Regions are essentially contiguous ranges of rows stored together. They are dynamically split by the system when they become too large. Alternatively, they may (more...)
With increasing data volume , in HDFS space could be continued challenge. While running into some space related issue, following command came very handy, hence thought of sharing with extended virtual community.
hadoop dfsadmin -report
Post running the command, below is the result, it takes all the nodes in the cluster and gives the detail break-up based on the space availability and spaces used.
Configured Capacity: 13965170479105 (12.70 TB)
Present Capacity: 4208469598208 (more...)
"Real-time" its a word that gets thrown about a lot in IT and its worth documenting a few of the different ways it gets used
This is what Real-time Java was created to address (along with Soft Real-time) what is this? Easiest way to say it is that often in Hard Real-time environments the following statement is true
If it doesn't finish in X milliseconds then people might die
There are various views going around on what a Data Scientist is and what their value is to an organisation and the salaries they command. To me however asking 'what is a Data Scientist?' is like asking 'What is a Physicist?' sure 'someone who studies Physics' might be a factually accurate but pointless definition. How does that separate someone who did Physics in High School from Albert
One of the things that always stuns me in IT is how people don't appear to like change. Whether it was the EAI folks pushing back on Web Services in 2000 in favour of their old-school approaches. The package guys pushing back against SaaS or now the BI guys pushing back against the new wave of BI technologies and approaches the message is always the same:
We are happy doing what we are doing,
I can smell a change coming, the last few years have seen cloud and SaaS on the rise and seen a fragmentation in application development (thanks in a large part to the appalling stewardship of Java) and a real focus of budgets around BI and 'vanilla' package approaches. Now this is a good thing, both because I jumped out of the Java boat onto the BI boat a few years ago but also because its
The end of the next Software Development wave will be when Software development against 'eats itself' as it did with with technologies like Hadoop showing a new value in information, with platforms like SFDC showing new pre-build services, where people like GoodData have turned BI into SaaS. So we will see the same evolution again and a new generation of commoditisation which drives
Is your data science providing you enough indications that challenge your existing compensation strategy? Does it reveal that the art of compensation distribution performed by your managers is not in accordance with your compensation strategy? Old habits die-hard, so you need to make sure that your plan for data-driven decision-making is not getting overridden by compensation managers’ belief system and they are not ignoring data science recommendations.
Even today challenge is to effectively distribute (more...)
Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.
Click Here To Read Full Article (more...)
I’m wearing a Nike Fuelband – one of those fitness/activity tracker gizmos. Nike is offering both a website and an app showing my daily activity. As a customer, I am expecting these two to contain the same data. After all, my bank balance is the same in my mobile banking app, in an ATM or in a web browser.
Unfortunately, Nike does not have a proper infrastructure behind their gadget, so the numbers do not (more...)
There is an interesting article on Forbes where Paul Sonderegger from Oracle is making the case that you have to jump onto the “Big Data” bandwagon without delay if you want to avoid your big-data-using competitors crushing you.
But he would say that, wouldn’t he?
In reality, most companies already (more...)
The Oracle guys running the Big Data 4 the Enterprise Meetup
are always apologetic about marketing. The novelty is quite amusing. They do this because most Big Data Meetups are full of brash young people from small start-ups who use cool open source software. They choose cool open source software (more...)
On his recent Forbes report, Greg Satell lays down 5 steps to get Big Data working in your business. The first four are very well captured, but it was the fifth that really caught my attention: “Adopt a Big Data Mindset“. This is exactly where I want to drill in (more...)
Yesterday's UKOUG Analytics event
was a mixture of presentations about OBIEE with sessions on the frontiers of data analysis. I'm not going to cover everything, just dipping into a few things which struck me during the day
During the day somebody described dashboards as "Fisher Price activity centres for managers". (more...)
This year’s R User Conference happened in Albacete (Spain), gathering R professionals and enthusiasts all over the world since 2004, when it first began in Vienna. The sponsors this year were REvolution analytics, Google, R-Studio, Oracle, and TIBCO. Other companies like OpenAnalytics and Mango Solutions were also present with a booth stand. Besides sponsoring the (more...)
Big Data – The New Information Before asking the crystal ball what can Big Data do for you, sit back and think about these four questions: Where’s the new information? Where could it be? If it was in the right place, what could happen? (Challenges of the main industries) What are the (more...)