As Hive metastore is getting into the center of nervous system for the different type of SQL engines like Shark and Impala. It getting equally difficult to distinguish type of table created in Hive metastore. Eg. if we create a impala table using impala shell you will see the same table on hive prompt and vice versa. See the below example
Step 1 : “Create Table” in Impala Shell and “Show Table” (more...)
Oracle recently published their view on the; Top Ten Trends “Big Data” & “Analytics“ for 2014. Find the list below: 1. Business Users Get Hooked on Mobile Analytics –> Oracle Business Intelligence Mobile App Designer 2. Analytics Take to the Cloud –> Oracle Applications Cloud 3. Hadoop-Based Data Reservoirs Unite with Data Warehouses –> Your Data Warehouse and Hadoop – […]
While building a data flow for replacing one of the EDW’ workflow using Big Data technology stack , came across some interesting findings and issues. Due to UPSERT ( INSERT new records or UPDATE existing records depending) nature of data we had to use Hbase, but to expose the outbound feed we need to do some calculation on HBase and publish that to Hive as external. Even though conceptually , its easy to create an (more...)
While looking into HBase performance issue, one of the suggestion was to have more region for a larger table. There was some confusion around, “Region” vs “RegionServer” . While doing some digging, found a simple text written below.
The basic unit of scalability and load balancing in HBase is called a region. Regions are essentially contiguous ranges of rows stored together. They are dynamically split by the system when they become too large. Alternatively, they may (more...)
With increasing data volume , in HDFS space could be continued challenge. While running into some space related issue, following command came very handy, hence thought of sharing with extended virtual community.
hadoop dfsadmin -report
Post running the command, below is the result, it takes all the nodes in the cluster and gives the detail break-up based on the space availability and spaces used.
"Real-time" its a word that gets thrown about a lot in IT and its worth documenting a few of the different ways it gets used
This is what Real-time Java was created to address (along with Soft Real-time) what is this? Easiest way to say it is that often in Hard Real-time environments the following statement is true
If it doesn't finish in X milliseconds then people might die
The Big Data presentation I gave yesterday is now available for download. In this presentation I define some common features of Big Data use cases, explain what the big deal about Big Data is all about and explore the impact of Big Data on the traditional data warehouse framework.
There are various views going around on what a Data Scientist is and what their value is to an organisation and the salaries they command. To me however asking 'what is a Data Scientist?' is like asking 'What is a Physicist?' sure 'someone who studies Physics' might be a factually accurate but pointless definition. How does that separate someone who did Physics in High School from Albert
One of the things that always stuns me in IT is how people don't appear to like change. Whether it was the EAI folks pushing back on Web Services in 2000 in favour of their old-school approaches. The package guys pushing back against SaaS or now the BI guys pushing back against the new wave of BI technologies and approaches the message is always the same:
We are happy doing what we are doing,
I can smell a change coming, the last few years have seen cloud and SaaS on the rise and seen a fragmentation in application development (thanks in a large part to the appalling stewardship of Java) and a real focus of budgets around BI and 'vanilla' package approaches. Now this is a good thing, both because I jumped out of the Java boat onto the BI boat a few years ago but also because its
The end of the next Software Development wave will be when Software development against 'eats itself' as it did with with technologies like Hadoop showing a new value in information, with platforms like SFDC showing new pre-build services, where people like GoodData have turned BI into SaaS. So we will see the same evolution again and a new generation of commoditisation which drives
This is the stage at which software development begins to commoditise itself, its no surprise that underneath all that Salesforce.com scripting lurked rather a lot of Java code. This wave sees the rise of the libraries, the utilities and above all the commoditisation of software in a way that enables the majority of developers to be useful in the enterprise. This was the goal of Spring, JEE
The problem with Wave 1 was that it didn't scale, I mean sure lots of the personal developers claimed it did scale, often laughing at large scale developments and going 'Me and four mates could do that in a couple of weeks' often they attempted to do that and suddenly realised that when you get a few people together it gets a bit more complicated and when that few gets over 20 it begins to (more...)
This is the wave we are in at the moment and its the wave that we last saw in the late 90s, this is where technologies enabled single people to build small specific things really quickly. Java and its applets really were the peak of this first wave back then but now we are seeing people use technologies such as R, Python and others to create small solutions that offer really good point value.
Is your data science providing you enough indications that challenge your existing compensation strategy? Does it reveal that the art of compensation distribution performed by your managers is not in accordance with your compensation strategy? Old habits die-hard, so you need to make sure that your plan for data-driven decision-making is not getting overridden by compensation managers’ belief system and they are not ignoring data science recommendations.
Even today challenge is to effectively distribute (more...)
What’s the Big Deal about Big Data? Hear me speak at OUG Ireland. 11 March 2014. Convention Centre Dublin.
So what’s the Big Deal about Big Data? Oil has fueled the Industrial Revolution. Data will fuel the Information Revolution.
Not convinced? Did you know that Amazon has recently patented a technology based on a Big Data algorithm that will start the shipping process before you have completed your order. That’s right. Amazon knows that you (more...)
There continues to be a disproportionate amount of hype around 'NoSQL' data stores. By disproportionate I mean 'completely and utterly out of scale with the actual problems of the vast majority of companies'. I wrote before about 'how NoSQL became more SQL'. The point I made there is now more apparent the more I work with companies on Big Data challenges.
There are three worlds of data
Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.
Which came first Big Data or Fast Data? If you go from a hype perspective you'd be thinking Hadoop and Big Data are the first with in-memory and fast coming after it. The reality though is the other way around and comes from a simple question:
Where do you think all that Big Data came from?
When you look around at the massive Big Data sources out there, Facebook, Twitter, sensor data,