Every data platform has its value, and deciding which one will work best for your big data objectives can be tricky—Alex Gorbachev, Oracle ACE Director, Cloudera Champion of Big Data, and Chief Technology Officer at Pythian, has recorded a series of videos comparing the various big data platforms and presents use cases to help you identify which ones will best suit your needs.
Riak and Oracle are completely different platforms. Alex explains that “Oracle database (more...)
These days Hortonworks with their IPO and Cloudera sitting on $1bn of cash grab all the headlines. However,the real visionary in the field is someone else. Someone blasting the previous world record in TeraSort . A Hadoop distribution on both Amazon Web Services and the Google Compute Engine. A company that Google is invested in. While their competitors have been in skirmishes with each other, MapR has been quietly working away and innovating.
MapR-FS: Features and (more...)
How has the interest in Big Data, Hadoop, Business Intelligence, Analytics and Dashboards changed over the years?
One easy way to gauge the interest is to measure how much news is generated for the related term and Google Trends allows you do that very easily.
After plugging all of the above terms in Google trends and further analysis leads to the following visualizations.
Aggregating the results by year
It is very amazing to see (more...)
In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing.
You believe that you have a big data project?
Do not start with the installation of an Hadoop Cluster -- the "how"
Start to talk to business people to understand their problem -- the "why"
Understand the data you must
Information Technology units will continue to be challenged by the unbridled growth of their organization’s data stores. An ever-increasing amount of data needs to be extracted, cleansed, analyzed and presented to the end user community. Data volumes that were unheard of a year ago are now commonplace. Day-to-day operational systems are now storing such large amounts of data that they rival data warehouses in disk storage and administrative complexity. New trends, products, and strategies, (more...)
Let us move on from Grass Eating Sauropods and talk about who’s who in the analytic space.
For every dime there are dozen analytic companies. Everybody who provides a freaking dashboard is an analytic company. Anybody that merely mentions Google, Facebook, Hadoop etc in the same sentence is somehow into BigData. Haven’t you stumbled across company pages where they claim to be expert in analytics and big data but they want you to schedule a (more...)
Last week I attended Oracle OpenWorld 2014, and it was an outstanding event filled with great people, awesome sessions, and a few outstanding notable experiences.
Personally I thought the messaging behind the conference itself wasn’t as amazing and upbeat as OpenWorld 2013, but that’s almost to be expected. Last year there was a ton of buzz around the introduction of Oracle 12c, Big Data was a buzzword that people were totally excited (more...)
I will give a presentation on 24 September at the Jury’s Inn in Dublin on the next generation of Big Data 2.0 tools and architecture.
Over the last two years there have been significant changes and improvements in the various Big Data frameworks. With the release of Yarn (Hadoop 2.0) the most popular of these platforms now allows you to run mixed workloads. Gone are the days when Hadoop was only good for (more...)
For an organization to respond in real-time it needs to acquire or develop systems
that can respond in real-time. Such systems need to be able to rapidly
determine that a response is required and determine also what the
appropriate and relevant response should be – they need to decide when
and how to act. These kinds of decision-making systems are known as
Decision Management Systems. To ensure that a response is delivered in
real-time, more (more…)
Both ODI and the Hadoop ecosystem share a common design philosophy. Bring the processing to the data rather than the other way around. Sounds logical, doesn’t it? Why move Terabytes of data around your network if you can process it all in the one place. Why invest millions in additional servers and hardware just to transform and process your data?
In the ODI world this approach is known as ELT. ELT is a marketing concept (more...)
Permission issues is one of the key error , while setting up Hadoop Cluster, while debugging some error found below table on http://hadoop.apache.org/ . It’s a good scorecard to keep handy.
Permissions for both HDFS and local fileSystem paths
The following table lists various paths on HDFS and local filesystems (on all nodes) and recommended permissions:
In every change there are hype machines that over play and sages who call doom. Into the Big Data arena steps David Searls to proclaim that Big Data is a myth and simply hype which is set to burst in an article over at ZDNet.
But big data, he said, is nothing more than the myth that collecting vast amounts of data can help companies know customers better than those customers even know
My paper on NoSQL and Big Data won the Editor’s Choice award at ODTUG Kscope14. Here are some key points from the paper: The relational camp made serious mistakes that limited the performance and usefulness of the relational model. NoSQL is based on the incorrect premise that tables in the relational model must be mapped to […]
Oracle Scene (the publication of United Kingdom Oracle Users Group) has published my article "Hadoop for Oracle Professionals", where I have attempted, like many others, to demystify the terms such as Hadoop, Map/Reduce and Flume. If you were interested in Big Data and what all comes with understanding it, you might find it useful.
A PDF version of the article can be downloaded here http://www.proligence.com/art/oracle_scene_summ14_hadoop.pdf
I'm going to state a sacrilegious position for a moment: the quality of data isn't a primary goal in Master Data Management
Now before the perfectly correct 'Garbage In, Garbage Out' statement let me explain. Data Quality is certainly something that MDM can help with but its not actually the primary aim of MDM.
MDM is about enabling collaboration, collaboration is about the cross-reference
There is a massive amount of IT hype that is focused on what people see, its about the agile delivery of interfaces, about reporting, visualisation and interactional models. If you could weight hype then it is quite clear that 95% of all IT is about this area. Its why we need development teams working hand-in-hand with the business, its why animations and visualisation are massively important.
Scoop, Flume, PIG, Zookeeper. Do these mean anything to you? If they do then the odds are you are looking at Hadoop. The thing is that while that was cool a few years ago it really is time to face it that HDFS is a commodity, Map Reduce is interesting but not feasible for most users and the real question is how we turn all that raw data in HDFS into something we can actually (more...)
Hi Fellow Big Data Admirers ,
With big data and analytics playing an influential role helping organizations achieve a competitive advantage, IT managers are advised not to deploy big data in silos but instead to take a holistic approach toward it and define a base reference architecture even before contemplating positioning the necessary tools.
My latest print media article (5th in the series) for CIO magazine (ITNEXT) talks extensively about need of reference architecture in (more...)
Over the last few years there has been a trend of increased spending on BI, and that trend isn't going away. The analyst predictions however have, understandably, been based on the mentality that the choice was between a traditional EDW/DW model or Hadoop. With the new 'Business Data Lake' type of hybrid approach its pretty clear that the shift is underway for all vendors to have a hybrid
As Hive metastore is getting into the center of nervous system for the different type of SQL engines like Shark and Impala. It getting equally difficult to distinguish type of table created in Hive metastore. Eg. if we create a impala table using impala shell you will see the same table on hive prompt and vice versa. See the below example
Step 1 : “Create Table” in Impala Shell and “Show Table” (more...)