Six things to make your Big Data project succeed

So I wrote about why your Hadoop project will fail so I think its only right that I should follow up with some things that you can do to actually make the Big Data project you take on succeed.  The first thing you need to do is stop trying to make 'Big Data' succeed and instead start focusing on how you educate the business on the value of information and then work out how to deliver new (more...)

Schnelle Datensatz-Generierung

Im OTN Forum hat sich dieser Tage ein Thread mit einer recht interessanten Diskussion zum Thema der schnellsten Möglichkeit zur Erzeugung einer großen Mengen von Test-Datensätzen ergeben. Ausgangspunkt war, dass der Fragesteller des Threads mit der Performance eines INSERT mit einer connect by level Generierung von 10 Millionen Datensätzen nicht zufrieden war und nach schnelleren Optionen suchte. David Berger schlug eine PL/SQL-Lösung mit Collections und Bulk Insert vor, die tatsächlich im System des OP zu (more...)

A Restriction of the Cardinality Hint

Here is a restriction of the cardinality hint in conjunction with the materialize-hint ( note: both are undocumented but sometimes of great use ):
we cannot tell the optimizer in the outer query ( the one that uses the materialized subquery ) about the cardinality of the materialization, this can only – and then not always – be done within the materializing query.

The example to show that is stolen from Tom Kyte’s Presentation S13961_Best_Practices_for_Managing_Optimizer_Statistics_Short. (more...)

Issue with updatable views

It’s sometimes amazing, how many bugs there are still with elementary SQL.

Here is one concerning updatable views:

sokrates@12.1 > create table t ( v varchar2(30) );

Table created.

sokrates@12.1 > create view v as
  2  select v as dontdothatman, v as canbelostwheninserted
  3  from t; 

View created.

sokrates@12.1 > insert /* this is fine */ into v 
  2  values('fine', 'fine');

1 row created.

sokrates@12.1 > select * from v;

DONTDOTHATMAN		        (more...)

The Twelve Days of NoSQL: Day Twelve: Concluding Remarks

On the twelfth day of Christmas, my true love gave to me Twelve drummers drumming. The relational camp put productivity, ease-of-use, and logical elegance front and center. However, the mistakes and misconceptions of the relational camp prevent mainstream database management systems from achieving the performance level required by modern applications. For example, Dr. Codd forbade […]

Six reasons your Big Data Hadoop project will fail in 2014

Ok so Hadoop is the bomb, Hadoop is the schizzle, Hadoop is here to solve world hunger and all problems.  Now I've talked before about some of the challenges around Hadoop for enterprises but here are six reasons that Information Week is right when it says that Hadoop projects are going to fail more often than not. 1. Hadoop is a Java thing not a BI thing The first is the most important

The Twelve Days of NoSQL: Day Eleven: Mistakes of the relational camp

On the eleventh day of Christmas, my true love gave to me Eleven pipers piping. Over a lifespan of four and a half decades, the relational camp made a series of strategic mistakes that made NoSQL and Big Data possible. The mistakes started very early. The biggest mistake is enshrined in the first sentence of […]

The Twelve Days of NoSQL: Day Ten: Big Data

On the tenth day of Christmas, my true love gave to me Ten lords a-leaping. The topic of Big Data is often encountered when talking about NoSQL so let’s give it a nod. In 1998, Sergey Brin and Larry Page invented an algorithm for ranking web pages (The Anatomy of a Large-Scale Hypertextual Web Search […]

The Twelve Days of NoSQL: Day Nine: NoSQL Taxonomy

On the ninth day of Christmas, my true love gave to me Nine ladies dancing. NoSQL databases can be classified into the following categories: Key-value stores: The archetype is Amazon Dynamo of which DynamoDB is the commercial successor. Key-value stores basically allow applications to “put” and “get” values but each (more...)

The Twelve Days of NoSQL: Day Seven: Schemaless Design

On the seventh day of Christmas, my true love gave to me Seven swans a-swimming. As we discussed on Day One, NoSQL consists of “disruptive innovations” that are gaining steam and moving upmarket. So far, we have discussed functional segmentation (the pivotal innovation), sharding, asynchronous replication, eventual consistency (resulting from (more...)

The Twelve Days of NoSQL: Day Six: The False Premise of NoSQL

On the sixth day of Christmas, my true love gave to me Six geese a-laying. The final hurdle was extreme performance, and that’s where the Dynamo developers went astray. The Dynamo developers believed that the relational model imposes a “join penalty” and therefore chose to store data as “blobs.” (more...)

The Twelve Days of NoSQL: Day Five: Replication and Eventual Consistency

On the fifth day of Christmas, my true love gave to me Five golden rings. By now, you must be wondering when I’m going to get around to explaining how to create a NoSQL database. When I was a junior programmer, quite early in my career, my friends and I were assigned (more...)

Plötzliche Änderung von Query-Laufzeiten

Jonathan Lewis hat eine interessante Liste veröffentlicht, die mögliche Gründe für eine plötzliche Verlangsamung einer Query, die vorher deutlich schneller ausgeführt wurde, aufführt - der umgekehrte Fall ist natürlich auch denkbar, führt aber seltener zu Beschwerden.

SQL utils using XML

You may have previously seen a short post I did on a SQL statement to identify which statements are using dynamic sampling.

If not, quick recap:

SELECT p.sql_id, t.val
FROM   v$sql_plan p
,      xmltable('for $i in /other_xml/info
                 where $i/@type eq "dynamic_sampling"
                 return $i'
                passing xmltype(p.other_xml)
                columns attr  (more...)

Running PL/SQL Procedures in Parallel

As your data volumes increase, particularly as you evolve into the big data world, you will be start to see that your Oracle Data Mining scoring functions will start to take longer and longer.  To apply an Oracle Data Mining model to new data is a very quick process. (more...)

SQL Plan Management – 12C dumb feature

In Oracle 11G, Oracle introduced SQL Plan Management (SPM). It is excellent (I love it to bits). It allows you to create Baselines against SQL which lock-down the SQL execution plan. No more plan flips. More consistency. Perfect.

Whenever some Baselined SQL is ran, Oracle still parses it and compares (more...)

My history with Big Data

Before I joined Cloudera, I hadn't had much formal experience with Big Data. But I had crossed paths with one of its major use cases before, so I found it easy to pick up the mindset. My previous big project involved a relational database hooked up to a web server. (more...)

Largest Tables Including Indexes and LOBs

Just a quick code snippit. I do a lot of data pumps to move schemas between different databases; for example taking a copy of a schema to an internal database to try to reproduce a problem. Some of these schemas have some very large tables. The large tables aren’t always (more...)

Invisible Not Null Column

Yesterday I attended John King (@royaltwit) session on Oracle 12c for developers. He gave an overview of all the goodies that are available to us as developers. The whole plugging and unplugging of database, though very cool and exiting, is most likely not very relevant to most developers.
When he (more...)

Best Practice in 12c

Since PL/SQL now is closely integrated into SQL, we hence can happily state
sokrates@12.1 > with function bestpractice return varchar2
  2  is
  3  begin
  4     return 'Do not use PL/SQL when it can be done with SQL alone !';
  5  end bestpractice;
  6   (more...)