You are currently browsing the archives for the Database category.
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
| « Jun | ||||||
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 | ||||||
March 26, 2008 by Kevin.Haas.
I was talking with a colleague of mine the other day in the marketing database services business, and he was noting that there was an increasing interest in column-oriented databases for rapid database querying. This technology has been around for many years, and marketing database providers have used it for a while. Experian had a technology they acquired called Analytix that, for a time, was the dominant rapid query engine for marketing use. It blew the doors off of traditional technologies and was generally loved by users. It delivered quick counts from fairly complex questions.
As with anything, innovation continues, and other solutions for the same problem popped up. Bitmap indexes in commercial databases became the norm. Other concepts, like clustered databases, are solving the same problem. Greenplum is one of these very viable clustered databases, based on open source Postgres. Additionally, as noted earlier, new column-oriented databases with a more open standard are gaining attention. There are commercial players like Alterian, but also a new class of vendors, like Infobright. Also, look at Vertica, who is commercializing open source projects like C-Store. And there are other open source column databases like Lucid and Monet that are getting attention.
There are many approaches here to solve performance constraints in “non-traditional” database ways. And while many of these have lightning fast performance in getting data out, the biggest pain is in getting data in. These can all be very slow when loading data. In many cases, they rely on bulk loaders that handle data in massive chunks, rather than individual rows. I tried one column-oriented database on my laptop, with a load speed (granted, out of the box, no tuning) of something in the neighborhood of a dismal 2 rows a second, versus hundreds or thousands of rows a second for traditional databases.
Certainly, you can get the speed up by tuning these and applying the right kind of hardware to this, but if these don’t work well out of the box with common techniques, adoption of these kind of technologies will stifle. We’ve worked with some of these vendors and we can overcome many of the gaps, but it can certainly be a little tricky.
There’s a lot of promise in each of these areas. And with the right attention, I’m optimistic that the open source versions of these databases can be as successful as other open source database projects, like MySQL.
Posted in Database | No Comments »