Question

Apparently the reason for the BigTable architecture has to do with the difficulty scaling relational databases when you're dealing with the massive number of servers that Google has to deal with.

But technically speaking what exactly makes it difficult for relational databases to scale?

In the enterprise data centers of large corporations they seem to be able to do this successfully so I'm wondering why it's not possible to simply do this at a greater order of magnitude in order for it to scale on Google's servers.

Was it helpful?

Solution

In addition to Mitch's answer, there's another facet: Webapps are generally poorly suited to relational databases. Relational databases put emphasis on normalization - essentially, making writes easier, but reads harder (in terms of work done, not necessarially for you). This works very well for OLAP, ad-hoc query type situations, but not so well for webapps, which are generally massively weighted in favor of reads over writes.

The strategy taken by non-relational databases such as Bigtable is the reverse: denormalize, to make reads much easier, at the cost of making writes more expensive.

OTHER TIPS

When you perform a query that involves relationships which are physically distributed, you have to pull that data for each relationship into a central place. That obviously won't scale well for large volumes of data.

A well set-up RDBMS server will perform the majority of it's queries on hot-pages in RAM, with little physical disk or network I/O.

If you are constrained by network I/O, then the benefits of relational data become lessened.

The main reason as stated is physical location and network IO. Additionally, even large corporations deal with a fraction of the data that search engines deal with.

Think about the index on a standard database, maybe a few feilds... search engines need fast text search, on large text fields.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top