문제

I've read a few articles suggesting that Hadoop is only really designed to work on a cluster at a single physical location, not for a number of widely distributed nodes (e.g. running a distributed cluster over the internet from multiple sites).

Does anyone have any real experience trying to use Hadoop across mutliple sites? What kind of issues will I run into? Or am I better to just go with a different framework (e.g. BOINC).

도움이 되었습니까?

해결책

If there's any difference between executing on a set of relatively local nodes vs on a set of widely distributed nodes it would be in the increased time required to move large amounts of data back and forth between nodes. If you have a problem that involves crunching, aggregating and joining large amounts of data then you will necessarily be sending large amounts of data between your nodes. That means that no matter what platform you choose (hadoop, storm, etc) you will have to deal with this issue. BOINC or some other volunteer-based system may be cheaper, but your implementation will still be hit with high data transfer costs. Furthermore, you'll likely introduce node heterogeneity into the mix which will make your implementation even more interesting to develop and debug.

And by the way, hadoop and BOINC are two very different animals solving very different problems.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top