Question

I want to develop one multimedia system, the system need to save millions videos and images, so I want to select a distributed storage subsystem. who can give me some suggestion ? thanks!

Was it helpful?

Solution

@yi_H

You can configure your writes to be first replicated to multiple nodes before it return to the client. Now whether or not that is needed is of course unto the use case. And definitely involves a performance hit. So if you are implementing a write heavy analytical database, it will have a significant impact on write throughput.

All other points you make about the question in terms of lack of requirements etc, I second that.

Having replicated file system with metadata in a nosql database is a very common way of doing things. @why did you consider this kinda approach?

Have you taken a look at Mongodb gridfs? I have never used it, but it is something I would take a look at to see if it gives you any ideas.

OTHER TIPS

I guess that best option for the 'millions videos and images' is content distribution/delivery network (CDN):

CDN is a server setup which allows for faster, more efficient delivery of your media files. It does this by maintaining copies of your media at different points of presence (POPs) along a global network to ensure quick client access and the fastest delivery possible

If you will use CDN you no need care about many problems(distribution, fast access). Integration with CDN also should be very simple.

Yo gave us (near) zero information about what your requirements are. Eg:

  • Do you want atomic transactions?
  • Is the system read or write heavy?
  • Do you need fast queries or want to batch-process the data set?
  • How big are the videos?
  • Do you want to distribute data locally (on a LAN) or spanning multiple data centers / continents?

How are we supposed to pick the right tool if we don't know what it needs to support?

Without any knowledge of the system I would advise using some kind of FS replication for the videos and images and then storing the metadata associated with the items either in MongoDB, MySQL Master-Master or MySQL Cluster.

Distributed related to what?

If you are talking of replication to distribute:

MongoDb only restricted to Master-Slave replication, so only one node is able to read/write which leaves you with a single point of failure for a really distributed system. CouchDB is able to peer-to-peer replicate.

Find a very good comparison here and here also compared with hbase.

With CouchDB you also have to be aware that you are going to talk http to the database and have build in webservices.

Regards, Chris

An alternative is to use MongoDB's GridFS, serving as a (very easily manageable) redundant and distributed filesystem.

Some will say that it's slow on reads, (and it is, mostly because of the nature of its design) but that doesn't have to mean it's a dealbreaker for your system in whole, because if you need performance later on, you could always put Varnish or Squid in front of the filesystem tier.

For all I know, Squid also supports on-disk cache for all the less-hot files.

Sources:

http://www.mongodb.org/display/DOCS/GridFS

http://www.squid-cache.org/Doc/config/cache_dir/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top