Question

I have a scenario that requires me to export a large mailing list (> 1m customers) to an external email system. I have source code and control over both applications.

I need a mechanism for transferring the data from one system to another that is:

  • Robust
  • Fast
  • Secure

So far, I have set up a standard MVC controller method that responds to a request (over https), performs some custom security checks, and then pulls the data from the DB.

As data is retrieved from the DB, the controller method iterates over the results, and writes response in a plain text format, flushing the response every 100 records or so. The receiver reads each row of the response and performs storeage and processing logic.

I have chosen this method because it does not require persisting user data to a permanent file, and a client built in any language will be able to implement receiver logic without a dependency on any proprietary technology (e.g. WCF).

I am aware of other transport mechanisms that I can use with .NET, but none with an overall advantage, given the requirements listed above.

Any insight into which technologies might be better than my request / response solution?

Was it helpful?

Solution

Two suggestions come to mind, we had something similar to this happen at our company a little while ago (acquired website with over 1 million monthly active users and associated data needed a complete datacenter change, including 180gb db that was still active).

We ended up setting up a pull replication to it over SSH (SQL Server 2005), this is black magic at best and took us about a month to set up properly between research and failed configurations. There are various blog posts about it, but the key parts are:

1) set up a named server alias in SQL Server configuration manager on the subscriber db machine, specifying localhost:1234 (choose a better number).

2) set up putty to make a ssh tunnel between your subscriber's localhost:1234 from step 1 and publish db's port 9876 (again, choose a better number). Also make sure you have ssh server enabled on the publisher. Also keep the port a secret and figure out a secure password for the ssh permissions.

3) add a server alias on publisher for port 9876 for the replicated db.

4) if your data set is small enough, create the publications and try starting up the subscriber using snapshot initialize. If not, you need to create a publication with "initialize from backup" enabled, and restore a partial backup at the subscriber using ftp to transfer the backup file over. This method is much faster than snapshot initialization for larger datasets.

Pros: You don't need to worry about authentication for the sql server, "just" the ssh tunnel. Publication can be easily modified in case you realize you need more columns or schema changes. You save time writing an api that may be only temporary and might have more security issues.

Cons: It's weird, there's not much official documentation and ssh on windows is finicky. If you have a linux based load balancer, it may be easier. There are a lot of steps.

Second suggestion: use ServiceStack and protobuf.NET to create a very quick webservice and expose it over https. If you know how to use ServiceStack, it should be very quick. If you don't, it would take a little time because it operates on a different design philosophy from Web API and WCF. Protobuf.NET is the most compact and fastest serialization/deserialization wire format widely available currently. Links:

ServiceStack.NET

Protobuf.NET

Pros: You can handle security however you like. This is also a downside since you then have to worry about it. It's much better documented. You get to use or learn a great framework that will speed up the rest of your webservice-related projects for the rest of your life (or until something better comes along).

Cons: You have to write it. You have to test it. You have to support it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top