Question

I'm trying to figure out how to write an IQueryable data source that can pull and combine data from multiple sources (in this case Azure Table, Azure Blobs, and ElasticSearch). I'm really having a hard time figuring out where to start with this though.

The idea is that a web service (in this case an Asp.Net Web Api) can present a queryable, OData interface, but when it gets queried it pulls data from multiple sources depending on what is requested. So large queries might hit the indexing service (ElasticSearch) which wouldn't necessarily have the full object available, but calls to get an individual object would go directly to the Azure Tables. But from the service users perspective it's always just accessing the same data source.

While I would like to just use the index as our search service and the tables as our backup, I have a design requirement that it has to pull data from multiple sources, which greatly complicates this whole thing.

I'm wondering if anyone has any guidance on this or can point me towards the right technologies. Some of the big issues I'm seeing are:

  • the backend objects aren't necessarily the same as the front end object being queried. Multiple back end objects may get combined into a single front end one, or it may have computed values. So a LINQ query would have to be translated or mapped
  • changing data sources based on query parameters

Here is a quick overview of the technology I'm working with:

  • ASP.Net Web API 2 web service running as an Azure Cloud service
  • ElasticSearch running on SUSE VMs (on Azure)
  • Azure Tables
  • Azure Blobs
Was it helpful?

Solution

First, you need to separate the data access from the Web API project. The Web API project is merely an interface, so remove it from the equation. The solution to the problem should be the same regardless of whether it is web API or an ASP.NET web page, an MVC solution, a WPF desktop application, etc.

You can then focus on the data problem. What you need is some form of "router" to determine the data source based on the parameters that make the decision. In this case, you are talking about 1 item = azure and more than 1 item - and map reduce when more than 1 item (I would set up the rules as a strategy or similar so you can swap out if you find 1 versus 2+ is not a good condition to change routing).

Then you solve the data access problem for each methodology.

The system as a whole.

  1. User asks for data (user can be a real person or another system through the web api)
  2. Query is partially parsed to determine routing path
  3. Router sends data request to proper class that handles data access for the route
  4. Data is returned
  5. Data is routed back to the user via whatever User interface is used (in this case Web API - see paragraph 1 for other options)

One caution. Don't try to mix all types of persistence, as a generic "I can pull data or a blob or a {name your favorite other persistant storage here}" often ends up becoming a garbage can.

OTHER TIPS

This post has been out a while. The 2nd / last paragraph is close, yet still restricted... Even a couple years ago, this architecture is common place.

Whether a WPF or ASP.NET or Java, or whatever the core interface is written in - the critical path is the result set based on a query for information. High-level, but sharing more than I should because of other details of a project I've been part of for several years.

Develop your core interface. We did a complete shell that replaced Windows/Linux entirely.

Develop a solution architecture wherein Providers are the source component. Providers are the publishing component.

Now - regardless of your query 'source' - it's just another Provider. The interfacing to that Provider - is abstract and consistent - regardless of the Provider::SourceAPI/ProviderSourceAPI::Interface

When the user wants to query for anything... literally anything... Criminal background checks.... Just hit Google... Query these specific public libraries in SW somewhere USA/Anywhere USA - for activity on checkouts or checkins - it's really relevant. Step back - and consider the objective. No solution is too small, and guaranteed - too large for this - abstract the objectives of the solution - and code them.

All queries - regardless of what is being searhed for - are simply queries.

All responses - regardless of the response/result-set - are results - the ResultantProviderModel / ResultantProviderController (no, I'm not referencing MVC specifically).

I cannot code you a literal example here.. but I hope I challenge you to consider the approach and solution much more abstract and open than what I've read here. The physical implementation should be much more simplified and VERY abstract form a specific technology stack. The searched source? MUST be abstract - and use a Provider Architecture to implement. So - if I have a tool my desktop or basically office workers use - they query for something... What has John Doe written on physics???

In a corporation leveraging SharePoint and FAST Search? This is easy, out of the box stuff...

For a custom user interfacing component - well - they you have the backend plumbing to resolve. So - abstract each piece/layer - from an architecture approach. Pseudo code it out - however you choose to do that. Most important is that you do not get locked into a mindset locked into a specific development paradigm, language, IDE, or whatever. If you can design the solution abstract and walk it through is pseudo code - and do this for each abstraction layer... Then start coding it. The source is relative... The publishing aspect - is relative - consistent.

I do not know if you'll grasp this - but perhaps someone will - and it'll prove helpful.

HTH's...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top