سؤال

I'm currently struggling at a complex URL handling concept question. The application have a product property database table/collection with all the different product types (i.e. categories, colors, manufacturers, materials, etc.).

{_id:1,alias:"mercedes-benz",type:"brand"},
{_id:2,alias:"suv-cars",type:"category"},
{_id:3,alias:"cars",type:"category"},
{_‌​id:4,alias:"toyota",type:"manufacturer"},
{_id:5,alias:"red",type:"color"},
{_id:6,alias:"yellow",type:"color"},
{_id:7,alias:"bmw",type:"manufacturer"},
{_id:8,alias:"leather",type:"material"}
...

Now the mission is to handle URL requests in the style below in every(!) possible order to retrieve the included product properties. The only allowed character is the dash (settled SEO requirement, some properties also can include dashes by themselve - i think also an important point - i.e. the category "suv-cars" or the manufacturer "mercedes-benz"):

http:\\www.example.com\{category}-{color}-{manufacturer}-{material}
http:\\www.example.com\{color}-{manufacturer}
http:\\www.example.com\{color}-{category}-{material}-{manufacturer}
http:\\www.example.com\{category}-{color}-nonexistingproperty-{manufacturer}
http:\\www.example.com\{color}-{category}-{manufacturer}
http:\\www.example.com\{manufacturer}
http:\\www.example.com\{manufacturer}-{category}-{color}-{material}
http:\\www.example.com\{category}
http:\\www.example.com\{manufacturer}-nonexistingproperty-{category}-{color}-{material}
http:\\www.example.com\{color}-crap-{manufacturer}
...

...so: every order of the properties should be allowed! The result have to be the information about the used properties per URL-Request (BTW yes, the duplicate content will be fixed by redirects and a predefined schema). The "nonexistingproperties"/"crap" are possible and just should be ignored.

UPDATE:

Idea 1: One way i'm thinking about the question is to split the query string by dashes and analyze them value by value, the problem: At the two or three or more word combinations at some properties there are too many different combinations and variations so a loooot of queries which kills this idea i think..

Idea 2: The other way is to build a (in my opinion) too large Alias/URL-Table with all of the different combinations, but i think that's just an ugly workaround. There are about 15.000 of different properties so the count of the aliases in the different sort orders is killing this idea.

Idea 3: It's your turn! Thanks for your mind and your time.

هل كانت مفيدة؟

المحلول

While your question is a bit broad, below are some ideas. There isn't a single awesome answer unless you find a free or commercial engine for this that works exactly the way you want.

The way I thought about your problem was to consider the URL as a list of keywords.

  • use Lucene as a keyword/tag system. It's good at the types of searches you suggest you want, including phrases, stems, etc.
  • store and index the data in DB of choice, but pull the keywords into memory and build a bit index of all keywords vs items. Iterate through the keyword table producing weighted results. If order of keywords matters, you'll also need make a pass through the result set to weight based on word order. These types of searches always need to cap their result set quickly in order to return results quickly.
  • cache the results like crazy from working matches, and give precedence to results that users seem to click on the most for a given URL.
  • attack the database by using tag indexes in MongoDB. You'd still need to merge and weight results. Very intensive and not likely a good use of DB resources.
  • read some of the academic papers on keyword searches. It's a popular topic.
  • build a table of words that have dashes in them, and normalize/convert those before running your queries
  • always check for full exact matches first

نصائح أخرى

The only way this may work, if you restrict all property values to be unique. So, you make a set of categories+colors+manufacturers, etc. All values have to be unique. This will allow you to find to what property the value belongs. The data structure for this should be fairly simple:

{_id:ValueOfTheProperty, Property:TypeOfProperty}

Here are some possible samples:

{ _id: Red, Property: Color }
{ _id: Green, Property: Color }
{ _id: Boots, Property: Category }
{ _id: Shoes, Property: Category }
...

This way, the order does not matter, and you are able to convert them in a single pass to a map:

{ Color: Red, Category: Boots }

Though, I predict some problems with ambigous names here.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top