Question

I am attempting to create a search in pymongo using REGEX. After the match, I want the data to be appended to a list in the module. I thought that I had everything set, but no matter what I set for the REGEX it returns 0 results. The code is below:

REGEX = '.*\.com'

def myModule(self, data)
    #after importing everything and setting up the collection function in the DB I call the following:
    cursor = collection.find({'multiple.layers.of.data' : REGEX})
    data = []
    for x in cursor:
        matches.append(x)
    return matches

This is but one module of three I am using to filter through a huge amount of json files that have been stored in a mongodb. However, no matter how many times I change this formatting such as /.*.com/ to declare in the operation or using the $regex in mongo...it never finds my data and appends it in the list.

EDIT: Adding in the full code along with what I am trying to identify:

RegEx = '.*\.com' #Or RegEx = re.compile('.*\.com')

def filterData(self, data):
       db = self.client[self.dbName]
       collection = db[self.collectionName]
       cursor = collection.find({'data.item11.sub.level3': {'$regex': RegEx}})
       data = []
       for x in cursor:
           data.append(x)
       return data

I am attempting to parse through JSON data in a mongodb. The data is structured like so:

"data": {
    "0": {
        "item1": "something",
        "item2": 0,
        "item3": 000,
        "item4": 000000000,
        "item5": 000000000,
        "item6": "0000",
        "item7": 00,
        "item8": "0000",
        "item9": 00,
        "item10": "useful",
        "item11": {
            "0000": {
                "sub": {
                    "level": "letter",
                    "level1": 0000,
                    "level2": 0000000000,
                    "level3": "domain.com"
                },
                "more_data": "words"
            }
        }
    }

UPDATE: After further testing it appears as though I need to include all of the layers in the search. Thus, it should look like

collection.find({'data.0.item11.0000.sub.level3': {'$regex': RegEx}}).

However, the "0" can be 1 - 50 and the "0000" is randomly generated. Is there a way to set these to index's as variables so that it will step into it no matter what the value? It will always be a number value.

Was it helpful?

Solution

Well, you need to tell mongodb the string should be treated as a regular expression, using the $regex operator:

cursor = collection.find({'multiple.layers.of.data' : {'$regex': REGEX}})

I think simply replacing REGEX = '.*\.com' with import re; REGEX = re.compile('.*\.com') might also work, but I'm not sure (would rely on a specific handling in the pymongo driver).


EDIT:

Regarding the wildcard part of the question: The answer is no.

In a nutshell, values that unknown should never be assigned as keys because it makes querying very inefficient. There are no 'wild card' queries.

It is better to restructure the database such that values that are unknown are not keys

See:

MongoDB wildcard in the key of a query

http://groups.google.com/group/mongodb-user/browse_thread/thread/32b00d38d50bd858

https://groups.google.com/forum/#!topic/mongodb-user/TnAQMe-5ZGs

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top