Question

I am using MongoDB where I have a collection named Companies, and it is having documents as following-

    {
        "_id":1,
        "name": "Innovative Software pvt. limited 1",
        "description": "This is a software company"
    }

    {
        "_id":2,
        "name": "Innovative Software pvt. limited 2",
        "description": "This is a software,company with <img src='' class='' alt='company logo' /> symbol"
    }

    {
        "_id":3,
        "name": "Innovative Software pvt. limited 3",
        "description": "This is a software, company with <img src='' class='' alt='company,logo' /> symbol"
    }

    {
        "_id":4,
        "name": "Innovative Software pvt. limited 4",
        "description": "This is a software, company with,<img src='' class='' alt='company, logo' /> symbol"
    }

Now I want a regular expression to find all companies where decsription field satisfy the following condition-

1- No space between a comma and the letter/number/image that follows it.
2- It shouldn't include content writte inside img tags.

So in my case I want the output of following documents-
_id:2("description": "This is a software,company with...,

_id:4("description": "This is a software, company with,<..

I want a query something like-

db.Companies.find({description:{$regex:'regular expression'}})

Can it be achieved in the query itself, or I need to write the logic inside code itself. I am using pymongo.

Was it helpful?

Solution

Try this regex:

/"description"\s*:\s*"(?:[^<'"]|\\")+?,(?=[<a-z]).+"/gi

Description

Regular expression visualization

Demo

http://regex101.com/r/bN3uY7

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top