Try this regex:
/"description"\s*:\s*"(?:[^<'"]|\\")+?,(?=[<a-z]).+"/gi
Question
I am using MongoDB where I have a collection named Companies, and it is having documents as following-
{
"_id":1,
"name": "Innovative Software pvt. limited 1",
"description": "This is a software company"
}
{
"_id":2,
"name": "Innovative Software pvt. limited 2",
"description": "This is a software,company with <img src='' class='' alt='company logo' /> symbol"
}
{
"_id":3,
"name": "Innovative Software pvt. limited 3",
"description": "This is a software, company with <img src='' class='' alt='company,logo' /> symbol"
}
{
"_id":4,
"name": "Innovative Software pvt. limited 4",
"description": "This is a software, company with,<img src='' class='' alt='company, logo' /> symbol"
}
Now I want a regular expression to find all companies where decsription field satisfy the following condition-
1- No space between a comma and the letter/number/image that follows it.
2- It shouldn't include content writte inside img tags.
So in my case I want the output of following documents-
_id:2("description": "This is a software,company with...,
_id:4("description": "This is a software, company with,<..
I want a query something like-
db.Companies.find({description:{$regex:'regular expression'}})
Can it be achieved in the query itself, or I need to write the logic inside code itself. I am using pymongo.
Solution
Try this regex:
/"description"\s*:\s*"(?:[^<'"]|\\")+?,(?=[<a-z]).+"/gi