سؤال

I've written a function for Django which allows a user to enter a word or phrase, and fetch all the instances in a specified model where that instance has all those words appear in any order across a range of specified fields. I have chosen to use the objects.raw method and write custom SQL for this as there's problems using the Django Q object to build the correct query.

def fuzzy_search(objmodel,columns,q='',limit=None,offset=0):
    """
        TEMPORARY PATCH version for fuzzy_search, gets around a native Django bug.
    """ 
    if len(q)<3:
        return [] #No results until you reach 3 chars
    words = q.strip().split(" ")
    #Get model table name:
    print "All results: %s" % objmodel.objects.all() 
    db_table = objmodel._meta.db_table
    print("DB_table = %s" % db_table)
    #Construct fields into list of kwarguments!
    sql = "SELECT * FROM %s" % (db_table,)
    userparams = []
    whereands = []
    #Construct the SQL as 
    for word in words:
        if len(word)<2:
            continue #Ignore computationally expensive single char strings
        whereors = []
        for col in columns:
            whereors.append('`%s`.`%s` LIKE "%s##*##"' % (db_table,col,"##P##"))    #STARTSWITH word... The third param will be converted via injection proof means 
            whereors.append('`%s`.`%s` LIKE "(%s##*##"' % (db_table,col,"##P##")) #STARTSWITH (word... The third param will be converted via injection proof means
            whereors.append('`%s`.`%s` LIKE "##*## %s##*##"' % (db_table,col,"##P##"))  #CONTAINS word... The third param will be converted via injection proof means 
            whereors.append('`%s`.`%s` LIKE "##*## (%s##*##"' % (db_table,col,"##P##")) #CONTAINS (word... The third param will be converted via injection proof means
        if whereors not in boolfalse:
            whereorstr= "(" + " OR ".join(whereors) + ")"
            for i in range(0,len(whereors)):
                userparams.append(word) #Need to match the number of supplied params to the number of clauses
            whereands.append(whereorstr)    #Build into an SQL string
        else:
            continue
    #Build the final sql:
    results = []
    if whereands not in boolfalse:
        sql+= " WHERE " + " AND ".join(whereands)
        sql = sql.replace("##P##","%s") #Necessary to get around %s persistence restriction
        sql = sql.replace("##*##","%%") #Makes everything a bit clearer!
        #For big datasets, it is more efficient to apply LIMITS and OFFSETS at SQL level:
        if limit:
            sql+= " LIMIT %s" % int(limit)  #This is injection proof as only ints are accepted
        if offset:
            sql+= " OFFSET %s" % int(offset)    #This is injection proof as only ints are accepted  
        #Perform the raw query, but with params carefully passed in via SQLi escaped method:
        ### objects.raw method ###
        resultsqset = objmodel.objects.raw(sql,userparams)
        print("Fuzzy SQL: \n%s\n" % resultsqset.query.__str__())    #View SQL
        results = list(resultsqset)
        print("Results: %s" % results)
        ### direct cursor method ###
        #cursor = connection.cursor()
        #cursor.execute(sql,userparams)
        #results = dictfetchall(cursor) #Ensures the results are fetched as a dict of fieldname => value
        return results
    return results

This function is being called like this:

from modules.documents.models import Data_icd10_en
results = fuzzy_search(Data_icd10_en,["code","long_label"],"diab mel",30)

The model is:

class Data_icd10_en(models.Model):
    code = models.CharField(max_length=10)
    short_label = models.CharField(max_length=100)
    long_label = models.CharField(max_length=100)

When I call the function, I can see the actual SQL dump in the console:

print("Fuzzy SQL: \n%s\n" % resultsqset.query.__str__())    #View SQL
Fuzzy SQL: 
<RawQuery: u'SELECT * FROM documents_data_icd10_en WHERE (`documents_data_icd10_en`.`code` LIKE "diabetes%" OR `documents_data_icd10_en`.`code` LIKE "(diabetes%" OR `documents_data_icd10_en`.`code` LIKE "% diabetes%" OR `documents_data_icd10_en`.`code` LIKE "% (diabetes%" OR `documents_data_icd10_en`.`long_label` LIKE "diabetes%" OR `documents_data_icd10_en`.`long_label` LIKE "(diabetes%" OR `documents_data_icd10_en`.`long_label` LIKE "% diabetes%" OR `documents_data_icd10_en`.`long_label` LIKE "% (diabetes%") AND (`documents_data_icd10_en`.`code` LIKE "mell%" OR `documents_data_icd10_en`.`code` LIKE "(mell%" OR `documents_data_icd10_en`.`code` LIKE "% mell%" OR `documents_data_icd10_en`.`code` LIKE "% (mell%" OR `documents_data_icd10_en`.`long_label` LIKE "mell%" OR `documents_data_icd10_en`.`long_label` LIKE "(mell%" OR `documents_data_icd10_en`.`long_label` LIKE "% mell%" OR `documents_data_icd10_en`.`long_label` LIKE "% (mell%") LIMIT 30'>

If I copy and paste this SQL directly into the database backend (MySQL), the correct results are returned (30 rows of variants of the diagnosis "Diabetes Mellitus"). However, the python function itself fails to return anything (results is just an empty list). I have tried print(resultsqset), and this simply reveals this RawQuerySet:

Results: <RawQuerySet: u'SELECT * FROM documents_data_icd10_en WHERE (`documents_data_icd10_en`.`code` LIKE "diab%" OR `documents_data_icd10_en`.`code` LIKE "(diab%" OR `documents_data_icd10_en`.`code` LIKE "% diab%" OR `documents_data_icd10_en`.`code` LIKE "% (diab%" OR `documents_data_icd10_en`.`long_label` LIKE "diab%" OR `documents_data_icd10_en`.`long_label` LIKE "(diab%" OR `documents_data_icd10_en`.`long_label` LIKE "% diab%" OR `documents_data_icd10_en`.`long_label` LIKE "% (diab%") AND (`documents_data_icd10_en`.`code` LIKE "mel%" OR `documents_data_icd10_en`.`code` LIKE "(mel%" OR `documents_data_icd10_en`.`code` LIKE "% mel%" OR `documents_data_icd10_en`.`code` LIKE "% (mel%" OR `documents_data_icd10_en`.`long_label` LIKE "mel%" OR `documents_data_icd10_en`.`long_label` LIKE "(mel%" OR `documents_data_icd10_en`.`long_label` LIKE "% mel%" OR `documents_data_icd10_en`.`long_label` LIKE "% (mel%") LIMIT 30'>

I have also tried casting the rawqueryset into a list, and also manually iterating over it and printing the lines. Both produce nothing.

Finally, to check that the model object is actually what I think it is, trying print "All results: %s" % objmodel.objects.all() gives me a list of 40 or so <Data_icd10_en: Data_icd10_en object> which is what I would expect.

So, what's going on here? Why is my code not producing anything when run via modelname.objects.raw(), yet is fetching results when the exactly same SQL is run in the database shell, and is also fetching results correctly when the same modelname has all its rows fetched within that function?

---- EDIT ---- Tests confirm that yes, I am indeed accessing the same database via the Django app and via the shell. Also, simple raw queries all in one line do work.

هل كانت مفيدة؟

المحلول

After further investigation, turning on MySQL logging and an e-mail to Django devs, it turns out that there is no problem with my code.

Rather there is a native and minor bug in QuerySet.query.__str__(): when this outputs the actual SQL content to the console, it fails to print the quotes which encapsulate the user-supplied parameters.

So when the console is stating:

<RawQuery: u'SELECT * FROM documents_data_icd10_en WHERE (`documents_data_icd10_en`.`code` LIKE "(diabetes%"...

What is actually getting executed is:

"<RawQuery: u'SELECT * FROM documents_data_icd10_en WHERE (`documents_data_icd10_en`.`code` LIKE "("diabetes%""...

...which is invalid.

Morale of the story: don't believe what QuerySet.query.__str__() tells you, and don't encapsulate your user supplied strings in quotes for Model.objects.raw(sql,PARAMS), as this will be done for you.

نصائح أخرى

#this will execute the raw query of any type it can be a subquery etc..\

 query ="""this will be the raw query"""\ 
 X=cursor.execute(query) \
 answers = cursor.fetchall()\
 print ("answers---",answers)\
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top