Pregunta

The database design is far from optimum but I have to deal with it and now I'm really stuck.

Edit: I'm using cx_Oracle

OK so this is my query:

query="select degree, spectraldev.event.eventnumber \
       from spectraldev.degree \
       join spectraldev.alignment on \
            (spectraldev.version_id = alignment.version_id) \
       join spectraldev.event on \
            (alignment.timestamp between event.eventstart and event.eventstop) \
       join spectraldev.eventsetup on \
            (spectraldev.event.eventsetup = spectraldev.eventsetup.oid) \
       where spectraldev.event.eventnumber>=" + options.start + " AND spectraldev.event.eventnumber<=" + options.stop + " AND \
            HITS>=" + str(options.minimum_hits)+" \
       order by spectraldev.event.eventnumber"

db_cursor.execute(query)

Which returns a bunch of degrees (12.34 etc.) for many events, which are identified by a unique number (eventnumber like 346554).

So I get a table like this:

454544    45.2
454544    12.56
454544    41.1
454544    45.4
454600    22.3
454600    24.13
454600    21.32
454600    22.53
454600    54.51
454600    33.87
454610    32.7
454610    12.99

And so on…

Now I need to create a dictionary with the average degree for each event (summing up all corresponding floats and dividing by the number of them).

I think this could be done in SQL but I just can't get it work. At the moment I'm using python to do this, but the fetch command takes 1-2 hours to complete about 2000 Events, which is far too slow, since I need to process about 1000000 events.

This is my fetching part, which takes so long:

_degrees = []
for degree, eventNumber in cursor.fetchall():
    _degrees.append([eventNumber, degree])

and then sorting (which is really fast, < 1sec) and calculating averages (also really fast):

_d={}
for eventNumber, degree in _degrees:
    _d.setdefault(eventNumber, []).append(degree)

for event in events:
    _curDegree = _degrees[int(event)]
    _meanDegree = sum(_curDegree) / float(len(_curDegree))
    meanDegrees.append(_meanDegree)

Is there a way to do the python part in SQL?

¿Fue útil?

Solución

This is an aside, but an important one. You're wide open to SQL Injection. It may not matter in your particular instance but it's best to always code for the worst.

You don't mention what module you're using but assuming it's something that's PEP 249 compliant (you're probably using cx_Oracle) then you can pass a dictionary with the named bind parameters. A typical query might look like this:

query = """select column1 from my_table where id = :my_id"""
bind_vars = {'my_id' : 1}

db_cursor.execute(query, bind_vars)

On your actual query you're converting some variables (options.start for instance) to a string in Python, but not quoting them in SQL, which means they're being implicitly converted back to a number. This almost definitely isn't needed.


In relation to your actual problem 1-2 hours to complete 2,000 events is, you're correct, ridiculous. You haven't posted a schema but my guess is you're lacking any indexes.

To get the average number of degrees per event number you should use the avg() function. This would make your query:

select spectraldev.event.eventnumber, avg(degree) as degree
  from spectraldev.degree
  join spectraldev.alignment 
        -- I think this is wrong on your query
    on (degree.version_id = alignment.version_id)
  join spectraldev.event 
    on (alignment.timestamp between event.eventstart and event.eventstop)
  join spectraldev.eventsetup 
    on (spectraldev.event.eventsetup = spectraldev.eventsetup.oid)
 where spectraldev.event.eventnumber >= :start
   and spectraldev.event.eventnumber <= :stop
   and hits >= :minimum_hits
 group by spectraldev.event.eventnumber
 order by spectraldev.event.eventnumber

I've formatted your query to make it slightly more readable (from my point of view) and to make it more obvious where you need the indexes.

Judging by this you need an index on the following tables and columns;

  • EVENT - eventnumber, eventstart, eventstop, eventsetup
  • DEGREE - version_id
  • ALIGNMENT - version_id, tstamp
  • EVENTSETUP - oid

and wherever hits might be.

Having said all that your problem may be the indexes. You haven't provided your explain plan or the schema, or the number of rows so this is going to be a guess. However, if you're selecting a significant proportion of the rows in a table the CBO may be using the indexes when it shouldn't. Forcing a full table scan using the full hint, /*+ full(event) */ for instance, may solve your problem.

Removing the order by, if it's not required may also significantly speed up your query.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top