Question

In a previous programme I was reading data from a csv file like this:

AllData = np.genfromtxt(open("PSECSkew.csv", "rb"),
                        delimiter=',',
                        dtype=[('CalibrationDate', datetime),('Expiry', datetime), ('B0', float), ('B1', float), ('B2', float), ('ATMAdjustment', float)],
                        converters={0: ConvertToDate, 1: ConvertToDate})

I'm now writing an incredibly similar programme but this time I want to get a really similar data structure to AllData (except the floats will all be in a csv string this time) but from SQL Server instead of a csv file. What's the best approach?

pyodbc looks like it involves using cursors a lot which I'm not familiar with and would like to avoid. I just want to run the query and get the data in a structure like above (or like a DataTable in C#).

Was it helpful?

Solution

Here's a minimal example, based on the other question that you linked to:

import pyodbc
import numpy

conn = pyodbc.connect('DRIVER={SQL Server};SERVER=MyServer;Trusted_Connection=yes;')
cur = conn.cursor()
cur.execute('select object_id from sys.objects')
results = cur.fetchall()
results_as_list = [i[0] for i in results]
array = numpy.fromiter(results_as_list, dtype=numpy.int32)
print array

OTHER TIPS

In the mean time, there is a better way. Check out the turbodbc package. To transform your result set into an OrderedDict of NumPy arrays, just do this:

import turbodbc
connection = turbodbc.connect(dsn="My data source name")
cursor = connection.cursor()
cursor.execute("SELECT 42")
results = cursor.fetchallnumpy()

It should also be much faster than pyodbc (depending on your database, factor 10 is absolutely possible).

How about using pandas? For example:

import psycopg2
import pandas

try :
    con = psycopg2.connect(
    host = "host",
    database = "innovate",
    user = "username",
    password = "password")
except:
    print "Could not connect to database."

data = pandas.read_sql_query("SELECT * FROM table", con)

In the end I just used pyodbc and iterated through the cursor / result set put each result in a manually constructed structured array through a lot of trial and error. If there is a more direct way, I'm all ears!

import numpy as np
import pyodbc as SQL
from datetime import datetime


cxn = SQL.connect('Driver={SQL Server};Server=myServer; Database=myDB; UID=myUserName; PWD=myPassword')
c = cxn.cursor()

#Work out how many rows the query returns in order to initialise the structured array with the correct number of rows
num_rows = c.execute('SELECT count(*) FROM PSECSkew').fetchone()[0]

#Create the structured array
AllData = np.zeros(num_rows, dtype=[('CalibrationDate', datetime),('Expiry', datetime), ('B0', float), ('B1', float), ('B2', float), ('ATMAdjustment', float)])

ConvertToDate = lambda s:datetime.strptime(s,"%Y-%m-%d")

#iterate using the cursor and fill the structred array.
r = 0
for row in c.execute('SELECT * FROM PSECSkew ORDER BY CalibrationDate, Expiry'):
    AllData[r] = (ConvertToDate(row[0]), ConvertToDate(row[1])) + row[2:] #Note if you don't need manipulate the data (i.e. to convert the dates in my case) then just tuple(row) would have sufficed
    r = r + 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top