I have a pandas df pulled from an ODBC connection:
import pandas.io.sql as psql
handle = pyodbc.connect('...')
df1 = psql.frame_query("select * from Table1 where... [some queries on columns]")
# below is a pandas df resulting from the above SQL query
df1 = pd.DataFrame([[1, 'F', 11111, 500, 60], [2, 'M', 22222, 400, 30], [3, 'M', 33333, 5400, 78], [4, 'F', 44444, 5400, 45], [5, 'M', 55555, 8914, 66]], columns = ['ID','Gender','ZipCd','Spend','Age'])
Now I want to run a separate query on a different table in the same database; and as one of the criteria, extract rows that match the IDs from df1
(e.g. below, which does not work).
df2 = psql.frame_query("select * from Table2 where ID = ? and StatusCd in ('104', '106', '112', '115')", df1['ID'])
# The ID variable is a common unique identifier b/n the 2 tables
My question is, how do I assign df1['ID']
as a list of elements to query in df2
? e.g. ...where ID in (1,2,3,...)
, but using df1['ID'] as an object containing the list. This would return records where IDs in df2 matched those of df1 as well as the other query criteria.
I am familiar w/ R syntax, so conceptually, my question very closely resembles this one: Pass R variable to RODBC's sqlQuery?
At the end of the day, I'm interested in parsing down table 1 so that it includes only records found in table 2 (i.e. that have one of the requisite StatusCds found in table 2). In this respect, I'm certain there is a more efficient way to call in the data, and probably in one query, but I'm not literate enough in python or SQL yet.
Further comment
I have pyodbc as a tag since i was originally pulling from my SQL servers using that module; maybe pyodbc is the more efficient method to use for this kind of task? But I'm an R/spreadsheet guy & pandas has just been the easiest thing for me to learn so far.