I've found the following method to be useful to me. It basically creates a namedtuple
containing the names of all the variables in the data frame as strings.
For example, consider the following data frame containing 2 variables called "variable_1" and "variable_2":
from collections import namedtuple
from pandas import DataFrame
import numpy as np
df = DataFrame({'variable_1':np.arange(5),'variable_2':np.arange(5)})
The following code creates a namedtuple called "var":
def ntuples():
list_of_names = df.columns.values
list_of_names_dict = {x:x for x in list_of_names}
Varnames = namedtuple('Varnames', list_of_names)
return Varnames(**list_of_names_dict)
var = ntuples()
In a notebook, when I write var.
and press Tab, the names of all the variables in the dataframe df
will be displayed. Writing var.variable_1
is equivalent to writing 'variable_1'. So the following would work: df[var.variable_1]
.
The reason I define a function to do it is that often times you will add new variables to a data frame. In order to update the new variables to your namedtuple "var" simply call the function again, ntuples()
, and you are good to go.