Less code lines for scaling and stacking columns in numpy

https://stackoverflow.com/questions/23120621

04-07-2023
|

Question

I just started using iPython and numpy. I am sure that it is possible to do the same thing as below with less code.. I would like to scale all columns (=dimensions of a datapoint, which is the row) to values between 0 and 1 and recombine the columns to one array with the same "shape".

import numpy as np
from StringIO import StringIO

data = np.genfromtxt("wine_names.csv", dtype=float, delimiter=',', skip_header=1)

Data.shape => (178, 14). A csv file with 178 rows and 14 columns (178 datapoints in 14 dimensions).

data0 = (data[:,0] - np.amin((data[:,0]))) / (np.amax((data[:,0]))-np.amin((data[:,0])))
data1 = (data[:,1] - np.amin((data[:,1]))) / (np.amax((data[:,1]))-np.amin((data[:,1])))
data2 = (data[:,2] - np.amin((data[:,2]))) / (np.amax((data[:,2]))-np.amin((data[:,2])))

until n. In this case it is 14. This can be written with less code I am sure, but I don't know how..

data_all = np.column_stack([data0, data1, data2])

here as well np.column_stack([data0, data1, data2, ....., n])

Solution

By taking the minimum values, and maximum values along an axis, you can do it in one line.

data_all = (data - np.min(data, axis=0))/(np.max(data, axis=0) - np.min(data, axis=0))

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow