You cannot merge
two strings. I think you're confused about what os.path.join
returns. It returns a string. You have to actually read in the DataFrame
s from the files named JJ
and WW
, then perform the merge
.
Here's a full example of writing 2 DataFrame
s, reading them back with read_csv
and then merging them on a column group
:
In [49]: df1 = DataFrame(randn(10, 1), columns=['a'])
In [50]: df1['group'] = np.random.choice(['b', 'c'], size=len(df1))
In [51]: df2 = DataFrame(randn(10, 1), columns=['b'])
In [52]: df2['group'] = np.random.choice(['b', 'c'], size=len(df1))
In [53]: df1.to_csv('df1.csv', index=False)
In [54]: cat df1.csv
a,group
-1.590035935931282,b
0.5496398501891229,c
-0.6484689548035797,b
0.19162302248253205,b
-0.9852064283582675,c
0.5975155551821989,b
0.29443634291217047,b
-0.7929994157215382,b
-1.9546460886048795,b
0.19195457928475546,c
In [55]: df2.to_csv('df2.csv', index=False)
In [56]: cat df2.csv
b,group
-1.2874060006117918,c
1.1037959548210117,b
0.47172389260467507,c
0.12802538607490285,c
-0.8753708425917293,b
-0.09187827793091947,b
1.140204215271196,c
0.4862940170888638,b
-1.1080430563137758,b
-1.3698112665693232,c
In [57]: df1_csv = read_csv('df1.csv', index_col=None)
In [58]: df2_csv = read_csv('df2.csv', index_col=None)
In [59]: df1_csv
Out[59]:
a group
0 -1.590 b
1 0.550 c
2 -0.648 b
3 0.192 b
4 -0.985 c
5 0.598 b
6 0.294 b
7 -0.793 b
8 -1.955 b
9 0.192 c
In [60]: df2_csv
Out[60]:
b group
0 -1.287 c
1 1.104 b
2 0.472 c
3 0.128 c
4 -0.875 b
5 -0.092 b
6 1.140 c
7 0.486 b
8 -1.108 b
9 -1.370 c
In [61]: df3 = pd.merge(df1_csv, df2_csv, on='group')
In [62]: df3
Out[62]:
a group b
0 -1.590 b 1.104
1 -1.590 b -0.875
2 -1.590 b -0.092
3 -1.590 b 0.486
4 -1.590 b -1.108
5 -0.648 b 1.104
6 -0.648 b -0.875
7 -0.648 b -0.092
8 -0.648 b 0.486
9 -0.648 b -1.108
10 0.192 b 1.104
11 0.192 b -0.875
12 0.192 b -0.092
13 0.192 b 0.486
14 0.192 b -1.108
15 0.598 b 1.104
16 0.598 b -0.875
17 0.598 b -0.092
18 0.598 b 0.486
19 0.598 b -1.108
20 0.294 b 1.104
21 0.294 b -0.875
22 0.294 b -0.092
23 0.294 b 0.486
24 0.294 b -1.108
25 -0.793 b 1.104
26 -0.793 b -0.875
27 -0.793 b -0.092
28 -0.793 b 0.486
29 -0.793 b -1.108
30 -1.955 b 1.104
31 -1.955 b -0.875
32 -1.955 b -0.092
33 -1.955 b 0.486
34 -1.955 b -1.108
35 0.550 c -1.287
36 0.550 c 0.472
37 0.550 c 0.128
38 0.550 c 1.140
39 0.550 c -1.370
40 -0.985 c -1.287
41 -0.985 c 0.472
42 -0.985 c 0.128
43 -0.985 c 1.140
44 -0.985 c -1.370
45 0.192 c -1.287
46 0.192 c 0.472
47 0.192 c 0.128
48 0.192 c 1.140
49 0.192 c -1.370
Couple of other things:
Don't use is
to compare objects for equality, use ==
. Only in the case of small integers will this work reliably, and even then you shouldn't rely on it because that's an implementation detail of CPython.
Instead of checking the file name with str.endswith
, just iterate over what you want by first globbing:
import glob
for f in glob.glob(os.path.join(path, '*J.csv')):
if len(f) == 12:
# do all the thingz!