문제

I have a datamining problem with a very large number of attributes (>15,000). I represent the data in multiple tables with a common Id field. Having deleted empty attributes, I want to merge the resulting smaller tables so I can calculate correlations to remove redundent attributes.
I can join the tables easily with:

select * from lrg_bin0 k0
  join lrg_bin1 k1 on k0.id = k1.id
  join lrg_bin2 k2 on k0.id = k2.id
  join lrg_bin3 k3 on k0.id = k3.id

but when I try to create a new table with:

Create table mrg0 as (
  select * from lrg_bin0 k0
    join lrg_bin1 k1 on k0.id = k1.id
    join lrg_bin2 k2 on k0.id = k2.id
    join lrg_bin3 k3 on k0.id = k3.id
) 

Postgress complains with 'column "id" specified more than once'

The very large number of attributes means I am doing everything programatically so answers suggesting queries I can generate would be appreciated.

도움이 되었습니까?

해결책

Use using in instead of on

create table mrg0 as 
select *
from
    lrg_bin0 k0
    join lrg_bin1 k1 using (id)
    join lrg_bin2 k2 using (id)
    join lrg_bin3 k3 using (id)

From the manual:

Also, USING implies that only one of each pair of equivalent columns will be included in the join output, not both

다른 팁

The problem is that in SQL when you perform a join using on, the matching column will be included once per table it's found in. What you can do is specify which table you want the column from ala:

Create table mrg0 as (
  select k1.id, ... from lrg_bin0 k0
    join lrg_bin1 k1 on k0.id = k1.id
    join lrg_bin2 k2 on k0.id = k2.id
    join lrg_bin3 k3 on k0.id = k3.id
) 
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top