Question

I have a datamining problem with a very large number of attributes (>15,000). I represent the data in multiple tables with a common Id field. Having deleted empty attributes, I want to merge the resulting smaller tables so I can calculate correlations to remove redundent attributes.
I can join the tables easily with:

select * from lrg_bin0 k0
  join lrg_bin1 k1 on k0.id = k1.id
  join lrg_bin2 k2 on k0.id = k2.id
  join lrg_bin3 k3 on k0.id = k3.id

but when I try to create a new table with:

Create table mrg0 as (
  select * from lrg_bin0 k0
    join lrg_bin1 k1 on k0.id = k1.id
    join lrg_bin2 k2 on k0.id = k2.id
    join lrg_bin3 k3 on k0.id = k3.id
) 

Postgress complains with 'column "id" specified more than once'

The very large number of attributes means I am doing everything programatically so answers suggesting queries I can generate would be appreciated.

Était-ce utile?

La solution

Use using in instead of on

create table mrg0 as 
select *
from
    lrg_bin0 k0
    join lrg_bin1 k1 using (id)
    join lrg_bin2 k2 using (id)
    join lrg_bin3 k3 using (id)

From the manual:

Also, USING implies that only one of each pair of equivalent columns will be included in the join output, not both

Autres conseils

The problem is that in SQL when you perform a join using on, the matching column will be included once per table it's found in. What you can do is specify which table you want the column from ala:

Create table mrg0 as (
  select k1.id, ... from lrg_bin0 k0
    join lrg_bin1 k1 on k0.id = k1.id
    join lrg_bin2 k2 on k0.id = k2.id
    join lrg_bin3 k3 on k0.id = k3.id
) 
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top