Question

I have combined two different tables together, one side is named DynDom and the other is CATH. I am trying to remove duplicates from that table such as below: enter image description here

However, if i select distinct Dyndom pdbcode from the table, it returns distinct values of that pdbcode. DynDom table and

CATH

Based on the pictures above, I commented out the DynDom/CATH columns in the table and ran the query separately for DynDom/CATH and it returned those values accordingly, which is what i need and i was wondering if it's possible for me to use 2 distinct statements to return distinct values of the entire table based on the pdbcode.

Here's my code :

select DISTINCT
    cath_dyndom_table_2."DYNDOM_DOMAINID",
    cath_dyndom_table_2."DYNDOM_DSTART",
    cath_dyndom_table_2."DYNDOM_DEND",
    cath_dyndom_table_2."DYNDOM_CONFORMERID",
    cath_dyndom_table_2.pdbcode,
    cath_dyndom_table_2."DYNDOM_ChainID",
    cath_dyndom_table_2.cath_pdbcode,
    cath_dyndom_table_2."CATH_BEGIN",
    cath_dyndom_table_2."CATH_END"
from 
    cath_dyndom_table_2 
where 
    pdbcode = '2hun'
order by 
    cath_dyndom_table_2."DYNDOM_DOMAINID",
    cath_dyndom_table_2."DYNDOM_DSTART",
    cath_dyndom_table_2."DYNDOM_DEND",
    cath_dyndom_table_2.pdbcode,
    cath_dyndom_table_2.cath_pdbcode,
    cath_dyndom_table_2."CATH_BEGIN",
    cath_dyndom_table_2."CATH_END";

In the end, i would like to search domains from DynDom and CATH, based on the pdbcode and return the rows without having duplicate values.

Thank you.

UPDATE :

This is my VIEW table that i have done.

    CREATE VIEW cath_dyndom_table AS
SELECT
  r.domainid AS "DYNDOM_DOMAINID",
  r.DomainStart AS "DYNDOM_DSTART",
  r.Domain_End AS "DYNDOM_DEND",
  r.ddid AS "DYN_DDID",
  r.confid AS "DYNDOM_CONFORMERID",
  r.pdbcode,
  r.chainid AS "DYNDOM_ChainID",
  d.cath_pdbcode,
  d.cathbegin AS "CATH_BEGIN",
  d.cathend AS "CATH_END"
FROM dyndom_domain_table r
  FULL OUTER JOIN cath_domains d ON d.cath_pdbcode::character(4) = r.pdbcode 
  ORDER BY confid ASC;
Was it helpful?

Solution

It sounds as though you want a UNION of domain name and ranges from each table - this can be achieved like so:

SELECT DYNDOM_DOMAINID, DYNDOM_DSTART, DYNDOM_DEND
FROM DynDom
UNION
SELECT RTRIM(cath_pdbcode), CATH_BEGIN, CATH_END
FROM CATH

This should eliminate exact duplicates (ie. where the domain name, start and end are all identical) but will not eliminate duplicate domain names with different ranges - if these exist you will need to decide how to handle them (retain them as separate entries, combine them with lowest start and highest end, or whatever other option is preferred).

EDIT: Actually, I believe you can get the desired results simply by changing the JOIN ON condition in your view to be:

FULL OUTER JOIN cath_domains d 
ON d.cath_pdbcode::character(5) = r.pdbcode || r.chainid AND
   r.DomainStart <= d.cathbegin AND
   r.Domain_End >= d.cathend

OTHER TIPS

What you are getting is the cartesian product of the ´two tables`.

In order to get one line without duplicates you need to have to have a 1-to-1 relation between both tables.


You can see HERE what are cartesian joins and HERE how to avoid them!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top