modo corretto per creare una tabella pivot in PostgreSQL usando CASO QUANDO

https://stackoverflow.com/questions/2477231

21-09-2019
|

Domanda

Sto cercando di creare un tipo di tabella pivot vista in PostgreSQL e sono quasi arrivati! Ecco la domanda di base:

select 
acc2tax_node.acc, tax_node.name, tax_node.rank 
from 
tax_node, acc2tax_node 
where 
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';

E i dati:

   acc    |          name           |     rank     
----------+-------------------------+--------------
 AJ012531 | Paromalostomum fusculum | species
 AJ012531 | Paromalostomum          | genus
 AJ012531 | Macrostomidae           | family
 AJ012531 | Macrostomida            | order
 AJ012531 | Macrostomorpha          | no rank
 AJ012531 | Turbellaria             | class
 AJ012531 | Platyhelminthes         | phylum
 AJ012531 | Acoelomata              | no rank
 AJ012531 | Bilateria               | no rank
 AJ012531 | Eumetazoa               | no rank
 AJ012531 | Metazoa                 | kingdom
 AJ012531 | Fungi/Metazoa group     | no rank
 AJ012531 | Eukaryota               | superkingdom
 AJ012531 | cellular organisms      | no rank

Quello che sto cercando di ottenere è la seguente:

acc      | species                  | phylum
AJ012531 | Paromalostomum fusculum  | Platyhelminthes

che sto cercando di fare questo con CASO QUANDO, così ho per quanto riguarda il seguente:

select 
acc2tax_node.acc, 
CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as species, 
CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as phylum 
from 
tax_node, acc2tax_node 
where 
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';

Il che mi dà l'output:

   acc    |         species         |     phylum      
----------+-------------------------+-----------------
 AJ012531 | Paromalostomum fusculum | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | Platyhelminthes
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         |

Ora so che devo gruppo con acc ad un certo punto, in modo da provare

select 
acc2tax_node.acc, 
CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as sp, 
CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as ph 
from 
tax_node, acc2tax_node 
where 
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531' 
group by acc2tax_node.acc;

Ma ho il temuto

ERROR:  column "tax_node.rank" must appear in the GROUP BY clause or be used in an aggregate function

Tutti gli esempi precedenti sono stato in grado di trovare qualcosa di simile uso SUM () intorno alle dichiarazioni CASE, quindi credo che sia la funzione di aggregazione. Ho provato con FIRST ():

select 
acc2tax_node.acc, 
FIRST(CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END) as sp, 
FIRST(CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END) as ph 
from tax_node, acc2tax_node where tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531' group by acc2tax_node.acc;

, ma ottenere l'errore:

ERROR:  function first(character varying) does not exist

Qualcuno può offrire alcun suggerimento?

Soluzione

Usa MAX () o MIN (), non FIRST (). In questo scenario, si avrà tutta vuota nella colonna per ogni valore di gruppo ad eccezione di, al massimo, uno con un valore non nullo. Per definizione, questo è sia il MIN e MAX di tale insieme di valori (tutti valori nulli sono esclusi).

Altri suggerimenti

PostgreSQL ha un paio di funzioni per le query pivot, si veda questo articolo all'indirizzo Postgresonline . È possibile trovare queste funzioni nel contrib .

SELECT  atn.acc, ts.name AS species, tp.name AS phylum
FROM    acc2tax_node atn
LEFT JOIN
        tax_node ts
ON      ts.taxid = atn.taxid
        AND ts.rank = 'species'
LEFT JOIN
        tax_node tp
ON      tp.taxid = atn.taxid
        AND tp.rank = 'phylum'
WHERE   atn.acc = 'AJ012531 '

Per ulteriori informazioni come richiesto (in una risposta piuttosto che un commento per la formattazione bello):

SELECT * FROM acc2tax_node WHERE acc = 'AJ012531';

   acc    | taxid  
----------+--------
 AJ012531 |  66400
 AJ012531 |  66399
 AJ012531 |  39216
 AJ012531 |  39215
 AJ012531 | 166235
 AJ012531 | 166384
 AJ012531 |   6157
 AJ012531 |  33214
 AJ012531 |  33213
 AJ012531 |   6072
 AJ012531 |  33208
 AJ012531 |  33154
 AJ012531 |   2759
 AJ012531 | 131567

Esegui:

SELECT report.* FROM crosstab(
 select 
 acc2tax_node.acc, tax_node.name, tax_node.rank 
 from 
 tax_node, acc2tax_node 
 where 
 tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';
) AS report(species text, enus text, family text, ...)

Come Matthew Wood ha sottolineato, utilizzare MIN () o MAX (), non FIRST ():

SELECT 
    an.acc, 
    MAX(
        CASE tn.rank 
            WHEN 'species' THEN tn.name 
            ELSE NULL 
        END
    ) AS species, 
    MAX(
        CASE tn.rank 
            WHEN 'phylum' THEN tn.name 
            ELSE NULL 
        END
    ) AS phylum 
FROM tax_node tn, 
    acc2tax_node an
WHERE tn.taxid = an.taxid 
    and an.acc = 'AJ012531' 
GROUP by an.acc;

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow