Question

I have a table that looks like this:

  Column A   |    Column   B    |     Counter
---------------------------------------------
      A      |       B          |       53
      B      |       C          |       23
      A      |       D          |       11
      C      |       B          |       22

I need to remove the last row because it's cyclic to the second row. Can't seem to figure out how to do it.

EDIT

There is an indexed date field. This is for Sankey diagram. The data in the sample table is actually the result of a query. The underlying table has:

date   | source node | target node | path count 

The query to build the table is:

SELECT source_node, target_node, COUNT(1) 
FROM sankey_table 
WHERE TO_CHAR(data_date, 'yyyy-mm-dd')='2013-08-19' 
GROUP BY source_node, target_node 

In the sample, the last row C to B is going backwards and I need to ignore it or the Sankey won't display. I need to only show forward path.

Was it helpful?

Solution 2

If you can adjust how your table is populated, you can change the query you're using to only retrieve the values for the first direction (for that date) in the first place, with a little bit an analytic manipulation:

SELECT source_node, target_node, counter FROM (
  SELECT source_node,
    target_node,
    COUNT(*) OVER (PARTITION BY source_node, target_node) AS counter,
    RANK () OVER (PARTITION BY GREATEST(source_node, target_node),
      LEAST(source_node, target_node), TRUNC(data_date)
        ORDER BY data_date) AS rnk
  FROM sankey_table 
  WHERE TO_CHAR(data_date, 'yyyy-mm-dd')='2013-08-19' 
)
WHERE rnk = 1;

The inner query gets the same data you collect now but adds a ranking column, which will be 1 for the first row for any source/target pair in any order for a given day. The outer query then just ignores everything else.

This might be a candidate for a materialised view if you're truncating and repopulating it daily.

If you can't change your intermediate table but can still see the underlying table you could join back to it using the same kind of idea; assuming the table you're querying from is called sankey_agg_table:

SELECT sat.source_node, sat.target_node, sat.counter
FROM sankey_agg_table sat
JOIN (SELECT source_node, target_node,
    RANK () OVER (PARTITION BY GREATEST(source_node, target_node),
      LEAST(source_node, target_node), TRUNC(data_date)
        ORDER BY data_date) AS rnk
  FROM sankey_table) st
ON st.source_node = sat.source_node
AND st.target_node = sat.target_node
AND st.rnk = 1;

SQL Fiddle demos.

OTHER TIPS

Removing all edges from your graph where the tuple (source_node, target_node) is not ordered alphabetically and the symmetric row exists should give you what you want:

DELETE 
FROM sankey_table t1
WHERE source_node > target_node
AND EXISTS (
  SELECT NULL from sankey_table t2
  WHERE t2.source_node = t1.target_node
    AND t2.target_node = t1.source_node)

If you don't want to DELETE them, just use this WHERE clause in your query for generating the input for the diagram.

DELETE FROM yourTable
where [Column A]='C'

given that these are all your rows

EDIT

I would recommend that you clean up your source data if you can, i.e. delete the rows that you call backwards, if those rows are incorrect as you state in your comments.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top