Question

I am really new at sql and I want to extract 'SWAMP RIVER NEAR DOVER PLAINS NY' from the following string:

<a href='http://waterdata.usgs.gov/nwis/nwisman/?site_no=01199490'>01199490</a> SWAMP RIVER NEAR DOVER PLAINS NY</a>

The problem is that length of the range I want to extract varies in each row.

I tried the following:

select substring (name, 80 , char_length(name) - 4 ) from stream_gages; 

But I get: SWAMP RIVER NEAR DOVER PLAINS NY</a> not matter what number I put after the minus sign.

Is there a way I can do this?

So, what I am exactly trying to do, is to extract the code and name of each stream gage stations into new columns, contained in:

<a href='http://waterdata.usgs.gov/nwis/nwisman/?site_no=01199490'>01199490</a> SWAMP RIVER NEAR DOVER PLAINS NY</a>

in order to do some GIS queries.

The code is the number starting at position 57, and the name starting in postion 80 (in the case that the code length does not vary which is the most of the cases) however the length of name varies in each row. The length of the code also varies in a few row, but I can edit it manually if it is to hard to come up with a sql that does that. Howevery, for the name in all 240 rows the length is different. Thanks.

Was it helpful?

Solution

You could use regexp_replace() , avoiding the literal constants:

create TABLE bla
        ( id SERIAL NOT NULL PRIMARY KEY
        , body varchar
        );

select * from bla;

INSERT INTO bla(body) VALUES
(e'<a href=\'http://waterdata.usgs.gov/nwis/nwisman/?site_no=01199490\'>01199490</a> SWAMP RIVER NEAR DOVER PLAINS NY</a>' )
        ;

select id
  , regexp_replace(body, e'.+<\/a> ([^<]+)<\/a>.*', '\1')
from bla;

And the results:

CREATE TABLE
 id | body 
----+------
(0 rows)

INSERT 0 1
 id |          regexp_replace          
----+----------------------------------
  1 | SWAMP RIVER NEAR DOVER PLAINS NY
(1 row)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top