Question

In a Postgres database (9.3) I have strings like these from which I intend to delete the links:

'HV 3 STANKOVERLAST (+Inc,net: reg.inmeld+) , J.J. Cremerplein 46 AMSTERDAM [ ASD ] http:\/\/t.co\/qzmyMibvHn #p2000'
'A1 13105 AMSTERDAM Bickersgracht 270 http:\/\/t.co\/4oX6B5oAo4 #p2000'
'A1 13157 AMSTERDAM Argonautenstraat 54 3 http:\/\/t.co\/mmyjBcWEFY #p2000'
'A1 13122 AMSTERDAM Tweede Helmersstraat 6 Hotel Crystal http:\/\/t.co\/BWGj4R1noh #p2000'

To delete them, I used:

split_part(text, 'http', 1)

Unfortunately, not all of them are build the same way with the link in the end:

'BR 2 BUITENBRAND (+http:\/\/t.co\/1x4jPyfA9e: reg.inmeld+) Ferdinand Bolstraat , Quellijnstraat AMSTERDAM [ ASD ] #p2000 #watiserloos'

Using split_part() here would delete big parts of this string

I already looked for some kind of regex function, but couldn't find a solution for these dynamic links.

Was it helpful?

Solution

Generally you can use regexp_replace() or substring() with regular expressions to cut most anything from your strings, as long as you can define it clearly.

In your case, something like:

SELECT regexp_replace(string, 'http:[^[:space:]]+(:?\s*#p\d+)?', '') AS trimmed

Or simpler, according to your later comment:

the part of string beginning with http until the next space

(or end of string, I may add)

SELECT regexp_replace(string, 'http:[^[:space:]]+', '') AS trimmed

Replaces the first occurrence of the pattern. Add the global switch 'g', if there can be more.

-> SQLfiddle

Explain

[^[:space:]] .. character class of all non-whitespace character.
^ .. negates the class.
[:space:] .. character class for white space characters as defined by your locale. Cuts any part starting with http: up to the next white space character. Plus, optionally, a dangling #p followed by a number.
+ .. one or more atoms

More explanation under this recent, related question:
Regex failing to match number and dash with letter (or space and letter)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top