Question

I'm seeing very bizarre behavior with the REGEXP_MATCH function in google big query. The function appears to work perfectly fine for public data but is not working on my dataset. I have a dataset imported from the csv with the first two lines (first is header row which all becomes the schema where everything is a string), there's a lot more but the following is the only relevant data for this case.

"id","common_name","botanical_name","low_hardiness_zone","high_hardiness_zone","type","exposure_min","exposure_max","moisture_min","moisture_max"
"plant1","Abelia","Abelia zanderi 'Conti (Confetti)'","5b","9a","Shrub","Partial Sun","Full Sun","Dry","Dry"

When I run the query:

SELECT * FROM [PlantLink_Plant_Types.plant_data_set] 
WHERE REGEXP_MATCH('common_name',r'.*')

I get every result.

However, when I run the query:

SELECT * FROM [PlantLink_Plant_Types.plant_data_set] 
WHERE REGEXP_MATCH('common_name',r'A.*')

I get no results, which is really weird because the plant common name Abelia starts with an A.

Now my regex magic is not that strong, but I am pretty sure the pattern is not at fault. Additionally I've run the public dataset test queries with REGEXP_MATCH and they run correctly. Does anyone have any clue why REGEXP_MATCH would not always function as advertised?

Was it helpful?

Solution

Note:

  • REGEXP_MATCH('common_name',r'.*') matches the string 'common_name'

while

  • REGEXP_MATCH(common_name,r'.*') matches a field in your table that is called common_name

the 1st one is always true and therefore you get all results. I guess you wanted to refer the content of the field, so you need to use the second one.

  • REGEXP_MATCH(common_name,r'A.*') should return all records that field common_name contains "A".

hope this helps.

OTHER TIPS

Issue is the string 'common_name' does not start with 'A'.

Check this:

  • REGEXP_MATCH('common_name',r'.*'): All results.
  • REGEXP_MATCH('common_name',r'A.*'): No results.
  • REGEXP_MATCH('common_name',r'c.*'): All results.
  • REGEXP_MATCH(common_name,r'A.*'): All results that somewhere have an 'A'.

:)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top