Bizarre Behavior with REGEXP_MATCH in Google Big Query
-
21-12-2019 - |
Question
I'm seeing very bizarre behavior with the REGEXP_MATCH
function in google big query. The function appears to work perfectly fine for public data but is not working on my dataset. I have a dataset imported from the csv with the first two lines (first is header row which all becomes the schema where everything is a string), there's a lot more but the following is the only relevant data for this case.
"id","common_name","botanical_name","low_hardiness_zone","high_hardiness_zone","type","exposure_min","exposure_max","moisture_min","moisture_max"
"plant1","Abelia","Abelia zanderi 'Conti (Confetti)'","5b","9a","Shrub","Partial Sun","Full Sun","Dry","Dry"
When I run the query:
SELECT * FROM [PlantLink_Plant_Types.plant_data_set]
WHERE REGEXP_MATCH('common_name',r'.*')
I get every result.
However, when I run the query:
SELECT * FROM [PlantLink_Plant_Types.plant_data_set]
WHERE REGEXP_MATCH('common_name',r'A.*')
I get no results, which is really weird because the plant common name Abelia starts with an A.
Now my regex magic is not that strong, but I am pretty sure the pattern is not at fault. Additionally I've run the public dataset test queries with REGEXP_MATCH
and they run correctly. Does anyone have any clue why REGEXP_MATCH
would not always function as advertised?
Solution
Note:
- REGEXP_MATCH('common_name',r'.*') matches the string 'common_name'
while
- REGEXP_MATCH(common_name,r'.*') matches a field in your table that is called common_name
the 1st one is always true and therefore you get all results. I guess you wanted to refer the content of the field, so you need to use the second one.
- REGEXP_MATCH(common_name,r'A.*') should return all records that field common_name contains "A".
hope this helps.
OTHER TIPS
Issue is the string 'common_name' does not start with 'A'.
Check this:
REGEXP_MATCH('common_name',r'.*')
: All results.REGEXP_MATCH('common_name',r'A.*')
: No results.REGEXP_MATCH('common_name',r'c.*')
: All results.REGEXP_MATCH(common_name,r'A.*')
: All results that somewhere have an 'A'.
:)