Question

I am writing a bash script that I am using to detect certain classes of strings in a SQL query (like all upper-case, all lowercase, all numeric characters, etc...). Before doing the classification, I want to extract all quoted strings. I am having trouble getting a regex that will properly extract the quoted strings from the query string. For example, take this query from the TPCH benchmark:

select
o_year,
sum(case
    when nation = 'JAPAN' then volume
    else 0
end) / sum(volume) as mkt_share
from
(
    select
        extract(year from o_orderdate) as o_year,
        l_extendedprice * (1 - l_discount) as volume,
        n2.n_name as nation
    from
        part,
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2,
        region
    where
        p_partkey = l_partkey
        and s_suppkey = l_suppkey
        and l_orderkey = o_orderkey
        and o_custkey = c_custkey
        and c_nationkey = n1.n_nationkey
        and n1.n_regionkey = r_regionkey
        and r_name = 'ASIA'
        and s_nationkey = n2.n_nationkey
        and o_orderdate between date '1995-01-01' and date '1996-12-31'
        and p_type = 'MEDIUM BRUSHED BRASS'
) as all_nations
group by
o_year
order by
o_year;

Its a complex query, but that is besides the point. I need to be able to extract all of the single-quoted strings from this file and print them on their own line. ie:

'JAPAN'
'ASIA'
'1995-01-01'
'1996-12-31'
'MEDIUM BRUSHED BRASS'

Right now, (being that I'm not very familiar with regex) all I have is:

printf '%s\n' $SQL_FILE_VARIABLE | grep -E "'*'"

But this doesn't support strings with spaces, and it doesn't work when multiple strings are on the same line of the file. Ideally, I can get this to work in my bash script, so preferably the solution will be grep/sed/perl. I have done some googling and have found solutions to similar problems, but I have not been able to get them to work for this in particular.

Any Ideas how I can achieve this? Thanks.

Était-ce utile?

La solution

You want something like this:

printf '%s\n' $SQL_FILE_VARIABLE | grep -E "'[^']*'"

Autres conseils

Why not try /'(.*)?'/g This means, between the quotes, match everything and extract it.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top