Pregunta

A Bloomberg futures ticker usually looks like:

MCDZ3 Curcny

where the root is MCD, the month letter and year is Z3 and the 'yellow key' is Curcny.

Note that the root can be of variable length, 2-4 letters or 1 letter and 1 whitespace (e.g. S H4 Comdty). The letter-year allows only the letter listed below in expr and can have two digit years. Finally the yellow key can be one of several security type strings but I am interested in (Curncy|Equity|Index|Comdty) only.

In Matlab I have the following regular expression

expr = '[FGHJKMNQUVXZ]\d{1,2} '; 
[rootyk, monthyear] = regexpi(bbergtickers, expr,'split','match','once');

where

rootyk{:}
ans = 
    'mcd'    'curncy'

and

monthyear = 
    'z3 '

I don't want to match the ' ' (space) in the monthyear. How can I do?

¿Fue útil?

Solución

Assuming there are no leading or trailing whitespaces and only upcase letters in the root, this should work:

^([A-Z]{2,4}|[A-Z]\s)([FGHJKMNQUVXZ]\d{1,2}) (Curncy|Equity|Index|Comdty)$

You've got root in the first group, letter-year in the second, yellow key in the third.

I don't know Matlab nor whether it covers Perl Compatible Regex. If it fails, try e.g. with instead of \s. Also, drop the ^...$ if you'd like to extract from a bigger source text.

Otros consejos

The expression you're feeding regexpi with contains a space and is used as a pattern for 'match'. This is why the matched monthyear string also has a space1.

If you want to keep it simple and let regexpi do the work for you (instead of postprocessing its output), try a different approach and capture tokens instead of matching, and ignore the intermediate space:

%//     <$1><----------$2---------> <$3>
expr = '(.+)([FGHJKMNQUVXZ]\d{1,2}) (.+)';
tickinfo = regexpi(bbergtickers, expr, 'tokens', 'once');

You can also simplify the expression to a more genereic '(.+)(\w{1}\d{1,2})\s+(.+)', if you wish.

Example

bbergtickers = 'MCDZ3 Curncy';
expr = '(.+)([FGHJKMNQUVXZ]\d{1,2})\s+(.+)'; 
tickinfo = regexpi(bbergtickers, expr, 'tokens', 'once');

The result is:

tickinfo =
    'MCD'
    'Z3'
    'Curncy'

1 This expression is also used as a delimiter for 'split'. Removing the trailing space from it won't help, as it will reappear in the rootyk output instead.

Assuming you just want to get rid of the leading and or trailing spaces at the edge, there is a very simple command for that:

monthyear = trim(monthyear)

For removing all spaces, you can do:

monthyear(isspace(monthyear))=[]

Here is a completely different approach, basically this searches the letter before your year number:

s = 'MCDZ3 Curcny'
p = regexp(s,'\d')
s(min(p)
s(min(p)-1:max(p))
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top