Question

I'd like to strip non ASCII characters from a variable. I've not had success with more elegant methods, so I'm using compress and nominating the characters I'd like to keep (because I don't know the ones I'd like to remove). It works except I'd like to keep both characters " and ' but I can't pass both of these characters into the compress function correctly.

data _null_;
  _text='#AB'!!byte(13)!!'C"D';
  _text_select=compress(_text,"ABCDEFGHIJKLMNOPQRSTUVWXYZ /-1234567890(),.'&?;=%:+><`[]*#","k");
  put _text;
  put _text_select;
run;
Was it helpful?

Solution

First off, if your concern is 'control' characters, the 'c' option is a good one.

compress(textstr,,'c');

That removes things in the early part of ASCII like line feeds, tabs, etc. (Probably, the first 16 characters from '00'x to '0F'x, and possibly '07'x, though I've never seen an exact definition.)

If you want to keep basically 'printable characters', the 'w' option is helpful.

compress(textstr,,'kw');

Your method can be made to work, if it's the only way you can figure to do exactly what you want, by escaping the quote with another quote.

compress(_text,"ABCDEFGHIJKLMNOPQRSTUVWXYZ /-1234567890(),.'&?;=%:+><`[]*#""","k");

You could also use "p" to keep all punctuation marks. In fact, you could certainly simplify this at least.

data _null_;
  _text='#AB'!!byte(13)!!'C"D';
  _text_select=compress(_text," /-()&=%+><` []*#","knp");
  put _text;
  put _text_select;
run;

I'm not entirely sure of what is officially a 'punctuation mark', likely the - is also one, and possibly ().

Edit: Here's a good way to test what's kept (in the official ASCII set, ie, up to '7F'x):

data test;
  length _text $255;
  do _t = 1 to 255;
    _text =byte(_t)||_text;
  end;
  _text_select=compress(_text," /-(),.'&""?;=%:+><`[]*#","kn");
  put _text=;
  put _text_select=;
run;

P seems to keep a lot of stuff that's a bit weirder, some of which are clearly not punctuation, so obviously SAS did something wrong there. I'm tempted to write a trouble ticket, honestly, as it definitely isn't doing what it clearly should be.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top