Question

I am making a large catalogue of all of the possible OS names that can be supported by my particular version of VMWare. Originally I was writing them all as they stood in the VMX files but then I found a website that had them all listed, the problem is they are not properly cased to provide a "perfect" match, would this be the perfect time to use the regex attribute for case insensitivity?

Also as a side question, would it be possibly extract the list of OSs from the website?. They look to be in a HTML formated chart. It would save me a lot of time having to type them all out.

I looked at HTML::Table extract, and I don't really understand how to use it. As far the table is concerned I was able to find the section in the websites code and I copied to a new html file so I can have it on my desktop.

This is odd, I am probably missing something. But I am not able to match with case insensitivity. When end my regex with /xmi I get this output;

Use of uninitialized value $guest_os in concatenation (.) or string at discovery4.pl line 146.

Which I have discovered mean that there is no match to associate to the scalar I am trying to print.

Anyhow I know I am having a problem with it not wanting to match with no case because if I modify winnetstandard to winNetStandard it works and says,; Windows Server 2003, Standard Edition. Which is what it should say.

Was it helpful?

Solution

HTML::TableExtract can be helpful. As far as matching goes, I'm not sure what it is that you are trying to match; if you are just comparing two names, uc($foo) eq uc($bar) makes more sense. But if you have a regex and want the whole match to be case insensitive, /i will do that.

Ah, so you want to get the supported os names and assemble them into a regex and match using it? Then, given @osnames, you might want something like this:

my $osnames = join('|', map quotemeta, sort { length($b) <=> length($a) } @osnames);
my $regex = qr/guestOS\s*=\s*"(?i:$osnames)"/;

The ?i: limits the scope of case insensitivity to just the OS names; only if you want guestOS to also be case insensitive would you use /i (and (?:$osnames)).

OTHER TIPS

This would be the right time to use the /i attribute, as changing the case can't really harm anything. What I would do to get the list of Operating Systems would be to copy the html of the sections where the list is, use regex on the list so that it outputs in the format you need it to, and then use the outputted text.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top