Question

This is workfile.txt

 NC_001778

NC_005252

NC_004744

NC_003096

NC_005803

I want to read it in array and have only the string without spaces or lines . this code does what I want on my laptop but it's not working on the linux desktop!

  @nodes=<nodefile>;
  chomp @nodes; 

foreach my $el(@nodes){
        chop ($el);
   }
print Dumper @nodes;
#output: `bash-4.2$ perl main.pl
';AR1 = 'NC_000893
';AR2 = 'NC_001778
';AR3 = 'NC_005252
';AR4 = 'NC_004744
';AR5 = 'NC_003096
';AR6 = 'NC_005803

`

    #hexdump -C workfile.txt |head -20

00000000  4e 43 5f 30 30 30 38 39  33 0d 0d 0a 4e 43 5f 30  |NC_000893...NC_0|
00000010  30 31 37 37 38 0d 0d 0a  4e 43 5f 30 30 35 32 35  |01778...NC_00525|
00000020  32 0d 0d 0a 4e 43 5f 30  30 34 37 34 34 0d 0d 0a  |2...NC_004744...|
00000030  4e 43 5f 30 30 33 30 39  36 0d 0d 0a 4e 43 5f 30  |NC_003096...NC_0|
00000040  30 35 38 30 33 0d 0d 0a  4e 43 5f 30 30 36 35 33  |05803...NC_00653|
00000050  31 0d 0d 0a 4e 43 5f 30  30 34 34 31 37 0d 0d 0a  |1...NC_004417...|
00000060  4e 43 5f 30 31 33 36 33  33 0d 0d 0a 4e 43 5f 30  |NC_013633...NC_0|
00000070  31 33 36 31 38 0d 0d 0a  4e 43 5f 30 30 32 37 36  |13618...NC_00276|
00000080  31 0d 0d 0a 4e 43 5f 30  31 33 36 32 38 0d 0d 0a  |1...NC_013628...|
00000090  4e 43 5f 30 30 35 32 39  39 0d 0d 0a 4e 43 5f 30  |NC_005299...NC_0|
000000a0  31 33 36 30 39 0d 0d 0a  4e 43 5f 30 31 33 36 31  |13609...NC_01361|
000000b0  32 0d 0d 0a 4e 43 5f 30  30 32 36 34 36 0d 0d 0a  |2...NC_002646...|
000000c0  4e 43 5f 30 30 34 35 39  35 0d 0d 0a 4e 43 5f 30  |NC_004595...NC_0|
000000d0  30 32 37 33 34 0d 0d 0a  4e 43 5f 30 30 34 35 39  |02734...NC_00459|
000000e0  38 0d 0d 0a 4e 43 5f 30  30 34 35 39 34 0d 0d 0a  |8...NC_004594...|
000000f0  4e 43 5f 30 30 38 34 34  38 0d 0d 0a 4e 43 5f 30  |NC_008448...NC_0|
00000100  30 34 35 39 33 0d 0d 0a  4e 43 5f 30 30 32 36 34  |04593...NC_00264|
00000110  37 0d 0d 0a 4e 43 5f 30  30 32 36 37 34 0d 0d 0a  |7...NC_002674...|
00000120  4e 43 5f 30 30 33 31 36  33 0d 0d 0a 4e 43 5f 30  |NC_003163...NC_0|
00000130  30 33 31 36 34 0d 0d 0a  4e 43 5f 30 32 30 31 35  |03164...NC_02015|

any suggestion ? thanks in advance

Était-ce utile?

La solution

The problem is that you have Windows line endings in this file, which is why when you use linux, your chomp is not removing line endings properly. It does not explain why chop does not remove the last character, which should be \r after chomp.

Your output

';AR6 = 'NC_005803

Indicates that the last character in the string is in fact \r. This is not an actual problem with the string, just with the visual representation. If you want to see this character written out literally, you can use the option

$Data::Dumper::Useqq = 1;

Which will then produce the output

$VAR6 = "NC_005803\r";

How to fix it?

A simple fix is to use the dos2unix utility in linux to fix the file. To fix it in Perl, you can do something like

s/[\r\n]*\z// for @nodes;  # remove all \r and \n  from end of string
s/\s*\z// for @nodes;      # remove all whitespace from end of string
s/\r//g   for @nodes;      # remove all \r from string
tr/\r//d  for @nodes;      # same
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top