Question

Really need your help. I have a file which included data like (field:value) in one line

File.A

A:13 B:2 D:5 F:92 G:3 ...

I had created a file which include "A to Z".

File.B

A B C D E F G H I J ...

And trying to use bash script to get content and fix the output which will insert the miss line with 0 value.

A:13 B:2 C:0 D:5 E:0 F:92 G:3 H:0 ...

Think over two days.. but still not thing come out from my head. Is there any way I can solve it?

Was it helpful?

Solution

Let's make brace expansion work: {A..Z} expands as all the list of letters:

$ echo {A..Z}
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Then we can loop through all lines greping. In case it matches, we print the line; otherwise, we print letter:0.

for letter in {A..Z}
do
   grep "^$letter" file || echo "$letter:0"
done

Test

$ for letter in {A..Z}; do grep "^$letter" file || echo "$letter:0"; done
A:13
B:2
C:0
D:5
E:0
F:92
G:3
H:0
I:0
J:0
K:0
L:0
M:0
N:0
O:0
P:0
Q:0
R:0
S:0
T:0
U:0
V:0
W:0
X:0
Y:0
Z:0

Now that you updated the question and the input file contains everything in the same line, you can use this grep to match:

grep -o "$word:[0-9]*" file

and then replace new lines with spaces:

$ for word in {A..Z}; do grep -o "$word:[0-9]*" file || echo "$word:0"; done | tr '\n' ' '
A:13 B:2 C:0 D:5 E:0 F:92 G:3 H:0 I:0 J:0 K:0 L:0 M:0 N:0 O:0 P:0 Q:0 R:0 S:0 T:0 U:0 V:0 W:0 X:0 Y:0 Z:0

OTHER TIPS

If you fancy a bit of awk, you could try this:

awk -F: -vRS=" " '
{ c[$1] = $2 }
END{ 
  for(i=65;i<91;++i){ 
    a=sprintf("%c", i)
    printf("%c:%d ",i,c[a])
  }
}' A

where A is your file. The first block builds an array of all the values that have been set. Once all of the file has been read, the loop goes through the ascii values of A (65) to Z (90) and prints out the values that have been set in the array. The ones that are missing are printed as 0.

Output:

A:13 B:2 C:0 D:5 E:0 F:92 G:3 H:0 I:0 J:0 K:0 L:0 M:0 N:0 O:0 P:0 Q:0 R:0 S:0 T:0 U:0 V:0 W:0 X:0 Y:0 Z:0

Since everyone clearly can't get enough from my answer, here's another way you could do it, inspired by the {A..Z} range used in @fedorqui's answer:

awk -F: -vRS=" " '
NR==FNR { a[i++] = $1; next }
{ b[$1] = $2 }
END{for(i=0;i<length(a);++i)printf("%c:%d ",a[i],b[a[i]])}' - <<<$(echo {A..Z}) A

The first block reads in all the letters of the alphabet, thus reducing the need to know their character codes. The second block builds an array from your file A. Once the file has been read, All the values are printed out, resulting in the same output as above.

Pure Bash, no external processes. Print the match if the letter is found in the line or the letter followed by 0 otherwise.

read content < "$infile"

for letter in {A..Z}; do
  if [[ $content =~ ${letter}:[[:digit:]]+ ]] ; then
    echo "${BASH_REMATCH[0]}"
  else
    echo "${letter}:0"
  fi
done

or shorter

for x in {A..Z}; do
  [[ $content =~ ${x}:[0-9]+ ]] && echo "${BASH_REMATCH[0]}" || echo "${x}:0"
done
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top