Finding punctuation and counting the number of each from the Unix Command line

https://stackoverflow.com/questions/8003456

21-02-2021
|

Pregunta

I want find all of the punctuation marks used my .txt file and give a count of the number of occurrences of each one. How would I go about doing this?? I am new at this but I am trying to learn! This is not homework! I have been doing research on grep and sed right now.

Solución

$ perl -CSD -nE '$seen{$1}++ while /(\pP)/g; END { say "$_ $seen{$_}" for keys %seen }'  sometextfile.utf8

As in

$ perl -CSD -nE '$seen{$1}++ while /(\pP)/g; END { say "$_ $seen{$_}" for keys %seen }' programming_perl_4th_edition.pod | sort -k2rn
, 21761
. 19578
; 10986
( 8856
) 8853
- 7606
: 7420
" 7300
_ 5305
’ 4906
/ 4528
{ 2966
} 2947
\ 2258
@ 2121
# 2070
* 1991
' 1715
“ 1406
” 1404
[ 1007
] 1003
% 881
! 838
? 824
& 555
— 330
‑ 72
– 41
‹ 16
› 16
‐ 10
⁂ 10
… 8
· 3
「 2
」 2
« 1
» 1
‒ 1
― 1
‘ 1
• 1
‥ 1
⁃ 1
･ 1

If you want not just punctuation but punctuation and symbols, use [\pP\pS] in your pattern. Don’t use old-style POSIX classes whatever you do, though.

Otros consejos

Use sed, tr, sort and uniq (and no perl):

sed -E 's/[^[:punct:]]//g;s/(.)/\1x/g' myfile.txt | tr 'x' '\n' | sort | uniq -c

I did it this way (sed + tr) so it will work on both unix and mac. Mac needs an imbedded linefeed in the sed command, but unix can use \n. This way it works everywhere.

This will work on non-mac unix:

sed -E 's/[^[:punct:]]//g;s/(.)/\1\n/g' myfile.txt | sort | uniq -c

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow