Pregunta

I want find all of the punctuation marks used my .txt file and give a count of the number of occurrences of each one. How would I go about doing this?? I am new at this but I am trying to learn! This is not homework! I have been doing research on grep and sed right now.

¿Fue útil?

Solución

$ perl -CSD -nE '$seen{$1}++ while /(\pP)/g; END { say "$_ $seen{$_}" for keys %seen }'  sometextfile.utf8

As in

$ perl -CSD -nE '$seen{$1}++ while /(\pP)/g; END { say "$_ $seen{$_}" for keys %seen }' programming_perl_4th_edition.pod | sort -k2rn
, 21761
. 19578
; 10986
( 8856
) 8853
- 7606
: 7420
" 7300
_ 5305
’ 4906
/ 4528
{ 2966
} 2947
\ 2258
@ 2121
# 2070
* 1991
' 1715
“ 1406
” 1404
[ 1007
] 1003
% 881
! 838
? 824
& 555
— 330
‑ 72
– 41
‹ 16
› 16
‐ 10
⁂ 10
… 8
· 3
「 2
」 2
« 1
» 1
‒ 1
― 1
‘ 1
• 1
‥ 1
⁃ 1
・ 1

If you want not just punctuation but punctuation and symbols, use [\pP\pS] in your pattern. Don’t use old-style POSIX classes whatever you do, though.

Otros consejos

Use sed, tr, sort and uniq (and no perl):

sed -E 's/[^[:punct:]]//g;s/(.)/\1x/g' myfile.txt | tr 'x' '\n' | sort | uniq -c

I did it this way (sed + tr) so it will work on both unix and mac. Mac needs an imbedded linefeed in the sed command, but unix can use \n. This way it works everywhere.

This will work on non-mac unix:

sed -E 's/[^[:punct:]]//g;s/(.)/\1\n/g' myfile.txt | sort | uniq -c
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top