Eliminar una variedad de líneas en un archivo de texto

https://stackoverflow.com/questions/1617568

06-07-2019
|

Pregunta

He estado tratando de implementar un script de bash que se lee de la base de datos en línea de wordnet y me he estado preguntando si hay una manera de eliminar una variedad de archivos de texto con un solo comando.

Ejemplo FileDump:

**** Noun ****
(n)hello, hullo, hi, howdy, how-do-you-do (an expression of greeting) "every morning they exchanged polite hellos"
**** Verb ****
(v)run (move fast by using one's feet, with one foot off the ground at any given time) "Don't run--you'll be out of breath"; "The children ran to the store"
**** Adjective ****
(adj)running ((of fluids) moving or issuing in a stream) "as mountain stream with freely running water"; "hovels without running water"

Solo necesito eliminar las líneas que describen aspectos de la gramática, p. ej.

**** Noun ****
**** Verb ****
**** Adjective ****

De modo que tengo un archivo limpio con solo las definiciones de las palabras:

(n)hello, hullo, hi, howdy, how-do-you-do (an expression of greeting) "every morning they exchanged polite hellos"
(v)run (move fast by using one's feet, with one foot off the ground at any given time) "Don't run--you'll be out of breath"; "The children ran to the store"
(adj)running ((of fluids) moving or issuing in a stream) "as mountain stream with freely running water"; "hovels without running water"

Los * símbolos alrededor de los términos gramaticales me están disparando en sed.

Solución

Si desea seleccionar líneas enteras de un archivo basándose únicamente en el contenido de esas líneas, grep es probablemente la herramienta más adecuada disponible. Sin embargo, algunos caracteres, como sus estrellas, tienen significados especiales para grep , por lo que deben ser " escapados " con una barra invertida. Esto imprimirá solo las líneas que comienzan con cuatro estrellas y un espacio:

grep "^\*\*\*\* " textfile

Sin embargo, desea mantener las líneas que no coinciden con eso, por lo que necesita la opción -v para grep que sí solo eso: imprime las líneas que no coinciden con el patrón.

grep -v "\*\*\*\* " textfile

Eso debería darte lo que quieres.

Otros consejos

sed '/^\*\{4\} .* \*\{4\}$/d'

o un poco más flojo

sed '/^*\{4\}/d'

 sed 's/^*.*//g' test | grep .

# awk '!/^\*\*+/' file
(n)hello, hullo, hi, howdy, how-do-you-do (an expression of greeting) "every morning they exchanged polite hellos"
(v)run (move fast by using one's feet, with one foot off the ground at any given time) "Don't run--you'll be out of breath"; "The children ran to the store"
(adj)running ((of fluids) moving or issuing in a stream) "as mountain stream with freely running water"; "hovels without running water"

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow