سؤال

I have a text file which has numbered entries, a timecode and a transcript. I am trying to remove the line breaks in the transcript and leave the others. I'm trying to use grep or awk.

File is like

1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.

2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit,
and it's formatted into two lines

3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on
these long lines, leaving all other formatting.

4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single
line no matter how long that line.

Output would look like:

1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.

2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit, and it's formatted into two lines

3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on these long lines, leaving all other formatting.

4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single line no matter how long that line.

thanks to all who have provided help

هل كانت مفيدة؟

المحلول

Don't rely on lines starting (or not) with any specific characters - just attach the 4th and subsequent lines in each record to the end of the 3rd line of that record:

$ awk '
BEGIN { RS=ORS=""; FS=OFS="\n" }
{
    print $1,$2,$3
    for (i=4;i<=NF;i++)
        printf " %s", $i
    print "\n\n"
}
' file
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.

2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit, and it's formatted into two lines

3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on these long lines, leaving all other formatting.

4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single line no matter how long that line.

نصائح أخرى

I think you need something like

awk '/[0-9]+/,/^$/{ if(NR<3) print $0; else {while($0!=""){ printf $0;next; }}}' file

It's not working, but you may get the idea.

You can try something like this with awk:

awk '!NF{print}/[a-z]/{printf "%s ", $0;next}1' file

$ cat file
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.

2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit,
and it's formatted into two lines

3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on
these long lines, leaving all other formatting.

4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single
line no matter how long that line.

$ awk '!NF{print}/[a-z]/{printf "%s ", $0;next}1' file
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.

2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit,  and it's formatted into two lines

3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on these long lines, leaving all other formatting.

4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single line no matter how long that line.

Delete all new lines that are preceded by a letter or a space or tab:

perl -pe 's/([a-zA-Z \t])\n$/$1/'

I had the same problem and wrote this little code, which solved my problem:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
FILE *quelle,*ziel;
int i;
long maxsub,count,tmp,sub;  
char puffer[10][200], *ptr,line[400];

if(argc != 3)
    {
    printf("Usage: srtlinejoin Filename CountOfSubtitles\n");
    return EXIT_FAILURE;
    }   

maxsub = strtol( argv[2], &ptr, 10);

if( (quelle=fopen(argv[1],"r")) == NULL) {
        fprintf(stderr, "Can't open %s\n", argv[1]);
        return EXIT_FAILURE;
    }
if( (ziel=fopen("out.srt","w")) == NULL) {
        fprintf(stderr, "Can't open out.srt\n");
    fclose(quelle);
        return EXIT_FAILURE;
    }

//read and write first line
fgets(puffer[0], 200, quelle);
fputs(puffer[0], ziel);

for(count=1; count < maxsub;count++)
//for(count=1; count <= 3;count++)
    {
    //printf("Processing subtitle %d\n",count);

    tmp=0;
    //Read and write time
    fgets(puffer[0], 200, quelle);
    fputs(puffer[0], ziel);

    do  {
        fgets(puffer[tmp], 200, quelle);
        //Scan for next Subtitle
        sub = strtol( puffer[tmp], &ptr, 10);
        tmp++;
        }
    while(sub != (count+1));

    //Der Untertitel hat nur eine Zeile
    if (strlen(puffer[1]) == 2)
        {
        fputs(puffer[0], ziel);     //New Subtitle
        fputs(puffer[1], ziel);     //Next empty line
        fputs(puffer[2], ziel);     //Next number
        }

    //Der Untertitel hat zwei Zeile
    if ((strlen(puffer[1]) > 2) && (strlen(puffer[2]) == 2))
        {
        for(i=0;i<400;i++)
            line[i]=0;

        strncpy(line,puffer[0],(strlen(puffer[0])-2));
        strcat(line," ");
        strcat(line,puffer[1]);
        fputs(line, ziel);      //New Subtitle
        fputs(puffer[2], ziel); //Next empty line
        fputs(puffer[3], ziel); //Next number
        }

    //Der Untertitel hat mehr als zwei Zeile
    if ((strlen(puffer[1]) == 2) && (strlen(puffer[2]) == 2))
        {
        printf("Attention: The subtitles has more than two lines\n");
        }
    }

printf("Check last subtitle!\n");

fclose(quelle);
fclose(ziel);
return EXIT_SUCCESS;
}
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top