Question

I have extracted data from one source to .txt file. Source is some sort of address book and I used macro recorder for extraction. Now I have several files which are formated exactly in next way (example on 4 contacts):

Abbrucharbeiten
ATR Armbruster
Werkstr. 28
78727 Oberndorf  
Tel. 0175 7441784
Fax 07423 6280
Abbrucharbeiten
Jensen & Sohn, Karl
Schallenberg 6A
25587 Münsterdorf
Tel. 04821 82538
Fax 04821 83381
Abbrucharbeiten
Kiwitt, R.
Auf der Heide 54
48282 Emsdetten
Tel. 02572 88559
Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau
Josef Grabmeier GmbH
Reitgesing 1
85560 Ebersberg
Tel. 08092 24701-0
Fax 08092 24701-24

1st row is always field(name) of bussines 2nd row is always name of company/firm 3rd row is always street adress 4th row is always Zip code and Place and then 5th row and next couple of rows (sometimes are two rows sometimes more) are eithar Tel. or Fax.

I want to format it so it would be something like excel sheet like:

Branche:    Name:     Address:   Place:    contact1:   contact2:
1st row     2nd row   3rd row    4th row   5th row     6th row.....

Now the main problem is I have over 500.000 contacts and my main problems are last fields which aren't always the same number... I don't wan't to do it manually, please help me...

Was it helpful?

Solution

Neither python nor visual basic but shouldn't be very difficult to translate to those languages. This is perl.

perl -lne '
        ## Print header. Either the header and data will be separated with pipes.
        ## Contacts(contact1, contact2, etc) are not included because at this 
        ## moment I can not know how many there will be. It could be done but script
        ## would be far more complex.
        BEGIN { 
                push @header, q|Branche:|, q|Name:|, q|Address:|, q|Place:|;
                printf qq|%s\n|, join q{|}, @header;
        }

        ## Save information for each contact. At least six lines. Over that only
        ## if lines begins with strings "Tel" or "Fax".
        if ( $line < 6 || m/\A(?i)tel|fax/ ) {
                push @contact_info, $_;
                ++$line;

                ## Not skip the printing of last contact.
                next unless eof;
        }

        ## Print info of contact, initialize data structures and repeat process
        ## for the next one.
        printf qq|%s\n|, join q{|}, @contact_info;

        $line = 0;
        undef @contact_info;

        push @contact_info, $_;
        ++$line;

' infile

It's a one-liner (I know it doesn't seem, but you can get rid of comments and remove newlines to get it), so run it directly from your shell. It yields:

Branche:|Name:|Address:|Place:
Abbrucharbeiten|ATR Armbruster|Werkstr. 28|78727 Oberndorf  |Tel. 0175 7441784|Fax 07423 6280
Abbrucharbeiten|Jensen & Sohn, Karl|Schallenberg 6A|25587 Münsterdorf|Tel. 04821 82538|Fax 04821 83381
Abbrucharbeiten|Kiwitt, R.|Auf der Heide 54|48282 Emsdetten|Tel. 02572 88559|Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau|Josef Grabmeier GmbH|Reitgesing 1|85560 Ebersberg|Tel. 08092 24701-0|Fax 08092 24701-24

Take into account that I didn't print the full header and that fields are separated with pipes. I think that is not problematic to import it in Excel.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top