Linux AWK 두 개의 CSV 파일 비교 및 플래그가있는 새 파일 만들기

https://stackoverflow.com/questions/9528202

15-11-2019
|

문제

I 비교하고 새로 형식화 된 파일에 차이를 가져와야하는 2 개의 CSV 파일이 있습니다. 샘플은 아래에 주어집니다.

이전 파일

DTL,11111111,1111111111111111,11111111111,Y,N,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,Y,cc,cc
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd
DTL,44444444,4444444444444444,44444444444,Y,Y,ss,ss
DTL,55555555,5555555555555555,55555555555,Y,Y,qq,qq

새 파일

DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc
DTL,44444444,4444444444444444,44444444444,Y,Y,ss,ss
DTL,55555555,5555555555555555,55555555555,Y,Y,qq,qq
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee

출력 파일

이전 및 새 CSV 파일을 비교하고 새 파일에서 영향을받은 변경 사항을 찾고 이러한 변경 사항을 나타내는 플래그를 업데이트하고 이러한 변경 사항을 나타냅니다

U - 새 파일 레코드가 업데이트되면 d - 이전 파일에 존재하는 레코드가 새 파일에서 삭제 된 경우 n - 새 파일에 존재하는 레코드가 이전 파일에서 사용할 수없는 경우

샘플 출력 파일은 이것입니다.

DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N

i diff 명령을 사용했으나 업데이트 된 레코드도 반복됩니다.

 DTL,11111111,1111111111111111,11111111111,Y,N,xx,xx
 DTL,22222222,2222222222222222,22222222222,Y,Y,cc,cc
 DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd
  ---
 DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
 DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc
 5a5
 DTL,77777777,7777777777777777,77777777777,N,N,ee,ee

나는 내 레코드를 필터링하기 위해 AWK 단일 행 명령을 사용했습니다

 awk 'NR==FNR{A[$1];next}!($1 in A)' FS=: old.csv new.csv

이 문제가있는 문제는 이전 파일에만 속한 레코드를 사용할 수 없습니다. 이것은 입니다

DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd

나는 공동 보쉬 스크립트를 시작했지만이 일을하지만 좋은 예를 가진 많은 도움을 얻지 못했습니다.

 myscript.awk

BEGIN { 
        FS = ","    # input field seperator 
        OFS = ","   # output field seperator
}

NR > 1 {
    #flag 
    # N - new record  D- Deleted U - Updated

id = $1
    name = $2
    flag = 'N'

   # This prints the columns in the new order. The commas tell Awk to use the     character set in OFS
    print id,name,flag
}

 >> awk -f  myscript.awk  old.csv new.csv > formatted.csv

해결책

This might work for you:

diff  -W999 --side-by-side OLD NEW |
sed '/^[^\t]*\t\s*|\t\(.*\)/{s//\1 U/;b};/^\([^\t]*\)\t*\s*<$/{s//\1 D/;b};/^.*>\t\(.*\)/{s//\1 N/;b};d'
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N

an awk solution along the same lines:

diff -W999 --side-by-side OLD NEW |
awk '/[|][\t]/{split($0,a,"[|][\t]");print a[2]" U"};/[\t] *<$/{split($0,a,"[\t]* *<$");print a[1]" D"};/>[\t]/{split($0,a,">[\t]");print a[2]" N"}'
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N

다른 팁

A good starting point would probably be:

 diff -e OLD NEW

This outputs:

 5a
 DTL,77777777,7777777777777777,77777777777,N,N,ee,ee
 .
 1,3c
 DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
 DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc

Meaning that it Added a record on line 5 (5a) and changed the records on lines 1 and 3 (1,3c).

If you can't use this format as-is (which would be good to use a standard) then you would need to write a script which converts it to the format that you describe.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow