Linux write into same file with awk and tee: very odd behaviour

https://stackoverflow.com/questions/21149972

28-09-2022
|

Domanda

I was trying to do something unusual and overwrite the same file on Unexpected new line when writing out in Unix Shell Script this question just out of curiosity

I found that on some attempts I could tee > to_same_file and it worked as you can see on the very first attempt and then subsequent attempts produced an empty file, my assumption is, this must be related to processing time.. Meaning on the first attempt it took longer to get to tee and had time to i/o where as it happens faster on the other attempts and has no chance to write to the file in time that it has read it.. just interested to understand why this odd behaviour occured

me@desktop:~/$ cp 2.csv 1.csv
me@desktop:~/$ cat 1.csv
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
me@desktop:~/$ awk  -F"," '{ 
     timestamp=$5;  
     gsub(":"," ",timestamp); 
     gsub("-"," ",timestamp);   
     EPOCH=(mktime(timestamp))
     } 
     {
      print $0","EPOCH
      }' 1.csv  2>&1 | tee > 1.csv
me@desktop:~/$ cat 1.csv
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0,1388998800
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0,1388998800
me@desktop:~/$ cp 2.csv 1.csv
me@desktop:~/$ cat 1.csv 
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
me@desktop:~/$ awk  -F"," '{ 
     timestamp=$5;  
     gsub(":"," ",timestamp); 
     gsub("-"," ",timestamp);   
     EPOCH=(mktime(timestamp))
     } 
     {
      print $0","EPOCH
      }' 1.csv  2>&1 | tee > 1.csv
me@desktop:~/$ cat 1.csv 
me@desktop:~/$ cp 2.csv 1.csv
me@desktop:~/$ awk  -F"," '{ 
     timestamp=$5;  
     gsub(":"," ",timestamp); 
     gsub("-"," ",timestamp);   
     EPOCH=(mktime(timestamp))
     } 
     {
      print $0","EPOCH
      }' 1.csv  2>&1 | tee > 1.csv
me@desktop:~/$ cat 1.csv 
me@desktop:~/$ cp 2.csv 1.csv
me@desktop:~/$ cat 1.csv 
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
me@desktop:~/$ awk  -F"," '{ 
     timestamp=$5;  
     gsub(":"," ",timestamp); 
     gsub("-"," ",timestamp);   
     EPOCH=(mktime(timestamp))
     } 
     {
      print $0","EPOCH
      }' 1.csv  2>&1 | tee -a > 1.csv
me@desktop:~/$ cat 1.csv 
me@desktop:~/$

Soluzione

A small, self contained test case with the same problem is this:

cat file | tee > file

This pipeline consists of two parts that runs in parallel.

cat file tries to open and read from the file.

tee > file tries to truncate the file.

Depending on whether the file is (partially) read or truncated first, you'll get either parts or all of your data, or just an empty file.

Altri suggerimenti

What you've done is create a race condition between awk and tee. The awk process is opening 1.csv for reading while tee is being redirected to 1.csv in another process.

As is the nature of race conditions, the results are random and depend on who gets there first.

To do this safely, you'll need to to save it to a new file or use a tool like sponge.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow