Question

I am trying to output the begin-timestamps of periods of silence (since there is background noise, by silence I mean a threshold) in a given audio file. Eventually, I want to split the audio file into smaller audio files, given these timestamps. It is important that no part of the original file be discarded.

I tried

sox in.wav out.wav silence 1 0.5 1% 1 2.0 1% : newfile : restart

(courtesy http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/)

Although, it somewhat did the job, it also trimmed and discarded the periods of silence, which I do not want happening.

Is 'silence' the right option, or is there a simpler way to accomplish what I need to do?

Thanks.

Was it helpful?

Solution 2

There is (currently, at least) no way to make the silence effect output the position where it has detected silence, or to retain all of the silent audio.

If you are able to recompile SoX yourself, you could add an output statement yourself to find out about the cut positions, then use trim in a separate invocation to split the file. With the stock version, you are out of luck.

OTHER TIPS

Unfortunately not Sox, but ffmpeg has a silencedetect filter that does exactly what you're looking for:

ffmpeg -i in.wav -af silencedetect=noise=-50dB:d=1 -f null -

(detecting threshold of -50db, for a minimum of 1 seconds, cribbed from the ffmpeg documentation)

...this would print a result like this:

Press [q] to stop, [?] for help
[silencedetect @ 0x7ff2ba5168a0] silence_start: 264.718
[silencedetect @ 0x7ff2ba5168a0] silence_end: 265.744 | silence_duration: 1.02612
size=N/A time=00:04:29.53 bitrate=N/A

SoX can easily give you the timestamps of the actual silences in a text file. Not periods of silence though, but you can calculate those with a simple script

   .dat   Text  Data  files.   These  files  contain a textual representation of the sample data.  There is one line at the beginning that contains the sample
          rate, and one line that contains the number of channels.  Subsequent lines contain two or more numeric data intems: the time since the beginning  of
          the first sample and the sample value for each channel.

          Values are normalized so that the maximum and minimum are 1 and -1.  This file format can be used to create data files for external programs such as
          FFT analysers or graph routines.  SoX can also convert a file in this format back into one of the other file formats.

          Example containing only 2 stereo samples of silence:

              ; Sample Rate 8012
              ; Channels 2
                          0   0    0
              0.00012481278   0    0

So you can do sox in.wav out.dat, then parse the text file and consider a silence a sequence of rows with a value close to 0 (depending on your threshold)

necroposting: You can run a separate script that iterates all of the sox output files, (for f in *.wav), and use the command; soxi -D $f to obtain the DURATION of the sound clip. Then, get the system time in seconds date "+%s", then subtract to find the time the recording starts.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top