How much data is getting written? I bet its not writing because you haven't collected enough to trigger a flush to HDFS with the default configuration parameters. There are a number of ways to configure the HDFS sink so that it flushes in a predictable way. You can set it so it flushes on a number of events (hdfs.rollCount
), on an interval (hdfs.rollInterval
), or on a size (hdfs.rollSize
). What is happening is when you kill the agent, it cleans up what it is doing currently and flushes... so basically you are forcing it by killing it.
You can also try lowering hdfs.batchSize
.
Remember that Hadoop likes larger files. You should try to avoid lots of small files, in general. So be careful here on rolling too often.
Running it in the foreground like you are, ctrl+c or kill are the only real ways to stop it. In production you should probably be using the init scripts, which have start/stop/restart.