Pergunta

This is a line from my sample log file,

[Sun Mar 7 16:05:49 2004] [info] [client 64.242.88.10] (104)Connection reset by peer: client stopped connection before send body completed

Most lines are of the same fixed length, In SAS I used to split this based on Position/location. For example, characters in 2-24 would be the TimeStamp column, 29-31 would be MessageType and so on.

Is there way to do the same kind of location based splitting or how to work this out in Pig/MapReduce?

Foi útil?

Solução

Yes it is. In your Map Reduce code have something like

 @Override   public void map(LongWritable key, Text value, Context
 context) throws IOException, InterruptedException {

     String fields = value.toString();
      // use the start and end index based on your needs
      String date = fields.substring(2,3);

     .........
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top