Question

We have two types of logs:

1) SESSION LOG: SESSION_ID, USER_ID, START_DATE_TIME, END_DATE_TIME

2) EVENT LOG: SESSION_ID, DATE_TIME, X, Y, Z

We only need to store the event log, but would like to replace the SESSION_ID with its corresponding USER_ID. Which technologies (i.e. Flume?) should we use to store the data in HDFS?

Thanks!

Was it helpful?

Solution

Yes Flume can be used to move log files to HDFS.

To replace SESSION_ID with USER_ID - you could:

Do this using Shell Scripts - and generate 'Modified Event Log File' - This is what Flume will pick up. This would be the simplest approach.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top