It seems like you need run two Flume agents:
Agent1: HtmlPagesSource -> channel -> PageParser (extends AvroSink and overrides process method that can parse input and write many slim messages)
Agent2: AvroSource -> channel -> WhateverSinkGoesNext
Look for some examples of chaining Flume data flows: http://www.ibm.com/developerworks/library/bd-flumews/#N10081