However when I walk through the examples it's all about recognising with very small amount of vocabulary. Is there any good tutorial to help config it to recognise something more challenging, e.g. a dialog between two people.
You do not need to configure sphinx4. You can just checkout the latest version from subversion and use the demo as is, for more information see the tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4
When I believe is sphinx4 already included the right acoustic models and dictionaries, but the lm file is for the specific applications, so I'm needing a lm file, am I correct?
Default lm file provided is good enough for generic speech, however, if you have specific domain it makes sense to create your domain-specific language model.