Remember that SIP's offer/answer model allows for either the caller or the callee to send the offer.
If you cannot describe the desired media before you've seen the sender's media streams, why not have the sender supply the offer, and base your answer on that?