If you are using iOS 7, you can follow this link:
http://blog.xamarin.com/snapshotting-views-in-ios-7/
Which talks about methods like
DrawViewHierarchy – Renders a view hierarchy snapshot.
SnapshotView – Renders a snapshot of a view to a new view.
ResizableSnapshotView – Renders a snapshot of a view to a new view with resizeable insets.
Basically, you add the caption as a SUB View of the image view and then use one of those methods.
OR
if you want to support any ios, you have to draw the contents of the views to a new imagecontext similar to this:
How to create an image from a UIView / UIScrollView
The difference is that you will render both, the image view's layer AND the label's layer too into the same context.
The server part is a different question.