but that's not optimal
What makes you think that? OpenGL and modern GPUs are designed on the grounds, that in the worst case you have to redraw the whole thing anyway and things should perform well in that situation, too.
Or is there a better way to do this?
Yes: Redraw the whole scene. (Or what I suggest below)
To a modern low-end GPU which easily capable of throwing tens of millions of triangles to the screen per second, the few hundred to a thousand triangles of a 2D GUI are neglectible.
In fact your copying-stuff-around will probably a worse performance hit, than redrawing everything, because copying from the front to the back buffer is not a very performant operation and causes serious synchronization issues.
If you want to cache things you might split your GUI into separate widgets of content, which you draw to individually using FBOs into textures – design it in a way that widgets may overlap. You redraw a widget if its contents change. For drawing the whole window you just redraw the full window contents from the textures into the main framebuffer.