The process is actually the opposite. You don't convert texture coordinates to the clip space but you convert coordinates from the clip space to the texture. For this to work you need to pass the light camera and projection to the fragment shader (and pass the position from vertex shader to fragment shader at least in OpenGLES 2.0, don't know about OpenGL 3.3).
- Multiply position by camera and projection and you'll get the position in the light's view.
- Divide xyz by w and you'll get the position in light view's clip space.
- Multiply x and y by 0.5 and add 0.5 and you'll have the uv coordinates of the shadow map.
Now you can read the depth from the shadow map at uv and compare it to your pixel's depth.