All texture transactions also flow through the L2 cache, so generally speaking it's rare for texture to ever be slower than L2. You can think of the texture cache on Kepler like an alternate L1 cache for read-only data.
The specifics of the texture cache are complex and aren't really well documented, but it is a very high performance cache, especially for streaming access patterns with some amount of locality and reuse between different threads in the same warp or thread block.
One important point about the texture cache is that, unlike a CPU L1 cache, it's not really designed to decrease latency. Rather, it is designed to be a "bandwidth aggregator" which aggregates simultaneous loads from many threads and tries to stream the results back to the processing units as efficiently as possible. This means that the memory system can fetch the same amount of data in fewer total transactions.
Without more info, it's difficult to say whether use of the texture cache (for instance via the ldg instrinsic on Kepler) will improve performance for any particular access pattern, but if your kernel is bandwidth bound it is usually worth a try.
In regards to your specific point, yes, data which hits in the texture cache will not have to go out to the L2. However, again due to specifics of the texture cache, this is usually a smaller effect than the impact of bandwidth aggregation which results in fewer total memory transactions send to the L2.