These are commonly known as Soft 404s. The only way to detect them is by examining the content, as the page headers do not indicate any error.
If you want to build something generic, then maybe you could try fetching a page that you know for sure does not exist as use it as your reference, and compare any other page you crawl to that to determine whether it is an error page or not (you may need to use some kind of a somewhat insensitive comparison algorithm as the page content may slightly change between different pages that do not exist). Still, it will be error prone if you are going to be crawling random websites.