I would start by doing some ad-hoc queries. Assuming that you have all the documents in a directory and that you have an XSLT or query processor like Saxon that can read all the documents in a directory using the collection() function, you could start with
<xsl:for-each-group select="collection('dir?select=*.xml')" group-by="node-name(*)">
<e name="name(*)" count="count(current-group())"/>
</xsl:for-each-group>
to see whether it's useful to group them by top-level element name.
You could then perhaps select one representative document for each top-level element name and use a tool to generate a schema for that document, then run a similar query to validate all the documents in that group against that schema (for this you will need a schema-aware XSLT or XQuery processor).
(Most of the IDE's such as oXygen include a tool to generate a schema from an instance. But I'm not aware of a tool that can be invoked programmatically.)
After this it depends a little on what you discover...