IOException using Spring Data Hadoop Classpath Resources
-
06-12-2019 - |
Question
Whenever I specify a resource using the Spring Data Hadoop namespace, by application throws an IOException
when loading the file specified. The file definitely exists and is of a valid format.
Spring Data Hadoop XML config:
Stack trace on startup:
Caused by: java.lang.RuntimeException: java.io.IOException: Stream closed
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1231)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1103)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1037)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:415)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:860)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1380)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at com.mendeley.swets.config.HdfsConfig.fileSystem(HdfsConfig.java:28)
at com.mendeley.swets.config.HdfsConfig$$EnhancerByCGLIB$$38b1feb7.CGLIB$fileSystem$0(<generated>)
at com.mendeley.swets.config.HdfsConfig$$EnhancerByCGLIB$$38b1feb7$$FastClassByCGLIB$$3c3c119d.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228)
at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:280)
at com.mendeley.swets.config.HdfsConfig$$EnhancerByCGLIB$$38b1feb7.fileSystem(<generated>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:149)
... 41 more
Caused by: java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream.read(XMLEntityManager.java:2932)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:704)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:186)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:772)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162)
... 61 more
Solution
This has been fixed in trunk and will be available in the next milestone. See the spring forum post [1] for more information.
OTHER TIPS
Chris is actually right. I ran into a similar problem (IOException: stream closed), and the problem is caused by reading from a stale stream. I am guessing, Deejay, that you are using something along these lines to read custom resource from your classpath:
<hdp:configuration resources="classpath:/custom-site.xml"/>
, and then obtaining a FileSystem as FileSystem.get(conf)
.
After spending sometime with a debugger, it looks like the problem is caused by a combination of Spring's ConfigurationFactoryBean
and Apache Hadoop's Configuration
objects. If you look at the source code for Spring Hadoop on github (yes, it is available there), Spring Hadoop looks like a combination of Spring Settings and Apache Hadoop API underneath.
An input stream is opened in Spring to parse the custom resource, and is closed after reading it. The method, get
, from FileSystem
subsequently reloads the same stream, which is already closed, and reads again throwing the IOException: stream closed
error.
A workaround, similar to the examples on github, is to use Spring properties and SpEl (Spring Expression Language) to substitue the configurations you need for the necessary fields. The other option is probably to write your own ConfigurationFactoryBean
that will create a new Configuration
instance using the existing one as parent, and adding resources as URLs.
Hope this somewhat helps.