I’m providing the commaseparatedfilenames to the FileInputFormat in MapReduce Job. My total size of the data is 30Gb compressed snappy orc files.
When my map reduce job is starting immediately after 30secs it is failing with the OOM Error
2024-07-31 00:59:02,572 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuffer.append(StringBuffer.java:270)
at org.apache.xerces.dom.DeferredDocumentImpl.getNodeValueString(Unknown Source)
at org.apache.xerces.dom.DeferredDocumentImpl.getNodeValueString(Unknown Source)
at org.apache.xerces.dom.DeferredTextImpl.synchronizeData(Unknown Source)
at org.apache.xerces.dom.CharacterDataImpl.getData(Unknown Source)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2775)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2663)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2559)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1340)
at org.apache.hadoop.mapreduce.v2.util.MRWebAppUtil.initialize(MRWebAppUtil.java:51)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1498)
Does Map Reduce Job tries to load complete input data into memory and run or it executes file by file ??