I saw new version of hadoop and orc-tools libraries, so I decided to update my project.
There are my new libraries:
implementation group: ‘org.apache.hadoop’, name: ‘hadoop-hdfs’, version: ‘3.4.0’
implementation(group: ‘org.apache.hadoop’, name: ‘hadoop-common’, version: ‘3.4.0’)
implementation(group: ‘org.apache.hive’, name: ‘hive-streaming’, version: ‘4.0.0’)
implementation(group: ‘org.apache.hadoop’, name: ‘hadoop-mapreduce-client-core’, version: ‘3.4.0’)
and orc-tools:
implementation ‘org.apache.orc:orc-tools:2.0.1’
and java 11->17
I want to use the method OrcFile.mergeFiles(…) but earch time a get Exception:
Caused by: java.lang.IllegalArgumentException: Incompatible merging of collection column statistics
at org.apache.orc.impl.ColumnStatisticsImpl$CollectionColumnStatisticsImpl.merge(ColumnStatisticsImpl.java:233)
I tried to merge three identical files(i had copied one file three times), but i got the same Exception.
Next my step was to merge this files using example from https://orc.apache.org/docs/java-tools.html:
java -jar orc-tools-2.0.1-uber.jar merge –output test.orc q/
And I got next Exception:
Exception in thread “main” java.lang.UnsatisfiedLinkError: ‘org.apache.hadoop.io.nativeio.NativeIO$POSIX$Stat org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(java.lang.String)’
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:619)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:1070)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:984)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:952)
at org.apache.hadoop.fs.LocatedFileStatus.(LocatedFileStatus.java:52)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2300)
at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2280)
at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2396)
at org.apache.orc.tools.MergeFiles.main(MergeFiles.java:66)
at org.apache.orc.tools.Driver.main(Driver.java:129)
I tried to invoke:
java -jar orc-tools-2.0.1-uber.jar count q/
And I got the same Exception.
Then I download orc-tools 1.9.4 and call:
java -jar orc-tools-1.9.4-uber.jar count q/
And I got result like:
file:/C:/work/1.orc 73338
file:/C:/work/2.orc 73338
file:/C:/work/3.orc 73338
file:/C:/work/4 73338
there were 4 identical files, so obtained number are OK.
Can somebody help me? What do i do wrong?
I want to merge a few orc files to one in java, but why orc-tools 2.0.1 doesn’t work too?