We distribute the orc tools in our native-toolchain. They are useful for debugging orc issues, e.g. dumping the metadata, contents or stats of an orc file. Note that they are in the same version of the liborc.a that Impala uses. So we can use them to simplify our debugging.

Step-by-step guide

  1. Check the missing dynamic libraries and add their folder in env variable LD_LIBRARY_PATH
  2. Use the tools with LD_LIBRARY_PATH set

First check whether the required dynamic libs can be found. You may see errors like this:

$ /home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/orc-1.7.0-p4/bin/orc-contents
/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/orc-1.7.0-p4/bin/orc-contents: error while loading shared libraries: libprotobuf.so.15: cannot open shared object file: No such file or directory

Use ldd to check what are missing:

$ ldd $IMPALA_TOOLCHAIN_PACKAGES_HOME/orc-1.7.0-p4/bin/orc-contents 
	linux-vdso.so.1 =>  (0x00007fff7aa7f000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd249f20000)
	libprotobuf.so.15 => not found
	libsnappy.so.1 => not found
	liblz4.so.1 => /usr/lib/x86_64-linux-gnu/liblz4.so.1 (0x00007fd249d08000)
	libstdc++.so.6 => /home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/orc-1.7.0-p4/bin/../lib/libstdc++.so.6 (0x00007fd249985000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd24967c000)
	libgcc_s.so.1 => /home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/orc-1.7.0-p4/bin/../lib/libgcc_s.so.1 (0x00007fd249466000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd24909c000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd24a13d000)

The orc-contents tool is missing protobuf and snappy. They actually exist in our native-toolchain. Add their folder in LD_LIBRARY_PATH so orc-contents can find them.

$ LD_LIBRARY_PATH=$IMPALA_TOOLCHAIN_PACKAGES_HOME/snappy-1.1.8/lib:$IMPALA_TOOLCHAIN_PACKAGES_HOME/protobuf-3.5.1/lib $IMPALA_TOOLCHAIN_PACKAGES_HOME/orc-1.7.0-p4/bin/orc-contents 
Usage: orc-contents <filename> [--columns=1,2,...]
Print contents of <filename>.
If columns are specified, only these top-level (logical) columns are printed.

I prefer setting LD_LIBRARY_PATH just for the command instead of exporting it to avoid polluting my dev env.