Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Update Tika to 2.6.0 and current state of logging in it

...

Apache Tika include a lot of Apache and thirdparty libraries that have different approach to logging.

tika-core

tika-core 1.x should have no external dependencies to be as lightweight as it can, so we have to use
java.util.logging there whereas tika-core 2.x will use slf4j-api.

tika-parsers

Tika use slf4j-api as logging API and Apache Log4j 2.x as an implementation for modules that require it.

Important note

Since Tika 2.5.0 (released 2022-10-03) depends on slf4j-api 2.0.x which requires downstream library users to update logging backend to compatible version. Tika 2.0.0 – 2.4.1 depends on slf4j-api 1.7.x.

Otherwise you will receive something like following message:

SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProvidersImage Added for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at file:/home/gross/.gradle/caches/modules-2/files-2.1/org.jboss.slf4j/slf4j-jboss-logmanager/1.2.0.Final/baff8ae78011e6859e127a5cb6f16332a056fd93/slf4j-jboss-logmanager-1.2.0.Final.jar!/org/slf4j/impl/StaticLoggerBinder.class
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindingsImage Added for an explanation.
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console..

Updates for popular logging backends:

  • Apache Log4j 2.x: org.apache.logging.log4j:log4j-slf4j-implorg.apache.logging.log4j:log4j-slf4j2-impl
  • Logback: ch.qos.logback:logback-classic 1.2.x → 1.3.x (uses older javax.* APIs) or 1.4.x (uses jakarta.* APIs)
  • Apache Log4j 1.2.x: org.slf4j:slf4j-log4j12org.slf4j:slf4j-reload4j (though slf4j-log4j12 has relocation relocation directive to slf4j-reload4j  since 1.7.34) or migrate to the Log4j 2.x since log4j 1.2.x is in the End of Life status since 2015 and has known vulnerabilities

JBoss Logging (slf4j-jboss-logging/slf4j-jboss-logmanager) as of 2022-11-09 are still on slf4j-api 1.7.x, see https://issues.redhat.com/browse/JBLOGGING-165. Currently you can try downgrading org.slf4j:slf4j-api version to 1.7.36 if you have to use Tika with JBoss Logging (e.g. if you use Quarkus or WildFly native logging).

Tika parser modules

tika-parser-*-module artifacts depend tika-parsers depends on many Apache and thirdparty libraries. Currently, parsers in it use either Tika itself use slf4j-api or logging approach from underlying library, e. g. parsers in o.a.tika.parsers.microsoft which use Apache POI depends on Apache Commons Logging and Apache Log4j 1.2. but underlying libraries use different logging API (commons-logging, java.util.logging, log4j 1.2.x, log4j 2.x, slf4j).

By default Tika will bring slf4j-api via tika-core and some bridges like org.slf4j:jcl-overGoal is to use slf4j-api for logging in all parsers with included dependencies on org.slf4j:jul-to-slf4j and org.slf4j:jcljul-over-slf4j to allow user simply add and configure log4j:log4j (Apache Log4j 1.2.x) in simple case. If downstream user wish to use different logging backend for slf4j he/she will have to include to-slf4j as opinionated default. Depending on your logging backend and preferred configuration you'll need different dependency exclusions and bridges/implementations.

In you have no preference about logging backend it's enough to add org.apache.logging.log4j:log4j-core, org.apache.logging.log4j:log4j-slf4j2-impl and org.apache.logging.log4j:log4j-1.2-api (or org.slf4j:log4j-over-slf4j) and exclude log4j:log4j, commons-logging:commons-logging, ch.qos.logback:logback-classic, ch.qos.logback:logback-core, ch.qos.reload4j:reload4j and org.slf4j:slf4j-reload4j.

As of main branch (and Tika 2.6.0) all Tika source use slf4j-api as a logging API with org.apache.logging.log4j:log4j-core:2.x as the backend for applications like tika-app / tika-eval-app / tika-server.

, include backend implementation and exclude conflicting bridges (in case of using jcl/commons-logging or java.util.logging as backends). Following sections shows how to configure different logging solutions/backends dependencies to avoid conflicts. Loggers configuration are out of scope of this document, you should look at relevant library documentation.

Currently tika-parsers depends on these logging solutions:

...

.

...

...

Example configuration for Apache Tika

...

2.5.0+

If you use Apache Maven dependency section in pom.xml will contain something like this:

Common sections

<!-- Merge with your properties section -->
<properties>
<!-- components versions, feel free keep only required for your case -->
<tika.version>1version>2.6.20<0</tika.version>
<slf4j.version>1version>2.70.26<3</slf4j.version>
<log4j.version>1.2.17</log4j.version>
<log4j2.version>2.1119.2<0</log4j2.version>
<logback.version>1.24.3<4</logback.version>
</properties>
<!-- Merge with your dependencies section -->
<dependencies>
1.4.4 for Jakarta EE 9+ or 1.3.4 if you use Java EE or Jakarta EE 8 --> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers<bom</artifactId> <version>${tika.version}</version> <type>pom</type> <scope>import</scope> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-bom</artifactId> <version>${log4j2.version}</version> <exclusions>
<type>pom</type> <scope>import</scope> </dependency>   </dependencies> </dependencyManagement> <!-- Merge with your dependencies section --> <dependencies> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers-standard-package</artifactId> <exclusions> <!--
This exclusionexclusions will become obsolete at some point but better to keep it now.
tika-parser-*-parsersmodule usuallyshould excludes commons-logging explicitly but upstream libraries
may add it to their direct or transitive dependencies
--> <exclusion> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> <exclusion> <groupId>ch.qos.logback</groupId> <artifactId>logback-core</artifactId> </exclusion> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-reload4j</artifactId> </exclusion> <exclusion> <groupId>ch.qos.logback</groupId> <artifactId>logback-classic</artifactId> </exclusion> <exclusion> <groupId>ch.qos.reload4j</groupId> <artifactId>reload4j</artifactId> </exclusion> <exclusion> <groupId>ch.qos.reload4j</groupId> <artifactId>reload4j</artifactId> </exclusion> </exclusions> </dependency>
<!--
You may want to add these dependencies to dependencyManagement to force consistent version if you wish.
tika-parsers have slf4j-api, jul-to-slf4j and jcl-over-slf4j as dependencies explicitly,
so they are here primary as example.
-->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency> <groupId>org.slf4j</groupId> <artifactId>jul-to-slf4j</artifactId> <version>${slf4j.version}</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>jcl-over-slf4j</artifactId> <version>${slf4j.version}</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>log4j-over-slf4j</artifactId> <version>${slf4j.version}</version> </dependency> </dependencies>

Apache Log4j

...

2.x with slf4j bridges

<!-- Merge with your dependencies section -->
<dependencies>
<!-- slf4j implementation to forward logs to log4j 1.logging backend: log4j 2.x -->
<dependency>
<groupId>org.slf4j<apache.logging.log4j</groupId>
<artifactId>slf4j<artifactId>log4j-log4j12<core</artifactId> <version>${slf4j.version}</version>
<!-- version is omitted since there's org.apache.logging.log4j:log4j-bom in dependencyManagement section -->
</dependency>
< <!-- logging backend: slf4j implementation that forwards to log4j 1.2.x -->
<dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j2-impl</artifactId> <!-- thisfor dependency declaration is optional since org.slf4j:slf4j-log4j12 depends on it transitivelyslf4j 1.7.x use log4j-slf4j-impl instead -->
<dependency>
<!-- version <groupId>log4j</groupId>
is omitted since <artifactId>log4j</artifactId>
<version>${log4j.version}</version>
there's org.apache.logging.log4j:log4j-bom in dependencyManagement section -->
  </dependency>
</dependencies>

Logback

...

<!-- Merge with your dependencies section -->
<dependencies>
<!-- bridges to route jul and jcl (commons-logging) are already present, so just add log4j 1.2.x one -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>${slf4j.version}</version>
</dependency>
<TODO: add log4j2 -> slf4j bridge -->

<!-- slf4j implementation -->
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>${logback.version}</version>
</dependency>
</dependencies>

TO BE REWRITTEN:

Apache Log4j 2.x with slf4j bridges

<dependencies>
<!-- bridges to route jul and jcl (commons-logging) are already present, so just add log4j 1.2.x one -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>${slf4j.version}</version>
</dependency>

<!-- slf4j implementation to forward logs to log4j 2.x -->
 <dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>${log4j2.version}</version>
</dependency>

<!-- logging backend: log4j 2.x -->
  <!-- this dependency declarations are optional since org.apache.logging.log4j:log4j-slf4j-impl depends on them transitively -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j2.version></version>
</dependency>
 <dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j2.version></version>
</dependency>
</dependencies>

...