...
Apache Tika include a lot of Apache and thirdparty libraries that have different approach to logging.
tika-core
tika-core
1.x should have no external dependencies to be as lightweight as it can, so we have to usejava.util.logging
there whereas tika-core 2.x will use slf4j-api.
tika-parsers
Tika use slf4j-api
as logging API and Apache Log4j 2.x as an implementation for modules that require it.
Important note
Since Tika 2.5.0 (released 2022-10-03) depends on slf4j-api
2.0.x which requires downstream library users to update logging backend to compatible version. Tika 2.0.0 – 2.4.1 depends on slf4j-api
1.7.x.
Otherwise you will receive something like following message:
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at file:/home/gross/.gradle/caches/modules-2/files-2.1/org.jboss.slf4j/slf4j-jboss-logmanager/1.2.0.Final/baff8ae78011e6859e127a5cb6f16332a056fd93/slf4j-jboss-logmanager-1.2.0.Final.jar!/org/slf4j/impl/StaticLoggerBinder.class
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console..
Updates for popular logging backends:
- Apache Log4j 2.x:
org.apache.logging.log4j:log4j-slf4j-impl
→org.apache.logging.log4j:log4j-slf4j2-impl
- Logback:
ch.qos.logback:logback-classic
1.2.x → 1.3.x (uses olderjavax.*
APIs) or 1.4.x (usesjakarta.*
APIs) - Apache Log4j 1.2.x:
org.slf4j:slf4j-log4j12
→org.slf4j:slf4j-reload4j
(thoughslf4j-log4j12
has relocation relocation directive toslf4j-reload4j
since 1.7.34) or migrate to the Log4j 2.x since log4j 1.2.x is in the End of Life status since 2015 and has known vulnerabilities
JBoss Logging (slf4j-jboss-logging
/slf4j-jboss-logmanager
) as of 2022-11-09 are still on slf4j-api
1.7.x, see https://issues.redhat.com/browse/JBLOGGING-165. Currently you can try downgrading org.slf4j:slf4j-api
version to 1.7.36 if you have to use Tika with JBoss Logging (e.g. if you use Quarkus or WildFly native logging).
Tika parser modules
tika-parser-*-module
artifacts depend tika-parsers
depends on many Apache and thirdparty libraries. Currently, parsers in it use either Tika itself use slf4j-api
or logging approach from underlying library, e. g. parsers in o.a.tika.parsers.microsoft
which use Apache POI depends on Apache Commons Logging and Apache Log4j 1.2. but underlying libraries use different logging API (commons-logging
, java.util.logging
, log4j 1.2.x
, log4j 2.x
, slf4j
).
By default Tika will bring slf4j-api
via tika-core
and some bridges like org.slf4j:jcl-over
Goal is to use slf4j-api for logging in all parsers with included dependencies on org.slf4j:jul-to-slf4j
and org.slf4j:
jcljul-
over-slf4j to allow user simply add and configure log4j:log4j (Apache Log4j 1.2.x) in simple case. If downstream user wish to use different logging backend for slf4j he/she will have to include to-slf4j
as opinionated default. Depending on your logging backend and preferred configuration you'll need different dependency exclusions and bridges/implementations.
In you have no preference about logging backend it's enough to add org.apache.logging.log4j:log4j-core
, org.apache.logging.log4j:log4j-slf4j2-impl
and org.apache.logging.log4j:log4j-1.2-api
(or org.slf4j:log4j-over-slf4j
) and exclude log4j:log4j
, commons-logging:commons-logging
, ch.qos.logback:logback-classic
, ch.qos.logback:logback-core
, ch.qos.reload4j:reload4j
and org.slf4j:slf4j-reload4j
.
As of main
branch (and Tika 2.6.0) all Tika source use slf4j-api
as a logging API with org.apache.logging.log4j:log4j-core:2.x
as the backend for applications like tika-app
/ tika-eval-app
/ tika-server
.
, include backend implementation and exclude conflicting bridges (in case of using jcl/commons-logging or java.util.logging as backends). Following sections shows how to configure different logging solutions/backends dependencies to avoid conflicts. Loggers configuration are out of scope of this document, you should look at relevant library documentation.
Currently tika-parsers
depends on these logging solutions:
...
.
...
...
Example configuration for Apache Tika
...
2.5.0+
If you use Apache Maven dependency section in pom.xml
will contain something like this:
Common sections
<!-- Merge with your properties section -->
<properties>
<!-- components versions, feel free keep only required for your case -->
<tika.
version>1version>2.6.
20<0</tika.version>
<slf4j.
version>1version>2.
70.
26<3</slf4j.version>
<log4j.version>1.2.17</log4j.version>
<log4j2.version>2.
1119.
2<0</log4j2.version>
<logback.version>1.
24.
3<4</logback.version>
<
/properties>
<!--
Merge with your dependencies section -->
<dependencies>1.4.4 for Jakarta EE 9+ or 1.3.4 if you use Java EE or Jakarta EE 8 --> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-
parsers<bom</artifactId> <version>${tika.version}</version> <type>pom</type> <scope>import</scope> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-bom</artifactId> <version>${log4j2.version}</version>
<exclusions><type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <!-- Merge with your dependencies section --> <dependencies> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers-standard-package</artifactId> <exclusions> <!--
This
exclusionexclusions will become obsolete at some point but better to keep it now.
tika-parser-*-
parsersmodule
usuallyshould excludes commons-logging explicitly but upstream libraries
may add it to their direct or transitive dependencies
--> <exclusion> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> </exclusion> <exclusion> <groupId>log4j</groupId> <artifactId>log4j</artifactId> </exclusion> <exclusion> <groupId>ch.qos.logback</groupId> <artifactId>logback-core</artifactId> </exclusion> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-reload4j</artifactId> </exclusion> <exclusion> <groupId>ch.qos.logback</groupId> <artifactId>logback-classic</artifactId> </exclusion> <exclusion> <groupId>ch.qos.reload4j</groupId> <artifactId>reload4j</artifactId> </exclusion> <exclusion> <groupId>ch.qos.reload4j</groupId> <artifactId>reload4j</artifactId> </exclusion> </exclusions> </dependency>
<!--
You may want to add these dependencies to dependencyManagement to force consistent version if you wish.
tika-parsers have slf4j-api, jul-to-slf4j and jcl-over-slf4j as dependencies explicitly,
so they are here primary as example.
-->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency> <groupId>org.slf4j</groupId> <artifactId>jul-to-slf4j</artifactId> <version>${slf4j.version}</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>jcl-over-slf4j</artifactId> <version>${slf4j.version}</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>log4j-over-slf4j</artifactId> <version>${slf4j.version}</version> </dependency> </dependencies>
Apache Log4j
...
2.x with slf4j bridges
<!-- Merge with your dependencies section -->
<dependencies>
<!-- slf4j implementation to forward logs to log4j 1.logging backend: log4j 2.x -->
<dependency>
<groupId>org.slf4j<apache.logging.log4j</groupId>
<artifactId>slf4j<artifactId>log4j-log4j12<core</artifactId> <version>${slf4j.version}</version>
<!-- version is omitted since there's org.apache.logging.log4j:log4j-bom in dependencyManagement section -->
</dependency>
< <!-- logging backend: slf4j implementation that forwards to log4j 1.2.x -->
<dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j2-impl</artifactId> <!-- thisfor dependency declaration is optional since org.slf4j:slf4j-log4j12 depends on it transitivelyslf4j 1.7.x use log4j-slf4j-impl instead -->
<dependency>
<!-- version <groupId>log4j</groupId>
is omitted since <artifactId>log4j</artifactId>
<version>${log4j.version}</version>
there's org.apache.logging.log4j:log4j-bom in dependencyManagement section -->
</dependency>
</dependencies>
Logback
...
<!-- Merge with your dependencies section -->
<dependencies>
<!-- bridges to route jul and jcl (commons-logging) are already present, so just add log4j 1.2.x one -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>${slf4j.version}</version>
</dependency>
<TODO: add log4j2 -> slf4j bridge -->
<!-- slf4j implementation -->
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>${logback.version}</version>
</dependency>
</dependencies>
TO BE REWRITTEN:
Apache Log4j 2.x with slf4j bridges
<dependencies>
<!-- bridges to route jul and jcl (commons-logging) are already present, so just add log4j 1.2.x one -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>${slf4j.version}</version>
</dependency>
<!-- slf4j implementation to forward logs to log4j 2.x -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>${log4j2.version}</version>
</dependency>
<!-- logging backend: log4j 2.x -->
<!-- this dependency declarations are optional since org.apache.logging.log4j:log4j-slf4j-impl depends on them transitively -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j2.version></version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j2.version></version>
</dependency>
</dependencies>
...