...
- Download test documents and extract on local server
- docx.zip file from here: https://corp.digitalcorpora.org/corpora/files/govdocs1/by_type/
- Start a local web server that serves the files from that directory.
- Enable JWT authentication on this server - thus requiring a bearer token to access the web resources
- The point of this is to prove we can authenticate safely in this process.
- Enable JWT authentication on this server - thus requiring a bearer token to access the web resources
- Start the Apache Tika Grpc Server
- Configured with tika-config XML custom tailored to our needs.
- Provide both Java and Go clients that are capable of establishing a Grpc Client to the Apache Tika Grpc Services, stream the list of http links for the documents into the service and obtain the parsed output,
- Show various configuration the parallel number of worker threads in play
- The Grpc server will use TLS Mutual authentication
Java Bi-Directional Streaming Example
...
A Java Tika Grpc Server with an HTTP fetcher is started, and a Tika Grpc Client opens a bidirectional stream and processes a bunch of files that need parsing.
Go Bi-Directional Streaming Example
TODO
Build Tika Grpc on Docker
...