With TIKA-605, you can now use Tika to parse geospatial file formats! To figure out how, read on.
If you're lucky this will work:
$ brew install gdal --complete
Note if you encounter errors while upgrading to Mavericks here, the answer is to first:
$ brew rm $(join <(brew leaves) <(brew deps gdal --complete ))
Note the above instructions are definitely Mac centric. We recommend checking out GDAL's Website for specific instructions on installing GDAL on your operating system.
Once GDAL is installed, the following command should be available on your path.
gdalinfo
Running gdalinfo
should produce something like:
Usage: gdalinfo [--help-general] [-mm] [-stats] [-hist] [-nogcp] [-nomd] [-norat] [-noct] [-nofl] [-checksum] [-proj4] [-listmdd] [-mdd domain|`all`]* [-sd subdataset] datasetname FAILURE: No datasource specified. |
If that works you are in business!
To use Tika and GDAL grab the 1.7-SNAPSHOT latest of Tika and then grab a geospatial file, e.g., this example will use a Flexible Image Transport System (FITS) file as an example. Then run:
java -jar tika-app-1.7-SNAPSHOT.jar -m WFPC2u5780205r_c0fx.fits
This should produce, e.g.,
ALLG-MAX: 3.777701E3 ALLG-MIN: -7.319537E1 ATODCORR: COMPLETE .. X-Parsed-By: org.apache.tika.parser.DefaultParser X-Parsed-By: org.apache.tika.parser.gdal.GDALParser |
If you see X-Parsed-By: ..GDALParser and a bunch of geospatial metadata, you are in business!
Once you have GDAL and a fresh build of Tika 1.7-SNAPSHOT (including Tika server), you can easily use Tika-Server with GDAL. For example, to post a FITS file to the server and get back its metadata, run the following commands:
java -jar /path/to/tika-server-1.7-SNAPSHOT.jar
curl -T /path/to/fits/image.fits http://localhost:9998/tika --header "Content-type: application/fits"
On TIKA-2684, Susan Borda, reports on some important steps to get a full FITS parse with GDAL. See Susan's comment, and her pointer to properly loading fitsio.