Differences between revisions 1 and 2
Revision 1 as of 2015-05-24 06:07:58
Size: 521
Comment:
Revision 2 as of 2015-05-24 15:11:36
Size: 2355
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:

= Installing the Lucene Gazetteer =

First you will need to download the [[http://github.com/chrismattmann/lucene-geo-gazetteer|Lucene Geo Gazetteer]] project and to install it. You can do so by:

{{{
$ cd $HOME/src
$ git clone https://github.com/chrismattmann/lucene-geo-gazetteer.git
$ cd lucene-geo-gazetteer
$ mvn install
$ add $HOME/src/lucene-geo-gazetteer/src/main/bin to your PATH environment variable
}}}

Once done, you can verify that the installation worked by running the following command:

{{{
$ lucene-geo-gazetteer --help
usage: lucene-geo-gazetteer
 -b,--build <gazetteer file> The Path to the Geonames
                                       allCountries.txt
 -h,--help Print this message.
 -i,--index <directoryPath> The path to the Lucene index
                                       directory to either create or read
 -s,--search <set of location names> Location names to search the
                                       Gazetteer for
}}}

You will now need to build a Gazetteer using the Geonames.org dataset. Instructions are provided below:

{{{
$ cd $HOME/src/lucene-geo-gazetteer
$ curl -O http://download.geonames.org/export/dump/allCountries.zip
$ unzip allCountries.zip
$ java -cp target/lucene-geo-gazetteer-<version>-jar-with-dependencies.jar edu.usc.ir.geo.gazetteer.GeoNameResolver -i geoIndex -b allCountries.txt
}}}

You can verify that the Gazetteer build worked by searching e.g., for Pasadena, and/or Texas:

{{{
$ lucene-geo-gazetteer -s Pasadena Texas
}}}

Note that we used the convenience script `lucene-geo-gazetteer` which assumes that you created an indexed named geoIndex in the $HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the pure Java command line to search.

GeoTopicParser

The GeoTopicParser combines a Gazetteer (a lookup dictionary of names/places to latitudes, longitudes) and a Named Entity Recognition (NER) modeling technique that identifies names and places in text to provide a way to geo tag documents and text i.e., to identify places in the text, and then to look up the latitude/longitude pairs for those places.

GeoTopicParser uses Apache Lucene and Apache OpenNLP to provide its capabilities.

Installing the Lucene Gazetteer

First you will need to download the Lucene Geo Gazetteer project and to install it. You can do so by:

$ cd $HOME/src
$ git clone https://github.com/chrismattmann/lucene-geo-gazetteer.git
$ cd lucene-geo-gazetteer
$ mvn install
$ add $HOME/src/lucene-geo-gazetteer/src/main/bin to your PATH environment variable

Once done, you can verify that the installation worked by running the following command:

$ lucene-geo-gazetteer --help
usage: lucene-geo-gazetteer
 -b,--build <gazetteer file>           The Path to the Geonames
                                       allCountries.txt
 -h,--help                             Print this message.
 -i,--index <directoryPath>            The path to the Lucene index
                                       directory to either create or read
 -s,--search <set of location names>   Location names to search the
                                       Gazetteer for

You will now need to build a Gazetteer using the Geonames.org dataset. Instructions are provided below:

$ cd $HOME/src/lucene-geo-gazetteer
$ curl -O http://download.geonames.org/export/dump/allCountries.zip
$ unzip allCountries.zip
$ java -cp target/lucene-geo-gazetteer-<version>-jar-with-dependencies.jar edu.usc.ir.geo.gazetteer.GeoNameResolver -i geoIndex -b allCountries.txt

You can verify that the Gazetteer build worked by searching e.g., for Pasadena, and/or Texas:

$ lucene-geo-gazetteer -s Pasadena Texas

Note that we used the convenience script lucene-geo-gazetteer which assumes that you created an indexed named geoIndex in the $HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the pure Java command line to search.

GeoTopicParser (last edited 2016-03-04 19:34:16 by MadhavSharan)