This is written as a relative outsider to Apache Solr development. It will appear painfully rudimentary to devs on Solr, and it assumes less-than-black-belt-familiarity with git...so be it.
Many thanks to Uwe Schindler for walking me through these steps several times.
ant ivy-bootstrap
ant idea
before opening the project in Intellijgit remote add upstream https://github.com/apache/lucene-solr.git
git fetch upstream
git pull upstream master
git push master
jira/solr-11701
mvn dependency:tree
on the newly released Apache Tika and MEMORIZE itlucene/ivy-versions.properties
– make sure that they are in alphabetical ordersolr/licenses
– must include a -LICENSE-XYZ.txt and -NOTICE.txt file for every jarsolr/contrib/extraction/ivy.xml
ant clean
(out of nervous habit) and then run the unit tests in contrib/dataimporthandler-extras
and contrib/extraction
XLSXResponseWriter
which relies on Apache POI.ant clean-jars jar-checksums
git add
new .sha1 files in solr/licenses and lucene/licenses and git rm
old .sha1 filesant precommit
ant precommit
as needed, waiting 15-20 minutes each time ... if you didn't break something obvious.ant precommit
eventually ends in errors about broken links in html. This means you are successful!!!ant test
for kicks. Something will likely break. Try to figure out if it is caused by anything you did or just a flaky build. Bonus points if the test failure is reproducible and you report it/fix it.To test that you've gotten most of the dependencies right, why not run DIH on Tika's test documents?
cd solr/
and ant package
bin\solr start
Dataimport
.Execute
Query
and check how many documents are actually indexedjava -jar tika-app.jar -i <input_dir> -o <output_dir>
In addition to DIH, the above configs are also set up to work with the ExtractingHandler.
You can run either the SolrJ client (https://github.com/tballison/tika-addons/blob/main/solr-tika-integration/src/main/java/org/tallison/indexers/SolrJIndexer.java) or the
Make sure to set the source directory appropriately and the solr-collection name correctly for your test files and Solr collection. Note that these indexers do not process files recursively.