Skip to content

installation

Conal Tuohy edited this page Dec 3, 2020 · 7 revisions

The newton_chymistry web application is an XProc pipeline, which is hosted by a Java web Servlet called XProc-Z, which in turn is hosted in Apache Tomcat. The web application also uses an instance of Apache Solr as a search engine.

Install and configure Solr

Solr should be installed as a single server (i.e. not the cloud configuration) and configured to have a single database ("core") for each distinct instance of the chymistry web application. In this document the core is called "chymistry", though you can use a different name if you wish.

Download Solr 7 from https://lucene.apache.org/solr/downloads.html

Extract the install script from the tarball, and run it. This installs Solr as a service.

tar -xvf solr-7.7.2.tgz
sudo solr-7.7.2/bin/install_solr_service.sh solr-7.7.2.tgz 

Then create the Solr "core" (database), here called "chymistry"

cd /opt/solr/
sudo -u solr bin/solr create -c chymistry

Install Apache Tomcat

Install Tomcat using the OS package repository.

Install the newton_chymistry XProc pipeline

Use git clone to install the application directly from the github repository, into a file system location owned by the Tomcat user.

git clone https://github.com/IUBLibTech/newton_chymistry.git

This will create a folder called newton_chymistry containing the XProc application. The main pipeline is defined in the file xproc-z.xpl, within the xproc subfolder.

Install and configure XProc-Z

Download XProc-Z from https://github.com/Conal-Tuohy/XProc-Z/releases/download/1.4.1/xproc-z.war and save in a location owned by the Tomcat user.

Install the XProc-Z servlet by creating a Tomcat "context" file, as described in Tomcat's documentation. To make newton_chymistry the "default" web application (i.e. so that it can be accessed with a base URL of /), name the context file ROOT.xml.

This context file specifies the location of the XProc-Z web archive file, and three parameters:

  • xproc-z.main tells XProc-Z where to find the file containing the XProc pipeline which it should use to handle HTTP requests.
  • solr-base-uri tells that pipeline the address of the Solr instance to use to index the TEI corpus. Note that the final component of the Solr base URI is the name of the Solr core, as defined above.
  • dc-coverage-regex provides the pipeline with a regular expression which it can use to filter the set of documents which it will ingest from Xubmit. Only documents whose dc:coverage field matches the regex supplied will be ingested. To ingest all documents, the regex .* will match any text. To ingest only "production" documents, use the regex Production.
<Context path=""
    docBase="/srv/services/chymistry-devel-tomcat-8220/xproc-z.war"
    preemptiveAuthentication="true"
    antiResourceLocking="false">
  <Valve className="org.apache.catalina.authenticator.BasicAuthenticator" />
  <Parameter name="xproc-z.main" override="false"
             value="/srv/services/chymistry-devel-tomcat-8220/newton_chymistry/xproc/xproc-z.xpl"/>
  <!-- the Solr base URL includes the core name (here "chymistry") -->
  <Parameter name="solr-base-uri" value="http://localhost:8605/solr/chymistry/"/>
  <!-- This parameter specifies a Regular Expression which must match the dc:coverage field of a Xubmit document -->
  <!--<Parameter name="dc-coverage-regex" value="Production"/>-->
  <Parameter name="dc-coverage-regex" value=".*"/>
</Context>

Loading the TEI XML

To complete the installation, follow the instructions for updating data.