Skip to content

Components of EFES

Polina Yordanova edited this page Mar 16, 2018 · 6 revisions

guts

EFES works by binding together several free standing software components that provide the functionality needed for publishing an epigraphic project.

The major components of EFES are:

  • Jetty;
  • Solr;
  • Sesame;
  • Cocoon;
  • Templates;
  • XSL files;
  • User content.

Jetty

Jetty is a lightweight server that is pre-configured to run all of EFES components. It is suitable for initial set up and development, but needs to be replaced by a more potent web server for the public deployment of your EFES project. You can find more information about how to do this in the Kiln documentation about running the webapp. The user's main interaction with Jetty consists of starting the EFES build process (detailed description in step 4 and 5 of Installation) and restarting EFES by quitting the server with Ctrl+C and running the build command again.

Solr

The Apache Solr searching platform is the search engine in EFES. It stores and indexes our documents and responds to queries for the information it has stored. Solr has its own interface, which provides the possibility for a more technical configuration and shouldn't need to be configured by the user. The user's interaction with solr is made available through the sitemaps and the stylesheets associated with its indexing and querying functions, and through the Index functionality in the Admin panel.

Solr's instructions for indexing and storing are kept in the schema.xml file in webapps/solr/conf/. They get sent to Solr every time we start Jetty, so if we need to make any changes to them (for example, when we create a new index that has more fields to store information into) we need to restart the server.

Solr is a very powerful tool that can be used for things such as automatic stemming of words, tokenizing etc., which requires some more advanced technical knowledge (information can be found in Apache's documentation). Its immediate out-of-the-box application in EFES is the possibility to perform lemma search (provided your inscriptions are lemmatized), grouped search, and boolean queries, for example.

Sesame

Sesame is the triple store used for storing the RDF extracted from our data. Storing information in RDF allows us to export and share data between projects, create shared ontologies, vocabularies, authorities etc. In order to make use of Sesame's functionality, you need to set up your own RDF repository at the start of your EFES project. This needs to be done at the start of the project, after which the user is only expected to harvest through the Admin panel whenever a change occurs in the relevant markup or the authority files, or new files have been added.

Cocoon

Apache Cocoon is the beating heart of EFES. It has been given a default setup for features, the need for which is shared between all epigraphic projects, such as pre-made indices, facets, transformations for the display of EpiDoc xml files.

Cocoon operates on sitemaps - xml files that list the URLs for the site. They are comprised of mapmatches that define what processes are needed for the generation of a web page. These instructions are in the form of three steps: generate (get the data from the source), transform (according to the specified transforming stylesheet), serialize (as xml, html or pdf):

When a url request is passed in the browser, it matches on the respective mapmatch that contains the instruction for assembling its building blocks. The source of each of those blocks is often depending on what is generated in the url so that’s why so many mapmatches have the stars for the variables (placed between {} ). For example, since EFES is built to be able to support multilingual sites, the language parameter is a variable that needs to be supplied in the url. All pages representing a transformed EpiDoc xml files have the same general outlook, the difference being which individual inscription is being displayed, hence they have a different variable for the respective filenumber.

The sitemaps work in a chain of mounting. Cocoon looks into sitemaps.xmap for the rest of the sitemaps, so this file should never be modified by the user.

The following sitemaps can be modified in order for us to customize our project:

  • config.xmap — this is where we do most of the configuration for our project - we can specify the default site language, add fields for RDF lookup when creating new facets and indices using authority lists, and change the variables which produce the edition structure for our EpiDoc files. The latter is done by changing the content of the epidoc-related variables c. line 80:


  • internal.xmap — this sitemap contains internal (not exposed by URL) pipelines, such as those responsible for the preprocessing of documents prior to their use in another pipelines, normalizing and annotating them as required. Such preprocessing is made of the EpiDoc xml files that contain multilingual metadata so that only the elements having the @xml:lang corresponding to the parameters given in the url get transformed.

  • main.xmap — contains the main pipelines that assemble and display the content of the site.

Note that sometimes sitemaps are not immediately recognized as xml files by the xml editors. It is possible that you will need to specify that you want this file to be treated as an xml file, or simply drag the sitemap file into the open window of the editor.


Cocoon is what binds together all other components and allows us to communicate with Solr and Sesame.

Templates

The templates are used for public urls to provide a general framework of the HTML and to have slots where we can put the content of our project. They can be found in webapps/ROOT/assets/templates/.

The main benefit from using the templating system is the possibility to organize the structure of the website on different levels through inheriting. The features which are shared by all pages on the site can be defined in a template that is higher up the hierarchy, so if we want to commit some changes to these features, we only need to change the file containing them and the changes will be inherited from its descendants. In EFES base.xml template defines the outermost structure that all other templates inherit from and specifies which CSS files should be used with the HTML.

Transformation stylesheets

...

User content

  • Valid xml files, the transformation scenarios assume that the files have been encoded following the EpiDoc standards;
  • Authority files - link to page with detailed info
  • Images can be any format, but only JPEGS can be made into thumbnails