Overview

There are mainly two configuration files coming with the tgpublish koLibRI module, that need to be taken care of: config.xml and policies.xml. The latter is used to define the TG-publish workflow and lead the publishing process through the different ActionModules, that are normally processed one by one. These modules can share information using a custom data object. For more detailed information please see the work-in-progress version 2.0 referred to above.

TG-publish is pre-configured to put it’s configuration files into /etc/dhrep/tgpublish/ and log to /var/log/dhrep/tgpublish/.

So please create the appropriate folders and then copy the config files from

into the config folder. Do not forget to set the permissions and owner settings so that Tomcat can write to it!

More config files may be needed from

Please just copy the following, and add more, if some file-not-found-errors should occur!

dias_formatregistry.xml
jhove.conf
...

policies.xml

There are four policies to be used with the TextGridLab at the moment:

  1. TGPublish
  2. TGPublishWorldReadable
  3. TGPublishSandboxData, and
  4. TGCopy

All these workflows (or policies) are described in the policies.xml file and define the order of processing koLibRI ActionModules. Each of the three workflows is started as a ProcessStarter with the current configuration (see below). TGPublish is used from within the TG-lab using the Publish Perspective, TGPublishWorldReadable also is used from within the TG-lab, but only applies to single technical files as e.g. XML Schema documents, XSLT stylesheets, TextGrid workflow documents, etc. The variety of files to be able to publish worldReadable can be checked by requesting the worldReadable List. TGPublishSandboxData is used to finally publish objects that were imported to the TextGrid Sandbox and is used from e.g. the Import Tool External (koLibRI), and last but not least TGCopy, that is used from within the TG-copy workflow to copy TextGrid objects from either the public repository or the non-public repository to own projects for further processing. Rewriting URIs and other stuff is included here.

This three policies will be explained now in detail.

TGPublish

<?xml version="1.0" encoding="UTF-8"?>
<policy name="TGPublish">
  <step class="textgrid.PublishStart">
    <step class="textgrid.PublishCheckEdition">
      <step class="textgrid.CheckIsPublic">
        <step class="textgrid.CheckReferences">
          <step class="textgrid.GetPids">
            <step class="textgrid.ModifyAndUpdate">
              <step class="textgrid.CopyElasticSearchIndex">
                <step class="textgrid.CopyRelationData">
                  <step class="textgrid.MoveToStaticGridStorage">
                    <step class="textgrid.UpdateTgauth">
                      <step class="textgrid.PublishComplete" />
                    </step>
                  </step>
                </step>
              </step>
            </step>
          </step>
        </step>
      </step>
    </step>
  </step>
</policy>
PublishStart
Just marks the publish process started.
PublishCheckEdition

Checks for correct Edition/Collection Metadata. At the moment the following mandatory elements are checked for existance. This requirements can be configured in the TG-publish config file.

  • Edition required fields are /tg:object/tg:edition/tg:isEditionOf and /tg:object/tg:edition/tg:license (existing node and text value)
  • Item required field is /tg:object/tg:item/tg:rightsHolder (existing node and text value)
  • Collection required field is /tg:object/tg:collection/tg:collector (existing node and text value)
  • Work required fields are /tg:object/tg:work/tg:agent, /tg:object/tg:work/tg:dateOfCreation (both attributes need to be existing and having a value, OR a tag value must be existing), /tg:object/tg:work/tg:genre (existing nodes and text values)
CheckIsPublic
Checks for already published objects.
CheckReferences
Checks if some objects that are referred to, are NOT contained in the current Edition/Collection to publish.
GetPids
Fetches PIDs for every object’s TextGrid URI using the GWDG Handle Service.
ModifyAndUpdate
Does rewriting of several URIs to PIDs, modifies all necessary object metadata and/or data, and finally updates everything calling TG-crud#UPDATEMETADATA or TG-crud#UPDATE.
CopyElasticSearchIndex
Copies the search index to the public index database.
CopyRelationData
Copies the RDF relation data to the public RDF database.
MoveToStaticGridStorage
Moves all metadata and data to the public storage location.
UpdateTgauth
Just updates the TG-auth calling the method TG-auth#PUBLISH
PublishComplete
PublishComplete is called just to ensure the operation has finished successfully, and to report to logfiles, etc.

TGPublishWorldReadable

<?xml version="1.0" encoding="UTF-8"?>
<policy name="TGPublishWorldReadable">
  <step class="textgrid.PublishStart">
    <step class="textgrid.PublishCheckWorldReadable">
      <step class="textgrid.ModifyAndUpdate">
        <step class="textgrid.MoveToStaticGridStorage">
          <step class="textgrid.UpdateTgauth">
            <step class="textgrid.PublishComplete" />
          </step>
        </step>
      </step>
    </step>
  </step>
</policy>
PublishStart
See above.
PublishCheckWorldReadable
Checks for correct WorldReadable Metadata.
ModifyAndUpdate
Does rewriting of several URIs to PIDs, modifies all necessary object metadata and/or data, and finally updates everything calling TG-crud#UPDATEMETADATA or TG-crud#UPDATE.
MoveToStaticGridStorage
Moves all metadata and data to the public storage location.
UpdateTgauth
See above.
PublishComplete
See above.

TGPublishSandboxData

<?xml version="1.0" encoding="UTF-8"?>
<policy name="TGPublishSandboxData">
  <step class="textgrid.PublishStart">
    <step class="textgrid.UpdateTgauth">
      <step class="textgrid.ReleaseNearlyPublishedRelation">
        <step class="textgrid.PublishComplete" />
      </step>
    </step>
  </step>
</policy>
PublishStart
See above.
UpdateTgauth
See above.
ReleaseNearlyPublishedRelation
Releases the nearlyPublished relation in the TG-rep’s Sesame triple store and the ElasticSearch database – so that the object is viewable and searchable in the public TextGrid repository browser and TextGridLab search GUI.
PublishComplete
See above.

TGCopy

<?xml version="1.0" encoding="UTF-8"?>
<policy name="TGCopy">
  <step class="textgrid.StartCopy">
    <step class="textgrid.GatherObjectUris">
      <step class="textgrid.ModifyAndCreate">
        <step class="textgrid.CopyComplete" />
      </step>
    </step>
  </step>
</policy>
CopyStart
Just marks the copy process started.
GatherObjectUris
Gets all referenced URIs from the objects out of the given URI list (out of all aggregations/editions/collections recursively), and adds every URI to the PublishResponse object list.
ModifyAndCreate
Retrieves every URI from the PublishResponse object list from the TG-crud, rewrites aggregation lists and other URIs includes, and creates a new TextGrid object in the project given.
CopyComplete
CopyComplete is called just to ensure the operation has finished successfully, and to report to logfiles, etc.

config.xml

config.xml or in this case config__tgpublish.xml is the main koLibRI configuration file. Here are the processStarters configured as well as all the ActionModules, and some global things as well. The TextGrid specific ProcessStarters and ActionModules all are described inside the config file (see description tags), and so we just refer to the file itself:

XML config file tags that are not documented are not used by the TG-publish (and not needed), you can look up their meaning in the koLibRI documentation, or the main koLibRI configuration file:

Logging

At the moment the koLibRI logs to stdout (see e.g. the Tomcat’s catalina.out log) and also to a logfile located at the configured logfile location (see config.xml). The logfile’s name has got a timestamp in it, and a new file is created every time the koLibRI Workflow Tool is started.