Overview¶
There are mainly two configuration files coming with the tgpublish koLibRI module, that need to be taken care of: config.xml and policies.xml. The latter is used to define the TG-publish workflow and lead the publishing process through the different ActionModules, that are normally processed one by one. These modules can share information using a custom data object. For more detailed information please see the work-in-progress version 2.0 referred to above.
TG-publish is pre-configured to put it’s configuration files into /etc/dhrep/tgpublish/ and log to /var/log/dhrep/tgpublish/.
So please create the appropriate folders and then copy the config files from
into the config folder. Do not forget to set the permissions and owner settings so that Tomcat can write to it!
More config files may be needed from
Please just copy the following, and add more, if some file-not-found-errors should occur!
dias_formatregistry.xml
jhove.conf
...
policies.xml¶
There are four policies to be used with the TextGridLab at the moment:
- TGPublish
- TGPublishWorldReadable
- TGPublishSandboxData, and
- TGCopy
All these workflows (or policies) are described in the policies.xml file and define the order of processing koLibRI ActionModules. Each of the three workflows is started as a ProcessStarter with the current configuration (see below). TGPublish is used from within the TG-lab using the Publish Perspective, TGPublishWorldReadable also is used from within the TG-lab, but only applies to single technical files as e.g. XML Schema documents, XSLT stylesheets, TextGrid workflow documents, etc. The variety of files to be able to publish worldReadable can be checked by requesting the worldReadable List. TGPublishSandboxData is used to finally publish objects that were imported to the TextGrid Sandbox and is used from e.g. the Import Tool External (koLibRI), and last but not least TGCopy, that is used from within the TG-copy workflow to copy TextGrid objects from either the public repository or the non-public repository to own projects for further processing. Rewriting URIs and other stuff is included here.
This three policies will be explained now in detail.
TGPublish¶
<?xml version="1.0" encoding="UTF-8"?>
<policy name="TGPublish">
<step class="textgrid.PublishStart">
<step class="textgrid.PublishCheckEdition">
<step class="textgrid.CheckIsPublic">
<step class="textgrid.CheckReferences">
<step class="textgrid.GetPids">
<step class="textgrid.ModifyAndUpdate">
<step class="textgrid.CopyElasticSearchIndex">
<step class="textgrid.CopyRelationData">
<step class="textgrid.MoveToStaticGridStorage">
<step class="textgrid.UpdateTgauth">
<step class="textgrid.PublishComplete" />
</step>
</step>
</step>
</step>
</step>
</step>
</step>
</step>
</step>
</step>
</policy>
- PublishStart
- Just marks the publish process started.
- PublishCheckEdition
Checks for correct Edition/Collection Metadata. At the moment the following mandatory elements are checked for existance. This requirements can be configured in the TG-publish config file.
- Edition required fields are /tg:object/tg:edition/tg:isEditionOf and /tg:object/tg:edition/tg:license (existing node and text value)
- Item required field is /tg:object/tg:item/tg:rightsHolder (existing node and text value)
- Collection required field is /tg:object/tg:collection/tg:collector (existing node and text value)
- Work required fields are /tg:object/tg:work/tg:agent, /tg:object/tg:work/tg:dateOfCreation (both attributes need to be existing and having a value, OR a tag value must be existing), /tg:object/tg:work/tg:genre (existing nodes and text values)
- CheckIsPublic
- Checks for already published objects.
- CheckReferences
- Checks if some objects that are referred to, are NOT contained in the current Edition/Collection to publish.
- GetPids
- Fetches PIDs for every object’s TextGrid URI using the GWDG Handle Service.
- ModifyAndUpdate
- Does rewriting of several URIs to PIDs, modifies all necessary object metadata and/or data, and finally updates everything calling TG-crud#UPDATEMETADATA or TG-crud#UPDATE.
- CopyElasticSearchIndex
- Copies the search index to the public index database.
- CopyRelationData
- Copies the RDF relation data to the public RDF database.
- MoveToStaticGridStorage
- Moves all metadata and data to the public storage location.
- UpdateTgauth
- Just updates the TG-auth calling the method TG-auth#PUBLISH
- PublishComplete
- PublishComplete is called just to ensure the operation has finished successfully, and to report to logfiles, etc.
TGPublishWorldReadable¶
<?xml version="1.0" encoding="UTF-8"?>
<policy name="TGPublishWorldReadable">
<step class="textgrid.PublishStart">
<step class="textgrid.PublishCheckWorldReadable">
<step class="textgrid.ModifyAndUpdate">
<step class="textgrid.MoveToStaticGridStorage">
<step class="textgrid.UpdateTgauth">
<step class="textgrid.PublishComplete" />
</step>
</step>
</step>
</step>
</step>
</policy>
- PublishStart
- See above.
- PublishCheckWorldReadable
- Checks for correct WorldReadable Metadata.
- ModifyAndUpdate
- Does rewriting of several URIs to PIDs, modifies all necessary object metadata and/or data, and finally updates everything calling TG-crud#UPDATEMETADATA or TG-crud#UPDATE.
- MoveToStaticGridStorage
- Moves all metadata and data to the public storage location.
- UpdateTgauth
- See above.
- PublishComplete
- See above.
TGPublishSandboxData¶
<?xml version="1.0" encoding="UTF-8"?>
<policy name="TGPublishSandboxData">
<step class="textgrid.PublishStart">
<step class="textgrid.UpdateTgauth">
<step class="textgrid.ReleaseNearlyPublishedRelation">
<step class="textgrid.PublishComplete" />
</step>
</step>
</step>
</policy>
- PublishStart
- See above.
- UpdateTgauth
- See above.
- ReleaseNearlyPublishedRelation
- Releases the nearlyPublished relation in the TG-rep’s Sesame triple store and the ElasticSearch database – so that the object is viewable and searchable in the public TextGrid repository browser and TextGridLab search GUI.
- PublishComplete
- See above.
TGCopy¶
<?xml version="1.0" encoding="UTF-8"?>
<policy name="TGCopy">
<step class="textgrid.StartCopy">
<step class="textgrid.GatherObjectUris">
<step class="textgrid.ModifyAndCreate">
<step class="textgrid.CopyComplete" />
</step>
</step>
</step>
</policy>
- CopyStart
- Just marks the copy process started.
- GatherObjectUris
- Gets all referenced URIs from the objects out of the given URI list (out of all aggregations/editions/collections recursively), and adds every URI to the PublishResponse object list.
- ModifyAndCreate
- Retrieves every URI from the PublishResponse object list from the TG-crud, rewrites aggregation lists and other URIs includes, and creates a new TextGrid object in the project given.
- CopyComplete
- CopyComplete is called just to ensure the operation has finished successfully, and to report to logfiles, etc.
config.xml¶
config.xml or in this case config__tgpublish.xml is the main koLibRI configuration file. Here are the processStarters configured as well as all the ActionModules, and some global things as well. The TextGrid specific ProcessStarters and ActionModules all are described inside the config file (see description tags), and so we just refer to the file itself:
XML config file tags that are not documented are not used by the TG-publish (and not needed), you can look up their meaning in the koLibRI documentation, or the main koLibRI configuration file:
Logging¶
At the moment the koLibRI logs to stdout (see e.g. the Tomcat’s catalina.out log) and also to a logfile located at the configured logfile location (see config.xml). The logfile’s name has got a timestamp in it, and a new file is created every time the koLibRI Workflow Tool is started.