Import Using the koLibRI JAR File¶

Java Version¶

Java8 JDK is recommended to use right from the start!

Downloading the Software¶

Please download at first (i) the configuration and folder ZIP file asset kolibri-addon-textgrid-import.zip and (ii) the koLibRI Command Line Java Application kolibri-cli.jar prepared for TextGrid import usage. You can find both files on the latest

koLibRI Release Page

under Assets > Other.

Please extract the file kolibri-addon-textgrid-import.zip to your preferred working folder, you can find all needed config files and templates in there.

The file kolibri-cli-[version].jar is the application to manage all the import processes. Please put this file directly into the main kolibri-addon-textgrid-import folder.

You now should have a folder structure like the following (files concerning import only):

📁 kolibri-addon-textgrid-import
  📁 config
    📁 koLibRI
    📁 rewrite
    📁 transformation
    📄 textgrid_metadata_template.xml
    📄 tglab_config_test.xml
    📄 tglab_config.xml
    📄 tgrep_config_test.xml
    📄 tgrep_config.xml
  📁 folders
    📁 hotfolder
      📁 data
    📁 log
    📁 metadata-responses
    📁 temp
  📄 kolibri-cli-[version].jar
  📄 koLiGO.bat
  📄 koLiGO.sh
  📄 koLiPRO.sh

Testing the Installation¶

Step-by-step – Testing the Test Server¶

Open the TG-lab and set the confserv to https://test.textgridlab.org/1.0/confserv (see Preferences > TextGridLab Server / Proxy > Configuration Service URI) and let the TG-lab restart.
Login to the newly restarted TG-lab and get a Session ID for the test server (see Help > Authentication > The current session ID) and copy the ID, should be something like: JTSFa5Gw4s2SAasi6Efo69V3r3j9oRH6zC9SxtaZatbMKQhxHm2RVDRym34AQNM3HG79.
Create a new project in the TG-lab (File > New Project) and also copy the project ID, should be something like: TGPR-8514a3cf-c880-4338-8f3c-5e32b3ed678e.
Copy the two values to your TG-lab import configuration file tglab_config_test.xml and put them as values under projectId and rbacSessionId (see PLEASE INSERT PROJECT ID HERE and PLEASE INSERT RBAC SESSION ID HERE).
Copy a file or a folder as test data into the folders/hotfolder/data folder. Your files will only be available for you and all the persons that you added to your project using the User Management of the TG-lab.
Start the import as described in Starting the koLibRI Workflow Tool.
The console messages should be like:

[INFO]     <DONE>     data  -->  All files have been successfully submitted in 41 seconds
[INFO]     The process queue has been processed. Total time elapsed: 837 milliseconds
[INFO]     Everything has been done! This process was logged to file ./kolibri-addon-textgrid-import/folders/log/kolibri.log

If you get no ERRORs in the console output, you should see all the imported files and folders in the TG-lab’s Navigator as objects of your project.

Configuration¶

Chosing Configuration File from Template¶

There are four template configuration files in the config/ folder, please choose one of the files according to your import plans:

tglab_config.xml¶

is to be used to import data into the TG-lab, so you can work with your data inside the chosen TextGrid project. The data will not be visible to users other than you and the users you decide to share it with using the TG-lab project management. All non-public services are preconfigured in this file.

tgrep_config.xml¶

is to be used to import directly to the TG-rep. Your data is visible to the public immediately (at first in the TextGrid Repository Sandbox only, and after finally publishing for everyone and everywhere). All public services are preconfigured in this file.

The two _test files please use to import to the test server https://test.textgridlab.org. All public resp. non-public services of the test server are preconfigured in this files.

Editing the Config File¶

Commonly Used Settings¶

In the <common><property> section you find the commonly used settings. Please set the config values below accordingly.

<field>defaultPolicyName</field>

Setting the import policy: The parameter defaultPolicyName can address the following policies. Edit the tgrep or tglab config file (tglab_config.xml or tgrep_config.xml) of your choice, and chose a value. Depending on your import policy, other configuration values have to be set, please see below. aggregation_import is the default value and already set.

aggregation_import

This policy is used to automatically create TextGrid metadata for each file out of the file name and the detected file format. For every folder a TextGrid aggregation is created and imported, so the folder structure will appear in TextGrid the same as in the import folder itself. All root folders, means every folder that is contained direcctly in the data folder, will be imported as a collection, all child folders will become aggregations.

prepare_aggregation_import

Does everything as aggregation_import, just the last step submitting the files to the TextGrid is NOT processed! Please feel free to use this policy for testing or refining your metadata after generating the metadata files prior to import! Gets URIs from TG-crud, creates aggregations for every folder, and generates PIDs (if configured so, please only use for public TG-rep). Aggregations and object metadata files can be taken from the appropriate temp folder and then be edited (please DO NOT change filenames of the data and metadata files!) and afterwards put into a new hotfolder and then be imported to the TG-rep or TG-lab using the policy continue_import. You also can change the format of the aggregations to edition or collection. Please refer to the TextGrid Metadata Schema!

complete_import

If you use this policy, all given files simply are imported, no additional metadata is created, so you need to have a complete set of TextGrid objects including TextGrid metadata. TextGrid URIs are taken from TG-crud whenever needed, so your files must be linked to each other (such as aggregation references) by local file pathes. File extensions for existing TextGrid editions, collections, works, aggregations, XML and metadata files can be configured if needed, but it is recommended to use the default ones and not change them.

prepare_complete_import

Does everything as complete_import, just the last step submitting the files to the TextGrid is NOT processed! Please feel free to use this policy for testing or refining your metadata after running all pre-flight tests and rewriting prior to import! Gets URIs from TG-crud, takes data and metadata files from the hotfolder, and generates PIDs (if configured so, please only use for public TG-rep). Aggregations and object metadata files can be taken from the appropriate temp folder and then be edited (please DO NOT change filenames of the data and metadata files!) and afterwards put into a new hotfolder and then be imported to the TG-rep or TG-lab using the policy continue_import. Please refer to the TextGrid Metadata Schema!

continue_import

Use this policy to continue a broken or stopped import (e.g. in case of an error). Just configure the hotfolder to be the temp folder, the files were processed in.

delete_import

An already imported set of objects can be deleted from the sandbox again. Uses the TG-crud service directly. This can be used with an URI list (as a file) or by giving a root URI. Please see configuration of the class DeleteFiles.

publish_import

An already imported set of objects will be finally published. Uses the TG-publish service. This can be used with an URI list (as a file) or by giving a root URI. Please see configuration of the class PublishFiles.

dfgviewermets_import

Takes as input one (or more) DFG Viewer METS file according to the DFG Viewer METS Specification and creates a folder structure from the physical and logical StructMap, that then is imported into the TextGrid. MODS and/or TEI metadata will be mapped to TextGrid metadata via existing MODS/TEI XSL transformation files, or can be done via custom XSL files.

<field>rbacSessionId</field> and <field>projectId</field>

Authentication and project settings: Please add the two values with your TextGrid Project ID (projectId) and your Session ID (rbacSessionId).

The TextGrid Session ID you get if you log in to the TextGridLab and chose Help and then Authentication from the TextGridLab’s menu bar. Just copy The current session ID.

A TextGrid project ID you get if you do create a project within the TextGridLab, please chose File and New Project from the menu bar and add project’s name and description. You can then copy the project ID (something like TGPR-aafea537-4daa-358a-c446-5901ee71d8d2) from the newly opened User Management view.

Aggregation Import Configuration¶

If you are using aggregation_import, just set the data as described above and run the koLibRI.

<field>hotfolderDir</field>

Choosing a hotfolder: As hotfolder ./folders/hotfolder/ is pre-configured. Just copy your data to publish into the data/ folder. The data is copied before processing starts, so the original data will not be touched. If chosen aggregation_import as policy, please put only ONE folder in the hotfolder containing files and folders to import 8this would be the already existing data/ folder). All those files will be imported in ONE TextGrid project as files and aggregations.

Please do not forget to add rightsHolder to the textgrid_metadata_template file, and a collector and id to the collections_metadata_template file, if you want to import to the TextGridRep directly.

Complete Import Configuration¶

If you are using complete_import, just set the data as described above and run the koLibRI.

<field>hotfolderDir</field>

Choosing a hotfolder: As hotfolder ./folders/hotfolder/ is pre-configured. Just copy your data to publish in the data/ folder. The data is copied before processing starts, so the original data will not be touched. All data will exactly be imported as prepared by the user. Please note that you do also need metadata files according to the TextGrid Metadata Schema. Everything else works according to the aggregation import hotfolderDir documentation.

<field>createNewRevisions</field>

Set this flag to true if you want to import new revisions of all your (existing) files to TG-lab or TG-rep.

Please note:

At revision import all your objects to be revisioned must have TextGrid URIs instead of local file pathes in the TextGrid metadata. You need to have revision URIs here!

It is not sufficient to have only the TextGrid URIs in the object’s metadata due to TextGrid metadata validation! You will need to have a complete <generated> tag in all metadata files! You can just copy the <generated> tag you get from the TG-crud for each object, or just export your data from the TextGridRep or TextGridLab.

Only the existing files can be revisioned, new files must still be coped with.

New PIDs are created for every new revision.

DFG Viewer METS Import Configuration¶

If you are using dfgviewermets_import, just set the data as described above and run the koLibRI.

<field>hotfolderDir</field>

Choosing a hotfolder: As hotfolder ./folders/hotfolder/ is pre-configured. Put all your METS files directly into the hotfolder/ folder. For each METS file there will be created a root Aggregation/Edition/Collection, please see below. It is possible to put more than one METS file into the hotfolder. koLibRI then processes the import concurrently with a configurable number of threads (please see general configuration options in the koLibRI configuration file).

<field>rootAggregationMimetype</field>

DFG Viewer aggregations: For DFG Viewer Import you can chose the format of your root aggregation (there is one root aggregation for every METS file). It can be chosen to be imported as a TextGrid Aggregation (text/tg.aggregation+xml), Edition (text/tg.edition+tg.aggregation+xml) or Collection (text/tg.collection+tg.aggregation+xml).

Please note: Custom XSLT stylesheets for metadata creation can be specified in the properties of <class name=”actionmodule.textgrid.DfgViewerMetadataProcessor”>.

Publish Configuration¶

In <modules><class name=”actionmodule.textgrid.PublishFiles”> you can find the configuration for the final publishing process of sand-boxed objects. To finally publish your objects after sandbox publishing - every koLibRI import is published to the sandbox first - you must use the policy publish_import.

<field>objectUri</field>

Please use import mapping file location, project ID, or root URI of TextGrid object to assemble TextGrid objects to be published, please use one of the following:

<value>file:./folders/temp/1470065621459_data_URI.imex</value> (URI mapping file)
<value>file:./folders/temp/1470065621459_data_PID.imex</value> (PID mapping file)
<value>textgrid:12345.0</value> (TextGrid URI)
<value>project:TGPR-f1867520-4a53-9ced-9da5-503762ba0f61</value> (project ID)

If you are using a TextGrid URI as an object URI, all objects of an edition or collection are being published, including the collection itself. If a single TextGrid item is referenced (no aggregation), only this item will be published.

<field>dryrun</field>

Use to check what will happen before publishing anything (recommended)! Will not publish anything unless set to false.

Delete Configuration¶

In <modules><class name=”actionmodule.textgrid.DeleteFiles”> you can find the configuration for the deletion of objects. Already published data can still be deleted, if it was imported into the TextGrid Sandbox and has not yet been finally published using the publish_import policy. To delete objects (in general or from the sandbox), change the policy to delete_import.

<field>objectUri</field>

Please use import mapping file, project ID, or root URI of TextGrid object to assemble TextGrid objects to be deleted, please use one of the following:

<value>file:./folders/temp/1470065621459_data_URI.imex</value> (URI mapping file)
<value>file:./folders/temp/1470065621459_data_PID.imex</value> (PID mapping file)
<value>textgrid:12345.0</value> (TextGrid URI)
<value>project:TGPR-f1867520-4a53-9ced-9da5-503762ba0f61</value> (project ID)

If using TextGrid URI as object URI, all objects of an edition or collection are being deleted, including the collection itself. If a single TextGrid item is referenced (no aggregation), only this item will be deleted.

<field>dryrun</field>

Use to check what will happen before deleting anything. Will not delete anything unless set to false (recommended)!

Project Specific Landing Page and Project Metadata¶

First Spezial File: README.md¶

You can create a project landing page for your project to be used on textgridrep.org using a file called README.md (title in metadata must be README.md!) and format set to text/markdown published on the projects root as technical metadata or with kolibri. In this case this file is shown embedded at the bottom of the project overview (see example).

The syntax for the README.md file must be Markdown, the metadata file (if using TG-import) should be like:

<?xml version="1.0" encoding="UTF-8"?>
<object xmlns="http://textgrid.info/namespaces/metadata/core/2010"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://textgrid.info/namespaces/metadata/core/2010
    https://www.textgridlab.org/schema/textgrid-metadata_2010.xsd">

    <generic>
        <provided>
            <title>README.md</title>
            <format>text/markdown</format>
        </provided>
    </generic>

    <item>
        <rightsHolder id="">***mandatory for publishing only**</rightsHolder>
    </item>
</object>

Second Spezial File: portalconfig.xml¶

It is also possible to change the project specific presentation page and the project description in the projects overview listing on textgridrep.org.

To do so, put a file named portalconfig.xml (title in metadata must be portalconfig.xml!) with format set to text/tg.portalconfig+xml in the root of your project and publish it as technical metadata from within the TextGridLab (or with TG-import):

<?xml version="1.0" encoding="UTF-8"?>
<portalconfig
  xmlns="http://textgrid.info/namespaces/metadata/portalconfig/2020-06-16"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://textgrid.info/namespaces/metadata/portalconfig-2020-06-16
    https://textgridlab.org/schema/textgrid-portalconfig_2020-06-16.xsd">

  <!-- Name and description of this project: How should it be listed in textgridrep.org/projects -->
  <name>Owl Archive</name>
  <description>A collection of owls</description>
  <!-- Avatar is a TextGrid URI pointing to a published image, it will be shown in 
    250x250px -->
  <avatar>textgrid:3t9n4.0</avatar>
  <!-- this will overwrite the default xml stylesheet -->
  <xslt>
    <!-- for html rendering -->
    <html>textgrid:3vb1m.4</html>
  </xslt>
</portalconfig>

The metadata file should then be like (if using TG-import):

<?xml version="1.0" encoding="UTF-8"?>
<object xmlns="http://textgrid.info/namespaces/metadata/core/2010"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://textgrid.info/namespaces/metadata/core/2010
    https://www.textgridlab.org/schema/textgrid-metadata_2010.xsd">

    <generic>
        <provided>
            <title>portalconfig.xml</title>
            <format>text/tg.portalconfig+xml</format>
        </provided>
    </generic>

    <item>
        <rightsHolder id="">***mandatory for publishing only**</rightsHolder>
    </item>

</object>

This would create a list entry using the information from the portalconfig file in the projects overview listing, and would also modify the project’s landing page.

Configuring Rewriting of local file pathes and TextGrid URIs¶

In two modules local file pathes are rewritten to TextGrid URIs (RenameAndRewrite) respective TextGrid URIs to Handle PIDs (GetPidsAndRewrite) in TEI files. This TEI file rewrite can now be configured via custom rewrite files (such as config/rewrite/grenzboten-tei.xml). You can now specify an XML configuration in your rewrite file and address it in the import configuration field teiRewriteSpec as value file:./config/rewrite/grenzboten-tei.xml#tei. The <rw:xmlConfiguration xml:id=”tei” ...> is now used for rewriting local file pathes or TextGrid URIs to TextGrid URIs or Handle PIDs.

Editing the Metadata Template File¶

The config file for the metadata generation used by some policies such as aggregation_import and dfgviewermets_import in module textgrid.TextgridMetadataProcessor (textgrid_metadata_template.xml) is used for the creation of metadata for every file to be imported! The metadata stated in this file is used for metadata file creation and can be edited according to the TextGrid Metadata Schema. Metadata not fitting in the schema will not be accepted.

Logging and Keeping¶

All imports are logged to the file /folders/log/kolibri.log. Please keep all the folders in the /folders/temp/ folder, and especially all the files with suffix _URI.imex for later publication or deletion policies. If PIDs are created, the PID mapping is stored to _PID.imex files. These files format is also being used in the TG-lab import and export module.

Change More Parameters?¶

DON’T!

There is some more information for every config file value in the description tags of each value in the config file’s module class definitions. Please do not change anything else unless you are really sure about it!

Hints and Tricks¶

If as hotfolder is given a directory with files contained only the import will do nothing, because the koLibRI will import the one and only directory from WITHIN the hotfolder. If you want to import the files contained in the hotfolder, too, just set the readDirectoriesOnly flag of the processstarter.MonitorHotfolder to FALSE! Beware: All rewriting will be restricted to single files now (so no rewriting will happen at all!) because every file will be handled one after another!

Starting the koLibRI Workflow Tool¶

If everything is configured correctly and all the data is copied, koLibRI can be started. Change into your work directory kolibri-addon-textgrid-import containing the JAR file, the config and folder directories and the koLiGO scripts and type

./koLiGO.sh config/tglab_config.xml

in a Linux console and MAX OS terminal or

koLiGO.bat config\tglab_config.xml

in a Windows/DOS command shell.

You can use a special Linux Bash import script that shows console progress bars for the aggregation_import policy, very useful to monitor the progress of the import of a large amount of data respective a huge amount of files. The process is logged to a file nohup.out then and uses the output for showing the progress bars. Please use it as follows instead of the koLiGO.sh script:

nohup ./koLiGO.sh config/tgrep_config.xml & sleep 2; ./koLiPro.sh

Please do not forget to delete old nohup.out files bofore starting a new import process!

If you do need more memory for your koLibRI import, you possibly get an error like java.lang.OutOfMemoryError: GC overhead limit exceeded, then please increase the -Xmx value of your koLiGO script. This depends on your computer’s memory, please just try to use 4096M or even more...

You can check the status of your imports either in the TextGridLab’s project you imported in or in the TextGridRep Sandbox, depending on your configuration. For using the correct charsets (depending on your local charset configuration some special chars, such as ö, ä, ü, may not be correctly processed), the -D trigger in the koLiGO scripts already are set to UTF-8. Furthermore you are allowed to use more then the default 50.000 XML child objects, so we set this to 500.000, as needed for handling projects with many objects.