ETL jobs

Required Permission: Jobs management (Read more about Permissions)

Groovy scripts can be used in ETL scripts so it's possible to write scripts to delete files on local disk (and other potentially dangerous tasks). Please be careful when writing scripts and with granting the Jobs management permission to users.

Table of contents

The Scriptella engine with several custom drivers is used as the ETL module in Graphlytic. Please see the reference documentation of Scriptella for details : http://scriptella.org/reference/

ETL engine contains several drivers for database connection and data manipulation (JDBC): Postgresql, MySQL, MSSQL, Oracle, DB2, Derby, H2, HSQLDB, Sybase, XSL. Not all JDBC drivers are included in Graphlytic and can be included in a custom build of Graphlytic. Please contact us for more information.

ETL engine contains also other drivers like CSV, Neo4j, Mail, Groovy, Log, Text, XPath and more. Using the combination of these drivers it's possible to create many automatization scenarios for data processing, pattern searching, logging or notifications.

ETL jobs are XML files that can be executed on demand, planned to execute at a defined time or executed periodically (with CRON like expressions).

See the examples here for more information.

Create a new ETL job

  1. Main menu (top right) - Page Jobs

  2. Click on create new job

  3. Fill the form:

    • name of job

    • if there is need to run job periodically enter "cron expression" into field "New trigger" and click on button "Add". If there is need to remove runner, select "X" icon. Example of cron expression:

      • 0 45 13 * * ? - starts every day at 13:45

      • 0 0/15 * * * ? - starts every 15 minutes

      • 0/30 * * * * ? - starts every 30 seconds

    • enter script text into area "script" (see example of ETL job in next paragraph)

    • confirm by pressing "Create job".

Example of an ETL job

Write information to the log file

This ETL job writes some information into the log file.

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
<description>Test log</description>
<connection id="logInfo" driver="log">
level=INFO
</connection>
<connection id="groovy" driver="script">language=groovy</connection>
<script connection-id="groovy">
etl.globals['sysStart'] = new Date().format("yyyy-MM-dd'T'HH:mm:ss")
</script>
<script connection-id="logInfo">
start: ${etl.globals['sysStart']}
</script>
</etl>

Using predefined properties in ETL jobs

This ETL job shows how to use internally or externally defined properties in an ETL job.

Let's say we have a property file properties.conf with these values:

properties.conf
neo4j.connector.username=neo4j
neo4j.connector.password=admin

The external property file can be then used in a job:

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
<description>Use internal and external properties in a job</description>
<properties>
<include href="/path/to/external/property/file/properties.conf"/>
neo4j.connector.bolt=bolt://localhost:7687
neo4j.script.prop=some_prop_name
</properties>
<connection id="neo4j" driver="neo4j" url="$neo4j.connector.bolt" user="$neo4j.connector.username" password="$neo4j.connector.password"/>
<script connection-id="neo4j">
MATCH (n) SET n.$neo4j.script.prop="some value"
</script>
</etl>

Cron expressions

A cron expression is a string of 6 or 7 fields separated by spaces. Fields can contain any of the allowed values, along with various combinations of the allowed special characters for that field. The fields are as follows:

Field Name

Mandatory

Allowed Values

Allowed Special Characters

Seconds

YES

0-59

, - * /

Minutes

YES

0-59

, - * /

Hours

YES

0-23

, - * /

Day of month

YES

1-31

, - * ? / L W

Month

YES

1-12 or JAN-DEC

, - * /

Day of week

YES

1-7 or SUN-SAT

, - * ? / L #

Year

NO

empty, 1970-2099

, - * /

More examples can be found in the Quartz's library tutorial here: http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html

Manage Jobs

  1. Main menu (top right) - Page Jobs

  2. List of existing jobs is shown

  3. There are several icons next to the each job. You can execute several operations with job:

    • run - job is started immediately

    • view history of executions - view when was job started, when was finished and if there was an error during execution

    • update - you can change job details

    • delete - confirmation dialog is shown. Job is deleted after confirmation