About

ITS 2.0 content analysis and terminology annotation are performed by a dedicated Terminology Annotation Web Service API – a Web-based interface for statistical term candidate annotation using state-of-the-art methods as well as term bank based term candidate annotation using terminology resources from EuroTermBank.

Euro Term Bank logo

The Terminology Annotation Web Service API can be integrated in various natural language processing workflows, for instance, machine translation, localization, terminology management and other tasks that may benefit from automatic terminology annotation.

The showcase and the underlying Terminology Annotation Web Service API are developed by Tilde.
The development is funded by the MultilingualWeb-LT project.

Tilde TAWS API

TAWS exposes a RESTful API over HTTP.

HTML5

Request

POST /api/html5 HTTP/1.1
Host: taws.tilde.com
Content-Length: 62

<!DOCTYPE html><html lang="en"><body>hello world</body></html>

Response

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8

<!DOCTYPE html>
<html lang="en">
<body its-annotators-ref="terminology|http://tilde.com/term-annotation-service">
<span its-term="yes" its-term-confidence="1">hello world</span>
</body></html>

XLIFF

Request

POST /api/xliff HTTP/1.1
Host: taws.tilde.com
Content-Length: 307

<?xml version="1.0" encoding="utf-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
<file original="hello.txt" source-language="en-us" target-language="lv-lv" datatype="plaintext">
<body>
<trans-unit id='1'>
<source>hello world</source>
</trans-unit>
</body>
</file>
</xliff>

Response

HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8

<?xml version="1.0" encoding="utf-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsx="http://www.w3.org/ns/its-xliff/"
       its:annotatorsRef="terminology|http://tilde.com/term-annotation-service">
 <file original="hello.txt" source-language="en-us" datatype="plaintext">
  <body>
   <trans-unit id="1">
    <source><mrk mtype="term" itsx:termConfidence="1">hello world</mrk></source>
   </trans-unit>
  </body>
 </file>
</xliff>

Plaintext

Request

POST /api/plaintext?lang=en HTTP/1.1
Host: taws.tilde.com
Content-Length: 11

hello world

Response

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8" />
</head>
<body its-annotators-ref="terminology|http://tilde.com/term-annotation-service">
  <span its-term="yes" its-term-confidence="1">hello world</span>
</body>
</html>

Parameters

Every single piece of text in a document must have a language identifier or it will not be annotated since the language is unknown. You can add a lang parameter to the query string to set the default language of the content without modifying the markup.

/api/html5?lang=en

The Domain data category identifies the topic of the document content. If the document contains no domain information, terminology from all domains is annotated. You can optionally add one or more domain parameters to the query string to set the default domain(s) of the content without modifying the markup. Each domain must be a TaaS domain code ("TaaS-" followed by four digits, e.g., TaaS-1500). A parent domain includes all child domains.

/api/html5?domain=TaaS-1500
/api/html5?domain=TaaS-2000&domain=TaaS-1501
/api/html5?lang=en&domain=TaaS-1501&domain=TaaS-2200

If your document contains references to external rules with relative paths (e.g., <link rel="its-rules" href="rules.xml">), you can add a baseUri parameter specifying an accessible base path (e.g., http://example.org/its/), otherwise the rules cannot be loaded by TAWS.

/api/html5?baseUri=http://example.org/its/
/api/html5?lang=en&baseUri=http://example.org/its/

By default, terminology is annotated using both the Statistical terminology annotation (statistical) and the Term bank based terminology annotation (termbank) method. To use only one method for annotation of terminology, specify it with a method parameter in the query string:

/api/html5?method=statistical
/api/html5?method=termbank
/api/html5?lang=en&domain=TaaS-1501&method=termbank

Note that the Domain data category is ignored if the Term bank based terminology annotation (termbank) method is not used.

Plaintext

For convenience, it is possible to annotate terminology in plaintext documents as well. The text will be converted and returned as an HTML5 document since there is no standard way to annotate terminology in plaintext using ITS 2.0 metadata.

Because there is no ITS 2.0 markup present in plaintext content, the lang parameter is mandatory for plaintext documents. You can optionally add one or more domain parameters to specify the domain of the text.

Response HTTP Status Codes

TAWS will respond with one of the following status codes:

The content of the response will be the annotated document or an error message in case of an error.

Limitations