About
ITS 2.0 content analysis and terminology annotation are performed by a dedicated Terminology Annotation Web Service API – a Web-based interface for statistical term candidate annotation using state-of-the-art methods as well as term bank based term candidate annotation using terminology resources from EuroTermBank.

The Terminology Annotation Web Service API can be integrated in various natural language processing workflows, for instance, machine translation, localization, terminology management and other tasks that may benefit from automatic terminology annotation.
The showcase and the underlying Terminology Annotation Web Service API are developed by Tilde.
The development is funded by the MultilingualWeb-LT project.
Tilde TAWS API
TAWS exposes a RESTful API over HTTP.
HTML5
Request
POST /api/html5 HTTP/1.1
Host: taws.tilde.com
Content-Length: 62
<!DOCTYPE html><html lang="en"><body>hello world</body></html>
Response
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
<!DOCTYPE html>
<html lang="en">
<body its-annotators-ref="terminology|http://tilde.com/term-annotation-service">
<span its-term="yes" its-term-confidence="1">hello world</span>
</body></html>
XLIFF
Request
POST /api/xliff HTTP/1.1
Host: taws.tilde.com
Content-Length: 307
<?xml version="1.0" encoding="utf-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
<file original="hello.txt" source-language="en-us" target-language="lv-lv" datatype="plaintext">
<body>
<trans-unit id='1'>
<source>hello world</source>
</trans-unit>
</body>
</file>
</xliff>
Response
HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
<?xml version="1.0" encoding="utf-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsx="http://www.w3.org/ns/its-xliff/"
its:annotatorsRef="terminology|http://tilde.com/term-annotation-service">
<file original="hello.txt" source-language="en-us" datatype="plaintext">
<body>
<trans-unit id="1">
<source><mrk mtype="term" itsx:termConfidence="1">hello world</mrk></source>
</trans-unit>
</body>
</file>
</xliff>
Plaintext
Request
POST /api/plaintext?lang=en HTTP/1.1
Host: taws.tilde.com
Content-Length: 11
hello world
Response
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
</head>
<body its-annotators-ref="terminology|http://tilde.com/term-annotation-service">
<span its-term="yes" its-term-confidence="1">hello world</span>
</body>
</html>
Parameters
Every single piece of text in a document must have a language identifier or it will not be annotated since the language is unknown. You can add a lang
parameter to the query string to set the default language of the content without modifying the markup.
/api/html5?lang=en
The Domain data category identifies the topic of the document content. If the document contains no domain information, terminology from all domains is annotated. You can optionally add one or more domain
parameters to the query string to set the default domain(s) of the content without modifying the markup. Each domain must be a TaaS domain code ("TaaS-" followed by four digits, e.g., TaaS-1500). A parent domain includes all child domains.
/api/html5?domain=TaaS-1500
/api/html5?domain=TaaS-2000&domain=TaaS-1501
/api/html5?lang=en&domain=TaaS-1501&domain=TaaS-2200
If your document contains references to external rules with relative paths (e.g., <link rel="its-rules" href="rules.xml">
), you can add a baseUri
parameter specifying an accessible base path (e.g., http://example.org/its/
), otherwise the rules cannot be loaded by TAWS.
/api/html5?baseUri=http://example.org/its/
/api/html5?lang=en&baseUri=http://example.org/its/
By default, terminology is annotated using both the Statistical terminology annotation (statistical
) and the Term bank based terminology annotation (termbank
) method. To use only one method for annotation of terminology, specify it with a method
parameter in the query string:
/api/html5?method=statistical
/api/html5?method=termbank
/api/html5?lang=en&domain=TaaS-1501&method=termbank
Note that the Domain data category is ignored if the Term bank based terminology annotation (termbank
) method is not used.
Plaintext
For convenience, it is possible to annotate terminology in plaintext documents as well. The text will be converted and returned as an HTML5 document since there is no standard way to annotate terminology in plaintext using ITS 2.0 metadata.
Because there is no ITS 2.0 markup present in plaintext content, the lang
parameter is mandatory for plaintext documents. You can optionally add one or more domain
parameters to specify the domain of the text.
Response HTTP Status Codes
TAWS will respond with one of the following status codes:
- 200 OK – document was annotated successfully;
- 400 Bad Request – invalid document or parameters passed to TAWS;
- 500 Internal Server Error – there is a problem with the service.
The content of the response will be the annotated document or an error message in case of an error.
Limitations
- Only input in UTF-8 is supported.
- Domain values are limited to TaaS codes.
- Only the first 50 000 characters of the submitted document will be annotated.
The remaining document will be returned to the user without annotated terminology.
This is a limitation for showcase purposes in order not to allow misuse of the service.