Architecture Overview

The primary components of the Twi-XL API architecture are depicted below:

_images/architecture_overview.png

Figure - The Twi-XL Architecture

twi-xl-python

The twi-xl-python package is a Python library for interfacing with the Twi-XL API and download the query results.

Twi-XL API

The Twi-XL API is the interface to the Twi-XL functionality. This interface is responsible to translate the incoming requests - from the twi-xl-python library - to search tasks and return the results.

Athena

The Athena is the interactive query service that is used to analyze the TwiNL Twitter archive.

TwiNL archive

The TwiNL twitter archive is stored in an Amazon S3 Bucket. The twitter messages are aggregated, partioned and compressed to reduce the total size and improve the search performance.

Athena results

The results of the Athena queries are stored in an Amazon S3 bucket. These results are automatically downloaded by the twi-xl-python

Twitter scraper

The TwiXL scraper is responsible for collecting new Dutch tweets and store them in the Raw tweets bucket.

Raw tweets

The raw tweets are stored in an Amazon S3 Bucket.

Step Function workflow

The Step Function workflow collects and compress the latest scraped tweets and are added to the TwiNL archive. Every night this workflow is started.