Architecture Overview
The primary components of the Twi-XL API architecture are depicted below:
Figure - The Twi-XL Architecture
twi-xl-python
The twi-xl-python package is a Python library for interfacing with the Twi-XL API and download the query results.
Twi-XL API
The Twi-XL API is the interface to the Twi-XL functionality. This interface is responsible to translate the incoming requests - from the twi-xl-python library - to search tasks and return the results.
Athena
The Athena is the interactive query service that is used to analyze the TwiNL Twitter archive.
TwiNL archive
The TwiNL twitter archive is stored in an Amazon S3 Bucket. The twitter messages are aggregated, partioned and compressed to reduce the total size and improve the search performance.
Athena results
The results of the Athena queries are stored in an Amazon S3 bucket. These results are automatically downloaded by the twi-xl-python
Twitter scraper
The TwiXL scraper is responsible for collecting new Dutch tweets and store them in the Raw tweets bucket.
Raw tweets
The raw tweets are stored in an Amazon S3 Bucket.
Step Function workflow
The Step Function workflow collects and compress the latest scraped tweets and are added to the TwiNL archive. Every night this workflow is started.