Terabase

Intelligent data collection and analysis to tell you what's going on in the world right now, currently under development.

Terabase uses a number of connected systems to collect, analyse and visualise data that it gathers from around the web. In this implementation, data is gathered from news websites to display an aggregated view of what's going on in the world.

However, Terabase has been designed to be adapted to a number of use-cases and I've also tested it with gathering & analysing Cryptocurrency data and connecting it to a mock trading bot to see how it performs with trading Cryptocurrencies.

How it works

Data is ingested using the NetWatch crawler. This data is sent to a MySQL database and to a RabbitMQ server, where's it's split into two queues - one for further analysis and one for streaming to the UI.

The analysis system picks up messages as they're sent from NetWatch on the queue. Natural Language Processing is then performed on the messages to determine the subject of a news article, the locations in which it relates to, and sentiment analysis. This is all done using the spaCy library which provides a number of pre-trained models for NLP.

The analysis system also groups related topics and serves a leaderboard of trending topics over HTTP for the UI to consume.

The web UI, built with React, then consumes the WebSocket stream of live data provided by NetWatch and the HTTP API provided by the Analysis system to display the information to users.

Architecture

Architecturally, Terabase follows the microservice pattern. There are a number of interconnected systems which communicate via RabbitMQ, HTTP and WebSockets where appropriate. Each system resides in its own Docker container - which I've written about in more detail.