RI Infrastructure Requirements Clause Samples
RI Infrastructure Requirements. The requirements for the infrastructure are driven by two main constraints: high scalability and data flexibility. The RI Engine is to be designed with special attention to scalability as a large amount of data will be collected from different sources of stakeholders (e.g., users). Considering scalability using two dimensions, the x-axis deals with running multiple copies of the application across servers, whereas the functionalities that are available scale over the y-axis. The former is addressed using load-balancing mechanisms which is nowadays a requirement for IaaS providers (e.g., Amazon Web Services, Microsoft Azure). This scalability dimension is usually addressed at a hardware level (i.e., through specialized servers and networking equipment) and is, therefore, not discussed in this document. The second level of scalability drove the decision to select a microservice architecture. Following such approach, the RI Engine functionalities are broken down into components and services so that each loosely-coupled service is responsible for a one or few closely related functionalities. At the same time, the RI Engine leverages a diversified set of data sources, most of which are unstructured (i.e., natural language). The approach deals with the need for two diverse set of data representation with a hybrid approach to data and metadata management. Therefore, on top of a relational storage mechanism which imposes a rigid data representation (i.e., a schema), the RI Engine uses a database model that does not separate the data from its schema. The relational storage is used for operations on data that has a strictly defined structure that is unlikely to change (e.g., bug report entry); the non-relational solution is used for data that cannot be constrained in shape as it changes from one instance to the other (i.e., the representation of a machine learning model). The two solutions presented above (i.e., microservice architecture and hybrid storage) address the volume (i.e., large amount of data in terms of size), velocity (i.e., frequency of incoming data to be processed), and variety (i.e., unstructured data) requirements typical of modern analytics platforms. The requirement about high scalability impact the overall architecture of the RI Engine (see Section 3.2), whereas the requirement regarding data representation impacts the DSL of the proposed architecture (see Section 3.3.12).
