The primary data source for the Rimac Data Lake is the Amazon S3 service. Here is where the different flat files generated by the Data Extractors are stored.
Amazon Redshift is the repository of the data transformed in the Data Lake. Amazon Redshift receives information from the ETL script transformations in EMR and from Amazon S3.
The Data Lake allows the use of EMR clusters to perform predictive analysis of the data stored in the Amazon S3 service.
Amazon Athena allows interactive queries in the Data Lake.
With Amazon SageMaker, Rimac can create, train and run its Machine Learning models.
To facilitate loading data into the Data Lake and provide a secure, easy-to-use data transfer and implementation system, AWS Storage Gateway provides a single point of access to the Data Lake structure in S3, which allows the copying of files to the Data Lake.