The concept of Big Data is becoming increasingly important at Enterprise and Commercial levels in Chile. More Companies are willing to take full advantage of all the potential that lies in their data, and that is why AWS Chile and Morris & Opazo (Advanced Partner specialist in Big Data) organized during the morning on November 9th the Workshop “Building a Data Lake in AWS”, which was held in the Amazon Web Services offices in Chile, showing all their recognized knowledge in the APN awarding.
From AWS the workshop was provided by Jesús Federico (https://www.linkedin.com/in/jesusf/), AWS Chile Solutions Architect, who during the first part of the Workshop presented in a very clear and concise way what is AWS and the fundamental aspects of AWS Cloud; what are its main services, the Cloud Architecture Best Practices and different approaches to implement AWS Architectures, the Shared Responsibility Model, and Regions and Availability Zones in AWS.
This first part was crucial and allowed the assistants to the event (including Chief Technology Officers of companies like Forum, SmartHomy, DIN S.A., Maisasa, Entel, Grupo CGE, WOM, Facenote and abcdin) to start generating curiosity regarding how AWS Cloud Computing, the Serverless features and different AWS services could benefit several areas of their companies.
During the break, attendants were able to do networking and at the same time solve doubts and questions with the Workshop expositors.
This introduction also served as a prelude for the second part of the Workshop in which Marcelo Rybertt, our Country Manager, detailed exposed what is Big Data, what is a Data Lake, how is its Data Flow, and what are the services and tools that AWS provides us for the different data flow components.
To end the Workshop, William Guzmán (Chief Operations Officer, Morris & Opazo) presented in real-time 3 of the multiple use cases in which Big Data may support the operations of a Company. These included:
Real-Time Sentiment Analysis in Social Networks:
Twitter gives an outstanding opportunity to test the concept of Data Streaming. During the Workshop it was requested to the attendants to publish Tweets with certain keywords that would later be recognized by a Twitter application.
Previously in an EC2 instance a Nodejs application was run, which ‘listened’ to the generated tweets and selected those that matched with the configured keywords. Once tweets were detected, this Nodejs application feeded a data stream in Amazon Kinesis which sent to a S3 bucket the tweets with no analysis performed until that moment.
Once an object was stored in the Data Lake, an AWS Lambda function was ‘triggered’ that, supported by Amazon Comprehend, identified the entities from each tweet and performed a sentiment analysis. If the original idiom of the tweet was different from English, the Lambda function used Amazon Translate to generate the corresponding translation to this idiom. The result of these 3 analysis (entities, sentiments and translations) was stored back in the Data Lake (in different buckets).
Amazon Athena was in charge of performing analysis to non-structured data generated by the Lambda function, and made this information available for any consumer. The chosen one for the Workshop was Amazon Quicksight, in which the attendants were able to see how tweets that were created by themselves during the session showed up in the different sections of the dashboard:
- Most cited entities types
- Sentiments per minute
- Original text, translation and sentiment for each tweet
- Geolocation of the tweet
Analysis of Call Center Calls with Machine Learning
Through a dashboard implemented in Kibana, companies that assisted to the Workshop saw how different audio files were processed and analyzed to identify in them the keywords, entities, texts, data and metadata that each one contained. With this exercise they evidenced one of the Best Practices for Cloud Architecture, given that there was no rigid coupling between each component of the data flow (for this case Kibana was used as consumer of the Data Lake, a different system from previous case but having on S3 as Data Lake).
Image and Video Recognition using Machine Learning
Those attending the event could see how a system supported by Amazon Rekognition was trained by uploading a photo of a person, and then identify in a different video or photo whether or not the face of the person with whom the system was trained was found.
It was not only possible to find this kind of matches. In addition to this, Rekognition delivered detailed information about the labels, features, characteristics and celebrities found in the video or photo in real time.
It is precisely this kind of activities that allow AWS and Morris & Opazo to bring to the companies the full potential of Cloud Computing, in this particular case, the added value of Big Data solutions. But above all, they allow us to hear ‘first hand’ the concerns generated by technical issues addressed in a face-to-face, practical and collaborative manner. To all the attendees, thanks a lot for your active participation in our Workshop!
Text: Morris & Opazo / WG