Workshop Big Data: “Building a Data Lake in AWS”

The concept of Big Data is becoming increasingly important at Enterprise and Commercial levels in Chile. More Companies are willing to take full advantage of all the potential that lies in their data, and that is why AWS Chile and Morris & Opazo (Advanced Partner specialist in Big Data) organized during the morning on November 9th the Workshop “Building a Data Lake in AWS”, which was held in the Amazon Web Services offices in Chile, showing all their recognized knowledge in the APN awarding.

From AWS the workshop was provided by Jesús Federico (https://www.linkedin.com/in/jesusf/), AWS Chile Solutions Architect, who during the first part of the Workshop presented in a very clear and concise way what is AWS and the fundamental aspects of AWS Cloud; what are its main services, the Cloud Architecture Best Practices and different approaches to implement AWS Architectures, the Shared Responsibility Model, and Regions and Availability Zones in AWS.

Categorías de servicios disponibles en la Nube de AWS

Categories of Available Services in AWS Cloud


This first part was crucial and allowed the assistants to the event (including Chief Technology Officers of companies like Forum, SmartHomy, DIN S.A., Maisasa, Entel, Grupo CGE, WOM, Facenote and abcdin) to start generating curiosity regarding how AWS Cloud Computing, the Serverless features and different AWS services could benefit several areas of their companies.

Muy contento de esta invitación al evento. La verdad es muy constructivo ir aprendiendo, no sólo de lo que digamos es AWS, sino de su partner. ¡Así que muy agradecido!" (Luis Contreras, Compañía General de Electricidad S.A.)

I’m very happy with this event invitation. Honestly it has been very constructive to learn, not only about AWS, but also about its partner. So I’m very thankful. (Luis Contreras, Compañía General de Electricidad S.A.)


During the break, attendants were able to do networking and at the same time solve doubts and questions with the Workshop expositors.

Eliana Correa, Greenfield Sales AWS Chile, compartiendo experiencias con los asistentes al Workshop "Construyendo un Data Lake en AWS"

Eliana Correa, AWS Chile Greenfield Sales, sharing experiences with the attendants of the Workshop “Building a Data Lake in AWS”

 

 Durante el break del Worskhop fue posible encontrarse con los conocimientos y lecciones aprendidas de nuestro Especialista en Big Data, Marcelo Rybertt

During the Workshop break it was possible to meet the knowledge and lessons learned by our Big Data Specialist, Marcelo Rybertt

 

Una de las potencialidades de los Workshop de AWS y Morris & Opazo: Resolver dudas e inquietudes de manera personalizada (derecha = Jesús Federico, Solutions Architect de AWS Chile; izquierda = Luis Contreras, Jefe de Tecnología de CGE).

One of the strengths of AWS and Morris&Opazo Workshop: To solve doubts and questions in a personalized way (right = Jesús Federico, AWS Chile Solutions Architect; left = Luis Contreras, Chief Technology Officer, CGE).


This introduction also served as a prelude for the second part of the Workshop in which Marcelo Rybertt, our Country Manager, detailed exposed what is Big Data, what is a Data Lake, how is its Data Flow, and what are the services and tools that AWS provides us for the different data flow components.

To end the Workshop, William Guzmán (Chief Operations Officer, Morris & Opazo) presented in real-time 3 of the multiple use cases in which Big Data may support the operations of a Company. These included:

 Big Data no fue el único protagonista del Workshop. Marcelo Rybertt explicó también qué es Dark Data y el por qué es importante tenerla en consideración

Big Data was not the only protagonist of the Workshop. Marcelo Rybertt also explained what is Dark Data and why it is important to keep it in mind


Real-Time Sentiment Analysis in Social Networks:

Twitter gives an outstanding opportunity to test the concept of Data Streaming. During the Workshop it was requested to the attendants to publish Tweets with certain keywords that would later be recognized by a Twitter application.
Previously in an EC2 instance a Nodejs application was run, which ‘listened’ to the generated tweets and selected those that matched with the configured keywords. Once tweets were detected, this Nodejs application feeded a data stream in Amazon Kinesis which sent to a S3 bucket the tweets with no analysis performed until that moment.

Once an object was stored in the Data Lake, an AWS Lambda function was ‘triggered’ that, supported by Amazon Comprehend, identified the entities from each tweet and performed a sentiment analysis. If the original idiom of the tweet was different from English, the Lambda function used Amazon Translate to generate the corresponding translation to this idiom. The result of these 3 analysis (entities, sentiments and translations) was stored back in the Data Lake (in different buckets).

Amazon Athena was in charge of performing analysis to non-structured data generated by the Lambda function, and made this information available for any consumer. The chosen one for the Workshop was Amazon Quicksight, in which the attendants were able to see how tweets that were created by themselves during the session showed up in the different sections of the dashboard:

  • Most cited entities types
  • Sentiments per minute
  • Original text, translation and sentiment for each tweet
  • Geolocation of the tweet

Jesús Federico, Solutions Architect de AWS Chile, explicando a la audiencia los conceptos básicos de la Nube de AWS.

Jesús Federico, AWS Chile Solutions Architect, explaining to the audience the basic concepts of AWS Cloud.


Analysis of Call Center Calls with Machine Learning

Through a dashboard implemented in Kibana, companies that assisted to the Workshop saw how different audio files were processed and analyzed to identify in them the keywords, entities, texts, data and metadata that each one contained. With this exercise they evidenced one of the Best Practices for Cloud Architecture, given that there was no rigid coupling between each component of the data flow (for this case Kibana was used as consumer of the Data Lake, a different system from previous case but having on S3 as Data Lake).


Image and Video Recognition using Machine Learning

Those attending the event could see how a system supported by Amazon Rekognition was trained by uploading a photo of a person, and then identify in a different video or photo whether or not the face of the person with whom the system was trained was found.

It was not only possible to find this kind of matches. In addition to this, Rekognition delivered detailed information about the labels, features, characteristics and celebrities found in the video or photo in real time.

It is precisely this kind of activities that allow AWS and Morris & Opazo to bring to the companies the full potential of Cloud Computing, in this particular case, the added value of Big Data solutions. But above all, they allow us to hear ‘first hand’ the concerns generated by technical issues addressed in a face-to-face, practical and collaborative manner. To all the attendees, thanks a lot for your active participation in our Workshop!

 

La activa participación de los asistentes fue vital durante todo el desarrollo del Workshop "Construyendo un Data Lake en AWS”

The active participation of the attendants was vital through the entire Workshop “Building a Data Lake in AWS”


Text:
Morris & Opazo / WG

AWS Partner
Microsoft Partner





Bitnami