Streamlining Data Processing using Apache Airflow, MongoDB, SparkSQL, and Google Composer


Initial Project Proposal

"Develop a data pipeline in Airflow to build an application that functions as a social media platform for small groups of friends. The application will allow users to connect their Spotify accounts and share their listening history. This data can then be used to create a dashboard that visualizes aggregated data for user listening habits and preferences."

The Finished Product

A Dynamic Dashboard for Sharing Music Taste with Friends
Dashboard implementation with MongoDB Charts

A fully automated data pipeline was implemented to create a dashboard that visualizes listening history. Users can connect their Spotify accounts via OAuth2 authentication, and the pipeline collects and processes data using Apache Airflow and Google Composer. Data is stored in MongoDB Atlas, and the dashboard is built using MongoDB Charts, enabling users to share their music preferences and listening habits with friends and family. The dashboard is refreshed in real-time, automatically updating from MongoDB Atlas, allowing for the exploration and comparison of music tastes as a shared experience.

For a full walkthrough of the project, please visit the Medium article.
To view the code and the project on GitHub, please visit the GitHub repository.