This project was my final submission for a database course instructed by Dr. Gheibi. The project was divided into three main phases:

  1. Database Design: The first phase required designing a database for a library, ensuring it adhered to the fifth normal form (5NF).
  2. ETL Pipeline: The second phase involved creating an ETL (Extract, Transform, Load) pipeline to synchronize a data warehouse with the operational database.
  3. Time Machine: The final phase was to develop a “time machine” feature that could restore the database to a specific point in the past.

For the ETL pipeline, a common approach is to replicate each database operation individually, but this can create significant overhead. Instead, I modeled the database as a Directed Acyclic Graph (DAG), where tables and their relationships were represented as vertices and edges. I then used a topological sort of this DAG to determine the optimal order for applying bulk insert, delete, and update operations.

Database as DAG

To implement the versioning functionality for the “time machine,” I created a table and two views for each table in the original database. Each table in the data warehouse included a column with the tstzrange datatype to store the valid time ranges for each row.

Versioning 2
Versioning 1

For more details and the complete source code, please visit the project repository.