Understanding the Maven Movies Dataset

SQL Certification

Understanding the Maven Movies Dataset

The Maven Movies dataset is a fictional movie rental database designed to simulate the day-to-day operations of a video rental business. It serves as an excellent practical environment for learning and applying SQL skills to solve real-world business problems. The dataset is structured relationally, meaning its various pieces of information are organized into interconnected tables, allowing for complex queries and deep insights.

Your Maven Movies dataset is represented by the following 12 CSV files, each corresponding to a table in the database:

ACTORS.csv (Actor): Contains unique actor_id values, along with first_name and last_name of actors.
actor_award.csv (Actor Award): Provides actor_award_id and details on awards received by actors, linking to the actor_id.
address.csv (Address): Stores physical location details including address_id, address (street), district, city_id, postal_code, and phone. This is a central table for connecting people (customers, staff) and places (stores) to their geographical points.
advisor.csv (Advisor): Lists advisor_id, first_name, last_name, and is_chairmain status for various business advisors.
category.csv (Category): Defines the genres of films, with category_id and name (e.g., ‘Action’, ‘Comedy’, ‘Horror’).
city.csv (City): Contains city_id, city name, and country_id, linking addresses to specific urban areas.
country.csv (Country): Lists country_id and country names, providing the highest level of geographical context.
customer.csv (Customer): Comprehensive customer details including customer_id, store_id (their preferred store), first_name, last_name, email, address_id, and active status.
film.csv (Film): Detailed movie information such as film_id, title, description, release_year, rental_duration, rental_rate, length, replacement_cost, rating, and special_features.
film_actor.csv (Film Actor): A bridge table (film_id, actor_id) to establish a many-to-many relationship between films and actors (a film can have many actors, an actor can be in many films).
staff.csv (Staff): Records information about employees, including staff_id, first_name, last_name, email, store_id (where they work), and address_id. This table is essential for analyzing staff performance and identifying store managers.
store.csv (Store): Defines the individual store locations with store_id, manager_staff_id (linking to the staff_id of the store’s manager), and address_id (linking to the store’s physical location).

Relational Design: The dataset’s strength lies in its relational model. Tables are connected through primary keys (unique identifiers for rows within a table, e.g., film_id in film) and foreign keys (columns in one table that refer to a primary key in another, e.g., store_id in customer references store_id in store). These explicit relationships enable you to JOIN tables and retrieve combined information, which is fundamental to performing holistic business analysis.