Understanding the Maven Movies Dataset
The Maven Movies dataset is a fictional movie rental database designed to simulate the day-to-day operations of a video rental business. It serves as an excellent practical environment for learning and applying SQL skills to solve real-world business problems. The dataset is structured relationally, meaning its various pieces of information are organized into interconnected tables, allowing for complex queries and deep insights.
Your Maven Movies dataset is represented by the following 12 CSV files, each corresponding to a table in the database:
ACTORS.csv
(Actor): Contains uniqueactor_id
values, along withfirst_name
andlast_name
of actors.actor_award.csv
(Actor Award): Providesactor_award_id
and details onawards
received by actors, linking to theactor_id
.address.csv
(Address): Stores physical location details includingaddress_id
,address
(street),district
,city_id
,postal_code
, andphone
. This is a central table for connecting people (customers, staff) and places (stores) to their geographical points.advisor.csv
(Advisor): Listsadvisor_id
,first_name
,last_name
, andis_chairmain
status for various business advisors.category.csv
(Category): Defines the genres of films, withcategory_id
andname
(e.g., ‘Action’, ‘Comedy’, ‘Horror’).city.csv
(City): Containscity_id
,city
name, andcountry_id
, linking addresses to specific urban areas.country.csv
(Country): Listscountry_id
andcountry
names, providing the highest level of geographical context.customer.csv
(Customer): Comprehensive customer details includingcustomer_id
,store_id
(their preferred store),first_name
,last_name
,email
,address_id
, andactive
status.film.csv
(Film): Detailed movie information such asfilm_id
,title
,description
,release_year
,rental_duration
,rental_rate
,length
,replacement_cost
,rating
, andspecial_features
.film_actor.csv
(Film Actor): A bridge table (film_id
,actor_id
) to establish a many-to-many relationship between films and actors (a film can have many actors, an actor can be in many films).staff.csv
(Staff): Records information about employees, includingstaff_id
,first_name
,last_name
,email
,store_id
(where they work), andaddress_id
. This table is essential for analyzing staff performance and identifying store managers.store.csv
(Store): Defines the individual store locations withstore_id
,manager_staff_id
(linking to thestaff_id
of the store’s manager), andaddress_id
(linking to the store’s physical location).
Relational Design: The dataset’s strength lies in its relational model. Tables are connected through primary keys (unique identifiers for rows within a table, e.g., film_id
in film
) and foreign keys (columns in one table that refer to a primary key in another, e.g., store_id
in customer
references store_id
in store
). These explicit relationships enable you to JOIN
tables and retrieve combined information, which is fundamental to performing holistic business analysis.