Storing sales data properly and later combining it can lead to creating a predictive model with a high rate of accuracy. In this and the next few articles we’ll analyze a database design for recording sales.
Everyone lives by selling something.
Robert Louis Stevenson
In today’s world, selling products is ubiquitous. And salespeople who have access to robust tools that leverage historical data to analyze trends and enable an enterprise to adjust business strategies accordingly have an advantage over their competitors. There are lots of parameters that can affect company results: the current global economic situation, clients’ location, age, material and marital status, and history of previous contacts or sales to clients.
We’ll start with a very simple example: a database model for sales in a coffee shop. In subsequent articles, we'll extend the model to selling products in other branches.
Sales Model
In this article we’ll analyze just a part of model that contains sales data with other parts missing.
We still have connections to missing tables and we’ll look at the model as a black-box assuming that the following is correct for table sale
:
user_has_role_id
refer to id inuser_has_role
(as presented in my previous article in the section “Time component added”) and stores information about the user that created sale record
This model enables us to create sales records with multiple items. Each item is related to a product from our catalog. The moment when we generate a sale can be different from the moment when the sale is paid for. For example, for a cup of coffee these moments will differ in a matter of minutes or hours. If our shop sold telecommunication devices, the difference can be a few days, maybe even months.
Tables
Let’s take a look at table definition and explain the purpose and usage of attributes.
The most important table in the model is product
. It’s used to store details about products we’ll offer to our clients. Products are usually delivered to a client and paid for once, usually at delivery time. Also, products are usually physical objects like cars, phones, packages of sugar, or cups of coffee.
We’ll talk about selling non-physical things (services) in the next articles.
Attributes in the product
table are:
name
– the name of the product in systemprice_per_unit
– cost of product per unit (e.g., 1 cup of coffee costs 1.8 Euro, 1 car costs 17,500 Euro, 1 kg of rice costs 2 Euros)basic_unit
– base unit when we’re selling a product (e.g., piece, kg, liter)tax_percentage
– percent of the price_per_unit to be charged as tax. We must assume that tax percentage wouldn’t be the same for all productslimited
– this field is set to True if we have a limited quantity on stock and False otherwise (e.g., we can order any quantity we need for our store from a distributer)in_stock
– if limited=True this attribute shows how many we have available to sellactive_for_sale
– if this attribute is False than we’re currently not offering that product for sale, otherwise we can offer it to clients
We can get a list of products we can offer to clients with the following query:
SELECT product.id, product.price_per_unit, product.basic_unit, product.limited, product.in_stock FROM product WHERE product.active_for_sale = True AND (product.limited = False OR (product.limited = True and product.in_stock > 0))
The table sale_status
is just a simple dictionary that contains all statuses that a sale can have in the system (e.g., “receipt issued”, “receipt paid”).
The table
sale
is the second most important table in this model. So far, this table has no connection with clients to whom we sold products because, in our coffee shop example, we don’t need to know such information. In part 2, the model will be extended to cover such cases.
Attributes in the table and their meanings are:
time_created
– time when a sale record was generated in the system (e.g., automatic time that the record was created when we generated a sale for coffee in our coffee shop or a manually added time if we want it so)time_paid
– generally we can expect that some sales will be paid within a few days or even a month after creation (e.g., if we deliver software and create a receipt we can wait up to 90 days to get paid in some countries, if everything goes by the law)sale_amount
– original amount intended to be charged to the clientsale_amount_paid
– amount that the client actually paid. It can be null because at the moment we create a receipt we don’t always have this informationtax_amount
– sum of all tax amounts for items on that receiptsale_status_id
– reference tosale_status
tableuser_has_role_id
– reference to user and his role at the moment he or she entered the receipt into system
We can get the issued and paid amount (according to time_created) within a period of time with query like this:
SELECT SUM(sale.sale_amount) AS amount_issued, SUM(sale.sale_amount_paid) AS amount_paid FROM sale WHERE sale.time_created >= @start_time AND sale.time_created <= @end_time;
To get the exact amount paid within a period of time we must use a query like this:
SELECT SUM(sale.sale_amount_paid) AS amount_paid FROM sale WHERE sale.time_paid >= @start_time AND sale.time_paid <= @end_time;
The query below will calculate the issued and paid amount within a period of time with the issue date and payment date checked separately:
SELECT SUM(CASE WHEN sale.time_created >= @start_time AND sale.time_created <= @end_time THEN sale.sale_amount END) AS amount_issued, SUM(CASE WHEN sale.time_paid >= @start_time AND sale.time_paid <= @end_time THEN sale.sale_amount_paid END) AS amount_paid FROM sale
In all examples @start_time
and @end_time
are variables containing the start time and end time of period for which we want to check issued and paid SUM.
The table sale_item
connects products and sales. Of course, we must assume that we’ll have multiple items on one receipt so we need this table to have a many-to-many relationship.
Attributes and theirs meanings are:
quantity_sold
– quantity of product that was sold and is charged on that sale/receipt (e.g., 3 coffees)price_per_unit
– same value asproduct.price_per_unit
at the moment when the sale was created. We have to save it becauseprice_per_unit
in theproduct
table can change over timeprice
– product ofquantity_sold
andprice_per_unit
; a small redundancy that helps us to avoid this calculation in queries. Generally, the sum of all item prices belonging to the same sale should be equal to thesale.sale_amount
tax_amount
– tax amount for that item on receiptsale_id
– id of sale that this item belongs toproduct_id
– product id related to this item
We could now easily make a simple report, how many products/services we sold in period and at what price.
SELECT product.name, SUM(sale_item.quantity_sold) AS quantity, SUM(sale_item.price) AS price FROM sale, sale_item, product WHERE sale.id = sale_item.sale_id AND sale_item.product_id = product.id AND sale.time_created >= @start_time AND sale.time_created <= @end_time GROUP BY product.id