Learning Objectives

Discuss the purpose of a database management system
Dicide whether it is better to store data using a database management system or another alternative, such as a spreadsheet
Explain the basic sturcture and components of relational databses
Describe the purpose of foreign keys in a relational database
Discuss the purpose of a relational databse schema and explain it notation
List and describe a number of online databases
Understand what Big Data is and how businesses can use it to make more informed decisions

Much of the information used by businesses is stored in database, and much of the data you will deal with is stored in relational databases. In this chapter, we will help you learn about how relational databases store data. (Appendices C & D you will learn how to construct, populate, and retrieve data from Microsoft Access).

Overview of Relational Databases

Databases and Database Management Systems

Database is an organized collection of data.
Relational database (RD) is the dominant type of database for business applications.
RD can store different types of information, include text, numbers, images, and videos.
A database is managed by a database management system (DBMS).
A DBMS provides the means for creating, maintaining, and using databases. –> Using is most important for business students.
Smaller database-oriented tasks can be handled by a personal DBMS, such as Microsoft Access or OpenOffice Base.
Larger, more complex databases require an enterprise-level DBMS, such as Oracle, MySQL, Microsoft SQL Server, or IBM’s DB2.
Each of these databases has its adv. and disadv., but all are quite capable.
Databases are very important elements of information system.
Most IS today use multitiered architectures that divide processing into different elements.
multitiered architectures refers to the database server, the Web server, and applications, all existing on different computers. –> more chapter 7.
Application requests data from DBMS –> APP manipulate the data -> send updated data to the DBMS –> DBMS updates the actual databases.

Databases versus Spreadsheets: Where to Use a DBMS

Both spreadsheet (e.g. MS Excel) and DBMS store information in tables of rows and columns. However, they have very different purposes, strengths, and weaknesses.
Spreadsheet – very simple data storage tasks, such as personal contact list, home inventory, or shopping list.
Spreadsheet limitations: (Example in Google Spreadsheet)
- Unnecessary data duplication
- Difficulty in data retrieval and search
- Poor data integrity – inaccurate, inconsistent, and out-of-date data
- Difficulty in relating different elements (such as customers and orders)
Most database management systems have methods for keeping data up to date and consistent, and provide a higher quality data than data stored in spreadsheet.
Although spreadsheets are relatively easy to use, in the long run, it is better to take time to plan & design a proper database, especially as the amount of information being stored increases.
Spreadsheets are good for analyzing and displaying information visually.
Databases are good at storing and organizing information.
Fortunately, it is easy to extract data from database and import them to a spreadsheet for analysis.
So you don’t have to make an “either/or” choice.

Relational Databases

A relational database stores data in the form of connected tables.

Tables are made up of records (rows) and fields (columns).
A record is a set of fields that all belongs to the same thing.
The fields represent some characteristic of the thing.

More complicated Relational Database:

Each table stores data about a separate thing. –> allows for flexibility when retrieving data while minimizing redundancy.
By storing each piece of information only once, we ensure that the data are consistent.

The way to uniquely identify each record & connect with different pieces of information:

A natural choice for unique identifier called the primary key. – Each table in a database has a primary key (PK).
The connections are implemented by foreign keys. – The fields that reference a PK in a related table.
These cross-referencing foreign keys make it easy to combine data contained in multiple tables.

Rules govern relational database design:

Each row/column intersection can store at most one item of data.
Normal forms – are the rules to ensure data consistency by eliminating unnecessary redundancy.
The separate structure prevents inconsistencies by storing each item of data only once.

Relationships:

Relational databases store one-to-many relationships – e.g. one customer is related to many orders
Many-to-many relationships are also existed – e.g. student can take different courses, and one course can have more than one student.
Many-to-many relationships require creating a new table that links the two related tables – linking or intersection tables.
One-to-one are more unusual, but it also existed – e.g. customers & accounts.

Database Diagrams

Using actual data to show the structure of a database only works for very small databases. For larger databases, we use database schema diagrams.

Entity-relationship diagrams (ERD)
Database schemas
Both show the data elements and the relationships among them
Table name - top
Primary key (PK) - left
Foreign keys (FK) - beside PK to the left (numbered, could be more than 1)

Online Databases

There are other types of databases that are quite useful; many of them are accessible through the Internet.

These databases serve a different purpose than databases we discussed earlier
Very useful for performing research
Contain pointers to sources of information
Most of these are available through libraries

Examples:

Career Research
- United States Bureau of Labor Statistics (http://www.bls.gov/data/), www.monster.com, www.careerlink.com, www.careerbuilder.com
Market Research
- United States Census Bureau (http://www.census.gov).
Economic Forecasting
- Federal Reserve Economic Data (FRED) (http://research.stlouisfed.org/fred2/)

Big Data

Big Data is something of a buzzword that refers to the vast amount of data (often measured in terabytes or petabytes) that are created and stored that have grown beyond the capabilities of traditional data processing tools and applications.

Big Data Challenges

How should the data be stored?
- Data is unstructured, unlike relational data – sacrifice some levels of data integrity to keep performance at an acceptable level.
- Network-attached storage (NAS)
  - Uses a series of file servers that can easily expand to grow capacity and high-speed connections between them
- Direct-attached storage (DAS)
  - Keeps data more centralized for faster access time for processing; limits scalability of data size
- Hybrid approaches: store data with NAS; access temporary data with DAS
- Data lakes: huge volumes of data in original format with unique identifiers and tags describing what data is about.
  - In most data repositories data structure and requirements are defined up front.
  - In data lake, data structures and requirements are not defined until the data are needed.
How do data from various sources integrate with one another?
- Big data uses E/T/L process (extract, transform, load) – more in Chapter 14
  - Extract or retrieve data involves identifying the sources of the various data that needed
  - Transform is the process where the data are modified and combined. e.g. second-by-second data combined with day-by-day date
  - Loading is a matter of writing the data to a disk in a fashion that can be readily retrieved as necessary
How are data retrieved and disseminated?
- If structured data: SQL
- If unstructured: NoSQL (“Not only SQL”)
  - Greater scalability and efficiency in storing and retrieving data.
  - Dynamic creation of a table per query
  - Allows ad hoc queries – Ad hoc is a word that originally comes from Latin and means “for this” or “for this situation.” In current American English it is used to describe something that has been formed or used for a special and immediate purpose, without previous planning.
- Data Analytics: Interpretation of the data regardless of how it is stored

修心问道

Information System 5- Storing and Organizing Information