DBMS Basic
Database: A database is a collection of related data which represents some aspect of the real world. A database system is designed to be built and populated with data for a certain task.
Database Management System (DBMS) is software for storing and retrieving users' data while considering appropriate security measures. It consists of a group of programs that manipulate the database. The DBMS accepts the request for data from an application and instructs the operating system to provide the specific data. In large systems, a DBMS helps users and other third-party software to store and retrieve data.
Database management systems were developed to handle the following difficulties of typical File-processing systems supported by conventional operating systems.
- Data redundancy and inconsistency
- Difficulty in accessing data
- Data isolation — multiple files and formats
- Integrity problems
- Atomicity of updates
- Concurrent access by multiple users
- Security problems
ER diagram
- ER diagram or Entity-Relationship diagram is a conceptual model that gives the graphical representation of the logical structure of the database.
- It shows all the constraints and relationships that exist among the different components.
- An ER diagram is mainly composed of the following three components- Entity Sets, Attributes, and Relationship Set.
- Roll_no is a primary key that can identify each entity uniquely.
- Thus, by using a student's roll number, a student can be identified uniquely.
Entity Set:
Strong Entity Set:
- A strong entity set is an entity set that contains sufficient attributes to uniquely identify all its entities.
- In other words, a primary key exists for a strong entity set.
- The primary key of a strong entity set is represented by underlining it.
Weak Entity Set:
- A weak entity set is an entity set that does not contain sufficient attributes to uniquely identify its entities.
- In other words, a primary key does not exist for a weak entity set.
- However, it contains a partial key called a discriminator. o Discriminator can identify a group of entities from the entity set.
- A discriminator is represented by underlining with a dashed line.
Relationship:
- Unary Relationship Set - Unary relationship set is a relationship set where only one entity set participates in a relationship set.
- Binary Relationship Set - Binary relationship set is a relationship set where two entity sets participate in a relationship set.
- Ternary Relationship Set - Ternary relationship set is a relationship set where three entity sets participate in a relationship set.
- N-ary Relationship Set - N-ary relationship set is a relationship set where ‘n’ entity sets participate in a relationship set.
- One-to-One Cardinality - An entity in set A can be associated with at most one entity in set B. An entity in set B can be associated with at most one entity in set A.
- One-to-Many Cardinality - An entity in set A can be associated with any number (zero or more) of entities in set B. An entity in set B can be associated with at most one entity in set A.
- Many-to-One Cardinality - An entity in set A can be associated with at most one entity in set B. An entity in set B can be associated with any number of entities in set A.
- Many-to-Many Cardinality - An entity in set A can be associated with any number (zero or more) of entities in set B. An entity in set B can be associated with any number (zero or more) of entities in set A.
Attributes:
- Simple Attributes - Simple attributes are those attributes that cannot be divided further. Ex. Age
- Composite Attributes - Composite attributes are those attributes that are composed of many other simple attributes. Ex. Name, Address
- Multi-Valued Attributes - Multi-valued attributes are those attributes that can take more than one value for a given entity from an entity set. Ex. Mobile No, Email ID
- Derived Attributes - Derived attributes are those attributes that can be derived from other attributes (s). Ex. Age can be derived from DOB.
- Key Attributes - Key attributes are those attributes that can identify an entity uniquely in an entity set. Ex. Roll No.
Constraints:
- Domain Constraint - Domain constraint defines the domain or set of values for an attribute. It specifies that the value taken by the attribute must be the atomic value from its domain.
- Tuple Uniqueness Constraint - Tuple Uniqueness constraint specifies that all the tuples must be necessarily unique in any relation.
- Key Constraint - All the values of the primary key must be unique. The value of the primary key must not be null.
- Entity Integrity Constraint - Entity integrity constraint specifies that no attribute of the primary key must contain a null value in any relation.
- Referential Integrity Constraint - It specifies that all the values taken by the foreign key must either be available to the primary key or be null.
Closure of an Attribute Set:
Keys:
Types of Keys:
- Super Key - A superkey is a set of attributes that can identify each tuple uniquely in the given relation. A super key may consist of any number of attributes.
- Candidate Key - A set of the minimal attribute(s) that can identify each tuple uniquely in the given relation is called a candidate key.
- Primary Key - A primary key is a candidate key that the database designer selects while designing the database. Primary Keys are unique and NOT NULL.
- Alternate Key - Candidate keys that are left unimplemented or unused after implementing the primary key are called alternate keys.
- Foreign Key - An attribute ‘X’ is called a foreign key to some other attribute ‘Y’ when its values are dependent on the values of attribute “Y’. The relation in which attribute ‘Y’ is present is called the referenced relation. The relation in which attribute ‘xX’ is present is called the referencing relation.
- Composite Key - A primary key composed of multiple attributes and not just a single attribute is called a composite key.
- Unique Key - It is unique for all the records of the table. Once assigned, its value cannot be changed i.e. it is non-updatable. It may have a NULL value.
Functional Dependency:
Types of Functional Dependency:
Trivial Functional Dependencies
- A functional dependency X →Y is said to be trivial if and only if Y € X.
- Thus, if the RHS of a functional dependency is a subset of LHS, then it is called a trivial functional dependency.
Non-Trivial Functional Dependencies
- A functional dependency X → Y is said to be non-trivial if and only if Y € X.
- Thus, if there exists at least one attribute in the RHS of a functional dependency that is not a part of LHS, then it is called a non-trivial functional dependency.
Decomposition of a Relation:
Properties of Decomposition:
- Lossless decomposition ensures
- No information is lost from the original relation during decomposition.
- When the sub relations are joined back, the same relation is obtained that was decomposed.
- Dependency preservation ensures
- None of the functional dependencies that hold on the original relation are lost.
- The sub relations still hold or satisfy the functional dependencies of the original relation.
Types of Decomposition:
- Consider there is a relation R which is decomposed into sub relations R1, R2, ...., Rn.
- This decomposition is called lossless join decomposition when the join of the sub relations results in the same relation R that was decomposed.
- For lossless join decomposition, we always have- R1 ™ R2 ™ R3....... where ™ is a natural join operator
- Consider there is a relation R which is decomposed into sub relations R1, R2, ...., Rn.
- This decomposition is called lossy join decomposition when the join of the sub relations does not result in the same relation R that was decomposed.
- For lossy join decomposition, we always have- R1 ⋈ R2 ⋈ R3....... ⋈ Rn = R where ⋈ is a natural join operator
Normalization:
- Reducing the redundancies
- Ensuring the integrity of data through lossless decomposition
Normal Forms:
- First Normal Form (1NF) - A given relation is called in First Normal Form (1NF) if each cell of the table contains only an atomic value i.e. if the attribute of every tuple is either single-valued or a null value.
- Second Normal Form (2NF) - A given relation is called in Second Normal Form (2NF) if and only if 1) Relation already exists in 1NF. 2) No partial dependency exists in the relation. A → B is called a partial dependency if and only if- Ais a subset of some candidate key and B is a non-prime attribute.
- Third Normal Form (3NF) - A given relation is called in Third Normal Form (3NF) if and only if 1) Relation already exists in 2NF. 2) No transitive dependency exists for non-prime attributes. A → Bis is called a transitive dependency if and only if- Ais is not a super key and B is an anon-prime attribute.
- Boyce-Codd Normal Form - A given relation is called in BCNF if and only if 1) Relation already exists in 3NF. 2) For each non-trivial functional dependency ‘A — B’, A is a super key of the relation.
Transaction:
Operations in Transaction:
- Read Operation - Read(A) instruction will read the value of ‘A’ from the database and will store it in the buffer in the main memory.
- Write Operation — Write(A) will write the updated value of ‘A’ from the buffer to the database.
Transaction States:
- This is the first state in the life cycle of a transaction.
- A transaction is called in an active state as long as its instructions are getting executed.
- All the changes made by the transaction now are stored in the buffer in the main memory.
- After the last instruction of the transaction has been executed, it enters into a partially committed state.
- After entering this state, the transaction is considered to be partially committed.
- It is not considered fully committed because all the changes made by the transaction are still stored in the buffer in the main memory.
- After all the changes made by the transaction have been successfully stored in the database, it enters into a committed state.
- Now, the transaction is considered to be fully committed.
- When a transaction is getting executed in the active state or partially committed state and some failure occurs due to which it becomes impossible to continue the execution, it enters into a failed state.
- After the transaction has failed and entered into a failed state, all the changes made by it have to be undone.
- To undo the changes made by the transaction, it becomes necessary to roll back the transaction.
- After the transaction has rolled back completely, it enters into an aborted state.
- This is the last state in the life cycle of a transaction.
- After entering the committed state or aborted state, the transaction finally enters into a terminated state where its life cycle finally comes to an end.
ACID Properties:
- This property ensures that either the transaction occurs completely or it does not occur at all.
- In other words, it ensures that no transaction occurs partially.
- This property ensures that integrity constraints are maintained.
- In other words, it ensures that the database remains consistent before and after the transaction.
- This property ensures that multiple transactions can occur simultaneously without causing any inconsistency.
- The resultant state of the system after executing all the transactions is the same as the state that would be achieved if the transactions were executed serially one after the other.
- This property ensures that all the changes made by a transaction after its successful execution are written successfully to the disk.
- It also ensures that these changes exist permanently and are never lost even if there occurs a failure of any kind.
Schedules:
Serial Schedules —
- All the transactions execute serially one after the other.
- When one transaction executes, no other transaction is allowed to execute.
- Serial schedules are always- Consistent, Recoverable, Cascadeless, and Strict.
Non-Serial Schedules -
- Multiple transactions execute concurrently.
- Operations of all the transactions are interleaved or mixed with each other.
- Non-serial schedules are not always- Consistent, Recoverable, Cascadeless, and Strict.
Serializability —
- Some non-serial schedules may lead to inconsistency of the database.
- Serializability is a concept that helps to identify which non-serial schedules are correct and will maintain the consistency of the database.
Serializable Schedules —
- If a given non-serial schedule of ‘n’ transactions is equivalent to some serial schedule of ‘n’ transactions, then it is called a serializable schedule.
- Serializable schedules are always- Consistent, Recoverable, Cascadeless, and Strict
Types of Serializability
- Conflict Serializability - If a given non-serial schedule can be converted into a serial schedule by swapping its non-conflicting operations, then it is called a conflict serializable schedule.
- View Serializability - If a given schedule is found to be viewed as equivalent to some serial schedule, then it is called a view serializable schedule.
Non-Serializable Schedules —
- A non-serial schedule that is not serializable is called a non-serializable schedule.
- A non-serializable schedule is not guaranteed to produce the same effect as produced by some serial schedules on any consistent database.
- Non-serializable schedules- may or may not be consistent, may or may not be recoverable.
- A transaction performs a dirty read operation from an uncommitted transaction
- And commits before the transaction from which it has read the value then such a schedule is known as an Irrecoverable Schedule.
- A transaction performs a dirty read operation from an uncommitted transaction
- And its commit operation is delayed till the uncommitted transaction either commits or rollbacks then such a schedule is known as a Recoverable Schedule.
Types of Recoverable Schedules —
- Cascading Schedule - If in a schedule, failure of one transaction causes several other dependent transactions to rollback or abort, then such a schedule is called a Cascading Schedule or Cascading Rollback or Cascading Abort.
- Cascadeless Schedule - If in a schedule, a transaction is not allowed to read a data item until the last transaction that has written it is committed or aborted, then such a schedule is called a Cascadeless Schedule.
- Strict Schedule - If in a schedule, a transaction is neither allowed to read nor write a data item until the last transaction that has been written it is committed or aborted, then such a schedule is called a Strict Schedule
Relational Algebra:
File Structures:
- Primary Index: A primary index is an ordered file, records of fixed length with two fields. The first field is the same as the primary key as a data file and the second field is a pointer to the data block, where the key is available. The average number of block accesses using index `x = log_2, Bi + 1`, where `Bi` = number of index blocks.
- Clustering Index: A clustering index is created on a data file whose records are physically ordered on a non-key field (called the Clustering field).
- Secondary Index: Secondary index provides secondary means of accessing a file for which primary access already exists
B Trees
Properties of B Trees:
B+ Trees
DDL
- CREATE - to create a database and its objects like (table, index, views, store procedure, function, and triggers)
- ALTER - alters the structure of the existing database
- DROP - delete objects from the database
- TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed
- RENAME - rename an object
DML:
- SELECT - retrieve data from a database
- INSERT - insert data into a table
- UPDATE - updates existing data within a table
- DELETE - Delete all records from a database table
- MERGE - UPSERT operation (insert or update)
DCL:
- GRANT - allow users access privileges to the database
- REVOKE - withdraw users access privileges given by using the GRANT command
TCL:
- COMMIT - commits a Transaction
- ROLLBACK - rollback a transaction in case of any error occurs
- SAVEPOINT - to roll back the transaction making points within groups
SQL:
SELECT:
SELECT DISTINCT:
WHERE:
AND, OR and NOT:
- The AND operator displays a record if all the conditions separated by AND are TRUE.
- The OR operator displays a record if any of the conditions separated by OR is TRUE.
ORDER BY:
INSERT INTO:
NULL Value:
UPDATE:
DELETE:
SELECT TOP:
Aggregate Functions:
MIN():
MAX():
COUNT():
AVG():
SUM():
LIKE Operator:
- The percent sign (%) represents zero, one, or multiple characters
- The underscore sign (_) represents one, single character
IN:
BETWEEN:
Joins:
INNER JOIN:
LEFT (OUTER) JOIN:
RIGHT (OUTER) JOIN:
FULL (OUTER) JOIN:
UNION:
- Every SELECT statement within UNION must have the same number of columns
- The columns must also have similar data types
- The columns in every SELECT statement must also be in the same order