Database Principles and Applications Exam Related-2023
---------------------------------------- Exam question types 1. Multiple-choice questions (10 points, 5 questions, 2 points each) 2. Fill-in-the-blank questions (15 points, 5 questions, 3 points each) 3. Short-answer questions (15 points, 3 questions, 5 points each, please be as detailed as possible, with clear logic and clear layers) 4. Design questions (60 points. The first question is 10 points; the second question is 50 points, a total of 10 programming questions, each question is 5 points)*** Question design ideas: Choose corresponding questions from the following key points, then randomly select the above question types and numbers, and then make fine adjustments. ** ---------------------------------------- Exam focus:
1. Introduction
数据库的4个基本概念 P10
- data, database, database management system, database system
数据库的定义 P14
- A database is an organized, large-scale, shared collection of data stored in a computer for a long time
数据库(系统)的基本特征 P14 / P37 - P41
- Shareable by users, with low redundancy, easy expansion, and strong data independence
数据库管理系统的主要功能 P17 - P19
- Data definition function (DDL), data organization, storage and management, data manipulation function (DML), database transaction management and remote management, database establishment and maintenance function, communication function of database and other software systems in the network
数据库系统的构成 P20 - P21
- Databases, database management systems (and their development tools), applications and database administrators
数据库系统的特点 P37-P41
- The data is structured, with high data shareability, low redundancy, easy expansion, and high data independence. The data is managed and controlled uniformly by DBMS.
数据独立性高如何理解 P40
- Refers to an application’s database schema and data storage method that does not depend on other applications or operating system platform-specific implementations.
数据库管理系统如何控制数据共享 P41
- Database access permission control, database metadata control, data encryption, database replication, database partitioning,
概念模型、逻辑模型和物理模型的概念 P47
- The conceptual model is an abstraction and simplification of the real world. It is used to describe the data in the database and the relationship between the data. It is the core of database design and the basis of database modeling;
- A logical model is a logical representation of a conceptual model, which is used to store data in a database and the relationships between data;
- The physical model is the representation of the logical model on a physical storage device, which is used to store and manage data in the database;
数据模型的组成要素 P57
- Entities, attributes, relationships, constraints between entities, data types and precision
数据模型中的数据结构 P58
- The data structure in the data model refers to the way data is stored and represented in the database;
- Tables, views, indexes, primary and foreign keys,
数据模型中的数据操作 P59
- Add, delete and modify, insert operations, query operations, data backup and recovery operations, transaction operations
数据模型中的完整性约束条件 P60
- Refers to a set of rules used to ensure the integrity of data in a database;
- Entity integrity, attribute integrity, referential integrity, user-defined integrity
关系模型谁提出的? P84
- E.F. Codd, a researcher at the San Jose Research Laboratory of IBM Corporation in the United States
类型和值的概念 P95 - P96
- Type: Type is an identifier used to describe data structure and data type. They usually appear in the form of keywords;
- Common types: integer, character and date;
- Value: Value refers to the specific content and value of data, usually data input by the user or generated by the program;
数据库系统的三级模式结构 P99 - P105
- Physical layer: The physical layer refers to the actual physical device where data is stored, ensuring the reliability and security of the data;
- Conceptual layer: The conceptual layer refers to the data model and data structure in the database, which is the core of database design and implementation;
- Logical layer: The logical layer refers to the application program in the database management system and is the direct object of database management and maintenance.
数据库系统的两级映射功能 P107 - P114
2. Relational database
关系数据库的提出者和时间 P4
- E.F.Codd
何为关系 P7
- It is a basic concept of database. It is used to describe the relationship between data in the database. It is the relationship between entities. It describes the connection and interaction between entities. It can be one-to-one, one-to-many, and many-to-many.
域 P9
- A domain is a collection of values of the same data type
笛卡尔积 P10 - P14
- Cartesian product is a set operation on the field
关系和笛卡尔积的关系 P17,P19
- Relations are finite subsets of Cartesian products
码 P18
- Code refers to a coding method used to represent specific data or entities, usually consisting of a set of numbers, letters or other symbols.
三类关系 P20
- One-to-one: One-to-one relationship between two tables;
- One-to-many: One-to-many relationship between two tables;
- Many-to-many: refers to the many-to-many relationship between three tables or above;
基本关系的性质 P22
-
- The columns are homogeneous, that is, the components in each column are the same type of data and come from the same domain.
-
- Different columns can come from the same domain. Each column is called an attribute. Different attributes should be given different attribute names.
-
- The order of the columns does not matter, that is, the order of the columns can be exchanged arbitrarily.
-
- The candidate codes of any two tuples cannot have the same value.
-
- The order of rows does not matter, that is, the order of columns can be exchanged arbitrarily.
-
- The components must take on atomic values, that is, each component must be an indivisible data item.
关系模式的定义 P26
- The description of a relationship is called a relationship description
基本的关系操作 P36
- Query operations, two parts: insertion, deletion and modification operations
- Query operations can be divided into selection, projection, connection, division, union, difference, intersection, and Cartesian product
SQL的4大功能 P39
- Data definition, data query, data manipulation, data control
三类完整性约束 P41
- Entity integrity, referential integrity, user-defined integrity
实体完整性规则 P43 - P44
- If attribute (referring to one or a group of attributes) A is the main attribute of basic relationship R, then A cannot take a null value
外码 P50
- Suppose F is an attribute or a set of attributes of the basic relation R, but not the key of the relation R, and KS is the primary key of S. If F corresponds to KS, then F is said to be the foreign key of R, and the basic relationship R is called the reference relationship, and the basic relationship S is the referenced relationship or target relationship.
参照完整性 P55 - P56
- Also known as referential integrity, it is a constraint between two related tables.
用户定义的完整性 P60
- User-defined integrity is a constraint for a specific relational database, which reflects the semantic requirements that the data involved in a specific application must meet.
关系代数运算符 P64
- Union, difference, intersection, Cartesian product and selection, projection, connection, division operation
传统的集合运算 P66 - P74
- Union, difference, intersection, Cartesian product
专门的关系运算(除运算除外) P85 - P103
- Selection, projection, concatenation, division operations
关系运算符的运用 P110
- Learn by yourself
3. SQL
SQL的中文和英文 1P51
- Structured Query Language SQL
SQL的功能和特点 1P55 - 1P61
- Comprehensive and unified, highly non-procedural, set-oriented operation mode, providing multiple usage methods with the same syntax structure,
- Simple language, easy to learn and use
- Data query, data manipulation, data definition, data control
内模式/模式/外模式 与 存储文件/基本表/视图 之间概念关联 1P64 - 1P67
- External schema: also called sub-schema or user schema, is a description of the logical structure and characteristics of local data that users can see and use;
- Schema: Also known as logical schema, it is a description of the logical structure and characteristics of all data in the database, and is a common data view of all data.
- Internal schema: also called storage schema, is a description of the physical structure and storage method of the database, and is the way data is represented inside the database.
- The data is provided to the user according to the description of the external mode and stored on the hard disk according to the description of the internal mode. The mode is between the external and internal modes and involves neither external access nor internal storage, thereby playing an isolation role and conducive to the independence of the data. The internal mode relies on the global logical structure, but can be independent of specific storage devices.
- Basic table: an independent table;
- View: a virtual table, which is a table derived from one or several basic tables;
- The database only stores the definition of the view and not the data corresponding to the view. These data are still stored in the basic table of the derived view. When the data in the basic table changes, the data queried from the view also changes.
数据定义的所有代码 1P74 - 1P99
- Create: create schema; create table; create view; create index;
- Delete: drop schema; drop table; drop view; drop index;
- Modify: alter table; alter index
数据字典是什么 1P101
- The data dictionary is a set of system tables within the relational database management system. It records all definition information in the database, including relational schema definitions, view definitions, index definitions, integrity constraint definitions, operating permissions of various users on the database, statistical information, etc.
单表查询 1P106 - 1P137
- Single table query refers to a query involving only one table
连接查询 2P8 - 2P24
- If a query involves more than two tables at the same time, it is called a join query.
嵌套查询(EXISTS除外) 2P26 - 2P48
- A SELECT-FROM-WHERE statement is called a query block, and a query that nests a query block in the conditions of the WHERE clause or HAVING phrase of another query block is called a nested query.
- If the query conditions of a subquery depend on the parent query, it is called a related subquery, and the entire query statement is called a related nested query.
- Otherwise, it is called an irrelevant subquery.
集合查询 2P58 - 2P63
- Union operation: UNION, intersection operation: INTERSECT, difference operation: EXCEPT
基于派生表的查询 2P65 - 2P66
- Subqueries can not only appear in the WHERE clause, but also in the FROM clause. At this time, the temporary derived table generated by the subquery becomes the query object of the main query.
插入数据 3P6 - 3P13
- INSERT INTO (table name) VALUES (values)
- INSERT INTO (table name) subquery
修改数据 3P15 - 3P20
- UPDATE (table name) SET (column name) WHERE (condition)
删除数据 3P22 - 3P27
- DELETE FROM (table name) WHERE (condition)
Drop和DELETE的区别
- Drop is generally used to delete overall data, such as tables, schemas, indexes, views and integrity restrictions.
- DELETE is used to delete local data, such as a tuple in a table
空值相关 3P29 - 3P37
- The null value is a very special value that contains uncertainty.
- Attribute definitions (domain definitions) with NOT NULL constraints cannot take null values, attributes with UNIQUE restrictions cannot take null values, and code attributes cannot take null values.
视图的特点 3P39
- View: a virtual table, which is a table derived from one or several basic tables;
- The database only stores the definition of the view and not the data corresponding to the view. These data are still stored in the basic table of the derived view. When the data in the basic table changes, the data queried from the view also changes.
-
建立视图 3P42 - 3P52
- CREATE VIEW (view name) AS (subquery) [WITH CHECK OPTION]
删除视图 3P53
- DROP VIEW (view name) [CASCADE]
- CASCADE can delete other views exported by this view.
视图更新的限制 3P66 - 3P68
-
- If the view is exported from more than two basic tables, this view is not allowed to be updated;
-
- If the fields of the view come from field expressions or constants, INSERT and UPDATE operations are not allowed on this view, but DELETE operations are allowed;
-
- If the fields of the view come from aggregate functions, this view is not allowed to be updated;
-
- If the view definition contains a GROUP BY statement, this view is not allowed to be updated;
-
- If the view definition contains the DISTINCT phrase, this view is not allowed to be updated;
-
- If there is a nested query in the view definition, and the table involved in the FROM clause of the inner query is also the basic table for exporting the view, this view is not allowed to be updated;
-
- A view defined on a view that is not allowed to be updated is not allowed to be updated.
视图的作用 P72
-
- Simplify user operations;
-
- Enable users to view the same data from multiple perspectives;
-
- Provides a certain degree of logical independence for reconstructing the database;
-
- Able to provide security protection for confidential data;
-
- Appropriate use of views can express queries more clearly.
4. Database security
数据库安全性问题的提出 P4
- Database security refers to protecting the database from data leakage, alteration or destruction caused by illegal use
数据库的不安全因素 P8 - P11
-
- Malicious access and damage to the database by unauthorized users;
-
- Important or sensitive data in the database is leaked;
-
- The vulnerability of the security environment;
数据库安全性控制的流程 P35
-
- User identification and authentication (static password authentication, dynamic password authentication, biometric authentication, smart card authentication);
-
- Access control (autonomous access control, mandatory access control);
-
- Authorization: grant and revoke (grant, revoke, permission to create database schema);
-
- Database roles (creation of roles, authorization of roles, granting a role to other roles or users, withdrawal of role permissions);
用户身份鉴定的概念和方法 P38 - P43
- Static password authentication:
- Dynamic password authentication:
- Biometric authentication:
- Smart card authentication:
存取控制 P46
- The access control mechanism mainly includes two parts: defining user permissions and checking legal permissions;
常用的两种存取控制方法 P47 - P48
- Autonomous access control: Users have different access permissions for different database objects, and different users also have different permissions for the same object. Users can also delegate their access permissions to other users, so autonomous access control is very flexible;
- Mandatory access control: Each database object is marked with a certain confidentiality level, and each is subsequently granted a certain level of license. For any object, only users with legal licenses can access it, so mandatory access control is relatively strict.
-
权限授予的SQL代码 P54 - P68
权限回收的SQL代码 P69 - P73
数据库角色的概念 P76
- A database role is a named set of permissions related to database operations. A role is a collection of permissions.
数据库角色的SQL代码 P77 - P83
视图机制在安全性控制中的作用
- The view mechanism indirectly implements the user permission definition that supports access predicates. Through the view mechanism, the data to be kept confidential is hidden from users who do not have access rights, thereby automatically providing a certain degree of security protection for the data.
5. Database integrity
数据库完整性的概念 P4
- Database integrity refers to the correctness and compatibility of data;
- The correctness of the data means that the data conforms to the semantics of the real world and reflects the current actual situation;
- Data compatibility means that the data of the same object in the database in different relational tables is logical
数据库完整性和安全性的异同 P5
- Data integrity is to prevent the existence of data that does not conform to semantics in the database, that is, to prevent the existence of incorrect data in the database;
- Data security is to protect the database from malicious damage and illegal access.
数据库管理系统如何维护数据库的完整性 P6 - P10
-
- Provide a mechanism for defining integrity constraints;
-
- Provide methods for integrity checking;
-
- Handle breach of contract;
实体完整性定义的代码 P13 - P16
实体完整性的检查和违约处理 P18
- Check whether the main code value is unique. If not, reject insertion or modification;
- Check whether each attribute of the main code is empty. As long as one of the attributes is empty, the insertion or modification will be refused.
参照完整性定义的代码 P24 - P25
- Referential integrity uses the FOREIGN KEY phrase in CREATE TABLE to define which columns are foreign keys, and the REFERENCES phrase is used to indicate which tables’ primary keys these foreign keys refer to.
参照完整性的检查和违约处理 P27 - P35
- Reject (NO ACTION) execution: The change operation is not allowed to be executed. This policy is generally set as the default policy;
- Cascade (CASCADE) operation: When deleting or modifying a tuple in the referenced table causes inconsistency with the reference table, delete or modify all tuples in the reference table that cause inconsistency.
- Set to null: When deleting or modifying a tuple in the referenced table causes inconsistency, set the corresponding attributes of all tuples in the reference table that cause inconsistency to null.
用户定义完整性 P40 - P50
- User-defined integrity is the semantic requirement that data for a specific application must meet.
- Constraints on attributes p163
- Definition of constraints on attributes
- Checking and violation handling of constraints on attributes
- Constraints on tuples p163
- Definition of constraints on tuples
- Checking and violating constraints on tuples
- Integrity Constraint Named SQL P53 - P57
触发器的定义、激活与删除 P68 - P78
- A trigger is a special event-driven process defined by the user on a relational table.
- Definition of trigger P168
- The execution of a trigger is activated by a triggering event and is automatically executed by the database server.
-
- Execute the BEFORE trigger on the table
- SQL statement to activate trigger
- Execute the AFTER trigger on the table
6. Relational Data Theory
- Data dependency is a constraint relationship between the internal attributes of a relationship. This constraint relationship is a correlation between data reflected by the equality of values between attributes. It is the abstraction of the interconnections between real-world attributes, the inherent nature of data, and the embodiment of semantics.
现实世界已知语义得到的函数依赖 P13 - P14
- Functional dependency is a ubiquitous type of data dependency
- Improper functional dependencies can cause many problems:
-
- Data redundancy
-
- Update exception
-
- Insertion exception
-
- Delete exceptions
函数依赖 P22 - P26
- Functional dependency, like other data dependencies, is a semantic category concept, and a functional dependency can only be determined based on semantics.
-
平凡函数依赖与非平凡函数依赖 P27 - P28 -
完全函数依赖与部分函数依赖 P29 - P30 -
传递函数依赖 P31
1NF、2NF、3NF、BCNF的定义以及判断 P41 - P56,P81
(1NF, Normal Form) If the domain value of each attribute A in a relational schema R is atomic, that is, the attribute value cannot be subdivided, then the relational schema R belongs to the first normal form, abbreviated as R ∈ 1NF. If every relational schema in database schema R is 1NF, database schema R∈1NF.This is very easy to understand. Basically, as long as the question is given, it is a relationship model that satisfies the first paradigm. Second Normal Form (2NF) definition
2NF means that all non-primary attributes in our relational schema are completely dependent on each key. What does it mean? The most important thing here is to understand what 非主属性 is, what is 主属性, what is 键
It is recommended to read the summary of another blog to understand the meaning of the key: https://blog.csdn.net/fjxcsdn/article/details/76549751
Read the blog I wrote before to understand the meaning of complete dependence and partial dependence: Database Basics (3) Functional Dependency - Trivial Dependency, Complete Dependence, Partial Dependence, Transitive Dependence
How should you understand non-primary attributes and primary attributes? For example, suppose the relation schema is R={A,B,C,D} and the candidate key of R is AD. Then A and D are primary attributes, while B and C are non-primary attributes. 包含在候选键里的属性就是主属性!
Now that we understand the meaning of keys and dependencies, we can take a closer look at what 2NF is.
举例子1:
So let’s first look at who the keys are in this R relational model. UseGiven
R = {A, B, C}and the functional dependency setF = {B → C, AC → B}, determine whether the relation schema is in 2NF
数据库基础(4)中我们学习的属性闭包算法 to find the candidate key of R: AC
So the second step is to see if there are any non-primary attributes that partially depend on the main attribute AC (our non-primary attribute here is B)
Obviously no part of B depends on A or C, so R is 2NF
举例子2
Like the first example, we first see who the keys in R are Using the attribute closure algorithm, the candidate key of R is found to be: AD (AD+ = ABCD), so C and B are both non-primary attributes, and A and D are primary attributes. Therefore, it is obvious that we find that A in AD can independently determine C (A->C), so C partially depends on AC, there are non-primary attributes that partially depend on primary attributes, and R is not 2NF Third Normal Form (3NF) definitionGiven
R = {A, B, C, D}and the functional dependency setF = {A → C, AD → B}, determine whether the relation schema is in 2NF
The meaning of the third normal form is that the key of 没有非主属性传递依赖 in R, R is 3NF
There is also a condition implicit here, that is, if 主属性传递依赖 in R is the key of R, then R also satisfies 3NF.
Be careful to distinguish between primary attribute transitive dependencies and non-primary attribute transitive dependencies
举例子1
Let’s make a judgment. The first step is to find the key! We found that AC is a candidate key (AC+=ABC, a candidate key obtained through the attribute closure algorithm) At the same time, since AC->B, B->C, C transitively depends on AC. So is R 3NF? R is of course 3NF, because the candidate key of R is AC, so C is the primary attribute, so here isGiven
R(A, B, C)and the functional dependency setF = {B → C, AC → B}, determine whether the relation schema is in 3NF
主属性C transitively dependent on the key AC, R is 3NF
Notice! 只有当非主属性传递依赖于R的时候,R才不是3NF
举例子2
The first step is to find the key! It is found that AB is a candidate key, but since AB -> C, C -> D, D transitively depends on AB So is R 3NF? Of course R is not 3NF, because D is a non-primary attribute, so here the non-primary attribute D transitively depends on the primary attribute AB, so it does not meet the definition of 3NF.Given
R(A, B, C, D)and the functional dependency setF = {AB → C, C → D}, determine whether the relation schema is in 3NF
Boyce-Codd范式(BCNF)
definition
BCNF is the most advanced, it refers to 没有任何属性传递依赖于R中的任何一个键 in R. So think of Example 1 in our 3NF above. Although it satisfies 3NF, it is not BCNF because the main attribute transitively depends on the key.
Look at the examples and determine which normal form it is.
Obviously, both are BCNF, because there is no transitive dependency on any attribute.
Finally, a relationship model that satisfies BCNF must also satisfy 3NF; similarly, a relationship model that satisfies 3NF must also satisfy 2NF.
7. Database design
数据库设计的概念 1P6
数据库高效率的运行环境 1P7
数据库设计的特点 1P10 - 1P12
数据库设计的基本步骤 1P18
- Requirements design
- conceptual structural design
- Logical structure design
- physical structure design
- Database implementation
- Database operation and maintenance
数据库设计各个阶段的数据设计描述 1P25
- P 209
需求分析的任务 1P36 - 1P38
- information request
- processing request
- Security and integrity requirements
需求分析的方法 1P40
- Investigate the status of the organization
- Investigate the business activities of various departments
- Based on familiarity with business activities, assist users to clarify various requirements for the new system, including information requirements, processing requirements, security and integrity requirements
- Determine the boundaries of the new system
- Follow-up homework
- Hold an investigation meeting
- Please introduce someone
- ask
- Please fill in the design questionnaire
- View records
E-R模型的提出者 1P58
- P.P.S.Chen
实体之间联系的类型 1P59 - 1P67
- Relationship between two entity types: one-to-one, one-to-many, many-to-many
- The relationship between two or more entity types
- Relationships within a single entity type
E-R图 1P68 - 1P74
实体与属性的划分原则 1P77
E-R图的集成 1P87 - 1P94
- Merge and resolve conflicts between E-R diagrams
- Modify and refactor to eliminate unnecessary redundancy
- Conflict types: 1. Attribute conflict, 2. Naming conflict, 3. Structure conflict
E-R图向关系模型的转换 2P7 - 2P13
- An entity type is converted into a relationship schema. The attributes of the relationship are the attributes of the entity, and the code of the relationship is the code of the entity.
优化数据模型的方法 2P19
- Determine data dependencies
- Minimize data dependencies between various relational schemas
- Analyze the relationship patterns one by one according to the theory of data dependence
- Based on the processing requirements obtained in the requirements analysis stage, analyze whether these patterns are suitable for such an application environment and determine whether certain patterns need to be merged or decomposed.
- Perform necessary decomposition of the relational model to improve data operation efficiency and storage space utilization. Two commonly used decomposition methods are horizontal decomposition and vertical decomposition.
数据库物理设计的步骤 2P32
- The process of selecting a physical structure that best suits application requirements for a given logical data model is the physical design of the database.
-
- Determine the physical structure of the database, which in relational databases mainly refers to the access method and storage structure.
- Evaluate physical structures with emphasis on time and space efficiency
- Common access methods of database management systems 2P39
- Index method and clustering method, B+ tree index and hash index P236
影响数据存放位置和存储结构的因素 2P53
- Access time, storage space utilization, maintenance cost, hardware environment, application requirements
8. Database Programming
python关联MySQL的包(Package)是什么 P9
- PyMySQL
游标的好处 P22
- A cursor is a data buffer opened by the system for users to store the execution results of SQL statements. Each cursor area has a name.
- Users can obtain records one by one through the cursor and assign them to the main variable, which is then handed over to the main language for further processing.
-
After using the cursor function, we can save the results first, and then we can do our own programming at will to get the result set we ultimately want.
tornado web服务器的四大组件 P62-P63
- ioloop instance, app instance, handler class, routing table
GET和POST的区别 P65
- The methods of passing parameters are different: Get includes the parameters in the URL, and Post passes the parameters through the request body;
- URL visibility is different: the parameters of the get request can be seen directly, but the parameter URL of the post request is not visible;
- The size of the data passed is different: the length of the data passed in the get request is limited by the URL size, while the post request has no length limit
- The impact of going back on the page is different: going back on get will not have any impact, but going back on post will resubmit;
- Different encoding methods: get requests only support URL encoding, while post requests support multiple encoding methods;
SQL 变量的定义 P73 - P76
流程控制 P80 - P97
- Flow control statements can be used in stored procedures and custom functions to control the flow of the program.
匿名块与命名块 P100
- Anonymous block: It must be compiled every time it is executed. It cannot be stored in the database and cannot be called in other procedural SQL blocks;
- Named block: After compilation, it is saved in the database and is called a persistent storage module. It can be called repeatedly and runs faster. Procedures and functions are named blocks.
存储过程的定义 P102
- A stored procedure is a database object that stores complex programs in the database so that they can be called by external programs.
存储过程的优点 P103
- High efficiency when traveling far away
- Reduced communication volume between client and server
- Facilitate implementation of corporate rules
存储过程的缺点 P104
- Stored procedures are often customized to a specific database because of the different supported programming languages. When switching to a database system from another manufacturer, the original stored procedures need to be rewritten.
- Performance tuning and writing of stored procedures, limited by various database systems
存储过程的SQL代码 P107 - P117
函数与存储过程的异同 P119
- Same: both are persistent storage modules
- Exception: The function must specify the return type
函数的SQL代码 P120 - P122
9. MySQL optimization principles
MySQL各组件之间协同工作的逻辑架构 P6 - P9
- 9-MySQL optimization principles.pdf
MySQL的查询过程 P12,P38
-
- Client/server communication protocol
-
- Query cache
-
- Grammar parsing and preprocessing
-
- Query optimization
-
- Query execution engine
-
- Return the results to the client
10. Database recovery technology
事务的概念 P7
- A transaction is a user-defined database operating system sequence. These operations are either done or not done at all. It is an indivisible unit of work;
- Transaction is the basic unit of database recovery and concurrency control
事务的显示控制 P8 - P9
- 10-Database recovery technology.pdf
事务的ACID特性 P13 - P19
- Atomicity: A transaction is a logical unit of work in the database, either all are done or none are done;
- Consistency: The result of transaction execution must be to change the database from one consistency state to another consistency state;
- Isolation: The execution of a transaction cannot be interfered with by other transactions;
- Continuity: Continuity is also called permanence. Once a transaction is committed, its changes to the data in the database should be permanent, and subsequent other operations or failures should not have any impact on its execution results.
数据库恢复的背景和原因 P21
- Failures are inevitable:
-
- Computer hardware failure;
- software errors;
- operator error
- malicious destruction
- Impact of failure:
-
- Abnormal interruption of running transactions affects the correctness of data in the database
- Corruption of the database, complete or partial loss of data
数据库恢复的概念 P22
- Database recovery: Database management must have the function of restoring the database from an error state to a known correct state (also known as a consistent state or a complete state). This is the database recovery management system’s response to failures; the recovery subsystem is an important part of the database management system; recovery technology is an important indicator to measure the quality of the system.
故障的种类 P24 - P38
- Failure within the accident
- System failure
- media failure
- computer virus
恢复机制的两个关键问题 P40
- How to create redundant data (1. Data dump 2. Register log file 3. Both are often used together)
- How to use these redundant data to implement database recovery
数据转储的概念 P43 - P44
- Data dump is the basic technique used in database recovery
- Dumping is a process in which the database administrator periodically copies the entire database to tape, disk, or other storage media.
- These spare data texts are called backup copies or fallback copies
数据转储的方法 P47 - P52
- Static dump and dynamic dump
- Mass dump and incremental dump
- Summary of dump methods (dynamic mass dump, static mass dump, dynamic incremental dump, static incremental dump)
日志文件的概念 P55
- Log files are files used to record transactions and database update operations.
以记录为单位的日志文件内容 P56 - P57
- Start tag of each transaction
- End mark of each transaction
- All update operations for each transaction
- Here the start mark of each transaction, the end mark of each transaction and each update operation are treated as a log record in the log file
- Transaction ID
- Operation type
- Operation object
- The old value of the data before updating
- The new value of the updated data
-
以数据块为单位的日志文件内容 P58
- transaction mark
- updated data block
日志文件的作用 P59 - P61
- Perform transaction failure recovery
- Perform system failure recovery
- Assists standby copies with media failure recovery
-
登记日志文件的原则 P63 - The order of registration is strictly based on the time order of concurrent transaction execution.
- The log file must be written first and then the database.
- (Write log file operation: write the log record representing this modification to the log file;
- Write database operation: write the modifications to the data into the database;
先写日志文件的原因 P64
- Writing to the database and writing to the log are two different operations
- A failure may occur between these two operations
- If the database modification is written first and the modification is not registered in the log file, the modification cannot be restored later.
- If the log is written first, but the database is not modified, restoring based on the log file will only perform one more unnecessary UNDO operation, which will not affect the correctness of the database.
- Therefore, for safety, you must write the log file first, that is, first write the log records to the log file, and then write the modifications to the database.
事务故障的恢复方法 P67
- Modifications that have been made to the database by this transaction are undoed (UNDO) by the recovery subsystem using the log file.
事务故障恢复的步骤 P68 - P69
- Scan the file log in reverse direction (that is, scan the log file from the end forward) to find the update operation of the transaction
- Performs the reverse operation of the transaction’s update operation. That is, writing the “pre-update value” in the log record to the database
- (Insertion operation: If the “value before update” is empty, it is equivalent to a delete operation;
- Delete operation: If the “updated value” is empty, it is equivalent to an insertion operation;
- If it is a modification operation, it is equivalent to replacing the modified value with the pre-modified value)
-
系统故障的恢复方法 P71- Undo (undo) the unfinished transaction when the failure occurred;
- Redo (redo) completed transactions
-
系统故障的恢复步骤 P72 - P73- Scan the log file forward (that is, scan the log file from the beginning) to find transactions that were committed before the failure occurred
- Undo (UNDO) processing of undo (UNDO) queue transactions
- Redo (REDO) processing of redo (REDO) queue transactions
介质故障的恢复方法 P75
- Reinstall database
- Redo completed transactions
-
介质故障的恢复步骤 P76 - P77 - Load the latest standby database copy (the dump copy closest to the time of the failure) to restore the database to the consistency state of the latest dump
- Load a copy of the relevant log file (a copy of the log file at the end of the dump) and redo the completed transaction
-
为什么要具有检查点的恢复技术 P81 - Searching the entire log will take a lot of time
- Many transactions that need to be redone have actually written the results of their update operations to the database, but the recovery subsystem has to re-perform these operations, which wastes a lot of time.
