Världen Databas Databassystem Databashanteringssystem (DBMS) Anv 4 Updates Anv Queries 3 Svar Användare Updates Queries 2 Svar Modell Uppdatera Updates Queries Frågor-Svar Bearbetning av frågor och uppdateringar Tillgång till lagrad data Application schema naming & structure information Database catalogue/dd with metadata Stored database with application data SQL query Parsing & validating Query optimizer SELECT ORDER_ID, ENTRY_DATE FROM ORDER WHERE ENTRY_DATE > 200-08-30 Intermediate form of query σ ENTRY_DATE>200-08-30 π ORDER_ID,ENTRY_DATE ORDER Execution plan (Access plan) π ORDER_ID,ENTRY_DATE Query code generator σ ENTRY_DATE>200-08-30 Fysisk databas Application data Code to execute the query Runtime DB processor Query result ORDER TDDB48 Lecture : Introduction 2 TDDB48 Lecture : Introduction 2 Basic Definitions Database: A collection of related data. Data: Known facts that can be recorded and have an implicit meaning. Mini-world: Some part of the real world about which data is stored in a database. For example, student grades and transcripts at a university. Database Management (DBMS): A software package/ system to facilitate the creation and maintenance of a computerized database. Database : The DBMS software together with the data itself. Sometimes, the applications are also included. Typical DBMS Functionality Define a database: in terms of data types, structures and constraints Construct or Load the Database on a secondary storage medium Manipulating the database: querying, generating reports, insertions, deletions and modifications to its content Concurrent Processing and Sharing by a set of users and programs yet, keeping all data valid and consistent 3 TDDB48 Lecture : Introduction 3 4 TDDB48 Lecture : Introduction 4 Typical DBMS Functionality Other features: Protection or Security measures to prevent unauthorized access Active processing to take internal actions on data Presentation and Visualization of data 5 TDDB48 Lecture : Introduction 5 6
Information retrieval (IR) on the Internet. Locate document collections 2. Formulate query 3. Judge relevance. Traditional IR research and development has been concentrated on 2 and 3. The Internet (the web) requires too. IRS DBMS AI IRS, DBMS, and AI Data object document tabell logic expressions Basic function retrieval (probabilistic) retrieval (deterministic) inference Database size small to very large small to very large usually small 7 TDDB48 Lecture : Introduction 7 8 TDDB48 Lecture : Introduction 8 DBMS deterministic SQL> select * from kund where nummer = 7; meets the exact information need of the user Cf.: search for memory stick in discussion forums 9 TDDB48 Lecture : Introduction 9 0 LiU: Disciplinary actions Any kind of academic dishonesty, such as cheating, plagiarism, use of unauthorized assistance, fraud and failure to comply with University examination rules, may result in the filing of a complaint to the University Disciplinary Committee. The potential penalties include expulsion, suspension, and revocation of previously earned grade or degree. LiU Rules and regulations Lab policy You are expected to do the lab assignments by yourself. Merely copying others solutions will not be tolerated, even if you make cosmetic changes to the code/solution. If we suspect that this, or any other form of cheating, has happened we will report it to the disciplinary board of the university. Be prepared to be asked questions by your laboration assistant about detailed and specific code and also inquiries about why you have selected a specific solution. This applies to all lab group members. If you have problems meeting a deadline it is much better to talk to the instructor about it than to cheat. (It is a shame that we have to say these things. They should be obvious.) TDDB48 Lecture : Introduction 2 TDDB48 Lecture : Introduction 2 2
Historical Development of Database Technology Early Database Applications: The Hierarchical and Network Models were introduced in mid 960s and dominated during the seventies. A bulk of the worldwide database processing still occurs using these models. Relational Model based s: The model that was originally introduced in 970 was heavily researched and experimented with in IBM and the universities. Relational DBMS Products emerged in the 980s. 3 TDDB48 Lecture : Introduction 3 4 TDDB48 Lecture : Introduction 4 Historical Development of Database Technology Object-oriented applications: OODBMSs were introduced in late 980s and early 990s to cater to the need of complex data processing in CAD and other applications. Their use has not taken off much. Data on the Web and E-commerce Applications: Web contains data in HTML (Hypertext markup language) with links among pages. This has given rise to a new set of applications and E-commerce is using new standards like XML (extended Markup Language). 5 TDDB48 Lecture : Introduction 5 6 TDDB48 Lecture : Introduction 6 Varför databashanterare? Exempel, kundregister i C: struct kund { int nummer; char namn[50 + ]; char adress[50 + ]; struct kund* nextp; }; Varför databashanterare: Enkelt create table kund (nummer integer, namn char(50), adress char(50)); select namn, adress from kund where nummer = 7; 7 TDDB48 Lecture : Introduction 7 8 TDDB48 Lecture : Introduction 8 3
Varför databashanterare: Kraftfullt select * from kund where namn like 'S%' order by adress; select adress, count(*) from kund where namn = 'Anders' group by adress; Varför databashanterare: Flexibelt select namn from kund where adress = 'Vägen 8' and namn like 'S%'; alter table kund add telefon char(0); create index foo on kund(namn); 9 TDDB48 Lecture : Introduction 9 20 TDDB48 Lecture : Introduction 20 Mer: Varför r databashanterare? Flera användare ndare samtidigt Dataoberoende Flera användare samtidigt Persistens vid fel Datamodellering. Kalle Uppdaterar lönerna för 000 anställda Pelle Summerar lönekostnaden Databas 2 TDDB48 Lecture : Introduction 2 22 TDDB48 Lecture : Introduction 22 Kalle Persistens vid fel Uppdaterar lönerna för 000 anställda Strömavbrott Databas DBMS: Sammanfattning av fördelar Kontroll av redundant information Dataåtkomst Persistent datalagring Tillåter frågor och analys Tillåter flera användare Representera flera användare Effektiv lagring av data Integritetsvillkor Backup och återställning 23 TDDB48 Lecture : Introduction 23 24 TDDB48 Lecture : Introduction 24 4
25 Categories of data models Conceptual (high-level, semantic) Implementation (representational) Physical (low-level, internal) The data model implies the schema, which implies what type of data that can be stored TDDB48 Lecture : Introduction 25 History of Data Models Network Model: the first one to be implemented by Honeywell in 964-65 (IDS ). Adopted heavily due to the support by CODASYL (CODASYL - DBTG report of 97). Later implemented in a large variety of systems - IDMS (Cullinet - now CA), DMS 00 (Unisys), IMAGE (H.P.), VAX -DBMS (Digital Equipment Corp.). Hierarchical Data Model: implemented in a 26 TDDB48 Lecture : Introduction 26 joint effort by IBM and North American History of Data Models Object-oriented Data Model(s): several models have been proposed for implementing in a database system since 980s. One set comprises models of persistent O- O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE). Additionally, systems like O 2, ORION (at MCC - then ITASCA), IRIS (at H.P.- used in Open OODB). Object-Relational Models: Started with Informix Universal Server in 990s. Exemplified in the latest versions of Oracle-0i, DB2, and SQL Server etc. systems. XML-based Models in 2000s Network Model ADVANTAGES: Able to model complex relationships and represents semantics of add/delete on the relationships. Can handle most situations for modeling using record types and relationship types. Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET, etc. Programmers can do optimal navigation through the database. DISADVANTAGES: Navigational and procedural nature of processing Database contains a complex array of pointers that thread through a set of records. Little scope for automated query optimization 27 TDDB48 Lecture : Introduction 27 28 TDDB48 Lecture : Introduction 28 Hierarchical Model ADVANTAGES: Simple to construct and operate on Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT, etc. DISADVANTAGES: Navigational and procedural nature of processing Database is visualized as a linear arrangement of records Little scope for query optimization Relationsmodellen Data lagras som tabeller Teoretisk modell Standardiserat frågespråk I början var dock dessa databaser långsamma de hierarkiska databaserna snabbare. 29 TDDB48 Lecture : Introduction 29 30 TDDB48 Lecture : Introduction 30 5
Databasanvändare ndare och roller Databasadministratör Databasdesigner Slutanvändare Applikationsprogrammerare DBMS-designer Verktygsutvecklare Operatör, service Tre-schema schema-arkitekturenarkitekturen Olika schema olika nivåer Dataoberoende mellan nivåerna Vy Vy Konceptuell nivå Fysisk nivå Vy 3 TDDB48 Lecture : Introduction 3 32 TDDB48 Lecture : Introduction 32 Databasspråk Datamodeller idag Data Definition Language - DDL Specificerar det konceptuella schemat Data Modification Language - DML Lagra och hämta data Data Control Language - DCL Kontrollerar databasexekveringen Host language Tillägg till ett programmeringsspråk Relationsdatabaser vanligast Fortfarande finns hierarkiska databaser (främst inom flygindustrin) Objekt-orienterade och objekt-relationella databaser är en liten del XML-databaser nytt. 33 TDDB48 Lecture : Introduction 33 34 TDDB48 Lecture : Introduction 34 Världen Databassystem Anv 4 Updates Anv Queries 3 Svar Användare Updates Queries 2 Svar Modell Uppdatera Updates Queries Frågor-Svar ER-modellering Databas Databashanteringssystem (DBMS) Bearbetning av frågor och uppdateringar Tillgång till lagrad data Personnummer Adress Namn E-post Fysisk databas Telefon Ålder 35 TDDB48 Lecture : Introduction 35 36 TDDB48 Lecture : Introduction 36 6
ER-diagram Ett strukturerat sätt att modellera data Oberoende av databastyp Dokumentation av din datastruktur. Symboler i ER-diagram Attribut Sammansatta attribut FNamn ENamn AnstÅr Namn E-post Kandidatnycklar PNummer Entitet Härlett attribut Age Flervärt attribut Free 37 TDDB48 Lecture : Introduction 37 38 TDDB48 Lecture : Introduction 38 Relationer Totalt deltagande Varje avdelning måste ha minst en anställd a arbetar avdelningar 39 TDDB48 Lecture : Introduction 39 40 TDDB48 Lecture : Introduction 40 Totalt deltagande, forts. Kardinalitet: : Restriktioner p antal Varje anställd måste arbeta en avdelning Varje avdelning har exakt en anställd och varje anställd jobbar exakt en avdelning 4 TDDB48 Lecture : Introduction 4 42 TDDB48 Lecture : Introduction 42 7
Restriktioner p antal, forts. Restriktioner p antal, forts. N M N Varje avdelning kan ha många anställda men varje anställd kan endast jobba en avdelning Varje avdelning kan ha många anställda och varje anställd kan jobba flera avdelningar 43 TDDB48 Lecture : Introduction 43 44 TDDB48 Lecture : Introduction 44 Restriktioner p antal, forts. Svaga entiteter (,) (,00) N Varje avdelning kan ha upp till 00 anställda men varje anställd kan bara jobba en avdelning a identifieras genom sin avdelning, t.ex. Kalle sälj 45 TDDB48 Lecture : Introduction 45 46 TDDB48 Lecture : Introduction 46 Exempel Studenter studerar studieprogram och läser ett antal kurser. Varje kurs identifieras av en kurskod och ger studenten ett antal intjänade poäng. SUMMARY OF NOTATION FOR ER SCHEMAS Symbol Meaning ENTITY TYPE WEAK ENTITY TYPE RELATIONSHIP TYPE IDENTIFYING RELATIONSHIP TYPE ATTRIBUTE KEY ATTRIBUTE MULTIVALUED ATTRIBUTE COMPOSITE ATTRIBUTE DERIVED ATTRIBUTE E R E 2 E N R E 2 (min,max) R E TOTAL PARTICIPATION OF E 2 IN R CARDINALITY RATIO :N FOR E :E 2 IN R STRUCTURAL CONSTRAINT (min, max) ON PARTICIPATION OF E IN R 47 TDDB48 Lecture : Introduction 47 48 TDDB48 Lecture : Introduction 48 8
PROBLEM with ER notation THE ENTITY RELATIONSHIP MODEL IN ITS ORIGINAL FORM DID NOT SUPPORT THE SPECIALIZATION/ GENERALIZATION ABSTRACTIONS Extended Entity-Relationship (EER) Model Incorporates Set-subset relationships Incorporates Specialization/Generalization Hierarchies HOW THE ER MODEL CAN BE EXTENDED WITH - Set-subset relationships and Specialization/Generalization Hierarchies and how to display them in EER diagrams 49 TDDB48 Lecture : Introduction 49 50 TDDB48 Lecture : Introduction 50 Exempel: Två typer av anställda d ANummer Telefon. Lön ANummer Telefon Lön ANummer Telefon Lön mer Telefon Lön 5 TDDB48 Lecture : Introduction 5 52 TDDB48 Lecture : Introduction 52 mer mer Telefon Lön Telefon Lön d d a kan vara tekniker eller (XOR) administratörer a måste vara antingen tekniker eller (XOR) administratörer 53 TDDB48 Lecture : Introduction 53 54 TDDB48 Lecture : Introduction 54 9
mer Telefon o Lön ANummer Telefon o Lön Det kan finnas anställda som är både tekniker och administratörer AdmTekn Procent 55 TDDB48 Lecture : Introduction 55 56 TDDB48 Lecture : Introduction 56 Exempel På universitetet finns två typer av studenter, doktorander och grundutbildningsstudenter och man kan inte tillhöra båda kategorierna. Beroende vilken kategori man tillhör är olika kurser tillåtna. En del kurser bara för doktorander, en del för grundutbildningsstudenter och en del för alla typer av studenter. 57 TDDB48 Lecture : Introduction 57 58 TDDB48 Lecture : Introduction 58 UML Example for Displaying Specialization / Generalization Alternative Diagrammatic Notations Symbols for entity type / class, attribute and relationship Displaying attributes Notations for displaying specialization / generalization Various (min, max) notations Displaying cardinality ratios 59 TDDB48 Lecture : Introduction 59 60 TDDB48 Lecture : Introduction 60 0