Some notes from Chapter 3 ------------------------- Attributes: represented by fixed or variable-length sequences of bytes called fields Fields: put together in fixed or variable-length collections called records Records: stored in physical blocks; File: a collection of blocks; could be a bunch of records; could be the extent of a class --> (the file) may include index structures so that can quickly get certain records within the file (or certain attrs or ...) =========================== I. Data Elements and Fields =========================== How do we represent the values of attributes? - by fields How do we combine fields? - with records How do we combine records? - with blocks How do we combine blocks? - in files -------------------------------------- A. Representing Relational DB elements -------------------------------------- CREATE TABLE MovieStar( name CHAR(30) PRIMARY KEY, address VARCHAR(255), gender CHAR(1), birthdate DATE ); So will store each tuple in this relation as a record; within the record, there'll be one field for each of these attributes. This section: how do we represent SQL datatypes as fields Plus we may have weird data in the DBMS such as OIDs (a pointer to another record) or blobs (binary, large objects -- a 2 GB MPEG video) ----------------------- B. Representing Objects ----------------------- An object is a tuple; its fields are attributes. Except that objects can have methods (the code for which is part of the schema for the class of which these objects are instances) Objects can have an OID (an address in some global addy space that uniquely refers to the object). Objects can have r/ps to other objects, too. Relational data doesn't have addresses as values. Example 1.1: an Object Definition Language definition of class Star interface Star { attribute string name; attribute Struct Addr { string street, string city } address; relationship Set starredIn inverse Movie::stars; }; Star star = new Star(); // "star" can be represented by a record - that record would have fields for name and address - maybe we'd replace the structure containing address by two fields containing street and city - how do we capture the relationship? -- the relationship is a set of references to Movie objects -- so we'd need to provide a block address + offset for each of those objects - we also need to be able to store arbitrarily long lists of movies (that grow and shrink, too); --> may have a super mega star like Tom Cruise who has a huge set of Movies --> but may also have a 3rd rater like ... who only had a single movie hit (I would provide an example but that's sort of the point -- can never recall these folks' names) ----------------------------- C. Representing Data Elements ----------------------------- An attribute of type INTEGER is stored as 2 or 4 bytes An attribute of type FLOAT is stored as 4 or 8 bytes - ints and real nums represented in usual way and can be acted on by machine's hw in the usual way (1) Fixed-length character strings - simplest type, CHAR(n) - fixed length string of length n - represent via an array of chars of length n - if actual value populated for an attribute of this type is less than n, just fill out array with PAD char - quotation marks not stored as part of the value of the string (2) Variable-length character strings : VARCHAR(n) - length of actual string may vary widely - intended implementation: reserve n + 1 bytes (a) length plus content: allocate array of n + 1 bytes - first byte holds the # of bytes in the string (that's why the limit is 255) -- as a bit string (not a char of course) - 2nd through (n+1)'th byte hold chars of string - unused bytes are ignored (b) null-terminated string: allocate array of n + 1 bytes - fill array with chars of string then do '\0' Take-home point: for either of these schemes, will need x + 1 bytes where x <= n is the actual # of chars we have in our string (cost of doing business as a VARCHAR, I guess). (3) Dates and Times Dates usually represented as fixed-length character strings following some given format; so is really just a CHAR(n) where n == # of bytes in a given format's DATE representation. E.g. YYYY-MM-DD Ditto for times; e.g. HH:MM:SS ; use leading zero and military time (e.g. 7pm is 19 for the HH). May also include fractions of a second; e.g. HH:MM:SS.FF where FF is the fraction of one second; - may want varying precision on the fraction (1 sig dig, 2 sig digs, ...) - so time could be a VARCHAR(n) in such a case (4) Bits: data described by SQL2 as BIN(n) - can pack 8 bits to a byte - if ( n % 8 ) != 0, may as well just use ceiling(n/8) bytes - may store BOOLEANs in a single bit, e.g. 1000000 or 00000000 - but more commonly may just use 11111111 and 00000000 (5) Enumerated types - a type whose values may take on one of a small fixed set of values e.g. {SUN,MON,TUE,WED,THU,FRI,SAT} - can represent the various values as int codes; - so would just need 3 bits for the above (since that gives us 8 values) - so may pad in the front with 0s; i.e. WED = 011 --> 00000011 =========== II. Records =========== In general each type of record must have a schema (which is stored in the DB) - schema includes names and data types of fields in the record - when need to access a particular component of the record, look at the schema to confirm it exists and also to figure out where in the record that attr is -------------------------------- A. Building fixed-length records -------------------------------- MovieStar: name (30-byte string); addy (VARCHAR(255)) -- use 256 bytes; gender (single byte); birthday (of type DATE) So a single record is: 30 + 256 + 1 + 10 == 297 bytes - name begins at offset 0; addy begins at offset 30; gender at offset 286... - but may be able to more efficiently read/write data that begins at an offset of 4*x or 8*x - plus ints may be required to begin at an offset which is a multiple of 4 - so when we put the tuples on disk, we know that when we read them back, the first byte of the tuple will be put at a mem addy which is a multiple of 4; anyway, so may need to round record lengths up to satisfy this req So with MovieStar we might have: - name offset 0, addy offset 32; gender offset 288; bdate offset 292 ----------------- B. Record headers ----------------- Likely to need to keep info for the record that's not the value of any field - may want to include a pointer to where the schema is stored - may want to store the length of the record (so know offset to the next record without having to go to disk, get the record's related schema, calculate the offset, ...; main prob is di$k I/O) - may want timestamps for last modified, ... The DB maintains schema info for a relation: the attrs and their types, the order in which the attrs appear, constraints on the relation (the primary key, constraints, triggers, ...) May have schema pointer via 4 bytes which are an offset into an area where the schemas are stored. ------------------------------------------- C. Packing fixed-length records into blocks ------------------------------------------- Block may hold: header || record_1 || record_2 || ... || record_n || PAD block header: - links to other blocks (that contain other parts of this same relation, e.g.) - info a/b this block's location/placement within the group of blocks comprising the relation, e.g. - info a/b the relation these tuples/records belong to - directory: gives offset of each record in the block - block ID - timestamp; last access or modification ============================================ III. Representing block and record addresses ============================================ The address of a block when it is loaded into main mem may be the virtual mem addy of its first byte. So the addy of a record within that block is the virtual addy of the first byte of that record. Can get a record from a block and offset within that block (of the first byte of the record) ------------------------ A. Client-Server systems ------------------------ Server process provides data from 2ndary storage to client processes The client process uses a virtual address space of size 2^32 The OS or DBMS decides which parts of that virtual addy space are in physical mem; TLB and page tables map those parts that are in physical mem to their actual addresses there. Server's data lives in DB addy space; the addresses here refer to blocks and offsets within blocks. - represent this address space via physical addys; phys address may specify: -- name of the host to which this storage dev is connected -- disk ID for the desired block -- # of cylinders in that disk -- # of tracks within the cylinder -- # of blocks w/in the track -- maybe the offset to the start of records (end of header) - represent the address space via logical addys -- block block or record has a logical address (arbitrary bit string) -- hash table maps from logical addy to a physical addy Physical addresses are long: min 8 bytes, upto 16 bytes ----------------------------------- B. Logical and structured addresses ----------------------------------- Why bother with logical addresses? Because can then move records around without updating all the places that think the record has its old address X. - w/logical addys, move data, change one entry Combo: use structured addresses; - use physical address to specify/locate the block - but use a logical addy for offset w/in that block - header of the block would contain a list of record #s and their corresponding offsets - may instead have records be fixed size with each of their fields being fixed sized; so can access any particular field of some record type using a known offset (quick!) - or use an offset table as above - with structured deal may start writing records at the end of the block, and only allocate positions in the offset table as needed offset table grows this way --> <-- records grow this way - addy of a record is then: phys block addy + offset of the entry in the block's offset table - can move, compact, etc., records w/in the block - can move the record to another block (if the offset table entries are long enough or could be long enough to hold a pointer - when delete a record, reclaim its space but leave the offset table entry with a skull and crossbones indicating this record is dead -------------------- C. Pointer Swizzling -------------------- Pointers or addresses may be part of records (common for records that represent objects). Index structures are composed of blocks that usually have pointers within them (?). Every referenceable data item (block, record, object, ...) has two forms of address: - its address in the server's DB addy space -- database address ( ~ 8 B ) - its addy in virtual mem -- memory address (4 B) When in 2ndary storage, must use DB addy of the item; when in main mem, can use either addy; quicker to use memory address in main mem (b/c otherwise we'd need to do a translation); - translating from db addy to main mem addy -- use a table that translates all those db addys that are currently in virtual mem TO their current main mem address -- memory addresses provided as output of this table are for copies of the object in memory - since the object will also exist in the db -- only items which are currently in memory are in this translation table How to avoid cost of repeatedly translating from a DB addy to a mem addy? - pointer swizzling: -- when move a block from secondary to main mem, pointers within that block may be translated from the DB addy space to the virtual addy space -- so a pointer consists of: - a bit indicating whether the pointer is currently a DB or memory addy - the actual pointer (DB or mem) - entry has some number of bytes that is the same regardless of the type of addy the pionter is at this time For example if we have a record R1 which exists in some block B1 and B1 is brought into memory and R1 contains a pointer to R2 (which exists in the same block) and a pointer to R100 which exists in a different block, B2, we can *swizzle* the pointer in R1 for R2 (since R2 will also be in main mem); however if B2 is not in main mem, we can't swizzle the pointer from R1 to R100. If B2 is later brought into mem, we could swizzle that pointer from R1 to R100 at that time. ----------------------- (1) Automatic Swizzling ----------------------- As soon as a block is brought into mem, we locate all of its pointers and addresses and enter them into the translation table (if they're not already there). - includes pointers from records in the block to elsewhere - and the addresses of the block itself and/or its records (if they're addressable) Need some way to locate pointers within the block: - if block holds records with a known schema, schema will tell us where in the records the pointers are found - if block is used for an index structure, then block will hold pointers at known locations - may keep in block header a list of where the pointers are Easy to create translation table entries for blocks just moved into memory and/or for their records (if addressable). - so if when we insert one such addy into the table, we find that it is already in the table, we change the address (to the main mem one) and set the swizzle bit - if we try to follow a pointer from the block and find that this pointer is still a db pointer, then we look in the translation pointer and see if the db addy is in it; if not, we copy the block containing that pointer into mem; then we swizzle that pointer ----------------------- (2) Swizzling on Demand ----------------------- Leave all pointers unswizzled when bring block into mem Enter the block's addys into the translation table as well as their main mem addys; If and when we follow some pointer P in that block, we swizzle it. One option: make it so that db addys look to be invalid mem addys so that when one is attempted to be accessed, the mem ref will cause a hardware trap; then DBMS could provide a fxn which is invoked by the trap and which brings the desired block into mem (if it's not there already). ---------------- (3) No swizzling ---------------- Still need translation table; as a result, records can't be pinned in mem ----------------------------------- (4) Programmer control of swizzling ----------------------------------- Programmer may know whether it's likely that the pionters in a block will be followed; so programmer may specify that a block loaded into mem have its pointers swizzled or may indicate swizzling should be on-demand --------------------------- D. Returning Blocks to Disk --------------------------- When a block is moved from memory back to disk, any pointers within that block must be unswizzled (mem addys replaced by db addys); SELECT memAddr FROM TranslationTable WHERE dbAddr = x; --> maybe a hash table using dbAddr as the key (and which has an index on the dbAddr attr) SELECT dbAddr FROM TranslationTable WHERE memAddr = y; --> then would want an index on the attribute memAddr too So able to search translation table in both directions. ---------------------------- E. Pinned records and blocks ---------------------------- A block in mem is pinned if it can't be safely written back to disk - locate w/in header of a block the bit specifying whether the block's pinned or not - for example if B_1 has a swizzled pointer to some item in B_2 then we don't want to replace B_2 in main mem with some other block ... certainly not without letting B_1 know to invalidate its pointers (cuz we would be replacing B_2 in main mem with some other block .. and don't want to do this transparently to B_1) - a block that is referred to by a swizzled pointer is pinned So when we write a block back to disk, must: - unswizzle any pointers from that block to data in that block - change the translation table addresses for this block to DB addys (from mem addys) - any pointers in that block to other blocks should be unswizzled too - the translation table must record for each db addy whose data items in mem, -- locations where swizzled pointers to this item exist; could do: (1) keep list of refes to a mem addy as a linked list, attach that LL to the entry in the translation table for that addy (2) if ( 2 * len(mem_addy) ) <= (len(db_addy)) then we can replace the pointer to the db addy by - the swizzled pointer (the addy in main mem for this item) - and another pointer that points to a LL of all of the references to this swizzled pointer - address translation table entry might be: db_addy || mem_addy || pointer to first occurrence of mem_addy - then at that occurrence of a pointer to the mem_addy, after that occurrence, the next 4 bytes point to the *next* occurrence of a pointer to that mem_addy, ... -- so in place of the DB addy which we used to have for some location, we put the mem_addy (e.g. 4 bytes) then a pointer to the next occurrence of a swizzled pointer for this particular mem_addy In short: - update address translation table so that mem addy is no longer stored - then for every place that pointed to this block (which is being moved out of mem), make those pointers invalid (ideally make the pointers to mem point instead to the correct db addy for this block) - hackily could just use the same trick as before where make those mem addys invalid causing a hw trap on access, ... ==================================== IV. Variable-length data and records ==================================== May have: - data items of varying sizes; perhaps the address field is 50 bytes most of the time and only > 50 bytes rarely - repeating fields: some star might have tons of movies as part of his tuple (e.g. Tom Cruise) whereas another might have only a couple; so the amount of space needed to store a star obj will vary --> in one case StarredIn may occur just once --> in another case StarredIn may occur many times ("repeated fields") - variable-format records: for some stars might make sense to have a field such as Directed or Produced whereas for other stars such a field would go unused --> so the formatting of these different tuples may vary --> may want to have some vanilla format that all stars would use then add fields for certain records/tuples but not others - creating variable formatting - enormous fields: may have an attribute that is a gif image - so the record will no longer fit into a block -------------------------------------- A. Records with variable-length fields -------------------------------------- So if one or more fields has variable length, then the record must contain info so that we can find any field of that record. - put in the header the length of the record - then put offsets to the beginnings of all variable-length fields - maybe we'd put all the fixed-length fields first - then the first variable length field starts where the fixed length fields end - then have a pointer (offset) for the second variable-length field and the third and ... -------------------------------- B. Records with repeating fields -------------------------------- What if a record contains a variable # of occurrences of a field F - e.g. where F == "StarredIn" - mega major stars might have 50 occurrences of F - littler stars might have 1 or 2 occurrences of F So group all occurrences of F together and put in teh header a pointer to the first occurrence of F (assuming F is fixed length). Could instead keep records fixed length and put the variable-length portion on a separate block; - so in record keep a pointer to the start of each F where F is repeating - for every field F that repeats - then keep # of repetitions of each repeating field F So record header might be: rec_hdr_info || pointer_to_name || length_of_name || pointer_to_address || length_of_address || pointer_to_first_movie_ref || #_of_movie_refs || pointer_to_first_child_ref || #_of_child_refs So the record length is fixed (as long as we know in advance how many repeating fields we might have -- i.e. 2 in the above example since there may be many movies for any star and there may be many children for any star). - if records are fixed length, we can search them more efficiently - plus we can move them easily (since we don't have to also move all of their related data simultaneously) - but may need to make more disk I/Os when the record exists on a different block than its related data Compromise: keep in fixed length portion of the record enough space for - some avg # of repeating fields (say we expect that a star will have 6 movies; so would keep 6 occurrences of the movie field) - then keep a pointer to where addititional occurrences of the movie field can be found (e.g. for Tom Cruise) - and also keep a count of how many additional fields (beyond the 6 stored in the record) exist at the pointer above -------------------------- C. Variable-format records: What if records don't have a fixed schema? -------------------------- - the fields in a record or - the order of fields is not fixed Then will want to include control info within the record ... along with the data; - attribute or field name - the type of the field - the length of the field (if it's not obvious from the type) e.g. code_for_an_attr || code_for_that_attr's_type || length_of_the_attr_for_this_record || the_attr_value code_for_name || code_for_string_type || 14 || Clint Eastwood || code_for_restaurant || code_for_string_type || 16 || Hog's Breath Inn ... So we get records with flexible schema; we can deal with 'sparse records' (i.e. records where only a handful of thousands of fields are actually in use for any given record); ------------------------------------- D. Records that do not fit in a block: e.g. video or audio clips ------------------------------------- Large values; have variable length; Spanned records: let the record cover >1 block - also useful if we have records that are just a bit larger than a block, then we want to be able to use the remaining portion of the block that is unused ( block_size - (the_overflow_of_the_rec_from_the_first_block)) The portion of a record that appears in one block is called the record fragment A record with two or more fragments is called span Records that do not cross bloock boundaries are unspanned If records can be spanned, then each record and record frag must contain: - each record or fragment header needs a bit saying whether is a frag or not - if is a frag, needs a bit telling whether it's the first or last frag for its record - if there is a next or previous frag for the same record, need pointers to these frags -------- E. BLOBs: Binary Large OBjects; e.g. *.gif, *.jpeg, ... -------- So if a field has a BLOB as a value, - how to store the blob? - how to retrieve the blob? (1) Storage of BLOBs - a BLOB must be stored on a sequence of blocks (best if they are allocated consectutively on a cylinder or cylinders) but may also store the BLOB on a linked list of blocks - may need to stripe the BLOB across several disks so that several blocks of it can be retrieved simultaneously (2) Retrieval of BLOBs - may want to return to the requesting client the small fields of a requested record - then allow client to request blocks of the BLOB one at a time - may also want to allow client to request interior portions of the BLOB without having to get the whole thing -- for DBMS to support this, would need an index by seconds or minutes of an audio clip, e.g. (so user can get 45th second or minute) ======================= V. Record modifications ======================= Insertion, deletion, update may cause records to change length ------------ A. Insertion ------------ If records of relation not kept in order, can just find a block with empty space and put the record there; then maybe add that block to some list maintained for the relation of "all tuples/records of this relation" But what if records have to be kept in some order? - then insertion means finding this new tuple's correct place - and if have multiple records on a single block, then move those around so that this new record can go in its rightful place - if we need to slide records around, then using an offset table as before would probably be useful - if there's room in the correct block, slide the records around to create space for the new guy in the correct location, then update the offset table - but if there's no room for this new record, (a) find space ona nearby block - move the highest records from B_1 to B_2 - put the new record in its correct location - but if move a record R_N from B_1 need to leave a forwarding entry in B_1 so that R_N can be located in its new home in B_2 - perhaps that forwarding addy would occupy the space in the offset table where R_N's offset in B_1 used to be (b) create an overflow block - each block has in its header a place for a pointer to an overflow block - the overflow block for B can point to a second overflow block, ... ----------- B. Deletion ----------- May want to reclaim a deleted record's space; if use an offset table, can do so (via compacting existing records and updating the offset table). If can't slide records, then maintain an available-space list and put the deleted record's location on that list when deleting it. So the list includes the region location plus the space available @ that loc. May be able to stop using an overflow block when we delete a record. Consider total amount of space used by remainig records, and if those records can be kept on fewer blocks, move them But must deal with possibility that there may be pointers to the deleted record; so put a cross and bones in the space that record used to occupy then any subsequent references to it (or to fields within it) will come up as showing the record's gone. If we use an offset table, then can just put the cross and bones in the offset table entry for the deleted record (since user's won't be able to directly reference the record without going through that table). If we use a hash table to translate a record addy (logical) to a physical addy, then put the cross and bones in the entry for that record addy. May instead just have the first bit of a record indicate whether the record is live or not ... so user's would have to check this bit before accessing the record. In this case we could use the old bytes of the record for new stuff. --------- C. Update --------- So if we update a fixed-length record, no problem. But if we update a variable-length record with a value that increases the record's size, may have problems. May involve sliding other records around (as with insertion) and/or creating an overflow block. If the variable portions of a record are kept on a separate block, may need to move elements around that block or create an overflow block for the variable-length stuff. If record gets shorter have same opportunities as with deletion to reclaim space.