Previous Section  < Day Day Up >  Next Section

8.5 VxFS Internal Structures

VxFS first made its appearance in HP-UX version 10.01. Since then, it has grown in use and has become the default filesystem type in HP-UX version 11.X. With its current incarnation, it supports all the features of HFS including ACLs which were missing until layout version 4 (JFS version 3.3). Version 3.3 is available for HP-UX version 11.X as well as HP-UX version 10.20. As such, VxFS is the way forward as far as filesystems are concerned for HP-UX. The product is known by two main names: JFS (Journaled File System), which is the name associated with the software product itself, and VxFS (Veritas extended File System), which is the filesystem type. The easy way to remember this is that JFS is the product used to access VxFS filesystems.

The key benefits to using VxFS can be summarized as follows:

  • Fast File System Recovery: Using journaling techniques, the filesystem can track pending changes to the filesystem by using an intent log. After a system crash, only the pending changes need to be checked and replayed in the filesystem. All other changes are said to be complete and need no further checking. In some instances, the filesystem in its entirety is marked as dirty. Such a filesystem will require the fsck -o full,nolog command to be run against it in order to perform a full integrity check.

  • Online Administration: Most tasks associated with managing a filesystem can be performed while the filesystem is mounted. These include resizing, defragmenting, setting allocation policies for individual files, as well as online backups via filesystem snapshots.

  • Extent based allocation: This allows for contiguous filesystem blocks (called an extent) to be referenced by a single inode entry. A filesystem block is sized by the newfs command to be:

    - 1KB for filesystems less than 8GB in size

    - 2KB for filesystems less than 16GB in size

    - 4KB for filesystems less than 32GB in size

    - 8KB for filesystems greater than 32GB in size

    Allowing inodes to reference an entire extent does away with the notion of a fixed block size and the problems of having to use multiple inode pointers to reference large chunks of data. This can be used to dramatically improve IO performance for large files.

When we look at the basic building blocks of VxFS, they appear to be similar to the building blocks of HFS (Figure 8-4).

Figure 8-4. Basic VxFS layout.

graphics/08fig04.gif


In comparison to HFS, there are a number of conceptual similarities; we have a Superblock, which is a road map to the rest of the filesystem. An Allocation Unit is similar to the concept of a Cylinder Group in HFS in that it is a localized collection of tracks and cylinders. In VxFS, an Allocation Unit is 32MB in size (possibly with the exception of the last AU). There are some fundamental differences that are not immediately apparent.

The OLT is the Object Location Table. This structure references a number of structural elements that I haven't shown in Figure 8-4. Information stored in the OLT includes information relating to where to find the initial inodes describing the filesystem, the device configuration (HP-UX currently allows only one device per filesystem, even if it is a logical device), where to find redundant superblocks, as well as space for information not maintained in version 4 layout such as the Current Usage Table. One of the main elements in the OLT is a reference to a list of fileset headers. A fileset is essentially a collection of files/inodes stored within the data blocks of the filesystem. When I was first told this, I immediately equated a fileset to an inode list. This is a fair comparison, if a little naive. In VxFS, we (currently) have two filesets. Fileset 1 (known as the Structural Fileset) and Fileset 999 (known as the Unnamed or Primary Fileset). You and I, as users, will interface with the Primary Fileset because it references inodes that are user visible, i.e., regular files, directories, links, device files, etc., and it is the fileset that is mounted by default (in the future, it may be possible to support and mount more filesets; cloning a fileset may be possible). The Structural Fileset contains structural information relating to the filesystem, and there are no standard user-accessible commands to view files/inodes within Fileset 1. Fileset 1 is there to be used by the filesystem as it sees fit; for example, an inode in the Primary Fileset may reference and inode in the Structural Fileset for BSD-style quota information.

In VxFS, we have inodes that work in a similar way to inodes in an HFS filesystem; i.e., they reference the file type and mode, ownership, size, timestamps, and references to data blocks. This is where things start to change. VxFS are 256 bytes in size. One of the reasons an inode is bigger is that VxFS inodes can have attributes associated with them (more on attributes later). Another reason an inode is bigger is that we need to store information relating to allocation flags set by the setext command, e.g., contiguous allocation of extents for this file. One other fundamental difference is the way an inode will reference the data blocks (=extents) associated with the user file. This is known as the Inode Organization Type (i_orgtype). There are four organization types:

  • i_orgtype = 0: Used for character and block device file where there is no data area. The inode will contain an rdev reference to the device file. Also known as IORG_NONE.

    
    
    
    

    
    root@hpeos003[] ll -i /dev/kmem                          
    
        68 crw-r-----   1 bin        sys          3 0x000001 Aug 12 07:52 /dev/kmem
    
    root@hpeos003[] echo "68i" | fsdb -F vxfs /dev/vg00/lvol3
    
    inode structure at 0x00001089.0000
    
    type IFCHR mode 20640  nlink 1  uid 2  gid 3  size 0
    
    atime 1068715134 460056  (Thu Nov 13 09:18:54 2003 BST)
    
    mtime 1060671154 0  (Tue Aug 12 07:52:34 2003 BST)
    
    ctime 1068715134 460017  (Thu Nov 13 09:18:54 2003 BST)
    
    aflags 0 orgtype 0 eopflags 0 eopdata 0
    
    fixextsize/fsindex 0  rdev/reserve/dotdot/matchino 50331649
    
    blocks 0  gen 0  version 0 318  iattrino 0
    
    root@hpeos003[]
    
    

  • i_orgtype = 1: The most common organization type where the inode contains 10 direct block pointers similar to the direct block pointers in an HFS inode. The difference here is that the inode will store the starting block number (de = direct extent) and the number of subsequent, adjacent blocks to reference (des = direct extent size); in other words, we can reference multiple 1KB blocks known as an extent. Indirect pointers are available but not used (see i_orgtype = 3). Also known as IORG_EXT4.

    
    
    
    

    
    root@hpeos003[] ll -i /etc/hosts
    
         5 -r--r--r--   1 bin        bin           2089 Oct 23 15:31 /etc/hosts
    
    root@hpeos003[] echo "5i" | fsdb -F vxfs /dev/vg00/lvol3  
    
    inode structure at 0x00000579.0100
    
    type IFREG mode 100444  nlink 1  uid 2  gid 2  size 2089
    
    atime 1068727100 180003  (Thu Nov 13 12:38:20 2003 BST)
    
    mtime 1066919463 900011  (Thu Oct 23 15:31:03 2003 BST)
    
    ctime 1066919463 900011  (Thu Oct 23 15:31:03 2003 BST)
    
    aflags 0 orgtype 1 eopflags 0 eopdata 0
    
    fixextsize/fsindex 0  rdev/reserve/dotdot/matchino 0
    
    blocks 3  gen 1  version 0 4576  iattrino 0
    
    de:  2965    0    0    0    0    0    0    0    0    0 
    
    des:    3    0    0    0    0    0    0    0    0    0 
    
    ie:     0    0 
    
    ies:    0 
    
    root@hpeos003[]
    
    

  • i_orgtype = 2: Referred to as Immediate Inode Data. Where a directory or symbolic link is less than 96 characters in length, the filesystem will not allocate a data block but it will store the directory/symbolic link information directly in the inode itself. Similar to the create_fastlinks concept in HFS. Also known as IORG_IMMED.

    
    
    
    

    
    root@hpeos003[] mkdir -p /stuff/more
    
    root@hpeos003[] ll -id /stuff
    
      7470 drwxrwxr-x   3 root       sys             96 Nov 13 12:44 /stuff
    
    root@hpeos003[] echo "7470i" | fsdb -F vxfs /dev/vg00/lvol3
    
    inode structure at 0x00103dc3.0200
    
    type IFDIR mode 40775  nlink 3  uid 0  gid 3  size 96
    
    atime 1068727492 180001  (Thu Nov 13 12:44:52 2003 BST)
    
    mtime 1068727492 180002  (Thu Nov 13 12:44:52 2003 BST)
    
    ctime 1068727492 180002  (Thu Nov 13 12:44:52 2003 BST)
    
    aflags 0 orgtype 2 eopflags 0 eopdata 0
    
    fixextsize/fsindex 0  rdev/reserve/dotdot/matchino 2
    
    blocks 0  gen 45  version 0 374  iattrino 0
    
    root@hpeos003[] ll -i /stuff                                       
    
    total 0
    
      7546 drwxrwxr-x   2 root       sys             96 Nov 13 12:44 more
    
    root@hpeos003[] 
    
    root@hpeos003[] echo "7470i.im.p db" | fsdb -F vxfs /dev/vg00/lvol3
    
    immediate directory block at 00103dc3.0250 - total free (d_tfree) 76 
    
    00103dc3.0254:  d 0    d_ino 7546  d_reclen 92  d_namlen 4  
    
                   m  o  r  e 
    
    root@hpeos003[]
    
    

  • i_orgtype = 3: Used for files beyond the capabilities of IORG_EXT4. The inode contains six entries that can reference a block and size entry in the same way as an IORG_EXT4 entry or can reference an Indirect block. The Indirect block can reference data blocks or further indirection. The levels of Indirection are only limited by the size of the filesystem. This is similar in concept to the single, double, and triple indirect pointers that we see in HFS, except that the Indirection is unlimited. Also known as IORG_TYPED.

    
    
    
    

    
    root@hpeos003[] ll -i /logdata/db.log
    
    4 -rw-rw-r--   1 root       sys        212726016 Nov 13 13:04 /logdata/db.log
    
    root@hpeos003[] echo "4i" | fsdb -F vxfs /dev/vx/dsk/ora1/logvol
    
    inode structure at 0x000003f8.0400
    
    type IFREG mode 100664  nlink 1  uid 0  gid 3  size 212726016
    
    atime 1068728572 410003  (Thu Nov 13 13:02:52 2003 BST)
    
    mtime 1068728665 710006  (Thu Nov 13 13:04:25 2003 BST)
    
    ctime 1068728665 710006  (Thu Nov 13 13:04:25 2003 BST)
    
    aflags 0 orgtype 3 eopflags 0 eopdata 0
    
    fixextsize/fsindex 0  rdev/reserve/dotdot/matchino 0
    
    blocks 51938  gen 2  version 0 92  iattrino 0
    
    ext0:  INDIR  boff: 0x00000000 bno:    67584 len:        2
    
    ext1:  NULL   boff: 0x00000000 bno:        0 len:        0
    
    ext2:  NULL   boff: 0x00000000 bno:        0 len:        0
    
    ext3:  NULL   boff: 0x00000000 bno:        0 len:        0
    
    ext4:  NULL   boff: 0x00000000 bno:        0 len:        0
    
    ext5:  NULL   boff: 0x00000000 bno:        0 len:        0
    
    root@hpeos003[]
    
    root@hpeos003[] echo "67584b; p 128 T | more" | fsdb -F vxfs /dev/vx/dsk/ora1/logvol
    
    0x00010800.0000:  DATA   boff: 0x00000000 bno:     1290 len:     6492
    
    0x00010800.0010:  DATA   boff: 0x0000195c bno:    14336 len:     6528
    
    0x00010800.0020:  DATA   boff: 0x000032dc bno:    28672 len:     4096
    
    0x00010800.0030:  DATA   boff: 0x000042dc bno:  7831552 len:     4096
    
    0x00010800.0040:  DATA   boff: 0x000052dc bno:  7841792 len:     6144
    
    0x00010800.0050:  DATA   boff: 0x00006adc bno:  7854080 len:     5104
    
    0x00010800.0060:  DATA   boff: 0x00007ecc bno:  7800832 len:     8192
    
    0x00010800.0070:  DATA   boff: 0x00009ecc bno:  7815168 len:     5120
    
    0x00010800.0080:  DATA   boff: 0x0000b2cc bno:  7827456 len:     4096
    
    0x00010800.0090:  DATA   boff: 0x0000c2cc bno:    65536 len:     2048
    
    0x00010800.00a0:  DATA   boff: 0x0000cacc bno:    69632 len:       20
    
    0x00010800.00b0:  NULL   boff: 0x00000000 bno:        0 len:        0
    
    0x00010800.00c0:  NULL   boff: 0x00000000 bno:        0 len:        0
    
    ...
    
    root@hpeos003[]
    
    

Inodes are referenced via entries in the Inode List Table. This table can reference clumps of inodes known as Inode Extents, much in the same way that normal inodes reference clumps (extents) of data. The Inode List Table is a Structural File and has its own Structural inode, which follows the same organization type definitions as normal inodes. In VxFS, we do not create inodes until we need them (dynamic inode allocation). Consequently, Inode Extents may not necessarily contiguous. Inside the Inode Extent will be an Inode Allocation Unit Table, which stores information used to allocate inodes for that Inode List Table. All this proves that files/inodes in the Structural Fileset are used much in the same way as files/inodes in the Primary Fileset; it's just that we don't normally deal with them directly.

The last piece of VxFS theory we discuss is inode attributes. There are 72 bytes reserved at the end of the inode dedicated to attributes. In addition, an inode can have an attribute inode (similar in concept to a continuation inode in HFS). If used, an attribute inode is referenced via the iattrino structure in the general inode (there's a list of corresponding attribute inodes for every general inode). Who uses these attributes? Applications that are VxFS-aware can use them if they so desire. Take a backup application such as the Hierarchical Storage Management products like DataProtector. If coded properly, there's nothing to say that these applications could store a "tape archived" attribute with the inode every time it's backed up. This has nothing to do with standard filesystem commands (that don't know anything about this attribute), but might mean lots to the backup application. The only standard filesystem commands that currently use attributes are VxFS ACLs. We have discussed ACLs previously, but just let me do a quick recap. With ACLs, we can give users their own access permissions for files and directories. This allows a filesystem to meet a specific part of the C2 (U.S. Department of Defense: Orange Book) level of security. VxFS ACLs are managed by the getacl and setacl commands. Here, fred and barney are given their own permissions to the /logdata/db.log file:






root@hpeos003[] pwget -n fred          

fred:rK23oXbRNKgAo:109:20::/home/fred:/sbin/sh

root@hpeos003[] pwget -n barney

barney:acGNA0B.QxKYI:110:20::/home/barney:/sbin/sh

root@hpeos003[] 

root@hpeos003[] getacl /logdata/db.log                   

# file: /logdata/db.log

# owner: root

# group: sys

user::rw-

group::rw-

class:rw-

other:r--

root@hpeos003[] setacl -m "user:fred:rwx" /logdata/db.log

root@hpeos003[] setacl -m "user:barney:---" /logdata/db.log 

root@hpeos003[] getacl /logdata/db.log                     

# file: /logdata/db.log

# owner: root

# group: sys

user::rw-

user:fred:rwx

user:barney:---

group::rw-

class:rwx

other:r--

root@hpeos003[]


Let's see if we can find the attributes just applied:






root@hpeos003[] ll -i /logdata/db.log

4 -rw-rwxr--  1 root       sys        212726016 Nov 13 13:04 /logdata/db.log

root@hpeos003[]

root@hpeos003[] echo "4i" | fsdb -F vxfs /dev/vx/dsk/ora1/logvol

inode structure at 0x000003f8.0400

type IFREG mode 100674  nlink 1  uid 0  gid 3  size 212726016

atime 1068728572 410003  (Thu Nov 13 13:02:52 2003 BST)

mtime 1068728665 710006  (Thu Nov 13 13:04:25 2003 BST)

ctime 1068736084 820007  (Thu Nov 13 15:08:04 2003 BST)

aflags 0 orgtype 3 eopflags 0 eopdata 0

fixextsize/fsindex 0  rdev/reserve/dotdot/matchino 0

blocks 51938  gen 2  version 0 96  iattrino 0

ext0:  INDIR  boff: 0x00000000 bno:    67584 len:        2

ext1:  NULL   boff: 0x00000000 bno:        0 len:        0

ext2:  NULL   boff: 0x00000000 bno:        0 len:        0

ext3:  NULL   boff: 0x00000000 bno:        0 len:        0

ext4:  NULL   boff: 0x00000000 bno:        0 len:        0

ext5:  NULL   boff: 0x00000000 bno:        0 len:        0

root@hpeos003[]


As we can see, iattrino has not been set. This means that if we do not look for the attribute inode, the attribute will be stored in the last 72 bytes of this inode. We can dump the attributes with the fsdb attr command:






root@hpeos003[] echo "4i.attr.p 18 x" | fsdb -F vxfs /dev/vx/dsk/ora1/logvol

000003f8.04b8: 00000001 0000003c 00000001 00000001 

000003f8.04c8: 00000003 00000000 00000002 0000006d 

000003f8.04d8: 00070000 00000002 0000006e 00000000 

000003f8.04e8: 00000004 00000000 00060000 00000000 

000003f8.04f8: 00000000 00000000 00008180 00000001

root@hpeos003[]


This takes a little deciphering, but look at each highlighted element in turn:

  • Format = 0x00000001 = Attribute Immediate

  • Length = 0x0000003c = 60 bytes

  • Class = 0x00000001 = ACL class

  • Subclass = 0x00000001 = SVr4 ACL (see /usr/include/sys/aclv.h)

  • ACL type = 0x00000002 = User record

  • User ID = 0x0000006d = 109 = fred

  • Permissions = 0x00070000 = rwx

  • ACL type = 0x00000002 = User record

  • User ID = 0x0000006e = 110 = barney

  • Permissions = 0x0000000 = ---

This introduction into the background behind VxFS will allow us to understand how VxFS works and may help us to answer some questions when we run fsck. What we need to discuss now are the additional administrative features related that the Online JFS brings to our system.

    Previous Section  < Day Day Up >  Next Section