Core On-disk Format#

Overview#

The EROFS core on-disk format is designed to be as simple as possible, since one of the basic use cases of EROFS is as a drop-in replacement for tar or cpio:

EROFS core on-disk format

The format design principles are as follows:

  • Data (except for inline data) is always block-based; metadata is not strictly block-based.

  • There are no centralized inode or directory tables. These are not suitable for image incremental updates, metadata flexibility, and extensibility. It is up to users to determine whether inodes or directories are arranged one by one or not.

  • I/O amplification from extra metadata access should be as small as possible.

There are only three on-disk components to form a full filesystem tree: superblock, inodes, and directory entries.

Note that only the superblock needs to be kept at a fixed offset, as mentioned below.

Conformance to Core Format#

An EROFS image conforms to the core on-disk format if and only if all of the following conditions are met:

  1. The is_compressed field (offset 0x54, 2 bytes) in the superblock is 0.

  2. All bits in feature_compat and feature_incompat, except those listed in the Feature Flags section below, are 0.

An image that does not meet these conditions uses one or more optional features described in separate feature-specific documents. Note that the core on-disk format has always been supported since Linux 5.4; thus, the 48-bit layout is not part of the core on-disk format (for example), and not all users need 48-bit block addressing.

Superblock#

The EROFS superblock is located at a fixed absolute offset of 1024 bytes. Its base size is 128 bytes. When sb_extslots is non-zero, the total superblock size is 128 + sb_extslots * 16 bytes. The first 1024 bytes are unused, which allows for support of other advanced formats based on EROFS, as well as the installation of x86 boot sectors and other oddities.

Field Definitions#

Offset

Size

Type

Name

Description

0x00

4

u32

magic

Magic signature: 0xE0F5E1E2

0x04

4

u32

checksum

CRC32-C checksum of the superblock block; see Superblock Checksum

0x08

4

u32

feature_compat

Compatible feature flags; see Feature Flags

0x0C

1

u8

blkszbits

Block size = 2^blkszbits; minimum 9

0x0D

1

u8

sb_extslots

Number of 16-byte superblock extension slots

0x0E

2

u16

rootnid_2b

Root directory NID

0x10

8

u64

inos

Total inode count; see blocks and inos Fields

0x18

8

u64

epoch

Filesystem creation time, seconds since UNIX epoch

0x20

4

u32

fixed_nsec

Nanoseconds component shared by all compact inodes; see Modification Time in Compact Inodes

0x24

4

u32

blocks

Total block count; see blocks and inos Fields

0x28

4

u32

meta_blkaddr

Start block address to specify the inode-metadata zone

0x2C

4

u32

xattr_blkaddr

Start block address to specify the extended attribute zone

0x30

16

u8[]

uuid

128-bit UUID for the volume

0x40

16

u8[]

volume_name

Filesystem label (not null-terminated if 16 bytes)

0x50

4

u32

feature_incompat

Incompatible feature flags; see Feature Flags

0x54

2

u16

is_compressed

0 for non-compressed images, any non-zero value for compressed images

0x56

4

u32

dontcare

External device support specific; ignored in core format

0x5A

1

u8

dirblkbits

Directory block size = 2^(blkszbits + dirblkbits); strictly 0 in the core format

0x5B

5

u8[]

dontcare

Xattr specific; ignored in core format

0x60

8

u8[]

dontcare

Compression specific; ignored in core format

0x68

1

u8

reserved

Reserved; must be 0

0x69

1

u8

dontcare

Xattr specific; ignored in core format

0x6A

2

u16

reserved

Reserved; must be 0

0x6C

12

u8[]

dontcare

48-bit layout specific; ignored in core format

0x78

8

u64

reserved

Reserved; must be 0

Note the difference between reserved and dontcare fields:

  • reserved: Users must not use these fields, and they must be filled with 0 to comply with the supported features or reserve for future use.

  • dontcare: Users can safely use these for other purposes as long as the corresponding incompatible feature flag is not set.

Magic Number#

The magic number at offset 0x00 must be 0xE0F5E1E2 (little-endian). A reader must reject any image whose first four bytes at offset 1024 do not match this value.

Superblock Checksum#

When EROFS_FEATURE_COMPAT_SB_CHKSUM is set, the checksum field contains a CRC32-C digest. The digest is computed over the byte range [1024, 1024 + block_size), with the four bytes of the checksum field itself treated as zero during computation.

For example, when blkszbits is 12 (block size is 4 KiB):

Offset

Size

Description

Checksum covered

0

1024

Padding

No

1024

4

Magic number

Yes

1028

4

Checksum field in superblock, filled with zero

Yes

1032

3064

Remaining bytes in the filesystem block

Yes

Tip: Some implementations (e.g., java.util.zip.CRC32C) apply a final bit-wise inversion. If the superblock checksum does not match, try inverting it.

Feature Flags#

feature_compat β€” Compatible Feature Flags#

A mount implementation that does not recognise a bit in feature_compat may still mount the filesystem without loss of correctness.

Bit mask

Name

Description

0x00000001

EROFS_FEATURE_COMPAT_SB_CHKSUM

Superblock CRC32-C checksum is present; see Superblock Checksum

0x00000002

EROFS_FEATURE_COMPAT_MTIME

Per-inode mtime is stored in extended inodes

Note

For new filesystem builders, it is recommended to always set EROFS_FEATURE_COMPAT_MTIME, since it indicates that all inode timestamps record modification time (mtime) rather than change time (ctime).

feature_incompat β€” Incompatible Feature Flags#

A runtime implementation that doesn’t implement any feature implied by a bit in feature_incompat must refuse to mount the entire filesystem.

The core on-disk format defines no incompatible feature flags. A non-zero feature_incompat value indicates one or more non-core feature extensions.

blocks and inos Fields#

The blocks and inos fields are primarily intended for statvfs(3) reporting. For dynamically generated EROFS filesystems, these fields can be set to 0.

Implementations should not use the blocks field to validate whether a block address or NID is valid. Such checks are unnecessary; malicious block addresses or NIDs will simply result in -EIO or reading corrupted (meta)data without causing any real harmful behaviors. Furthermore, a maliciously crafted image can easily bypass bounds checking by modifying the blocks field accordingly, making such validation meaningless.

Inodes#

Each on-disk inode must be aligned to a 32-byte inode slot boundary, which is set to be kept in line with the compact inode size. Given a NID nid, its inode can be located in O(1) time by computing the absolute byte offset as follows:

inode_offset = meta_blkaddr * block_size + 32 * nid

The NIDs for the root directory and special-purpose inodes are stored in the superblock. Valid inode sizes are either 32 bytes (compact) or 64 bytes (extended), distinguished by bit 0 of the i_format field.

Compact Inode (32 bytes)#

Defined as struct erofs_inode_compact:

Offset

Size

Type

Name

Description

0x00

2

u16

i_format

Inode format hints; see i_format Field

0x02

2

u16

reserved

Xattr specific; must be 0 if no xattrs

0x04

2

u16

i_mode

File type and permission bits

0x06

2

u16

i_nlink

Hard link count

0x08

4

u32

i_size

File size in bytes (32-bit)

0x0C

4

u32

reserved

48-bit layout specific; ignored in core format

0x10

4

u32

i_u

Union; see i_u Union

0x14

4

u32

i_ino

Inode serial number for 32-bit stat(2) compatibility

0x18

2

u16

i_uid

Owner UID (16-bit)

0x1A

2

u16

i_gid

Owner GID (16-bit)

0x1C

4

u32

reserved

Reserved; must be 0

Modification Time in Compact Inodes#

Due to space constraints, compact inodes cannot store a full 64-bit per-inode timestamp, let alone an additional nanosecond field. Consequently, when the 48-bit layout extension is unused, the effective timestamp for all compact inodes is (epoch, fixed_nsec), which has been the case since Linux 5.4.

Extended Inode (64 bytes)#

Defined as struct erofs_inode_extended:

Offset

Size

Type

Name

Description

0x00

2

u16

i_format

Inode format hints; see i_format Field

0x02

2

u16

reserved

Xattr specific; must be 0 if no xattrs

0x04

2

u16

i_mode

File type and permission bits

0x06

2

u16

reserved

Reserved; must be 0

0x08

8

u64

i_size

File size in bytes (64-bit)

0x10

4

u32

i_u

Union; see i_u Union

0x14

4

u32

i_ino

Inode serial number for 32-bit stat(2) compatibility

0x18

4

u32

i_uid

Owner UID (32-bit)

0x1C

4

u32

i_gid

Owner GID (32-bit)

0x20

8

u64

i_mtime

Modification time, seconds since UNIX epoch

0x28

4

u32

i_mtime_nsec

Nanoseconds component of i_mtime

0x2C

4

u32

i_nlink

Hard link count (32-bit)

0x30

16

u8[]

reserved

Reserved; must be 0

i_format Field#

The i_format field is present at offset 0x00 in both inode variants and encodes layout metadata:

Bits

Width

Description

0

1

Inode version: 0 = compact (32-byte), 1 = extended (64-byte)

1–3

3

Data layout: values 0–4 are defined; 5–7 are reserved. See Inode Data Layouts

4

1

48-bit layout specific; ignored in core format

5–15

11

Reserved; must be 0

Note

When bits 1–3 contain reserved values (5–7), the inode uses an unsupported data layout. Implementations must reject such inodes and return an appropriate error (e.g., β€œnot supported”). This typically indicates a maliciously crafted or corrupted image.

i_u Union#

The i_u field (4 bytes at offset 0x10) is interpreted based on the data layout:

Name

Applicable when

Description

i_u.startblk

Flat inodes

Starting block number

i_u.rdev

Character/block device inodes

Device ID

Inode Data Layouts#

The data layout of an inode is encoded in bits 1–3 of i_format. The core format defines two flat layouts.

EROFS_INODE_FLAT_PLAIN (0)#

i_u is interpreted as startblk (the 32-bit starting block address).

The inode’s data lies in consecutive blocks starting from that address, occupying ceil(i_size / block_size) consecutive blocks.

EROFS_INODE_FLAT_INLINE (2)#

i_u is interpreted as startblk (the 32-bit starting block address).

The inode’s data lies in consecutive blocks starting from that address, except for the tail part (i_size % block_size) that is inlined in the block immediately following the inode metadata. If i_size is small enough that the entire content fits in the inline tail, there are no preceding blocks and i_u is a don’t-care field.

Note

This layout is not allowed if the tail inode data block cannot be inlined (e.g., if inlining the tail data would cause the inode to cross a physical block boundary).

Directories#

All on-disk directories are organized in the form of directory blocks of size 2^(blkszbits + dirblkbits). dirblkbits is strictly 0 for now.

Directory Block Structure#

Each directory block is divided into two contiguous regions:

  1. An array of fixed-size directory entry records starting from the beginning of the block.

  2. Variable-length filename strings following the directory entry array.

The nameoff field of the first entry in a block indicates the total number of directory entries in that block:

entry_count = nameoff[0] / sizeof(erofs_dirent)

All entries within a directory block, including . and .., are stored in strict lexicographic (byte-value ascending) order to enable an improved prefix binary search algorithm.

Directory Entry Record#

Defined as struct erofs_dirent:

Offset

Size

Type

Name

Description

0x00

8

u64

nid

Node number of the target inode

0x08

2

u16

nameoff

Byte offset of the filename within this directory block

0x0A

1

u8

file_type

File type code (see below)

0x0B

1

u8

reserved

Reserved; must be 0

file_type Values#

Value

Constant

POSIX type

0

EROFS_FT_UNKNOWN

Unknown

1

EROFS_FT_REG_FILE

Regular file

2

EROFS_FT_DIR

Directory

3

EROFS_FT_CHRDEV

Character device

4

EROFS_FT_BLKDEV

Block device

5

EROFS_FT_FIFO

FIFO

6

EROFS_FT_SOCK

Socket

7

EROFS_FT_SYMLINK

Symbolic link

Filename Encoding#

Filenames are not null-terminated (\0) except for the last one in each directory block. For each directory block, if the last filename doesn’t reach up to the end of the block, the remaining bytes must start with 0x00.

So the length of entry i is derived as:

  • For all entries except the last: nameoff[i+1] βˆ’ nameoff[i].

  • For the last entry in the block: strnlen(filename, block_end βˆ’ nameoff[last]).

No character encoding is mandated; UTF-8 is recommended.

Note

Other alternative forms (e.g., Eytzinger order) were also considered (that is why there was once .*_classic naming). Here are some reasons those forms were not supported:

  • Filenames are variable-sized strings, which makes Eytzinger order harder to utilize unless namehash is also introduced, but that complicates the overall implementation and expands directory sizes.

  • It is harder to keep filenames and directory entries in the same directory block (especially large directories) to minimize I/O amplification.

  • readdir(3) would be impacted too if strict alphabetical order were required.

If there are better ideas to resolve these, the on-disk definition could be updated in the future.