Core On-disk Format

Core On-disk Format#

Overview#

The EROFS core on-disk format is designed to be as simple as possible, since one of the basic use cases of EROFS is as a drop-in replacement for tar or cpio:

EROFS core on-disk format

The format design principles are as follows:

Data (except for inline data) is always block-based; metadata is not strictly block-based.
There are no centralized inode or directory tables. These are not suitable for image incremental updates, metadata flexibility, and extensibility. It is up to users to determine whether inodes or directories are arranged one by one or not.
I/O amplification from extra metadata access should be as small as possible.

There are only three on-disk components to form a full filesystem tree: superblock, inodes, and directory entries.

Note that only the superblock needs to be kept at a fixed offset, as mentioned below.

Conformance to Core Format#

An EROFS image conforms to the core on-disk format if and only if all of the following conditions are met:

The is_compressed field (offset 0x54, 2 bytes) in the superblock is 0.
All bits in feature_compat and feature_incompat, except those listed in the Feature Flags section below, are 0.

An image that does not meet these conditions uses one or more optional features described in separate feature-specific documents. Note that the core on-disk format has always been supported since Linux 5.4; thus, the 48-bit layout is not part of the core on-disk format (for example), and not all users need 48-bit block addressing.

Superblock#

The EROFS superblock is located at a fixed absolute offset of 1024 bytes. Its base size is 128 bytes. When sb_extslots is non-zero, the total superblock size is 128 + sb_extslots * 16 bytes. The first 1024 bytes are unused, which allows for support of other advanced formats based on EROFS, as well as the installation of x86 boot sectors and other oddities.

Field Definitions#

Offset	Size	Type	Name	Description
0x00	4	`u32`	`magic`	Magic signature: `0xE0F5E1E2`
0x04	4	`u32`	`checksum`	CRC32-C checksum of the superblock block; see Superblock Checksum
0x08	4	`u32`	`feature_compat`	Compatible feature flags; see Feature Flags
0x0C	1	`u8`	`blkszbits`	Block size = `2^blkszbits`; minimum 9
0x0D	1	`u8`	`sb_extslots`	Number of 16-byte superblock extension slots
0x0E	2	`u16`	`rootnid_2b`	Root directory NID
0x10	8	`u64`	`inos`	Total inode count; see blocks and inos Fields
0x18	8	`u64`	`epoch`	Filesystem creation time, seconds since UNIX epoch
0x20	4	`u32`	`fixed_nsec`	Nanoseconds component shared by all compact inodes; see Modification Time in Compact Inodes
0x24	4	`u32`	`blocks`	Total block count; see blocks and inos Fields
0x28	4	`u32`	`meta_blkaddr`	Start block address to specify the inode-metadata zone
0x2C	4	`u32`	`xattr_blkaddr`	Start block address to specify the extended attribute zone
0x30	16	`u8[]`	`uuid`	128-bit UUID for the volume
0x40	16	`u8[]`	`volume_name`	Filesystem label (not null-terminated if 16 bytes)
0x50	4	`u32`	`feature_incompat`	Incompatible feature flags; see Feature Flags
0x54	2	`u16`	`is_compressed`	0 for non-compressed images, any non-zero value for compressed images
0x56	4	`u32`	dontcare	External device support specific; ignored in core format
0x5A	1	`u8`	`dirblkbits`	Directory block size = `2^(blkszbits + dirblkbits)`; strictly 0 in the core format
0x5B	5	`u8[]`	dontcare	Xattr specific; ignored in core format
0x60	8	`u8[]`	dontcare	Compression specific; ignored in core format
0x68	1	`u8`	reserved	Reserved; must be 0
0x69	1	`u8`	dontcare	Xattr specific; ignored in core format
0x6A	2	`u16`	reserved	Reserved; must be 0
0x6C	12	`u8[]`	dontcare	48-bit layout specific; ignored in core format
0x78	8	`u64`	reserved	Reserved; must be 0

Note the difference between reserved and dontcare fields:

reserved: Users must not use these fields, and they must be filled with 0 to comply with the supported features or reserve for future use.
dontcare: Users can safely use these for other purposes as long as the corresponding incompatible feature flag is not set.

Magic Number#

The magic number at offset 0x00 must be 0xE0F5E1E2 (little-endian). A reader must reject any image whose first four bytes at offset 1024 do not match this value.

Superblock Checksum#

When EROFS_FEATURE_COMPAT_SB_CHKSUM is set, the checksum field contains a CRC32-C digest. The digest is computed over the byte range [1024, 1024 + block_size), with the four bytes of the checksum field itself treated as zero during computation.

For example, when blkszbits is 12 (block size is 4 KiB):

Offset

Size

Description

Checksum covered

0

1024

Padding

No

1024

4

Magic number

Yes

1028

4

Checksum field in superblock, filled with zero

Yes

1032

3064

Remaining bytes in the filesystem block

Yes

Tip: Some implementations (e.g., java.util.zip.CRC32C) apply a final bit-wise inversion. If the superblock checksum does not match, try inverting it.

Feature Flags#

`feature_compat` — Compatible Feature Flags#

A mount implementation that does not recognise a bit in feature_compat may still mount the filesystem without loss of correctness.

Bit mask	Name	Description
`0x00000001`	`EROFS_FEATURE_COMPAT_SB_CHKSUM`	Superblock CRC32-C checksum is present; see Superblock Checksum
`0x00000002`	`EROFS_FEATURE_COMPAT_MTIME`	Per-inode mtime is stored in extended inodes

Note

For new filesystem builders, it is recommended to always set EROFS_FEATURE_COMPAT_MTIME, since it indicates that all inode timestamps record modification time (mtime) rather than change time (ctime).

`feature_incompat` — Incompatible Feature Flags#

A runtime implementation that doesn’t implement any feature implied by a bit in feature_incompat must refuse to mount the entire filesystem.

The core on-disk format defines no incompatible feature flags. A non-zero feature_incompat value indicates one or more non-core feature extensions.

`blocks` and `inos` Fields#

The blocks and inos fields are primarily intended for statvfs(3) reporting. For dynamically generated EROFS filesystems, these fields can be set to 0.

Implementations should not use the blocks field to validate whether a block address or NID is valid. Such checks are unnecessary; malicious block addresses or NIDs will simply result in -EIO or reading corrupted (meta)data without causing any real harmful behaviors. Furthermore, a maliciously crafted image can easily bypass bounds checking by modifying the blocks field accordingly, making such validation meaningless.

Inodes#

Each on-disk inode must be aligned to a 32-byte inode slot boundary, which is set to be kept in line with the compact inode size. Given a NID nid, its inode can be located in O(1) time by computing the absolute byte offset as follows:

inode_offset = meta_blkaddr * block_size + 32 * nid

The NIDs for the root directory and special-purpose inodes are stored in the superblock. Valid inode sizes are either 32 bytes (compact) or 64 bytes (extended), distinguished by bit 0 of the i_format field.

Compact Inode (32 bytes)#

Defined as struct erofs_inode_compact:

Offset	Size	Type	Name	Description
0x00	2	`u16`	`i_format`	Inode format hints; see i_format Field
0x02	2	`u16`	reserved	Xattr specific; must be 0 if no xattrs
0x04	2	`u16`	`i_mode`	File type and permission bits
0x06	2	`u16`	`i_nlink`	Hard link count
0x08	4	`u32`	`i_size`	File size in bytes (32-bit)
0x0C	4	`u32`	reserved	48-bit layout specific; ignored in core format
0x10	4	`u32`	`i_u`	Union; see i_u Union
0x14	4	`u32`	`i_ino`	Inode serial number for 32-bit `stat(2)` compatibility
0x18	2	`u16`	`i_uid`	Owner UID (16-bit)
0x1A	2	`u16`	`i_gid`	Owner GID (16-bit)
0x1C	4	`u32`	reserved	Reserved; must be 0

Modification Time in Compact Inodes#

Due to space constraints, compact inodes cannot store a full 64-bit per-inode timestamp, let alone an additional nanosecond field. Consequently, when the 48-bit layout extension is unused, the effective timestamp for all compact inodes is (epoch, fixed_nsec), which has been the case since Linux 5.4.

Extended Inode (64 bytes)#

Defined as struct erofs_inode_extended:

Offset	Size	Type	Name	Description
0x00	2	`u16`	`i_format`	Inode format hints; see i_format Field
0x02	2	`u16`	reserved	Xattr specific; must be 0 if no xattrs
0x04	2	`u16`	`i_mode`	File type and permission bits
0x06	2	`u16`	reserved	Reserved; must be 0
0x08	8	`u64`	`i_size`	File size in bytes (64-bit)
0x10	4	`u32`	`i_u`	Union; see i_u Union
0x14	4	`u32`	`i_ino`	Inode serial number for 32-bit `stat(2)` compatibility
0x18	4	`u32`	`i_uid`	Owner UID (32-bit)
0x1C	4	`u32`	`i_gid`	Owner GID (32-bit)
0x20	8	`u64`	`i_mtime`	Modification time, seconds since UNIX epoch
0x28	4	`u32`	`i_mtime_nsec`	Nanoseconds component of `i_mtime`
0x2C	4	`u32`	`i_nlink`	Hard link count (32-bit)
0x30	16	`u8[]`	reserved	Reserved; must be 0

`i_format` Field#

The i_format field is present at offset 0x00 in both inode variants and encodes layout metadata:

Bits	Width	Description
0	1	Inode version: 0 = compact (32-byte), 1 = extended (64-byte)
1–3	3	Data layout: values 0–4 are defined; 5–7 are reserved. See Inode Data Layouts
4	1	48-bit layout specific; ignored in core format
5–15	11	Reserved; must be 0

Note

When bits 1–3 contain reserved values (5–7), the inode uses an unsupported data layout. Implementations must reject such inodes and return an appropriate error (e.g., “not supported”). This typically indicates a maliciously crafted or corrupted image.

`i_u` Union#

The i_u field (4 bytes at offset 0x10) is interpreted based on the data layout:

Name	Applicable when	Description
`i_u.startblk`	Flat inodes	Starting block number
`i_u.rdev`	Character/block device inodes	Device ID

Inode Data Layouts#

The data layout of an inode is encoded in bits 1–3 of i_format. The core format defines two flat layouts.

`EROFS_INODE_FLAT_PLAIN` (0)#

i_u is interpreted as startblk (the 32-bit starting block address).

The inode’s data lies in consecutive blocks starting from that address, occupying ceil(i_size / block_size) consecutive blocks.

`EROFS_INODE_FLAT_INLINE` (2)#

i_u is interpreted as startblk (the 32-bit starting block address).

The inode’s data lies in consecutive blocks starting from that address, except for the tail part (i_size % block_size) that is inlined in the block immediately following the inode metadata. If i_size is small enough that the entire content fits in the inline tail, there are no preceding blocks and i_u is a don’t-care field.

Note

This layout is not allowed if the tail inode data block cannot be inlined (e.g., if inlining the tail data would cause the inode to cross a physical block boundary).

Directories#

All on-disk directories are organized in the form of directory blocks of size 2^(blkszbits + dirblkbits). dirblkbits is strictly 0 for now.

Directory Block Structure#

Each directory block is divided into two contiguous regions:

An array of fixed-size directory entry records starting from the beginning of the block.
Variable-length filename strings following the directory entry array.

The nameoff field of the first entry in a block indicates the total number of directory entries in that block:

entry_count = nameoff[0] / sizeof(erofs_dirent)

All entries within a directory block, including . and .., are stored in strict lexicographic (byte-value ascending) order to enable an improved prefix binary search algorithm.

Directory Entry Record#

Defined as struct erofs_dirent:

Offset	Size	Type	Name	Description
0x00	8	`u64`	`nid`	Node number of the target inode
0x08	2	`u16`	`nameoff`	Byte offset of the filename within this directory block
0x0A	1	`u8`	`file_type`	File type code (see below)
0x0B	1	`u8`	reserved	Reserved; must be 0

`file_type` Values#

Value	Constant	POSIX type
0	`EROFS_FT_UNKNOWN`	Unknown
1	`EROFS_FT_REG_FILE`	Regular file
2	`EROFS_FT_DIR`	Directory
3	`EROFS_FT_CHRDEV`	Character device
4	`EROFS_FT_BLKDEV`	Block device
5	`EROFS_FT_FIFO`	FIFO
6	`EROFS_FT_SOCK`	Socket
7	`EROFS_FT_SYMLINK`	Symbolic link

Filename Encoding#

Filenames are not null-terminated (\0) except for the last one in each directory block. For each directory block, if the last filename doesn’t reach up to the end of the block, the remaining bytes must start with 0x00.

So the length of entry i is derived as:

For all entries except the last: nameoff[i+1] − nameoff[i].
For the last entry in the block: strnlen(filename, block_end − nameoff[last]).

No character encoding is mandated; UTF-8 is recommended.

Note

Other alternative forms (e.g., Eytzinger order) were also considered (that is why there was once .*_classic naming). Here are some reasons those forms were not supported:

Filenames are variable-sized strings, which makes Eytzinger order harder to utilize unless namehash is also introduced, but that complicates the overall implementation and expands directory sizes.
It is harder to keep filenames and directory entries in the same directory block (especially large directories) to minimize I/O amplification.
readdir(3) would be impacted too if strict alphabetical order were required.

If there are better ideas to resolve these, the on-disk definition could be updated in the future.

Offset	Size	Description	Checksum covered
0	1024	Padding	No
1024	4	Magic number	Yes
1028	4	Checksum field in superblock, filled with zero	Yes
1032	3064	Remaining bytes in the filesystem block	Yes

Core On-disk Format

Contents

Core On-disk Format#

Overview#

Conformance to Core Format#

Superblock#

Field Definitions#

Magic Number#

Superblock Checksum#

Feature Flags#

feature_compat — Compatible Feature Flags#

feature_incompat — Incompatible Feature Flags#

blocks and inos Fields#

Inodes#

Compact Inode (32 bytes)#

Modification Time in Compact Inodes#

Extended Inode (64 bytes)#

i_format Field#

i_u Union#

Inode Data Layouts#

EROFS_INODE_FLAT_PLAIN (0)#

EROFS_INODE_FLAT_INLINE (2)#

Directories#

Directory Block Structure#

Directory Entry Record#

file_type Values#

Filename Encoding#

`feature_compat` — Compatible Feature Flags#

`feature_incompat` — Incompatible Feature Flags#

`blocks` and `inos` Fields#

`i_format` Field#

`i_u` Union#

`EROFS_INODE_FLAT_PLAIN` (0)#

`EROFS_INODE_FLAT_INLINE` (2)#

`file_type` Values#