Enterprise Storage Stack

Home / Documentation / Linux Installation

ESS has several modes of operation. The mode is controlled by how the backing device is formatted with 'fbd-maint format ...'. The mode can also require additional setup in the startup script.

Linear Mode

In linear mode, ESS remaps blocks to optimize write performance. Because writes are linear to the media regardless of the write IO pattern, linear mode is an excellent solution for Flash media. Linear mode reduces flash wear and improves write performance.

Compressed Mode

Compressed mode is similar to linear mode except that 4K blocks are compressed using a lossless compression function. Compression can be used to improve performance, lower flash wear, or increase the amount of data that can be stored on the device.

Deduplication Mode

Deduplication mode is a more complicated storage model where each 4K block is stored based on the contents of the block. If two or more 4K blocks are stored to different logical block addresses, then only a single copy of the 4K block is actually stored on the media. Deduplication is an ideal solution for applications that store multiple copies of identical blocks. System backups and copies of virtual server images (such as VDI) are ideal candidates for deduplicaton.

Deduplication and Compression Mode

This mode combines the compression and deduplication functions on a single volume. Each unique 4K block is compressed and only stored once. This mode yields the highest degree of space savings.

Generic Startup / Shutdown Process

This example will involve a single RAID array of SSDs managed with Linux software RAID. At a higher level, the resulting storage is configured with Linux Volume Manager (LVM2) which creates logical volumes that are then exported as block devices over iSCSI, formatted with a file system, or otherwise used by local and remote applications.

1) Setting up the Hardware SSDs

ESS uses standard "disk drives". With Linux, these usually appear as scsi block devices denoted as '/dev/sda', '/dev/sdb', etc.

"Linux Standard Practices" suggest that SSDs should always be partitioned and that these partitions should be combined into a single RAID device. So you should create a partition on each SSD as type "linux raid". The partition should start at a "natural boundary" of the SSD flash. This usually involves starting the partition at sector 4096.

2) Setting up the Linux Software RAID device

Assuming you are using RAID-5, you do this with:

mdadm --create /dev/md0 --level=5 --raid-devices=23 --spare-devices=1 --chunk=64 --meta=1.0 /dev/sd[b-z]1

This will create /dev/md0

3) Format the array for use with ESS

This is done once to setup geometry to ESS operation.

Linear mode:

/usr/local/ess/fbd-maint format /dev/md0 --wtblksz=$((22*2048)) -q

Compressed mode:

/usr/local/ess/fbd-maint format /dev/md0 --wtblksz=$((22*2048)) --comp=50 -q

Deduplication mode:

Deduplication and Compressed mode:

Use the /usr/local/ess/dedupe-comp-fmt.sh script

4) Mount the ESS block device

Linear and Compressed mode:

Use the script

/usr/local/ess/init.d/fbd1 start

Deduplication mode

Deduplication and Compressed mode:

Use the script

/usr/local/ess/init.d/fbd1-comp-dd start

5) Setup LVM2

First create the volume group

pvcreate /dev/mapper/fbd1
vgcreate ess /dev/mapper/fbd1

Then create logical volumes

lvcreate -L 50G -n vol1 ess

6) Format File Systems etc.

Once volume manager is running you have standard logical volumes that you can use

Setup on Reboot

When the system is rebooted, you need to:

a) Start the /dev/md0 RAID array
b) Mount the ess device with the /usr/local/ess/init.d/fbd1... start script
c) Start LVM with 'vgchange -a yes'
d) Mount and/or export logical volumes

Setup for Shutdown

On shutdown, you need to do these steps backwards:

a) Unmount and stop exporting logical volumes
b) Stop LVM with 'vgchange ess -a no'
c) Unmount the ess device with the /usr/local/ess/init.d/fbd1... stop script
d) Stop the /dev/md0 RAID array

Memory Usage

Linear Mode Memory Usage

In linear mode, there is a single, RAM resident, lookup table. This table is either 4 bytes/entry or 5 bytes/entry depending on the device size.

For a device up to 4TB in size, you need 1 GB/TB of usable space.
For a device up to 1PB in size, yo uneed 1.25 GB/TB of usable space.

Compressed Mode Memory Usage

In compressed mode, there is a single, RAM resident, lookup table. This table is 4, 5, or 6 bytes/entry depending on the device size.

For a device up to 64GB in size, you need 1 GB/TB of usable space.
For a device up to 16TB in size, you need 1.25 GB/TB of usable space.
For a device up to 4PB in size, you need 1.5 GB/TB of usable space.

Deduplication Mode Memory Usage

De-dupe places signifigant demands on system memory. The act of looking up and verifying that a block is a duplicate requiers both complicated processing and system resources. In terms of memory, the RAM required to direct map de-dupe is several times larger than linear mode. In addition, data reduction makes the device appear much larger than the physical size, further increasing the memory footprint. To accomodate this, de-dupe mode is only practical in combination with "virtual memory". Virtual memory involves using flash storage and RAM cooperatively so that the RAM appears much larger than actual system memory size. Unfortunately, for best performance, some of the highest speed designs require signifigant real system RAM to run.

Deduplication and Compressed Mode Memory Usage

In DeDupe/Comp mode, each 4K block is not only checked against all other 4K blocks stored on the device, it is also compressed. This compression further increases the data reduction of your storage device.

In this mode, there are two "run-time" styles available.

a) Large memory. Enough memory for a full-sizes lookup hash table for physical storage locations
b) Sparse memory. For very low memory systems (mostly HDD based systems) using a different lookup function with less performance.

The large memory model requires a linear "hash table" resident in RAM. This table is sized as:

Take the size of the device as a count of 4K blocks.
Multiply this size by 4 to handle compression
Round this number down to the nearest power of 2
Optionally divide this number by 2, 4, or 8 if memory is really tight

If the target size is less than 2B blocks (2TB), then each table entry is 32 bits (4 bytes)
If the target size is less than 8B blocks (8TB), then each table entry is 34 bits (4.25 bytes)
If the target size is less than 32B blocks (32TB), then each table entry is 36 bits (4.5 bytes)
If the target size is less than 128B blocks (128TB) then each table entry is 38 bits (4.75 bytes)

For example, if you have a 12 TB device, this is 2.9B 4K blocks or 11.7B effective blocks after compression
11.7B target blocks requires 36 bit table entries
Rounded down this is 8B binary entries or 8B * 36 / 8 = 36 GB of RAM
If necessary, this table can be intentionally undersized as 18 GB or 9GB.
Undersized tables will lower performance, especially on full arrays.

The large memory model also requires a "cache table" to cache entries in the logical block and physical block lookup tables. It is recommended that you allocate from 2GB to 16GB for these tables. The more diverse your application workload, the larger this table should be.

The de-dupe logic can cache some read requests. This can improve the write performance of heavily duplicated writes. Ideally, you want this cache size to be as large as the common files for the user workload. For example, hosting 1000 VDI instanaces might want 2GB to store all of the common Windows files shared across the virtual instanances.

Finally, you can optionally allocate RAM as a general read cache. With arrays with large SSD counts, this is usually not needed. If you have a small number of slower SSDs, this can help read performance.