Deployment Overview

The HPE GreenLake for File Storage Data Reduction Estimation Probe provides estimated data reduction rate achieable based on an example data set. Make sure to review the prerequisites to understand the hardware and software requirements to successfully run the probe. This article will guide you through the process of deployment and execution of the probe.

Download

Download using sftp to the Linux client that you wish to run the probe.

sftp gl4f_probe@halo.storagelr5.ext.hpe.com:/935553.probe.bundle.tar.gz .

Type in password when prompted: HPE@cc3$$4SFTP

If sftp port has changed from the default value '22', use the following command:

sftp -P 22 gl4f_probe@halo.storagelr5.ext.hpe.com:/935553.probe.bundle.tar.gz .

Expand & Verify Download

Now that you've downloaded the probe, you'll need to untar it and then verify the download is correct.

export PROBE_BUILD=935553
tar -xzf ${PROBE_BUILD}.probe.bundle.tar.gz
ls -l

Note: example may not show current build numbers.

[root@iris-centos-workloadclient-22 probe]# ls -l
total 1840344
-rw-r--r--. 1 root root 937920831 Jul 12 12:44 935553.probe.bundle.tar.gz
-rw-r--r--. 1 root root 946565338 Jul 12 12:44 935553.probe.image.gz
-rwxr-xr-x. 1 root root     19579 Jul 12 12:44 probe_launcher.py

Mount Filesystems Selected to Be Probed

Validated Filesystems Include, But Are Not Limited To:

  • NFS
  • Lustre
  • GPFS
  • S3 with goofys
  • CIFS/SMB

For the most accurate results, do not use root-squash.

It's recommended to set read-only access on the mounted filesystem

Create Probe Directories

Change /mnt/ to the SSD-backed local disk to be used by the probe for the hash database and logging directories

sudo mkdir -p /mnt/probe/db
sudo mkdir -p /mnt/probe/out
sudo chmod -Rf 777 /mnt/probe

Size of the Data Set

  • The input to the probe is a defined directory (--input-dir)
  • The probe will automatically query the input filesystem about space consumed and file count (inodes) and use that in its calculations
  • Depending on the method of mounting and underlying storage, this can often provide an inaccurate query response
  • It's highly recommended that manual estimated entries be defined for space consumed (--data-size-gb) and file count (--number-of-files)
  • These estimates do not have to be accurate, round up reasonably

Running The Probe

The probe runs as a foreground application. This means that if your session is closed for whatever reason, the probe will stop. It's recommended running the probe as a screen session.

Here is an example of a command line. Edit the bold variables for the environment:

NOTE: Use underscores instead of spaces in COMPANY_NAME and WORKLOAD

export DB_DIR=/mnt/probe/db
export OUTPUT_DIR=/mnt/probe/out
export INPUT_DIR=/mnt/filesystem_to_be_probed/sub_directory
export INPUT_SIZE_GB=10000
export QTY_FILES=1000000
export COMPANY_NAME=Your_Amazing_Company
export WORKLOAD=Describe_Your_Workload

Start the probe: (This may take up to five minutes to start displaying output)

sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir $INPUT_DIR \
--metadata-dir $DB_DIR \
--output-dir $OUTPUT_DIR \
--data-size-gb $INPUT_SIZE_GB \
--number-of-files $QTY_FILES \
--customer-name ${COMPANY_NAME}---${WORKLOAD}

Example One: Small Data Sets To probe the directory interesting_data of 15 TB in-use and 5,000,000 files at the company ACME, the command would be:

sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir /mnt/acme_filer/interesting_data \
--metadata-dir /mnt/data/probe/db \
--output-dir /mnt/data/probe/out \
--data-size-gb 15000 \
--number-of-files 5000000 \
--customer-name ACME---Interesting_Data

Example Two: Larger Data Sets To probe the directory fascinating_data of 60 TB in-use and 750,000,000 files at the company FOO, and are using defined parameters for RAM and SSD-backed local disk the command would be:

sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir /mnt/foo_filer/fascinating_data \
--metadata-dir /mnt/data/probe/db \
--output-dir /mnt/data/probe/out \
--data-size-gb 60000 \
--number-of-files 750000000 \
--customer-name FOO---Facinating_Data

Example Three: Performance Throttling To probe the directory riviting_data of 250 TB in-use and 1,250,000,000 files at the company Initech, using defined parameters for RAM and SSD-backed local disk, but wish to have a lower performance impact on the filesystem, the command would be:

sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir /mnt/initech_filer/riviting_data \
--metadata-dir /mnt/data/probe/db \
--output-dir /mnt/data/probe/out \
--data-size-gb 250000 \
--number-of-files 1250000000 \
--number-of-threads 4
--customer-name Initech---Riviting_Data

Note the --number-of-threads flag. By default the probe will use all CPU cores in the system but this can be used to throttle performance and reduce potential impact of the scanned filesystem.

Other Probe Flags

While the probe is running and after completion, telemetry logs are automatically uploaded to HPE. To prevent this, add the following flag:

--dont-send-logs \

If you wish to send file names with the default telemetry logs, add the following flag:

--send-logs-with-file-names \

Probing filesystems which contain snapshots can often cause recursion issues and inaccurate results. As a result the probe automatically ignores directories named .snapshot. If your file system uses another convention, use the --regexp-filter command. If for some reason you want the probe to read the .snapshot directories, specify false rather than true for --filter-snapshots.

--filter-snapshots \    (this is the default)

Under most circumstances the probe should be run with adaptive chunking. However you can disable that feature by specifying this flag:

--disable-adaptive-chunking \

Understanding the Results

Once started, the probe will display the current projection of potential data reduction. Once completed, the probe will display output and is further described in Understanding Output

Re-Running The Probe

The hash database must be empty before running the probe again:

sudo rm -r /mnt/probe/db/*

Troubleshooting

Refer to the Troubleshooting document and contact HPE Support.