Back to Transparency

Dieharder Test Protocol

How Eormen tests 1 GiB entropy blocks using the Dieharder statistical test suite, including the full methodology, test selection rationale, and technical implementation details.

This documentation is published by Eormen and reproduced here in full. Eormen generates and certifies entropy blocks independently. ScirDom publishes this documentation so that anyone can understand the testing that was done before an entropy block was activated.

Overview

This document covers the comprehensive statistical test results from the Dieharder randomness test suite, as applied to 1 GiB blocks of entropy data generated by the Eormen entropy system. These results provide rigorous mathematical validation of the randomness quality of the entropy blocks using one of the most respected statistical test suites in the field.

What is Dieharder Testing?

The Dieharder test suite, developed by Robert G. Brown at Duke University, represents a comprehensive collection of statistical tests for evaluating random number generators and entropy sources. Building upon the original Diehard tests created by George Marsaglia, Dieharder provides a modernised and extended test battery that examines multiple aspects of randomness through sophisticated statistical analysis.

For cryptographic and security applications, rigorous randomness validation is essential. Any patterns, correlations, or biases in entropy data can create vulnerabilities that compromise system security. The Dieharder test suite provides mathematical assurance that the entropy data exhibits the statistical properties expected from truly random sequences.

The suite encompasses 29 different statistical tests, each designed to detect specific types of non-random patterns that might exist in data. These tests examine everything from basic frequency distributions to complex mathematical relationships, spatial correlations, and information-theoretic properties.

Software Provenance

The Dieharder software used in this validation was obtained from the official distribution maintained by Robert G. Brown at Duke University (webhome.phy.duke.edu/~rgb/General/dieharder.php). This ensures Eormen utilised the authoritative version of the test suite, maintaining the highest standards of software integrity and academic credibility.

Detail	Value
Source	Duke University Physics Department
Maintainer	Robert G. Brown
Package File	dieharder-2.24.7-1.i386.rpm
Installed Version	3.31.1 (as reported by the software)
Official URL	webhome.phy.duke.edu/~rgb/General/dieharder.php

The package filename reflects historical versioning whilst the installed software reports as version 3.31.1. This is the version used for all testing documented here. Using the official distribution guarantees that the test implementation follows the exact mathematical specifications and methodologies established by the original developers.

EORM Block Structure

EORM entropy blocks consist of two components:

Entropy Data: 1,073,741,824 bytes (exactly 1 GiB) of random data
Metadata: 64 bytes containing generation information
Total File Size: 1,073,741,888 bytes

The Dieharder tests are applied only to the entropy data portion, excluding the metadata. This ensures that the randomness validation focuses purely on the generated entropy without influence from structured metadata.

Metadata field	Byte offset	Size	Description
Nonce	0–15	16 bytes	Unique identifier for this generation session
Timestamp	16–23	8 bytes (little-endian)	Generation time
Filename	24–55	32 bytes (UTF-8)	Original filename
File size	56–63	8 bytes (little-endian)	Total file size

The 1 GiB Testing Challenge

Standard Dieharder implementations are typically designed for smaller data files or continuous streams from random number generators. Testing 1 GiB blocks (containing over 8.5 billion bits) presents unique challenges that required careful consideration and systematic solutions.

The Core Problem

Many Dieharder tests were designed assuming either unlimited data streams or much smaller finite files. When applied to 1 GiB files, some tests can consume data faster than the file can provide, leading to excessive file rewinding, data contamination from reusing the same data for multiple statistical measurements, and compromised test independence caused by statistical relationships between tests due to shared data.

File Rewinding Impact

When a test requires more data than is available in the file, Dieharder automatically rewinds to the beginning and continues reading. Whilst this allows tests to complete, it introduces problems: tests assume independent data sources; patterns from earlier in the file may influence later measurements; and results may not accurately reflect randomness properties.

The Scale Challenge

A 1 GiB entropy block represents a substantial amount of data, but some tests in the complete Dieharder suite were designed for continuous streams or much larger datasets. These data-hungry tests can easily consume multiple gigabytes of data when run with default parameters.

Test Selection Methodology

To address the 1 GiB testing challenge whilst maintaining statistical rigour, Eormen developed a systematic approach based on clear, defensible criteria. The methodology prioritises official test reliability ratings whilst ensuring practical feasibility for finite data files.

Primary Criterion: Official Reliability Ratings

The Dieharder developers have assigned reliability ratings to each test based on extensive research and validation:

“Good”: Tests that are statistically sound and provide reliable randomness assessment.
“Suspect”: Tests with known issues or questionable statistical validity.
“Do Not Use”: Tests that are fundamentally flawed or produce unreliable results.

Eormen's methodology gives absolute priority to these official ratings, as they represent the accumulated expertise of the academic randomness testing community.

Secondary Criterion: Data Efficiency

Among the “Good”-rated tests, each test's data consumption patterns were evaluated to identify those suitable for 1 GiB files, considering data requirements, rewinding behaviour, and whether tests provide unique statistical coverage or duplicate existing analysis.

Optimisation Philosophy

Include all suitable “Good” tests: No arbitrary exclusions based on convenience.
Exclude only when necessary: Clear justification for any exclusion.
Maintain statistical independence: Preserve the validity of individual tests.
Transparent decision-making: Document all rationale for reproducibility.

Test Selection Rationale

Included: Core Diehard Tests (Tests 0–4, 8–13, 15–17)

All rated “Good” by the Dieharder developers. Designed for finite data sources and complete within 1 GiB constraints. These tests form the backbone of the validation suite, examining collision patterns, matrix rank properties, bit-level pattern analysis, spatial distribution, information-theoretic measures, sequential pattern analysis, and number-theoretic relationships.

Excluded: Suspect Tests (Tests 5–7)

Tests 5–7: Diehard OPSO, OQSO, and DNA tests carry an official rating of “Suspect”. The Dieharder developers have identified statistical issues with these tests that make their results unreliable. Including tests known to be problematic would compromise the integrity of the validation suite.

Excluded: Problematic Test (Test 14)

Test 14: Diehard Sums Test carries an official rating of “Do Not Use”. This test is fundamentally flawed and should never be used for randomness assessment, according to the official documentation.

Excluded: Data-Intensive “Good” Tests (Tests 100–102, 200–209)

Despite being rated “Good”, seven tests were excluded due to excessive data requirements:

STS Tests (100–102): Tests 100–101 largely duplicate analysis provided by included tests. Test 102 requires excessive data, causing significant file rewinding.
RGB Tests (200–205): Created for continuous generator streams, not finite files. Extremely data-hungry, causing extensive file rewinding.
DAB Tests (206–209): Designed for much larger datasets. Would require significant file rewinding for completion.

Parameter Optimisation

Reduced p-value samples: Changed from default 100 to 20 samples per test. This maintains adequate statistical power whilst reducing data consumption by 80%, significantly reducing file rewinding across all tests. Slightly reduced statistical precision, but still statistically valid.

The 14 Selected Tests

The validation suite includes 14 carefully selected tests providing comprehensive coverage of randomness properties whilst respecting 1 GiB file constraints. Select any test to see the full details.

What it measures: Collision patterns in random sequences, based on the birthday paradox.
How it works: Examines whether birthdays (represented as random integers) in groups show the expected collision frequency. The test divides data into groups and counts collisions, comparing results to theoretical expectations for random data.
Technical parameters: ntup=0
Why it matters: Collision analysis is fundamental to cryptographic applications. Non-random data often exhibits unexpected collision patterns that this test can detect.
What PASSED means: The entropy exhibits appropriate collision patterns consistent with random data, with no unexpected clustering or avoidance of collisions.

What it measures: Permutation patterns in sequences of 5 elements.
How it works: Analyses whether the 120 possible permutations of 5 elements occur with equal frequency in the data. Each permutation should appear with probability 1/120 in truly random sequences.
Technical parameters: ntup=0
Why it matters: Ordering bias can indicate subtle patterns in entropy generation that might not be detected by simpler frequency tests.
What PASSED means: No bias towards particular ordering patterns in the entropy data, confirming that sequential relationships appear random.

What it measures: Mathematical rank properties of 32×32 binary matrices formed from the data.
How it works: Creates matrices from consecutive bits and calculates their rank over GF(2) (Galois Field). The distribution of ranks is compared to theoretical expectations for random binary matrices.
Technical parameters: ntup=0
Why it matters: Matrix rank analysis can detect linear dependencies and structural patterns that might not be apparent in other tests.
What PASSED means: The data exhibits the expected linear algebra properties of random binary matrices, with no detectable linear dependencies.

What it measures: Rank properties of smaller 6×8 binary matrices, complementing Test 2.
How it works: Analyses rank distribution of smaller matrices for different scale validation. This provides analysis at a different resolution than the 32×32 test.
Technical parameters: ntup=0
Why it matters: Testing multiple matrix sizes ensures that linear dependencies are not missed due to scale effects.
What PASSED means: Confirms randomness properties at a different matrix scale, providing additional confidence in linear independence.

What it measures: Overlapping bit patterns within the data stream.
How it works: Examines specific overlapping bit sequences for unexpected patterns, focusing on the frequency of particular bit combinations as they overlap.
Technical parameters: ntup=0
Why it matters: Overlapping pattern analysis can detect subtler correlations than non-overlapping pattern tests.
What PASSED means: No problematic bit-level patterns detected in the entropy, confirming appropriate bit-level randomness.

What it measures: Distribution of 1-bits in consecutive bit streams.
How it works: Counts 1-bits in overlapping windows of specific sizes and tests whether the distribution matches theoretical expectations for random data.
Technical parameters: ntup=0
Why it matters: Bit frequency distribution is fundamental to randomness assessment and can reveal bias in bit generation.
What PASSED means: Appropriate distribution of 1-bits throughout the data streams, confirming balanced bit generation.

What it measures: Distribution of 1-bits within individual bytes.
How it works: Analyses how many 1-bits appear in each byte value (0–8 ones per byte) and compares the distribution to theoretical expectations.
Technical parameters: ntup=0
Why it matters: Byte-level analysis can detect patterns that might be masked when analysing larger data blocks.
What PASSED means: Byte-level bit distribution matches random expectations, confirming appropriate entropy at the byte scale.

What it measures: 2D spatial distribution patterns using a parking lot analogy.
How it works: Places “cars” (data points) randomly in a 2D space and measures parking success rates. The test simulates trying to park cars of specific sizes and counts successful placements.
Technical parameters: ntup=0
Why it matters: Spatial distribution tests can detect clustering patterns that other tests might miss.
What PASSED means: The entropy exhibits appropriate 2D spatial randomness properties, with no unexpected clustering or avoidance patterns.

What it measures: Distribution of points within a 2D circular space.
How it works: Maps data to 2D coordinates within a unit circle and analyses distribution uniformity, examining whether points are appropriately distributed throughout the 2D circular space.
Technical parameters: ntup=2
Why it matters: Circular spatial analysis can detect patterns that rectangular spatial tests might miss.
What PASSED means: No unexpected clustering in 2D circular representation of the data, confirming appropriate spatial distribution.

What it measures: 3D spatial distribution within a unit sphere.
How it works: Maps data points to 3D coordinates within a sphere and tests distribution uniformity, examining whether points are appropriately distributed throughout the 3D space.
Technical parameters: ntup=3
Why it matters: Higher-dimensional spatial analysis can detect patterns that might not be apparent in 2D tests.
What PASSED means: Appropriate 3D spatial distribution properties confirmed, demonstrating randomness in higher-dimensional space.

What it measures: Compressibility and information-theoretic properties.
How it works: Uses a mathematical “squeeze” operation to test for hidden patterns that might make data compressible. The test examines how much data can be “compressed” using specific mathematical operations.
Technical parameters: ntup=0
Why it matters: Truly random data should be incompressible. Any compressibility suggests the presence of patterns.
What PASSED means: The entropy data resists compression as expected for random sequences, confirming high information content.

What it measures: Consecutive identical bit patterns (runs of 0s and 1s).
How it works: Counts run lengths of consecutive identical bits and compares the distribution to theoretical expectations for random bit sequences.
Technical parameters: ntup=0
Why it matters: Run-length analysis can detect bias towards longer or shorter sequences of identical bits.
What PASSED means: Run-length patterns match those expected from random bit sequences, confirming appropriate bit transition behaviour.
Note: This test produces 2 results: one for runs of 0s and one for runs of 1s.

What it measures: Complex sequence analysis using dice game simulation.
How it works: Simulates craps games using the entropy data and analyses win/loss patterns and the number of throws required to reach decisions.
Technical parameters: ntup=0
Why it matters: Game simulation tests complex sequential relationships that individual statistical tests might not detect.
What PASSED means: Complex sequential patterns behave as expected for random data, confirming sophisticated randomness properties.
Note: This test produces 2 results (wins and throws to decision). During execution, this test typically causes one file rewind.

What it measures: Greatest common divisor patterns in integer sequences.
How it works: Applies number theory analysis to detect mathematical patterns by examining the GCD relationships between pairs of integers derived from the data.
Technical parameters: ntup=0
Why it matters: Number-theoretic analysis can detect mathematical relationships that other statistical tests might miss.
What PASSED means: No unexpected mathematical relationships in the entropy data, confirming randomness from a number theory perspective.
Note: This test produces 2 results covering different GCD analyses.

Complete Test Suite Inventory

All 29 Dieharder tests with Eormen's inclusion and exclusion decisions. 14 included (shown in green); 15 excluded (shown in red) with rationale.

Test ID	Test Name	Official Rating	Included	Rationale
Original Diehard Tests
0	Diehard Birthdays Test	Good	✓ Yes	Core randomness test, efficient for 1 GiB
1	Diehard OPERM5 Test	Good	✓ Yes	Permutation analysis, suitable data usage
2	Diehard 32×32 Binary Rank Test	Good	✓ Yes	Matrix rank analysis, efficient implementation
3	Diehard 6×8 Binary Rank Test	Good	✓ Yes	Complementary matrix analysis
4	Diehard Bitstream Test	Good	✓ Yes	Bit pattern analysis, reasonable data usage
5	Diehard OPSO Test	Suspect	✗ No	Official rating “Suspect”
6	Diehard OQSO Test	Suspect	✗ No	Official rating “Suspect”
7	Diehard DNA Test	Suspect	✗ No	Official rating “Suspect”
8	Diehard Count the 1s (stream) Test	Good	✓ Yes	Fundamental bit analysis, efficient
9	Diehard Count the 1s Test (byte)	Good	✓ Yes	Byte-level analysis, minimal data usage
10	Diehard Parking Lot Test	Good	✓ Yes	Spatial analysis, suitable for 1 GiB
11	Diehard 2D Sphere Test	Good	✓ Yes	2D circular spatial analysis, efficient
12	Diehard 3D Sphere Test	Good	✓ Yes	3D spatial analysis, manageable data usage
13	Diehard Squeeze Test	Good	✓ Yes	Information theory, efficient implementation
14	Diehard Sums Test	Do Not Use	✗ No	Official rating “Do Not Use”
15	Diehard Runs Test	Good	✓ Yes	Run analysis, fundamental test
16	Diehard Craps Test	Good	✓ Yes	Sequential analysis, reasonable data usage
17	Marsaglia and Tsang GCD Test	Good	✓ Yes	Number theory, efficient for 1 GiB
STS (Statistical Test Suite) Tests
100	STS Monobit Test	Good	✗ No	Duplicates analysis in included tests
101	STS Runs Test	Good	✗ No	Duplicates Test 15 analysis
102	STS Serial Test (Generalised)	Good	✗ No	Excessive data consumption for 1 GiB
RGB (Robert G. Brown) Tests
200	RGB Bit Distribution Test	Good	✗ No	Extremely data-intensive, causes rewinding
201	RGB Generalised Minimum Distance Test	Good	✗ No	Data-hungry, designed for continuous streams
202	RGB Permutations Test	Good	✗ No	Excessive data requirements
203	RGB Lagged Sum Test	Good	✗ No	Data-intensive, rewinding risk
204	RGB Kolmogorov-Smirnov Test	Good	✗ No	High data consumption
205	Byte Distribution	Good	✗ No	Data-hungry implementation
DAB (Data Analysis Battery) Tests
206	DAB DCT	Good	✗ No	Designed for larger datasets
207	DAB Fill Tree Test	Good	✗ No	Excessive data requirements
208	DAB Fill Tree 2 Test	Good	✗ No	High data consumption
209	DAB Monobit 2 Test	Good	✗ No	Data-intensive for 1 GiB files

Total available tests

Tests included

67% of all “Good”-rated tests

Individual results produced

Some tests produce multiple p-values

Technical Implementation

Parameters Used

Each test execution uses the following carefully optimised parameters:

-d [test_number]  # Specific test identifier
-g 201           # file_input_raw generator (optimal for binary files)
-f [filename]    # Path to the entropy block file
-p 20           # 20 p-value samples (balanced for statistical power and data conservation)

Generator 201 (file_input_raw)

Specifically designed for binary entropy files, providing direct access to raw data without unnecessary transformations.

20 P-value Samples

Reduced from the default 100 to conserve data whilst maintaining adequate statistical power for reliable results. This prevents excessive file rewinding that could compromise test independence.

Individual Test Execution

Each test runs separately to provide detailed individual results and prevent interference between tests.

Understanding the Test Results

Result Categories

PASSED: Test indicates excellent randomness properties.
WEAK: Borderline result, not necessarily problematic but worth noting. Occasional weak results are not necessarily cause for concern.
FAILED: Test suggests potential non-random patterns requiring investigation.

Expected Performance

Runtime: Approximately 15–30 minutes for the complete 14-test suite.
Total results: 17 individual test results (some tests produce multiple p-values).
File rewinds: Typically 1 rewind (during the diehard_craps test).
Data usage: Efficient usage of the 1 GiB block with minimal repetition.

Overall Assessment Guidelines

All PASSED: Excellent randomness properties confirmed.
Mostly PASSED with occasional WEAK: Good randomness with minor variations.
Multiple FAILED: Concerning results requiring investigation of entropy source.

Strengths and Limitations

✓Systematic test selection: All practical “Good”-rated tests from the complete 29-test suite.

✓Official compliance: Uses only tests approved by Dieharder developers.

✓Comprehensive coverage: 14 tests examine all fundamental randomness properties.

✓Statistical independence: Minimal file rewinding preserves test validity.

✓Transparent methodology: Complete disclosure of test selection rationale.

✓Academic credibility: Official Duke University Dieharder distribution.

✗7 “Good” tests excluded: STS, RGB, and DAB tests omitted due to excessive data requirements.

✗Reduced samples: 20 p-values per test (vs. 100 default) to conserve data.

✗Finite block testing: Results are specific to the tested block, not predictive of future generations.

File Authenticity

The results file includes cryptographic verification using a three-tier hashing system:

Nonce: Links results to the specific EORM generation session.
Data-only SHA-256: Verifies the entropy data independently. Remains constant for reproductions.
Complete file SHA-256: Verifies the entire file has not been modified.
Metadata-only SHA-256: Verifies the metadata section.
Timestamp: Unix timestamp confirming when the entropy was generated.

This three-tier approach ensures both data integrity and proper attribution to the original generation session. The data-only hash remains constant for reproductions, whilst the complete file hash verifies the entire file has not been modified.