Dieharder Test Protocol
How Eormen tests 1 GiB entropy blocks using the Dieharder statistical test suite, including the full methodology, test selection rationale, and technical implementation details.
Overview
This document covers the comprehensive statistical test results from the Dieharder randomness test suite, as applied to 1 GiB blocks of entropy data generated by the Eormen entropy system. These results provide rigorous mathematical validation of the randomness quality of the entropy blocks using one of the most respected statistical test suites in the field.
What is Dieharder Testing?
The Dieharder test suite, developed by Robert G. Brown at Duke University, represents a comprehensive collection of statistical tests for evaluating random number generators and entropy sources. Building upon the original Diehard tests created by George Marsaglia, Dieharder provides a modernised and extended test battery that examines multiple aspects of randomness through sophisticated statistical analysis.
For cryptographic and security applications, rigorous randomness validation is essential. Any patterns, correlations, or biases in entropy data can create vulnerabilities that compromise system security. The Dieharder test suite provides mathematical assurance that the entropy data exhibits the statistical properties expected from truly random sequences.
The suite encompasses 29 different statistical tests, each designed to detect specific types of non-random patterns that might exist in data. These tests examine everything from basic frequency distributions to complex mathematical relationships, spatial correlations, and information-theoretic properties.
Software Provenance
The Dieharder software used in this validation was obtained from the official distribution maintained by Robert G. Brown at Duke University (webhome.phy.duke.edu/~rgb/General/dieharder.php). This ensures Eormen utilised the authoritative version of the test suite, maintaining the highest standards of software integrity and academic credibility.
| Detail | Value |
|---|---|
| Source | Duke University Physics Department |
| Maintainer | Robert G. Brown |
| Package File | dieharder-2.24.7-1.i386.rpm |
| Installed Version | 3.31.1 (as reported by the software) |
| Official URL | webhome.phy.duke.edu/~rgb/General/dieharder.php |
The package filename reflects historical versioning whilst the installed software reports as version 3.31.1. This is the version used for all testing documented here. Using the official distribution guarantees that the test implementation follows the exact mathematical specifications and methodologies established by the original developers.
EORM Block Structure
EORM entropy blocks consist of two components:
- Entropy Data: 1,073,741,824 bytes (exactly 1 GiB) of random data
- Metadata: 64 bytes containing generation information
- Total File Size: 1,073,741,888 bytes
The Dieharder tests are applied only to the entropy data portion, excluding the metadata. This ensures that the randomness validation focuses purely on the generated entropy without influence from structured metadata.
| Metadata field | Byte offset | Size | Description |
|---|---|---|---|
| Nonce | 0–15 | 16 bytes | Unique identifier for this generation session |
| Timestamp | 16–23 | 8 bytes (little-endian) | Generation time |
| Filename | 24–55 | 32 bytes (UTF-8) | Original filename |
| File size | 56–63 | 8 bytes (little-endian) | Total file size |
The 1 GiB Testing Challenge
Standard Dieharder implementations are typically designed for smaller data files or continuous streams from random number generators. Testing 1 GiB blocks (containing over 8.5 billion bits) presents unique challenges that required careful consideration and systematic solutions.
The Core Problem
Many Dieharder tests were designed assuming either unlimited data streams or much smaller finite files. When applied to 1 GiB files, some tests can consume data faster than the file can provide, leading to excessive file rewinding, data contamination from reusing the same data for multiple statistical measurements, and compromised test independence caused by statistical relationships between tests due to shared data.
File Rewinding Impact
When a test requires more data than is available in the file, Dieharder automatically rewinds to the beginning and continues reading. Whilst this allows tests to complete, it introduces problems: tests assume independent data sources; patterns from earlier in the file may influence later measurements; and results may not accurately reflect randomness properties.
The Scale Challenge
A 1 GiB entropy block represents a substantial amount of data, but some tests in the complete Dieharder suite were designed for continuous streams or much larger datasets. These data-hungry tests can easily consume multiple gigabytes of data when run with default parameters.
Test Selection Methodology
To address the 1 GiB testing challenge whilst maintaining statistical rigour, Eormen developed a systematic approach based on clear, defensible criteria. The methodology prioritises official test reliability ratings whilst ensuring practical feasibility for finite data files.
Primary Criterion: Official Reliability Ratings
The Dieharder developers have assigned reliability ratings to each test based on extensive research and validation:
- “Good”: Tests that are statistically sound and provide reliable randomness assessment.
- “Suspect”: Tests with known issues or questionable statistical validity.
- “Do Not Use”: Tests that are fundamentally flawed or produce unreliable results.
Eormen's methodology gives absolute priority to these official ratings, as they represent the accumulated expertise of the academic randomness testing community.
Secondary Criterion: Data Efficiency
Among the “Good”-rated tests, each test's data consumption patterns were evaluated to identify those suitable for 1 GiB files, considering data requirements, rewinding behaviour, and whether tests provide unique statistical coverage or duplicate existing analysis.
Optimisation Philosophy
- Include all suitable “Good” tests: No arbitrary exclusions based on convenience.
- Exclude only when necessary: Clear justification for any exclusion.
- Maintain statistical independence: Preserve the validity of individual tests.
- Transparent decision-making: Document all rationale for reproducibility.
Test Selection Rationale
Included: Core Diehard Tests (Tests 0–4, 8–13, 15–17)
All rated “Good” by the Dieharder developers. Designed for finite data sources and complete within 1 GiB constraints. These tests form the backbone of the validation suite, examining collision patterns, matrix rank properties, bit-level pattern analysis, spatial distribution, information-theoretic measures, sequential pattern analysis, and number-theoretic relationships.
Excluded: Suspect Tests (Tests 5–7)
Tests 5–7: Diehard OPSO, OQSO, and DNA tests carry an official rating of “Suspect”. The Dieharder developers have identified statistical issues with these tests that make their results unreliable. Including tests known to be problematic would compromise the integrity of the validation suite.
Excluded: Problematic Test (Test 14)
Test 14: Diehard Sums Test carries an official rating of “Do Not Use”. This test is fundamentally flawed and should never be used for randomness assessment, according to the official documentation.
Excluded: Data-Intensive “Good” Tests (Tests 100–102, 200–209)
Despite being rated “Good”, seven tests were excluded due to excessive data requirements:
- STS Tests (100–102): Tests 100–101 largely duplicate analysis provided by included tests. Test 102 requires excessive data, causing significant file rewinding.
- RGB Tests (200–205): Created for continuous generator streams, not finite files. Extremely data-hungry, causing extensive file rewinding.
- DAB Tests (206–209): Designed for much larger datasets. Would require significant file rewinding for completion.
Parameter Optimisation
Reduced p-value samples: Changed from default 100 to 20 samples per test. This maintains adequate statistical power whilst reducing data consumption by 80%, significantly reducing file rewinding across all tests. Slightly reduced statistical precision, but still statistically valid.
The 14 Selected Tests
The validation suite includes 14 carefully selected tests providing comprehensive coverage of randomness properties whilst respecting 1 GiB file constraints. Select any test to see the full details.
- What it measures
- Collision patterns in random sequences, based on the birthday paradox.
- How it works
- Examines whether birthdays (represented as random integers) in groups show the expected collision frequency. The test divides data into groups and counts collisions, comparing results to theoretical expectations for random data.
- Technical parameters
- ntup=0
- Why it matters
- Collision analysis is fundamental to cryptographic applications. Non-random data often exhibits unexpected collision patterns that this test can detect.
- What PASSED means
- The entropy exhibits appropriate collision patterns consistent with random data, with no unexpected clustering or avoidance of collisions.
- What it measures
- Permutation patterns in sequences of 5 elements.
- How it works
- Analyses whether the 120 possible permutations of 5 elements occur with equal frequency in the data. Each permutation should appear with probability 1/120 in truly random sequences.
- Technical parameters
- ntup=0
- Why it matters
- Ordering bias can indicate subtle patterns in entropy generation that might not be detected by simpler frequency tests.
- What PASSED means
- No bias towards particular ordering patterns in the entropy data, confirming that sequential relationships appear random.
- What it measures
- Mathematical rank properties of 32×32 binary matrices formed from the data.
- How it works
- Creates matrices from consecutive bits and calculates their rank over GF(2) (Galois Field). The distribution of ranks is compared to theoretical expectations for random binary matrices.
- Technical parameters
- ntup=0
- Why it matters
- Matrix rank analysis can detect linear dependencies and structural patterns that might not be apparent in other tests.
- What PASSED means
- The data exhibits the expected linear algebra properties of random binary matrices, with no detectable linear dependencies.
- What it measures
- Rank properties of smaller 6×8 binary matrices, complementing Test 2.
- How it works
- Analyses rank distribution of smaller matrices for different scale validation. This provides analysis at a different resolution than the 32×32 test.
- Technical parameters
- ntup=0
- Why it matters
- Testing multiple matrix sizes ensures that linear dependencies are not missed due to scale effects.
- What PASSED means
- Confirms randomness properties at a different matrix scale, providing additional confidence in linear independence.
- What it measures
- Overlapping bit patterns within the data stream.
- How it works
- Examines specific overlapping bit sequences for unexpected patterns, focusing on the frequency of particular bit combinations as they overlap.
- Technical parameters
- ntup=0
- Why it matters
- Overlapping pattern analysis can detect subtler correlations than non-overlapping pattern tests.
- What PASSED means
- No problematic bit-level patterns detected in the entropy, confirming appropriate bit-level randomness.
- What it measures
- Distribution of 1-bits in consecutive bit streams.
- How it works
- Counts 1-bits in overlapping windows of specific sizes and tests whether the distribution matches theoretical expectations for random data.
- Technical parameters
- ntup=0
- Why it matters
- Bit frequency distribution is fundamental to randomness assessment and can reveal bias in bit generation.
- What PASSED means
- Appropriate distribution of 1-bits throughout the data streams, confirming balanced bit generation.
- What it measures
- Distribution of 1-bits within individual bytes.
- How it works
- Analyses how many 1-bits appear in each byte value (0–8 ones per byte) and compares the distribution to theoretical expectations.
- Technical parameters
- ntup=0
- Why it matters
- Byte-level analysis can detect patterns that might be masked when analysing larger data blocks.
- What PASSED means
- Byte-level bit distribution matches random expectations, confirming appropriate entropy at the byte scale.
- What it measures
- 2D spatial distribution patterns using a parking lot analogy.
- How it works
- Places “cars” (data points) randomly in a 2D space and measures parking success rates. The test simulates trying to park cars of specific sizes and counts successful placements.
- Technical parameters
- ntup=0
- Why it matters
- Spatial distribution tests can detect clustering patterns that other tests might miss.
- What PASSED means
- The entropy exhibits appropriate 2D spatial randomness properties, with no unexpected clustering or avoidance patterns.
- What it measures
- Distribution of points within a 2D circular space.
- How it works
- Maps data to 2D coordinates within a unit circle and analyses distribution uniformity, examining whether points are appropriately distributed throughout the 2D circular space.
- Technical parameters
- ntup=2
- Why it matters
- Circular spatial analysis can detect patterns that rectangular spatial tests might miss.
- What PASSED means
- No unexpected clustering in 2D circular representation of the data, confirming appropriate spatial distribution.
- What it measures
- 3D spatial distribution within a unit sphere.
- How it works
- Maps data points to 3D coordinates within a sphere and tests distribution uniformity, examining whether points are appropriately distributed throughout the 3D space.
- Technical parameters
- ntup=3
- Why it matters
- Higher-dimensional spatial analysis can detect patterns that might not be apparent in 2D tests.
- What PASSED means
- Appropriate 3D spatial distribution properties confirmed, demonstrating randomness in higher-dimensional space.
- What it measures
- Compressibility and information-theoretic properties.
- How it works
- Uses a mathematical “squeeze” operation to test for hidden patterns that might make data compressible. The test examines how much data can be “compressed” using specific mathematical operations.
- Technical parameters
- ntup=0
- Why it matters
- Truly random data should be incompressible. Any compressibility suggests the presence of patterns.
- What PASSED means
- The entropy data resists compression as expected for random sequences, confirming high information content.
- What it measures
- Consecutive identical bit patterns (runs of 0s and 1s).
- How it works
- Counts run lengths of consecutive identical bits and compares the distribution to theoretical expectations for random bit sequences.
- Technical parameters
- ntup=0
- Why it matters
- Run-length analysis can detect bias towards longer or shorter sequences of identical bits.
- What PASSED means
- Run-length patterns match those expected from random bit sequences, confirming appropriate bit transition behaviour.
- Note
- This test produces 2 results: one for runs of 0s and one for runs of 1s.
- What it measures
- Complex sequence analysis using dice game simulation.
- How it works
- Simulates craps games using the entropy data and analyses win/loss patterns and the number of throws required to reach decisions.
- Technical parameters
- ntup=0
- Why it matters
- Game simulation tests complex sequential relationships that individual statistical tests might not detect.
- What PASSED means
- Complex sequential patterns behave as expected for random data, confirming sophisticated randomness properties.
- Note
- This test produces 2 results (wins and throws to decision). During execution, this test typically causes one file rewind.
- What it measures
- Greatest common divisor patterns in integer sequences.
- How it works
- Applies number theory analysis to detect mathematical patterns by examining the GCD relationships between pairs of integers derived from the data.
- Technical parameters
- ntup=0
- Why it matters
- Number-theoretic analysis can detect mathematical relationships that other statistical tests might miss.
- What PASSED means
- No unexpected mathematical relationships in the entropy data, confirming randomness from a number theory perspective.
- Note
- This test produces 2 results covering different GCD analyses.
Complete Test Suite Inventory
All 29 Dieharder tests with Eormen's inclusion and exclusion decisions. 14 included (shown in green); 15 excluded (shown in red) with rationale.
| Test ID | Test Name | Official Rating | Included | Rationale |
|---|---|---|---|---|
| Original Diehard Tests | ||||
| 0 | Diehard Birthdays Test | Good | ✓ Yes | Core randomness test, efficient for 1 GiB |
| 1 | Diehard OPERM5 Test | Good | ✓ Yes | Permutation analysis, suitable data usage |
| 2 | Diehard 32×32 Binary Rank Test | Good | ✓ Yes | Matrix rank analysis, efficient implementation |
| 3 | Diehard 6×8 Binary Rank Test | Good | ✓ Yes | Complementary matrix analysis |
| 4 | Diehard Bitstream Test | Good | ✓ Yes | Bit pattern analysis, reasonable data usage |
| 5 | Diehard OPSO Test | Suspect | ✗ No | Official rating “Suspect” |
| 6 | Diehard OQSO Test | Suspect | ✗ No | Official rating “Suspect” |
| 7 | Diehard DNA Test | Suspect | ✗ No | Official rating “Suspect” |
| 8 | Diehard Count the 1s (stream) Test | Good | ✓ Yes | Fundamental bit analysis, efficient |
| 9 | Diehard Count the 1s Test (byte) | Good | ✓ Yes | Byte-level analysis, minimal data usage |
| 10 | Diehard Parking Lot Test | Good | ✓ Yes | Spatial analysis, suitable for 1 GiB |
| 11 | Diehard 2D Sphere Test | Good | ✓ Yes | 2D circular spatial analysis, efficient |
| 12 | Diehard 3D Sphere Test | Good | ✓ Yes | 3D spatial analysis, manageable data usage |
| 13 | Diehard Squeeze Test | Good | ✓ Yes | Information theory, efficient implementation |
| 14 | Diehard Sums Test | Do Not Use | ✗ No | Official rating “Do Not Use” |
| 15 | Diehard Runs Test | Good | ✓ Yes | Run analysis, fundamental test |
| 16 | Diehard Craps Test | Good | ✓ Yes | Sequential analysis, reasonable data usage |
| 17 | Marsaglia and Tsang GCD Test | Good | ✓ Yes | Number theory, efficient for 1 GiB |
| STS (Statistical Test Suite) Tests | ||||
| 100 | STS Monobit Test | Good | ✗ No | Duplicates analysis in included tests |
| 101 | STS Runs Test | Good | ✗ No | Duplicates Test 15 analysis |
| 102 | STS Serial Test (Generalised) | Good | ✗ No | Excessive data consumption for 1 GiB |
| RGB (Robert G. Brown) Tests | ||||
| 200 | RGB Bit Distribution Test | Good | ✗ No | Extremely data-intensive, causes rewinding |
| 201 | RGB Generalised Minimum Distance Test | Good | ✗ No | Data-hungry, designed for continuous streams |
| 202 | RGB Permutations Test | Good | ✗ No | Excessive data requirements |
| 203 | RGB Lagged Sum Test | Good | ✗ No | Data-intensive, rewinding risk |
| 204 | RGB Kolmogorov-Smirnov Test | Good | ✗ No | High data consumption |
| 205 | Byte Distribution | Good | ✗ No | Data-hungry implementation |
| DAB (Data Analysis Battery) Tests | ||||
| 206 | DAB DCT | Good | ✗ No | Designed for larger datasets |
| 207 | DAB Fill Tree Test | Good | ✗ No | Excessive data requirements |
| 208 | DAB Fill Tree 2 Test | Good | ✗ No | High data consumption |
| 209 | DAB Monobit 2 Test | Good | ✗ No | Data-intensive for 1 GiB files |
Total available tests
29
Tests included
14
Individual results produced
17
Technical Implementation
Parameters Used
Each test execution uses the following carefully optimised parameters:
Generator 201 (file_input_raw)
Specifically designed for binary entropy files, providing direct access to raw data without unnecessary transformations.
20 P-value Samples
Reduced from the default 100 to conserve data whilst maintaining adequate statistical power for reliable results. This prevents excessive file rewinding that could compromise test independence.
Individual Test Execution
Each test runs separately to provide detailed individual results and prevent interference between tests.
Understanding the Test Results
Result Categories
- PASSED: Test indicates excellent randomness properties.
- WEAK: Borderline result, not necessarily problematic but worth noting. Occasional weak results are not necessarily cause for concern.
- FAILED: Test suggests potential non-random patterns requiring investigation.
Expected Performance
- Runtime: Approximately 15–30 minutes for the complete 14-test suite.
- Total results: 17 individual test results (some tests produce multiple p-values).
- File rewinds: Typically 1 rewind (during the diehard_craps test).
- Data usage: Efficient usage of the 1 GiB block with minimal repetition.
Overall Assessment Guidelines
- All PASSED: Excellent randomness properties confirmed.
- Mostly PASSED with occasional WEAK: Good randomness with minor variations.
- Multiple FAILED: Concerning results requiring investigation of entropy source.
Strengths and Limitations
File Authenticity
The results file includes cryptographic verification using a three-tier hashing system:
- Nonce: Links results to the specific EORM generation session.
- Data-only SHA-256: Verifies the entropy data independently. Remains constant for reproductions.
- Complete file SHA-256: Verifies the entire file has not been modified.
- Metadata-only SHA-256: Verifies the metadata section.
- Timestamp: Unix timestamp confirming when the entropy was generated.
This three-tier approach ensures both data integrity and proper attribution to the original generation session. The data-only hash remains constant for reproductions, whilst the complete file hash verifies the entire file has not been modified.