Eormen Full-Block Validation Protocol
v6.0.0How Eormen’s internal validation suite tests every single byte of a 1 GiB entropy block: 8 statistical categories, zero sampling, with full methodology and scoring criteria.
Overview
The Eormen Internal Validation Test Suite is a comprehensive statistical analysis system designed to perform exhaustive validation of 1 GiB entropy blocks generated by the Eormen entropy generation system. This validation suite processes every single byte of the entropy block without sampling, providing rigorous statistical measurements to characterise the randomness properties of the generated data. The suite includes advanced chunk uniqueness verification to ensure no portions of the entropy block repeat anywhere within the same block.
Purpose
This validation suite accompanies each generated entropy block to provide objective statistical measurements of its randomness characteristics. The suite performs multiple independent analyses to ensure comprehensive coverage of different statistical properties that characterise high-quality random data, including verification that no significant portions of data repeat within the block.
File Requirements
The validation suite expects entropy block files with the following structure:
| Section | Size | Description |
|---|---|---|
| Data section | Exactly 1,073,741,824 bytes (1 GiB) | Entropy data to be tested |
| Metadata section | 64 bytes | Block identification and generation details, appended after entropy data |
| Total file size | 1,073,741,888 bytes |
The 8 Statistical Categories
Every byte of the 1 GiB block passes through all 8 of the following analyses. A block must pass all 8 to be delivered.
Examines the distribution of byte values (0–255) throughout the entire block. For a block to pass, all 256 possible byte values must appear with near-perfect uniformity.
- Chi-square test: Measures deviation from expected uniform distribution.
- Uniformity index: Calculated using Gini coefficient.
- Maximum deviation: Largest deviation from expected frequency (1/256).
- Standard deviation: Variability in frequency distribution.
Calculates information-theoretic entropy at multiple scales throughout the full block. A perfect entropy source produces exactly 8.000000 bits per byte.
- Shannon entropy: Full-block entropy measurement (bits per byte).
- Multi-scale analysis: Entropy calculated for block sizes of 1 KB, 4 KB, 16 KB, 64 KB, 256 KB, and 1 MB.
- Bit-position entropies: Entropy for each bit position (0–7).
- Byte-pair entropy: Second-order entropy from consecutive byte pairs.
- Conditional entropy: Measures predictability based on previous bytes.
Tests for dependencies between bytes at 17 different distances. In a truly random block, no relationship should exist between any byte and any other byte, at any distance.
- Lag correlations: Calculated for lags of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1,024, 2,048, 4,096, 8,192, 16,384, 32,768, and 65,536 bytes.
- Maximum absolute correlation: Highest correlation value found across all lags.
- Correlation decay rate: How quickly correlations decrease with lag distance.
- Significant correlations: Correlations exceeding theoretical threshold.
Examines frequency domain properties using Fast Fourier Transform. Truly random data produces a flat “white noise” spectrum with no dominant frequencies.
- Total spectral power: Energy distribution across frequencies.
- Spectral flatness: Measure of “whiteness” of the spectrum.
- Peak-to-average ratio: Identifies any dominant frequencies.
- FFT window size: 1,048,576 bytes (1 MB).
Detects and quantifies patterns in the data at multiple levels. Covers both run-length behaviour and template matching across the full block.
- Run length distributions: For each bit position, counts consecutive 0s or 1s.
- Longest run: Maximum length of consecutive identical bits.
- Template matching: Counts occurrences of 9-bit patterns.
- Approximate entropy: Measures pattern complexity.
Evaluates compressibility using three industrial compression algorithms at maximum compression. Truly random data cannot be compressed: any achievable reduction indicates the presence of patterns.
- zlib: Deflate algorithm at maximum compression (level 9).
- bzip2: Burrows-Wheeler transform at maximum compression (level 9).
- lzma: Lempel-Ziv-Markov chain algorithm at maximum compression (level 9).
Tests the rank distribution of binary matrices formed from the data against theoretical predictions. Linear dependencies in the data would cause deviations from the expected rank distribution.
- Matrix size: 32×32 bits over Galois Field GF(2).
- Rank distribution: Counts of matrices with each possible rank.
- Chi-square test: Comparison against theoretical probabilities.
- Theoretical probabilities: Rank 32: 28.88% · Rank 31: 57.76% · Rank 30: 12.84% · Rank 29: 0.52% · Rank ≤28: ~0.0044%.
Verifies that no significant portions of the entropy block repeat anywhere within the same block. Every chunk is hashed with SHA-256; any collision indicates a duplicate. This test is unique to the Eormen suite and provides direct evidence that the block contains no repeated segments.
- Chunk sizes tested: 4,096 bytes (4 KB), 16,384 bytes (16 KB), and 65,536 bytes (64 KB).
- Uniqueness verification: Every chunk hashed using SHA-256 to detect duplicates.
- Collision rate analysis: Calculates observed collision rate per million chunks.
- Theoretical probability: Compares observed duplicates against birthday paradox expectations.
- Spatial analysis: Measures separation distances between any duplicate chunks found.
- Clustering coefficient: Detects if duplicates cluster in specific regions.
- Hash distribution uniformity: Verifies SHA-256 hash prefixes distribute uniformly.
- Uniqueness scoring: 0–1,000 point system where 1,000 represents perfect uniqueness.
Scoring thresholds:
| Score | Assessment | Meaning |
|---|---|---|
| 1000 | Excellent | No duplicates found at any chunk size |
| ≥ 950 | Good | Negligible duplicates |
| ≥ 900 | Acceptable | Minimal duplicates within expected range |
| ≥ 800 | Concerning | Duplicates approaching threshold |
| < 800 | Failed | Block not suitable for use |
Metadata Extraction and Verification
The suite extracts and provides comprehensive hash verification from the 64-byte metadata section appended to each block:
- Nonce: 16-byte unique identifier (displayed as hexadecimal). Links results to this specific generation session.
- Data-only SHA-256: Hash of entropy data section for reproduction verification.
- Complete file SHA-256: Hash of entire file for integrity verification.
- Metadata-only SHA-256: Hash of metadata section.
- Generation timestamp: Unix timestamp and UTC datetime.
- Original filename: As recorded during generation.
The hash ordering follows the production standard: nonce, data-only hash, complete file hash, ensuring consistency across all Eormen documentation.
Output Format
Results are saved to a JSON file with the naming convention:
The JSON output contains:
- Block identification with three-tier hash verification.
- Validation metadata (timestamp, version, processing statistics).
- Complete statistical measurements from all 8 analyses.
- Computational parameters used during validation.
- Comprehensive chunk uniqueness analysis results including: overall uniqueness score and assessment; per-chunk-size detailed statistics; duplicate positions if any found; statistical significance assessments; and scoring system explanation.
Processing Characteristics
- Memory efficient: Uses streaming algorithms with constant memory usage throughout.
- State isolation: Complete state reset between validation runs.
- No sampling: Every byte of the 1 GiB block is analysed without exception.
- Numerical precision: High-precision calculations using Decimal arithmetic where appropriate.
- Chunk processing: Efficient sliding window approach for uniqueness verification.
- Hash-based detection: SHA-256 used for reliable duplicate detection in the uniqueness analysis.
Interpretation Note
This validation suite provides objective statistical measurements only. No judgements about randomness quality are made by the suite itself. The measurements should be interpreted by qualified analysts familiar with statistical testing of random number generators. The chunk uniqueness analysis provides definitive verification that no significant data portions repeat within the block.
File Authenticity
The results file includes cryptographic verification:
- Nonce: Links results to the specific EORM generation session.
- Data-only SHA-256: Verifies the entropy data independently.
- Complete file SHA-256: Verifies the entire file tested.
- Metadata-only SHA-256: Verifies the metadata section.
- Timestamp: Confirms when the entropy was generated.
- GPG signature: For authentication of the results file.
The three-tier hash system enables verification of data integrity separately from metadata, essential for confirming that reproduced blocks contain identical entropy whilst having different timestamps.