How we test cold crypto wallets

Overview

We review cold crypto wallets using a weighted, category-based model designed for offline storage solutions.

Important distinction: In crypto, “hardware wallet” and “cold wallet” are often used interchangeably, but they’re not the same:

  • Hardware wallet = type of device (physical gadget like Ledger, Trezor)
  • Cold wallet = state of storage (private keys never touch the internet)

A hardware wallet is one type of cold wallet, but cold storage also includes:

  • Paper wallets (printed private keys or seed phrases)
  • Steel/metal wallets (seed phrases engraved on metal plates)
  • Air-gapped devices (old phones or computers with no network connection)
  • Seed phrase storage solutions (Cryptosteel, Billfodl, etc.)

Our methodology covers all cold storage methods, not just hardware devices.

What We Test

We focus on solutions that keep private keys permanently offline. The benchmark: users who want to secure long-term holdings (6+ months to years) and are willing to accept transaction friction in exchange for maximum security.

Cold wallet types we review:

  1. Hardware wallets (Ledger, Trezor, Coldcard, Tangem, etc.)
  2. Air-gapped solutions (dedicated offline devices, old phones with wallets)
  3. Paper wallets (generated offline, printed or handwritten keys)
  4. Metal/steel backup solutions (Cryptosteel, Billfodl, seed phrase plates)
  5. Hybrid solutions (multisig setups combining multiple cold storage methods)

Testing Process

Phase 1: Setup & Initial Security (Day 1-2)

For hardware wallets:

  • Document packaging integrity (tamper-evident seals, security labels)
  • Check for pre-generated seed phrases (red flag)
  • Complete initial setup (PIN, seed phrase generation, passphrase option)
  • Verify seed phrase never appears on connected computer/phone
  • Measure time to first transaction-ready state

For paper/metal wallets:

  • Test seed phrase generation process (offline computer, dedicated device)
  • Evaluate physical durability (water, fire, corrosion resistance for metal)
  • Test readability after “aging” simulation (light exposure, handling)
  • Check for engraving/stamping clarity on metal solutions

For air-gapped devices:

  • Verify network connectivity is disabled (WiFi, Bluetooth, cellular)
  • Test QR code transaction signing (if supported)
  • Check for potential data leakage vectors (USB, NFC, camera)

Phase 2: Offline Security Verification (Days 3-5)

Key question: Are the private keys TRULY offline?

  • Verify seed phrase/private key never transmitted to online device
  • Test transaction signing flow (does signing happen offline?)
  • Check for “online moments” (firmware updates, balance checks)
  • Evaluate physical security:
    • Hardware wallets: secure element chip, tamper resistance
    • Paper wallets: vulnerability to theft, fire, water damage
    • Metal wallets: durability testing (we don’t destroy, but review user reports)
    • Air-gapped devices: screen lock, encrypted storage

Phase 3: Transaction Flow & Usability (Days 6-10)

How cold storage interacts with hot wallets for spending:

  • Test “cold → hot” workflow (how to move funds when you need to spend)
  • Measure transaction signing complexity:
    • Hardware wallets: connect → verify → sign → disconnect
    • Air-gapped: QR codes, SD cards, manual entry
    • Paper wallets: import to hot wallet (risky) vs sweep funds
  • Evaluate “watch-only” wallet setup (check balance without exposing keys)
  • Test multisig setups (combining multiple cold wallets for added security)

Phase 4: Recovery & Disaster Scenarios (Days 11-14)

Critical question: What if your cold wallet is lost, stolen, or destroyed?

For hardware wallets:

  • Test recovery with seed phrase on same device (factory reset)
  • Test recovery on different device model (portability check)
  • Test recovery with passphrase (25th word)
  • Simulate lost/stolen device (can funds be moved quickly with backup?)

For paper/metal wallets:

  • Test seed phrase import to software wallet
  • Simulate “damaged backup” (partially illegible seed phrase)
  • Test redundancy strategies (multiple backups in different locations)

For air-gapped devices:

  • Test recovery if device is lost/broken
  • Verify backup methods (encrypted exports, seed phrase backups)

Phase 5: Long-Term Storage Considerations (Throughout testing)

  • Physical durability: Can it survive 5-10 years?
    • Hardware wallets: battery life, button wear, screen degradation
    • Paper wallets: ink fading, paper deterioration
    • Metal wallets: corrosion resistance, engraving legibility
  • Technological obsolescence: Will recovery tools exist in 10 years?
    • BIP39 standard vs proprietary formats
    • Firmware update requirements
    • Dependency on specific software/apps
  • Inheritance planning: Can heirs access funds?
    • Clarity of backup instructions
    • Multisig setups with trusted parties
    • Dead man’s switch solutions

Scoring Criteria (7 Categories, 1.0–5.0 scale)

1. Offline Security & Key Isolation (35%)

What we test:

  • Are private keys TRULY offline at all times?
  • Secure element vs general-purpose chip (hardware wallets)
  • Physical tamper resistance
  • Attack resistance:
    • Clipboard malware (hardware wallets with screens)
    • Supply chain tampering
    • Physical theft (can a thief access funds?)
    • Coercion resistance (duress PIN, hidden wallets)

5/5 example: Hardware wallet with secure element (CC EAL6+), true air-gapped operation, strong physical security, duress PIN, open-source firmware

3/5 example: Hardware wallet with general-purpose chip, keys offline but firmware closed-source, basic physical security

1/5 example: “Cold wallet” that requires online connection for setup, weak physical security, proprietary recovery method

2. Recovery & Backup Reliability (25%)

What we test:

  • Seed phrase generation quality (BIP39 standard, entropy source)
  • Backup method durability:
    • Paper: ink quality, paper degradation
    • Metal: corrosion resistance, engraving clarity
    • Hardware: backup export options
  • Recovery testing across different devices/software
  • Passphrase (25th word) support
  • Redundancy strategies (multiple backups)
  • Inheritance planning (can heirs recover funds?)

5/5 example: BIP39 standard, metal backup compatible, recovery tested on 3+ different wallets, clear inheritance instructions, Shamir Backup option

3/5 example: Standard seed phrase, paper backup works, recovery requires specific software, basic passphrase support

1/5 example: Proprietary recovery, single point of failure, no backup durability testing, heirs cannot access

3. Long-Term Durability & Obsolescence Risk (15%)

What we test:

  • Physical durability (5-10 year horizon):
    • Hardware: battery, buttons, screen, USB port
    • Paper: ink fading, water/fire damage
    • Metal: corrosion, engraving wear
  • Technological obsolescence risk:
    • Dependency on specific software (still maintained in 10 years?)
    • Proprietary formats vs open standards (BIP39, SLIP39)
    • Firmware update requirements
  • Recovery tool availability (can you recover in 2035?)

5/5 example: Metal backup (fireproof, waterproof), BIP39 standard (universally supported), no battery, open-source recovery tools

3/5 example: Hardware wallet with replaceable battery, BIP39 standard, firmware updates required every 2-3 years

1/5 example: Paper wallet with cheap ink, proprietary format, recovery depends on discontinued software

4. Transaction Signing & Spending Workflow (10%)

What we test:

  • How easy is it to spend funds when needed?
  • Transaction signing flow:
    • Hardware: connect → verify → sign → disconnect
    • Air-gapped: QR codes, SD cards, manual PSBT
    • Paper: must import to hot wallet (risky)
  • Address verification (full address display)
  • Fee control and transparency
  • Multisig support

5/5 example: Hardware wallet with large screen, QR code signing for air-gapped use, multisig support, watch-only wallet setup

3/5 example: Hardware wallet requires USB connection, address verification on small screen, basic fee display

1/5 example: Paper wallet requires full key import to hot wallet (no cold signing), no address verification

5. Multi-Chain & Asset Support (8%)

What we test:

  • Supported blockchains (BTC, ETH, SOL, etc.)
  • Token standards (ERC-20, SPL, BEP-20, NFTs)
  • Compatibility with popular software wallets (Sparrow, Electrum, MetaMask, Rabby)
  • Missing features on “supported” chains

5/5 example: BTC, ETH, SOL, and 20+ chains, full token support, works with Sparrow, Electrum, MetaMask, Rabby

3/5 example: BTC and ETH only, ERC-20 tokens work, limited software wallet options

1/5 example: BTC-only, requires proprietary software for transactions

6. Usability & Learning Curve (5%)

What we test:

  • Setup complexity for non-technical users
  • Quality of instructions (video guides, written docs)
  • Time from unboxing to first backup complete
  • Recovery drill clarity (can a user test recovery without risk?)
  • Physical ergonomics (hardware wallets: buttons, screen; metal wallets: assembly)

5/5 example: Clear video guides, setup in <20 minutes, recovery drill built into setup, beginner-friendly

3/5 example: Written instructions adequate, setup takes 45-60 minutes, recovery testing requires external guides

1/5 example: Confusing instructions, setup requires technical knowledge, no recovery testing guidance

7. Cost vs Security Tradeoff (2%)

What we test:

  • Does the price match the security level?
  • Hidden costs (mandatory software subscriptions, replacement parts)
  • Alternative cheaper solutions with similar security

Note: We score functionality first, but acknowledge cost matters for accessibility.

5/5 example: Strong security at reasonable price, no hidden costs, good value

3/5 example: Good security but expensive, some additional costs

1/5 example: Overpriced for security level, expensive mandatory subscriptions

Rating Scale

We use a 5-point scoring system for each criterion:

5.0 / ★★★★★ Exceptional – Industry-leading security, recovery, and durability. Sets the standard for cold storage. Example: Secure element chip (CC EAL6+), BIP39 standard, metal backup compatible, open-source firmware, tested recovery on 5+ devices, fireproof/waterproof durability.

4.0 / ★★★★ Excellent – Above-average performance across all criteria. Minor limitations do not compromise core security. Example: Secure element chip, BIP39 standard, recovery tested successfully, good build quality, comprehensive documentation.

3.0 / ★★★ Good – Meets basic cold storage expectations. Suitable for most users but with noticeable tradeoffs. Example: General-purpose chip with firmware encryption, BIP39 standard, recovery works but requires specific software, adequate physical security.

2.0 / ★★ Fair – Below-average solution with significant limitations. May be acceptable for small amounts or temporary storage. Example: Weak physical security, proprietary recovery format, limited multi-chain support, poor durability.

1.0 / ★ Poor – Major deficiencies that compromise core cold storage promise. Not recommended. Example: Keys not truly offline, no secure element, proprietary recovery method, frequent security incidents, poor build quality.

Final scores are calculated as weighted averages across all seven criteria. A device scoring 4.5/5 overall may have 5/5 in security but 3/5 in usability – we show the breakdown so you can decide what matters most for your use case.

How We Calculate Final Scores

Step 1: Rate each criterion on the 1-5 scale
Step 2: Multiply each score by its weight
Step 3: Sum the weighted scores

Example: Device X

CriterionScoreWeightWeighted Score
Offline Security & Key Isolation5/50.351.75
Recovery & Backup Reliability4/50.251.00
Long-Term Durability5/50.150.75
Transaction Signing Workflow4/50.100.40
Multi-Chain & Asset Support3/50.080.24
Usability & Learning Curve4/50.050.20
Cost vs Security Tradeoff4/50.020.08
Total29/351.004.42/5.00

Final rating: 4.42/5 (88%)

What We Don’t Rate

  • Aesthetics – Design matters for usability, not beauty
  • Brand loyalty – We test the solution, not the company’s reputation
  • Philosophical debates (open-source vs proprietary) – We note it, but score practical security
  • Price-to-value alone – Security comes first, cost is a secondary factor
  • Extreme edge cases – We focus on realistic threats (theft, loss, fire) not lab-grade attacks

Why Trust Our Ratings?

  • We purchase devices/solutions at retail price (no free review units)
  • We test with real funds ($200-500) over 10-14 days
  • We execute real mainnet transactions (BTC, ETH, SOL)
  • We test recovery on multiple devices/software wallets
  • We simulate realistic disasters (lost backup, damaged device)
  • We don’t accept payment for ratings
  • We update ratings when manufacturers release major updates or security advisories

Testing Limitations

  • We don’t perform destructive testing (we don’t burn/drown devices to test durability)
  • We don’t audit firmware source code (we verify signatures and check community audits)
  • We don’t test every attack vector (we focus on common threats, not nation-state attacks)
  • We can’t predict 10-year durability (we rely on materials science and user reports)

Questions About Our Process?

We welcome feedback on our cold wallet testing methodology. If you see critical factors missing, disagree with our weighting system, or have suggestions to improve our evaluation framework, contact our editorial team at [email protected].

Cold storage security deserves complete clarity. Our process leaves no room for shortcuts or paid influence – only verified results from hands-on testing with real funds and real recovery scenarios.

Last updated: February 11, 2026  

Next methodology review: Expected Q2 2026