(Last Updated: April 27, 2020)

Understanding the flash memory endurance problem



  • I am seeing a lot reports and complains throughout various home automation forums about people complaining about failed flash:
    SD cards (notoriously rPis and all SBCs for every known platforms)
    USB drives (vera USB logging)
    and internal on board flash for vera.

    All of these cause system failures and data corruption. While the vera is a case of needless auto-flagellation with their very highly rated SLC NAND flash paired with a mind boggling drive partitioning causing them to barely use 10% of the storage space the hardware makes available, the others are due to using inappropriate hardware for their purpose.
    What is important to know is that a flash memory cell has a limited lifetime.
    The smaller the cell is, the thinner the gate holding the charge is and the more fragile/less enduring it is. This is why the industry has gone vertical with V-NAND or 3D NAND instead of just shrinking the size of the cells to reduce cost and increase density.

    Following this move to the third dimension, came the idea that each charge could contain more than one bit by holding different state values. That's what MLC (Multi Level Cell, in practice only 2) and TLC (Tri Level Cell) and now QLC (Quad Level Cell). MLC means that the cell now has 4 states (2^2), TLC has 8 (2^3), QLC has 16 (2^4) states or distinct cell voltage levels. The compromise to these multi-level of the cells is increased error rate and lower reliability (accuracy of the charge depends in part on how worn the gate is) which needs to be compensated by ECC in their controllers. It is indeed harder to get the voltage right than a digital have voltage or have not logic. The difference in endurance can be a couple of orders of magnitude! The cheaper NAND flash like what you find in eMMC, SD cards and most USB flash drives are slow and have less write endurance because of they use the cheaper QLC/TLC technologies. (1GB QLC costs the same to make as 256MB of SLC and possibly significantly less because of smaller cell size)
    Another layer of reliability improvements in SSDs has been TRIM, garbage collection, and wear leveling. All three features are meant to distribute the writing more evenly across the cells, run background reset of the cells so as to speed up the write process...
    Well none of these exist on a controller-less NAND like an SD card or eMMC and this is why you could have a large flash drive corrupt data because it's been overwriting the exact same cells over and over while the rest of the cells may never have been used.

    For a home automation platform which needs to constantly save variable states and logs in various files, an SSD is definitely a better way to go than those controller-less storages... This is even more true for embedded/non replaceable storage...


Log in to reply