The last days the CPU world has been all about the upcoming AMD architecture, but the spotlights are now on Intel. While there is a saying that every publicity is good publicity, I somehow doubt that this is the case this time.
Intel publishes specification update for many products they produce. If you thought that a product is complete after the blueprints come off the drawing board then you have been mistaken. Problems and their solutions are detected while the product has been released and new updates to the hardware are built in the later stages of the production.
Intel posts these changes in a so-called erratum, from the Latin word: errata corrige. Literally translated: a correction of a published text.
You can find the Erratum of the Atom C2000 here. When reading such a document for the first time you get scared as *ell. You might think whats wrong with the product when reading the explanations in the document but it’s not so bad as you might think.
The latter one …
AVR54. System May Experience Inability to Boot or May Cease Operation
is kinda bad.
Problem: The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock outputs) may stop functioning. Implication: If the LPC clock(s) stop functioning the system will no longer be able to boot. Workaround: A platform level change has been identified and may be implemented as a workaround for this erratum. Status: For the steppings affected, see Table 1.
Table 1 shows that there is only one stepping of the CPU, so all CPU’s are affected.
What does this mean?
The LPC bus was introduced by Intel in 1998 as a software-compatible substitute for the Industry Standard Architecture (ISA) bus. It connects devices to the CPU, one of these devices is the boot ROM.
No boot ROM during boot? > Blank screen and that’s about it.
The Intel’s Atom C2000 family, better known by the codenames Avoton and Rangeley were launched in Q3 of 2013 – about three and a half years ago. These chips are meant for use in lower-power and reasonably highly threaded applications such as microservers, communication/networking gear, and storage. As a result the C2000 is an important part of Intel’s product lineup – especially as it directly competes with various ARM-based processors in many of its markets.
Intel is saying that the problem is “a degradation of a circuit element under high use conditions at a rate higher than Intel’s quality goals after multiple years of service.”
Circuit degradation is a normal part of the life-cycle of a complex semiconductor like a processor. However with modern processors the effect should take a decade or longer, much longer than the expected service lifetime of a chip. So when something happens to speed up the degradation process, if severe enough it can cut the lifetime of a chip to a fraction of what it was planned for, causing a chip (or line of chips) to fail while still in active use. And this is exactly what’s happening with the Atom C2000.
For Intel, this is the second time this decade that they’ve encountered a degradation issue like this. Back in 2011 the company had to undertake a much larger and more embarrassing repair & replacement program for motherboards using early Intel 6-series chipsets. On those boards a overbiased (overdriven) transistor controlling some of the SATA ports could fail early, disabling those SATA ports. While there were more steppings of that chipset produces, the majority was affected.
There are 10 different CPU’s launched in the C2000 series, link
Cisco is a large company that uses these processors in some of their products. They already have sent out an advisory affecting 8 series of products.
Synology is shipping 7 NAS devices with a Intel Atom C2538. Product overview
They have stated the following:
Intel has recently notified Synology regarding the issue of the processor’s increased degradation chance of a specific component after heavy, prolonged usage.
Synology has not currently seen any indication that this issue has caused an increase in failure rates for DiskStation or RackStation models equipped with Intel Atom C2000 series processors compared to other models manufactured in the same time frame not equipped with the affected processors.
But as always don’t expect Synology to be the first to come with a statement that something is wrong with their products. On their forums there have been topics with a device that seems to fit the sypmtoms of a defect CPU.
Firewall pfSense / Netgate are using the Atom C2758 and Atom C2558 in a range of their devices. Almost all of their higher positioned devices are affected Products