cascading failure

 

  • The high latency and packet loss is caused by the nodes that fail to operate due to congestion collapse, which causes them to still be present in the network but without much
    or any useful communication going through them.

  • Another common technique is to calculate a safety margin for the system by computer simulation of possible failures, to establish safe operating levels below which none of
    the calculated scenarios is predicted to cause cascading failure, and to identify the parts of the network which are most likely to cause cascading failures.

  • Cascading failure is a common effect seen in high voltage systems, where a single point of failure (SPF) on a fully loaded or slightly overloaded system results in a sudden
    spike across all nodes of the system.

  • This can cause one or more systems along the alternative route to go down, creating similar problems of their own.

  • A cascading failure is a failure in a system of interconnected parts in which the failure of one or few parts leads to the failure of other parts, growing progressively as
    a result of positive feedback.

  • As an example, DNS resolution might fail and what would normally cause systems to be interconnected, might break connections that are not even directly involved in the actual
    systems that went down.

  • This surge current can induce the already overloaded nodes into failure, setting off more overloads and thereby taking down the entire system in a very short time.

  • Cascading failures may occur when one part of the system fails.

  • Although undesired, this can help speed up the recovery from this failure as connections will time out, and other nodes will give up trying to establish connections to the
    section(s) that have become cut off, decreasing load on the involved nodes.

  • For example, under certain conditions a large power grid can collapse after the failure of a single transformer.

  • This can occur when a single part fails, increasing the probability that other portions of the system fail.

  • In power transmission Cascading failure is common in power grids when one of the elements fails (completely or partially) and shifts its load to nearby elements in the system.

  • This failure process cascades through the elements of the system like a ripple on a pond and continues until substantially all of the elements in the system are compromised
    and/or the system becomes functionally disconnected from the source of its load.

  • This, in turn, may cause seemingly unrelated nodes to develop problems, that can cause another cascade failure all on its own.

  • A common occurrence during a cascade failure is a walking failure, where sections go down, causing the next section to fail, after which the first section comes back up.

  • A cascade failure can affect large groups of people and systems.

  • [1][2] Such a failure may happen in many types of systems, including power transmission, computer networking, finance, transportation systems, organisms, the human body, and
    ecosystems.

  • Monitoring the operation of a system, in real-time, and judicious disconnection of parts can help stop a cascade.

  • Current research is to find a way to block this cascade in stroke patients to minimize the damage.

  • History[edit] Cascade failures are a relatively recent development, with the massive increase in traffic and the high interconnectivity between systems and networks.

  • Symptoms[edit] The symptoms of a cascade failure include: packet loss and high network latency, not just to single systems, but to whole sections of a network or the internet.

  • If enough routes go down because of a cascade failure, a complete section of the network or internet can become unreachable.

  • Usually, the redundant systems of an ISP respond very quickly, choosing another path through a different backbone.

  • This in turn overloads these nodes, causing them to fail as well, prompting additional nodes to fail one after another.

  • Properly designed structures use an adequate factor of safety and/or alternate load paths to prevent this type of mechanical cascade failure.

 

Works Cited

[‘1. “Cascading Failure – an overview | ScienceDirect Topics”. www.sciencedirect.com.
2. ^ Ulrich, Mike. “Chapter 22 – Addressing Cascading Failures”. Google – Site Reliability Engineering.
3. ^ Zhai, Chao (2017). “Modeling and Identification of
Worst-Case Cascading Failures in Power Systems”. arXiv:1703.05232 [cs.SY].
4. ^ “Why Gmail went down: Google misconfigured load balancing servers (Updated)”. 11 December 2012.
5. ^ Petroski, Henry (1992). To Engineer Is Human: The Role of Failure
in Structural Design. Vintage. ISBN 978-0-679-73416-1.
6. ^ Boast, P. Baveye, C. W. (1998). “Fractal Geometry, Fragmentation Processes and the Physics of Scale-Invariance: An Introduction”. Revival: Fractals in Soil Science (1998). CRC Press. doi:10.1201/9781315151052.
ISBN 9781315151052.
7. ^ Jump up to:a b Heisser, Ronald H.; Patil, Vishal P.; Stoop, Norbert; Villermaux, Emmanuel; Dunkel, Jörn (28 August 2018). “Controlling fracture cascades through twisting and quenching”. Proceedings of the National Academy
of Sciences. 115 (35): 8665–8670. arXiv:1802.05402. Bibcode:2018PNAS..115.8665H. doi:10.1073/pnas.1802831115. ISSN 0027-8424. PMC 6126751. PMID 30104353.
8. ^ Melton, L Joseph; Amin, Shreyasee (26 June 2013). “Is there a specific fracture ‘cascade’?”.
BoneKEy Reports. 2: 367. doi:10.1038/bonekey.2013.101. PMC 3935254. PMID 24575296.
9. ^ Acemoglu, Daron; Ozdaglar, Asuman; Tahbaz-Salehi, Alireza (2015). “Systemic Risk and Stability in Financial Networks”. American Economic Review. American Economic
Association. 105 (2): 564–608. doi:10.1257/aer.20130456. hdl:1721.1/100979. ISSN 0002-8282. S2CID 7447939.
10. ^ Gai, Prasanna; Kapadia, Sujit (2010-08-08). “Contagion in financial networks”. Proceedings of the Royal Society A: Mathematical, Physical
and Engineering Sciences. 466 (2120): 2401–2423. Bibcode:2010RSPSA.466.2401G. doi:10.1098/rspa.2009.0410. ISSN 1364-5021. S2CID 9945658.
11. ^ Jump up to:a b Elliott, Matthew; Golub, Benjamin; Jackson, Matthew O. (2014-10-01). “Financial Networks
and Contagion”. American Economic Review. 104 (10): 3115–3153. doi:10.1257/aer.104.10.3115. ISSN 0002-8282.
12. ^ “Report of the Commission to Assess the Threat to the United States from Electromagnetic Pulse (EMP) Attack” (PDF).
13. ^ Rinaldi,
S.M.; Peerenboom, J.P.; Kelly, T.K. (2001). “Identifying, understanding, and analyzing critical infrastructure interdependencies”. IEEE Control Systems Magazine. 21 (6): 11–25. doi:10.1109/37.969131.
14. ^ V. Rosato, Issacharoff, L., Tiriticco,
F., Meloni, S., Porcellinis, S.D., & Setola, R. (2008). “Modelling interdependent infrastructures using interacting dynamical models”. International Journal of Critical Infrastructures. 4: 63–79. doi:10.1504/IJCIS.2008.016092.
15. ^ Motter, A. E.;
Lai, Y. C. (2002). “Cascade-based attacks on complex networks”. Phys. Rev. E. 66 (6 Pt 2): 065102. arXiv:cond-mat/0301086. Bibcode:2002PhRvE..66f5102M. doi:10.1103/PhysRevE.66.065102. PMID 12513335. S2CID 17189308.
Photo credit: https://www.flickr.com/photos/avidlyabide/6970683464/’]