Etfa2013 Wip Fttrs

Hello everyone. My name is David Gessner and I am going to present the paper titled "Towards a flexible time-triggered replicated star for Ethernet". The work presented in the paper is part of a project called FT4FTT. The context of this project are real-time distributed embedded systems. Such systems are basically a set of nodes interconnected by a communication subsystem such that the nodes together with the communication subsystem satisfy certain real-time requirements. Previous projects have made such systems adaptable by ensuring that if the environment imposes new real-time requirements, the system can adapt to them. The goal of the new project is to keep the system adaptable and functioning correctly even in the presence of faults. To achieve this goal, the project builds upon the FTT communication paradigm, where FTT stands for Flexible time triggered communication. FTT is based on master/multi- slave communication where one master controls multiple slaves by means of a single periodic control message called trigger message. This message tells each slave when and what it is allowed to transmit. This is done while taking into account changing real-time requirements and is therefore the main mechanism by which FTT provides flexibility. There exists previous work to make such systems fault tolerant, however this work only focused on tolerating master failures and failures within the communication subsystem, but it did not take into account the failure of slaves. FT4FTT, in contrast, focuses on the whole system. This is particularly important because it has been shown that node failures can have a significant impact on the overall reliability. The design of FT4FTT can be divided into two basic blocks: the slave node subsystem and the communication subsystem. The focus of the paper I am presenting is the latter. Despite previous efforts, we tackle the reliability of the communication subsystem again for two main reasons. First, this subsystem should now not only tolerate its own failures, but also provide support for tolerating failures in the other subsystem. Second, all previous efforts for FTT were based on bus topologies, which have both reliability and performance limitations. The new approach is being designed to solve these issues. Specifically, the architecture presented in the paper is based on a switched Ethernet implementation of FTT where the master is embedded within the switch. Because this switch constitutes a single point of failure, we propose to replicate it. Now, for the slaves to be able to use the new replica, we need to add links from each slave to the new replica. The next step is to make it easy for a surviving replica to replace a faulty one. This is achieved by providing replica determinism, which means to ensure that both replicas provide the same results as long as they are non-faulty. To achieve replica determinism, some form of communication between the replicas is usually required. The paper therefore proposes an interlink between the two switches which is also replicated to provide further fault tolerance This is then the basic architecture presented in the paper for the communication subsystem of FT4FTT. Given this basic architecture, the paper discusses some ideas and alternatives for certain design decisions. For example, it explores How exactly to ensure replica determinism How to best take advantage of the channel replication when transmitting trigger messages What services to provide to the slaves to make it easier for them to tolerate each other's failures. And several other design ideas. No final decisions have been made for any of these ideas. Deciding the exact details still remains future work at this moment and is one of the next steps we will carry out. This concludes my presentation. Thank you very much for your attention.