Tip:
Highlight text to annotate it
X
Hello everyone. My name is David Gessner and
I am going to present the paper titled "Towards a flexible
time-triggered replicated star for Ethernet".
The work presented in the paper is part of a project called
FT4FTT. The context of this project are
real-time distributed embedded systems.
Such systems are basically a set of
nodes interconnected by a communication
subsystem such that the nodes together with the
communication subsystem satisfy
certain real-time requirements. Previous
projects have made such systems adaptable by ensuring
that if the environment imposes
new real-time requirements, the system can
adapt to them. The goal of the new project is
to keep the system adaptable and functioning correctly even
in the presence of faults.
To achieve this goal, the project builds upon the FTT
communication paradigm, where FTT stands for
Flexible time triggered communication.
FTT is based on master/multi- slave communication where
one master controls multiple slaves
by means of a single periodic control message
called trigger message. This
message tells each slave when and what it is allowed to
transmit. This is done while taking into account changing
real-time requirements and is therefore the main
mechanism by which FTT provides flexibility.
There exists previous work to make such systems fault
tolerant, however this work only focused on
tolerating master failures and
failures within the communication subsystem,
but it did not take into account the failure of slaves.
FT4FTT, in contrast, focuses on the whole
system. This is particularly important because it has been
shown that node failures can have a significant impact on
the overall reliability. The design of FT4FTT can be
divided into two basic blocks:
the slave node subsystem and
the communication subsystem.
The focus of the paper I am presenting is the latter.
Despite previous efforts, we tackle the reliability of the
communication subsystem again for two main reasons.
First, this subsystem should now not only tolerate its own
failures, but also provide support for tolerating failures in the
other subsystem. Second, all previous efforts for FTT were
based on bus topologies, which have both reliability and
performance limitations.
The new approach is being designed to solve these issues.
Specifically, the architecture presented in the paper is based
on a switched Ethernet
implementation of FTT where the master is embedded within
the switch. Because this switch constitutes
a single point of failure, we propose to replicate it.
Now, for the slaves to be able to use the new replica, we
need to add links from each slave to the new replica.
The next step is to make it easy for a surviving replica to replace a faulty one. This is achieved
by providing replica determinism, which means to ensure that both replicas provide the same
results as long as they are non-faulty. To achieve replica determinism, some form
of communication between the replicas is usually
required. The paper therefore proposes an interlink between the two switches
which is also replicated to provide further fault
tolerance This is then the basic
architecture presented in the paper for the communication
subsystem of FT4FTT. Given this basic architecture,
the paper discusses some ideas and alternatives for
certain design decisions. For example, it explores
How exactly to ensure replica determinism
How to best take advantage of the channel replication when
transmitting trigger messages What services to provide to
the slaves to make it easier for them to tolerate each other's
failures. And several other design
ideas. No final decisions have been
made for any of these ideas. Deciding the exact details still remains future
work at this moment and is one of the next steps we will carry out. This concludes my
presentation. Thank you very much for your attention.