RTLinux-Based Railway Interlocking Execution Layer Design
๐ Introduction #
Railway interlocking systems are safety-critical infrastructures responsible for preventing conflicting train movements. As these systems evolved from mechanical to computer-based implementations, the requirements for real-time performance, fault tolerance, and reliability have significantly increased.
This article presents the design of an execution layer for a computer interlocking system based on RTLinux-enhanced Linux. The design focuses on deterministic scheduling, robust device drivers, dual-CPU redundancy, and fault-tolerant mechanisms suitable for railway signaling environments.
๐งฉ System Requirements Analysis #
Basic Functional Requirements #
A complete computer interlocking system consists of:
- Upper-level control system
- Interlocking execution unit
- Indication/display subsystem
Core requirements include:
- Reliable hardware I/O drivers
- Deterministic real-time execution
- Strong fault tolerance and safety guarantees
Real-Time Constraints #
Standard Linux lacks deterministic interrupt handling and scheduling guarantees. RTLinux addresses this by inserting a real-time layer between hardware interrupts and the Linux kernel:
- Real-time tasks preempt Linux entirely
- Interrupts are first handled by RTLinux
- Linux runs as a low-priority task
This ensures that critical interlocking operations meet strict timing constraints.
โ๏ธ Hardware Device Driver Design #
CPU Board Driver #
The CPU subsystem consists of dual modules (A/B) operating in a master-slave configuration.
Key Design Points #
- Master CPU handles synchronization and coordination
- Communication via PC/104, Ethernet, serial, and USB
- FPGA manages data exchange and interrupt signaling
Data Flow #
- Outgoing data: ISA bus โ FPGA โ target RAM
- Incoming data: stored in dual-port RAM (DPRAM)
Concurrency Control #
- Kernel APIs:
copy_from_user,copy_to_user - Read-write locks (
rw_lock) ensure safe access
This design enables deterministic and synchronized data exchange between CPUs.
I/O Communication Board Driver #
Each I/O board includes:
- Dual FPGAs (XC6SLX9)
- SRAM extensions
- Independent power for redundancy
Write Operation #
- Check bus availability
- Select target board
- Write data
- Set completion flag and release bus
Read Operation #
- Access shared DPRAM
- Validate FPGA status
- Read via PC/104 interface
- Update flags and reset state
This ensures consistent data visibility across redundant CPUs.
Device-Level Safety #
Hardware reliability is critical in railway environments:
- Designed to withstand extreme conditions (weather, electrical noise)
- Failure rate analysis considers environmental and electrical stress factors
These measures provide a stable foundation for upper-layer logic.
๐ง Software Execution Layer Architecture #
Overall Framework #
The execution layer is composed of:
- Initialization Module
- Interlocking Scheduling Module
- Dual-CPU Control Module
- Fault Diagnosis Module
This layered structure separates responsibilities and improves maintainability.
๐ Core Functional Modules #
Initialization Module #
- Performs hardware and software initialization
- Loads configuration and station data
- Prepares runtime environment
Interlocking Scheduling Module #
- Uses a fixed 10 ms cycle
- Timeout threshold: 250 ms
- Responsibilities:
- Task scheduling
- Clock synchronization
- Role determination (master/slave)
- Output execution or failover
Dual-CPU Control Module #
Implements “2-out-of-2” redundancy logic.
Synchronization #
- Multiple sync points per cycle
- Shared data via DPRAM
- Real-time clock alignment
Data Comparison #
- Input data validated via CRC
- Output requires full agreement between CPUs
- Sequence numbers and timestamps ensure consistency
State Management #
- Heartbeat every 50 ms
- Timeout threshold: 200 ms
- Missing heartbeat triggers fault handling
Fault Diagnosis Module #
- Polling interval: 50 ms
- Detects:
- Initialization failures
- CPU halt conditions
- Communication loss
- Output inconsistencies
Fault Handling #
- Critical faults โ system halt + standby switch
- Minor faults โ warning indicators
๐ก๏ธ Fault Tolerance Mechanism #
A synchronous controller ensures real-time consistency:
- Continuous data monitoring
- Dual-machine synchronization via RTLinux clock
- Transient fault filtering via self-check routines
Processing flow:
- Data acquisition
- Dual-CPU computation
- 2ร2 comparison
- Output or recomputation
Persistent mismatches trigger fault handling procedures.
๐ Dual-Machine Hot Standby Switching #
The system uses an active-standby architecture:
- Primary (A) handles active control
- Backup (B) monitors and mirrors state
Switching Logic #
- Backup continuously compares outputs
- On primary failure, backup takes over seamlessly
- Global variable (
work_cpu) ensures exclusive control
This design guarantees uninterrupted operation.
๐ System Security Considerations #
Beyond redundancy, additional safeguards include:
- Network security mechanisms
- Data protection strategies
- Antivirus and firewall integration
- Regular backups and disaster recovery
These measures enhance system resilience against both faults and external threats.
๐งช System Testing and Validation #
Test Environment #
- Industrial PCs for control and display
- Interlocking host system
- Ethernet-based communication
- PCI monitoring tools
Validation Metrics #
- Cycle timing accuracy
- Data synchronization
- Heartbeat stability
- I/O consistency
Results #
- Stable operation under real-time constraints
- Successful synchronization across CPUs
- Reliable failover behavior
The system meets functional and safety requirements.
๐ง Key Design Insights #
- RTLinux ensures deterministic execution over standard Linux
- Dual-CPU architecture improves fault tolerance
- DPRAM enables efficient synchronization
- Layered software design simplifies system evolution
- Hot standby switching ensures high availability
โ Conclusion #
This RTLinux-based execution layer design demonstrates a robust approach to building safety-critical railway interlocking systems. By combining deterministic real-time scheduling, reliable device drivers, and comprehensive fault tolerance mechanisms, the system achieves high reliability and availability.
The architecture provides a practical reference for modern railway signaling systems and other safety-critical embedded applications. Future improvements may focus on hardware durability, communication protocol optimization, and long-term software maintainability.