Network traffic analysis has always been critical for maintaining security and efficiency in modern networks. However, with the rise of encryption protocols and high-speed infrastructures, traditional approaches often fail to provide the necessary insights. My research introduces a novel methodology for classifying network traffic based on Single Flow Time Series (SFTS) analysis, achieving unprecedented accuracy and universality.

The Challenge of Modern Traffic Analysis

With the increasing prevalence of encryption (e.g., TLS 1.3, DNS over HTTPS), traditional Deep Packet Inspection (DPI) methods are often ineffective. Flow-based monitoring emerged as a powerful alternative, aggregating packets into flows based on shared properties like IP addresses and transport ports. However, conventional flows provide limited information, often relying on simple statistics such as packet or byte counts. This simplification leads to significant information loss.

To overcome these limitations, my approach enhances flow data by analyzing time series of packets within a single flow, focusing on payload sizes and their transmission times. This method unlocks hidden patterns in the data, even for encrypted traffic.

Introducing Single Flow Time Series (SFTS)

An SFTS represents the sequence of packet payload sizes within a single flow, accompanied by timestamps. Unlike traditional evenly spaced time series, SFTS handles unevenly spaced data, capturing the natural irregularities of network traffic. This representation allows for more accurate and detailed analysis without the need for arbitrary aggregation windows.

Feature Extraction

We derived a comprehensive set of 69 features from the SFTS, grouped into the following categories:

  1. Statistical Features: Metrics like mean, variance, kurtosis, and skewness, describing the distribution of packet sizes.
  2. Time-based Features: Metrics such as time differences, duration, and temporal distribution, capturing packet timing behavior.
  3. Frequency-based Features: Derived from the Lomb-Scargle Periodogram, revealing periodicity and spectral properties.
  4. Distribution-based Features: Metrics assessing the spread and clustering of packet sizes.
  5. Behavioral Features: Attributes capturing unique flow dynamics, such as switching ratios and transient bursts.

These features provide a rich representation of each flow, enabling high-accuracy classification across a variety of tasks.

Applications in Network Traffic Classification

Our methodology was evaluated across 15 well-known datasets and 23 classification tasks, including:

  • Binary Classification: Detecting threats like botnets, cryptominers, and DDoS attacks.
  • Multiclass Classification: Differentiating between traffic types like IoT malware, VPNs, and Tor.

Key Results

  • Binary Classification: Achieved near-perfect accuracy (up to 99.9%) on tasks like botnet and DoH detection.
  • Multiclass Classification: Outperformed state-of-the-art methods in classifying IoT malware and VPN traffic.
  • Efficiency: The feature set was designed for online computation, enabling deployment in high-speed networks without excessive resource consumption.

Deployment in High-Speed Networks

The SFTS-based features were implemented as part of the NetTiSA (Network Time Series Analyzed) flow, optimized for real-time operation. The system processes flows at speeds up to 100 Gbps, making it suitable for large-scale deployments, such as ISP networks. By reducing the computational complexity of time-series analysis, we ensure scalability without compromising accuracy.

Conclusion and Future Work

Our SFTS-based approach represents a significant leap forward in network traffic classification. By leveraging advanced time-series analysis techniques, we provide a robust solution for classifying both encrypted and unencrypted traffic with minimal resource requirements.

Future Directions

  1. Enhanced Feature Optimization: Further reducing the feature set for bandwidth-constrained environments.
  2. Real-time Threat Detection: Applying SFTS features to real-time anomaly detection in high-speed networks.
  3. Extended Use Cases: Adapting the methodology for emerging traffic patterns and protocols.

Feel free to explore our datasets and implementation details, which are publicly available on Zenodo 1 and Zenodo 1.