Review on A Deep Learning for Sleep Analysis

Posted by Mohamad Ivan Fanany

Printed version

This writing summarizes and reviews on a deep learning for sleep analysis: Sleep Stage Classification Using Unsupervised Feature Learning

Source code: Matlab code used in the paper is available at


  • Multimodal sleep data is very complex.
  • Feature extraction of sleep data is difficult and time consuming.
  • The size of the feature space can grow, which ultimately needs feature selection.
  • Unsupervised feature learning and in particular deep learning [10, 11, 12, 13, 14, 15] propose ways for training the weight matrices in each layer in an unsupervised fashion as a pre-processing step before training the whole network.
  • Deep Learning has proven to give good results in other areas such as vision tasks [10], object recognition [16], motion capture data [17], speech recognition [18], and bacteria identification [19].


  • How to isolate features in multivariate time-series data to be used for correctly identify and automate the annotation process to generate sleep hypnograms.
  • The absence of universally applicable features for training a sleep stage classifier requires a two-stage process: feature extraction and feature selection [1, 2, 3, 4, 5, 6, 7, 8, 9].
  • Inconsistencies between sleep labs (equipment, electrode placement), experimental setups (number of signals and categories, subject variations), and interscorer variability (80% conformance for healthy patients and even less for patients with sleep disorder [9]) make it challenging to compare sleep stage classification accuracy to previous works.


  • The discovery of new useful feature representations that a human expert might not be aware of, which in turn could lead to a better understanding of the sleep process and present a way of exploiting massive amounts of unlabeled data.
  • Unsupervised feature learning, not only removes the need for domain specific expert knowledge, but inherently also provides tools for anomaly detection and noise redundancy.

Addressed problem:

  • Build an unsupervised feature learning architecture which can eliminate the use of handmade features in sleep analysis.

Previous works:

  • The proposed architecture of training the DBN follows previous work with unsupervised feature learning for electroencephalography (EEG) event detection [20].
  • Results in [2] report a best result accuracy of around 61% for classification of 5 stages from a single EEG channel using GOHMM and AR coefficients as features.
  • Works by [8] achieved 83.7% accuracy using conditional random fields with six power spectra density features for one EEG signal on four human subjects during a 24-hour recording session and considering six stages.
  • Works by [7] achieved 85.6% accuracy on artifact-free, two expert agreement sleep data from 47 mostly healthy subjects using 33 features with SFS feature selection and four separately trained neural networks as classifiers.

Key ideas:

  • An alternative to using hand-tailored features derived from expert knowledge is to apply unsupervised feature learning techniques for learning the feature representations from unlabeled data.
  • The main focus is to learn meaningful feature representations from unlabeled sleep data.
  • EEG, EOG, and EMG records is segmented and used to train a deep belief network (DBN), using no prior knowledge.
  • Integrating a hidden Markov model (HMM) and compare classification accuracy with a feature-based approach that uses prior knowledge.
  • The inclusion of an HMM post-processing is to:
    • Improve the capturing of a more realistic sleep stage switching, for example, hinders excessive or unlikely sleep stage transitions.
    • Infuse the human experts knowledge into the system.
  • Even though the classifier is trained using labeled data, the feature representations are learned from unlabeled data.
  • The paper also presents a study of anomaly detection with the application to home environment data collection.

Network architecture:

  • Deep belief networks (DBN).
  • A DBN is formed by stacking a user-defined number of RBMs on top of each other where the output from a lower-level RBM is the input to a higher-level RBM.
  • The main difference between a DBN and a multilayer perceptron is the inclusion of a bias vector for the visible units, which is used to reconstruct the input signal, which plays an important role in the way DBNs are trained.
  • A reconstruction of the input can be obtained from the unsupervised pretrained DBN by encoding the input to the top RBM and then decoding the state of the top RBM back to the lowest level.


  • Two dataset of electroencephalography (EEG) records of brain activity, electrooculography (EOG) records of eye movements, and electromyography records (EMG) of skeletal muscle activity.
    • The first consists of 25 acquisitions and is used to train and test the automatic sleep stager.
    • The second consists of 5 acquisitions and is used to validate anomaly detection on sleep data collected at home.
  • Benchmark Dataset. Provided by St. Vincent’s University Hospital and University College Dublin, which can be downloaded from PhysioNet [29].
  • Home Sleep Dataset. PSG data of approximately 60 hours (5 nights) was collected at a healthy patient’s home using a Embla Titanium PSG. A total of 8 electrodes were used: EEG C3, EEG C4, EOG left, EOG right, 2 electrodes for the EMG channel, reference electrode, and ground electrode.


  • Notch filtering at 50 Hz to cancel out power line disturbances and down- sampled to 64 Hz after being prefiltered with a band-pass filter of 0.3 to 32 Hz for EEG and EOG, and 10 to 32 Hz for EMG.
  • Each epoch before and after a sleep stage switch is removed from the training set to avoid possible subsections of mislabeled data within one epoch. This resulted in 20.7% of total training samples to be removed.

Experiment setup:

  • The five sleep stages that are at focus are:
    • Awake,
    • Stage 1 (S1),
    • Stage 2 (S2),
    • Slow wave sleep (SWS),
    • Rapid eye-movement sleep (REM).
  • These stages come from a unified method for classifying an 8 h sleep recording introduced by Rechtschaffen and Kales (R&K) [22].
  • The goal of this work is not to replicate the R&K system or improve current state-of-the-art sleep stage classification but rather to explore the advantages of deep learning and the feasibility of using unsupervised feature learning applied to sleep data.
  • Therefore, the main method of evaluation is a comparison with a feature-based shallow model.
  • Even though the goal in this work is not to replicate the R&K system, its terminology is used for evaluation of the proposed architecture.
  • A graph that shows these five stages over an entire night is called a hypnogram, and each epoch according to the R&K system is either 20 s or 30 s.
  • While the R&K system brings consensus on terminology, among other advantages [2390099-0/abstract)], it has been criticized for a number of issues [24].
  • Each channel of the data in the proposed study is divided into segments of 1 second with zero overlap, which is a much higher temporal resolution than the one practiced by the R&K system.
  • The paper uses and compares three setups for an automatic sleep stager:
    1. feat-GOHMM: a shallow method that uses prior knowledge.
    2. feat-DBN: a deep architecture that also uses prior knowledge.
    3. raw-DBN, is a deep architecture that does not use any prior knowledge.
  • feat-GOHMM:
    • A Gaussian observation hidden Markov model (GOHMM) is used on 28 handmade features;
    • Feature selection is done by sequential backward selection (SBS), which starts with the full set of features and greedily removes a feature after each iteration step.
    • A principal component analysis (PCA) with five principal components is used after feature selection, followed by a Gaussian mixture model (GMM) with five components.
    • Initial mean and covariance values for each GMM component are set to the mean and covariance of annotated data for each sleep stage.
    • The output from the GMM is used as input to a hidden Markov model (HMM) [25].
  • feat-DBN:
    • A 2-layer DBN with 200 hidden units in both layers and a softmax classifier attached on top is used on 28 handmade features.
    • Both layers are pretrained for 300 epochs, and the top layer is fine-tuned for 50 epochs. Initial biases of hidden units are set empirically to −4 to encouraged sparsity [26], which prevents learning trivial or uninteresting feature representations.
    • Scaling to values between 0 and 1 is done by subtracting the mean, divided by the standard deviation, and finally adding 0.5.
  • raw-DBN:
    • A DBN with the same parameters as feat-DBN is used on preprocessed raw data.
    • Scaling is done by saturating the signal at a saturation constant, sat channel, then divide by 2 ∗ sat channel , and finally adding 0.5. The saturation constant was set to sat EEG = sat EOG = ± 60 μV and sat EMG = ± 40 μV.
    • Input consisted of the concatenation of EEG, EOG1, EOG2, and EMG. With window width, w, the visible layer becomes With four signals, 1 second window, and 64 samples per second, the input dimension is 256.
  • Anomali detection for Home Sleep data:
    • Anomaly detection is evaluated by training a DBN and calculating the root mean square error (RMSE) from the reconstructed signal from the DBN and the original signal.
    • A faulty signal in one channel often affects other channels for sleep data, such as movement artifacts, blink artifacts, and loose reference or ground electrode. Therefore, a detected fault in one channel should label all channels at that time as faulty.
    • All signals, except EEG2, are nonfaulty prior to a movement artifact at t = 7 s. This movement affected the reference electrode or the ground electrode, resulting in disturbances in all signals for the rest of the night, thereby rendering the signals unusable by a clinician. A poorly attached electrode was the cause for the noise in signal EEG2.
    • Previous approaches to artifact rejection in EEG analysis range from simple thresholding on abnormal amplitude and/or frequency to more complex strategies in order to detect individual artefacts [2700060-6/pdf), 28].


  • The results using raw data with a deep architecture, such as the DBN, were comparable to a feature-based approach when validated on clinical datasets.
  • F1-scores of the three setups: feat-GOHMM: 63.9 ± 10.8 feat-DBN: 72.2 ± 9.7 raw-DBN: 67.4 ± 12.9

H/W, S/W and computation time:

  • Windows 7, 64-bit machine with quad-core Intel i5 3.1 GHz CPU with use of a nVIDIA GeForce GTX 470 GPU using GPUmat, simulation time for feat-GOHMM, feat-DBN, and raw-DBN were approximately 10 minutes, 1 hour, and 3 hours per dataset, respectively.


  • Regarding the DBN parameter selection, it was noticed that setting initial biases for the hidden units to −4 was an important parameter for achieving good accuracy.
  • A better way of encourage sparsity is to include a sparsity penalty term in the cost objective function [31] instead of making a crude estimation of initial biases for the hidden units.
  • For the raw-DBN setup, it was also crucial to train each layer with a large number of epochs and in particular the fine tuning step.
  • Replacing HMM with conditional random fields (CRFs) could improve accuracy but is still a simplistic temporal model that does not exploit the power of DBNs [32].
  • While a clear advantage of using DBN is the natural way in which it deals with anomalous data, there are some limitations to the DBN:
    • The correlations between signals in the input data are not well captured. This gives a feature-based approach an advantage where, for example, the correlation between both EOG channels can easily be represented with a feature. This could be solved by either representing the correlation in the input or extending the DBN to handle such correlations, such as a cRBM [33].
    • It has been suggested for multimodal signals to train a separate DBN for each signal first and then train a top DBN with concatenated data [34]. This not only could improve classification accuracy, but also provide the ability to single out which signal contains the anomalous signal.
  • Notice a lower performance if sleep stages were not set to equal sizes in the training set.
  • High variation in the accuracy between patients, even if they came from the same dataset.
  • Increase in the number of layers and hidden units did not significantly improve classification accuracy. Rather, an increase in either the number of layers or hidden units often resulted in a significant increase in simulation time

Future study:

  • The work has explored clinical data sets in close cooperation with physicians, and future work will concentrate on the application for at home monitoring as sleep data is an area where unsupervised feature learning is a highly promising method for sleep stage classification as data is abundant and labels are costly to obtain.

My Review:

  • This is a very interesting paper that demonstrates deep learning gives better classification accuracy (even though the standard deviation is slightly higher) compared to shallow features learning.
  • This paper also explain many interesting insights on how best to train deep belief network for sleep stages analysis.
  • The paper also provides a complete and valuable references not only on deep learning but also sleep stages analysis and scoring from clinical and machine learning aspects.
  • Overall the report is very clear and comprehensive.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s