{ "cells": [ { "cell_type": "markdown", "id": "5d037743", "metadata": { "id": "5d037743" }, "source": [ "# Exploring MIMIC-III" ] }, { "cell_type": "markdown", "id": "fbae8e9b", "metadata": { "id": "fbae8e9b" }, "source": [ "Let's begin by exploring data in the MIMIC-III Waveform Database.\n", "\n", "Our **objectives** are to:\n", "- Become familiar with the file structure\n", "- Find out which signals are present in selected records and segments, and how long the signals last.\n", "- Search for records that contain signals of interest.\n", "- Load waveforms using the WFDB toolbox.\n", "- Plot one minute of signals from a segment of data\n", "- Look more closely at the shape of the PPG pulse waves" ] }, { "cell_type": "markdown", "id": "0b240726", "metadata": { "id": "0b240726" }, "source": [ "
\n", "

Resource: You can find out more about the MIMIC-III Waveform Database here.

\n", "
" ] }, { "cell_type": "markdown", "id": "28b8e213", "metadata": { "id": "28b8e213" }, "source": [ "---\n", "## Setup" ] }, { "cell_type": "markdown", "id": "5dac032e", "metadata": { "id": "5dac032e" }, "source": [ "### Specify the required Python packages\n", "We'll import the following:\n", "- _sys_: an essential python package\n", "- _pathlib_ (well a particular function from _pathlib_, called _Path_)" ] }, { "cell_type": "code", "execution_count": 188, "id": "ce3cdfde", "metadata": { "id": "ce3cdfde" }, "outputs": [], "source": [ "import sys\n", "from pathlib import Path" ] }, { "cell_type": "markdown", "id": "9976c5e4", "metadata": { "id": "9976c5e4" }, "source": [ "### Specify a particular version of the WFDB Toolbox" ] }, { "cell_type": "markdown", "id": "6533154b", "metadata": { "id": "6533154b" }, "source": [ "- _wfdb_: For this workshop we will be using version 4 of the WaveForm DataBase (WFDB) Toolbox package. The package contains tools for processing waveform data such as those found in MIMIC:" ] }, { "cell_type": "code", "execution_count": null, "id": "5fdfa989", "metadata": { "id": "5fdfa989" }, "outputs": [], "source": [ "!pip install wfdb==4.1.0\n", "import wfdb" ] }, { "cell_type": "markdown", "id": "e11ce5b6", "metadata": { "id": "e11ce5b6" }, "source": [ "
\n", "

Resource: You can find out more about the WFDB package here.

\n", "
" ] }, { "cell_type": "markdown", "id": "d492e49f", "metadata": { "id": "d492e49f" }, "source": [ "Now that we have imported these packages (_i.e._ toolboxes) we have a set of tools (functions) ready to use." ] }, { "cell_type": "markdown", "id": "e7d38297", "metadata": { "id": "e7d38297" }, "source": [ "### Specify the name of the MIMIC Waveform Database" ] }, { "cell_type": "markdown", "id": "68491718", "metadata": { "id": "68491718" }, "source": [ "- Specify the name of the MIMIC-III Matched Waveform Database on Physionet, which comes from the URL: https://physionet.org/content/mimic3wdb-matched/1.0/" ] }, { "cell_type": "code", "execution_count": 190, "id": "982b8154", "metadata": { "id": "982b8154" }, "outputs": [], "source": [ "database_name = 'mimic3wdb-matched/1.0'" ] }, { "cell_type": "markdown", "id": "e49196a6", "metadata": { "id": "e49196a6" }, "source": [ "---\n", "## Identify the records in the database" ] }, { "cell_type": "markdown", "id": "b476f9b7", "metadata": { "id": "b476f9b7" }, "source": [ "### Get a list of records\n", "\n", "- Use the [`get_record_list`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.get_record_list) function from the WFDB toolbox to get a list of records in the database." ] }, { "cell_type": "code", "execution_count": 191, "id": "d91aa6a7", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d91aa6a7", "outputId": "db8e3169-76ac-4bdd-bbaa-91cf626c1a6b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done: Loaded list of 10282 ICU stays for 'mimic3wdb-matched/1.0' database\n" ] } ], "source": [ "icustay_records = wfdb.get_record_list(database_name)\n", "print(\"Done: Loaded list of {} ICU stays for '{}' database\".format(len(icustay_records), database_name))" ] }, { "cell_type": "markdown", "id": "143bd9e6", "metadata": {}, "source": [ "- Display the first few records" ] }, { "cell_type": "code", "execution_count": 192, "id": "66545c1d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First five ICU stays: ['p00/p000020/', 'p00/p000030/', 'p00/p000033/', 'p00/p000052/', 'p00/p000079/']\n" ] } ], "source": [ "print(\"First five ICU stays: {}\".format(icustay_records[0:5]))" ] }, { "cell_type": "markdown", "id": "4eecea37", "metadata": {}, "source": [ "Note the formatting of these records: each starts with an intermediate directory (\"p00\" in this case), followed by a record directory." ] }, { "cell_type": "markdown", "id": "7f9bb7cc", "metadata": {}, "source": [ "
Q: Can you print the names of the last five ICU stays?
Hint: in Python, the last five elements can be specified using '[-5:]'
" ] }, { "cell_type": "code", "execution_count": 193, "id": "0RzQmqjiQ9LD", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0RzQmqjiQ9LD", "outputId": "31eb6067-de92-4424-b32b-f292623215a5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reached maximum required number of records.\n", "Loaded 106 records from the 'mimic3wdb-matched/1.0' database.\n" ] } ], "source": [ "# iterate the subjects to get a list of records\n", "max_records_to_load = 100\n", "records = []\n", "for subject in icustay_records:\n", " # stop if we've loaded enough records\n", " if len(records) >= max_records_to_load:\n", " print(\"Reached maximum required number of records.\")\n", " break\n", " studies = wfdb.get_record_list(f'{database_name}/{subject}')\n", " for study in studies:\n", " records.append(Path(f'{subject}{study}'))\n", "\n", "print(f\"Loaded {len(records)} records from the '{database_name}' database.\")" ] }, { "cell_type": "markdown", "id": "fc82d67e", "metadata": { "id": "fc82d67e" }, "source": [ "### Look at the records" ] }, { "cell_type": "markdown", "id": "29552f5a", "metadata": { "id": "29552f5a" }, "source": [ "- Display the first few records" ] }, { "cell_type": "code", "execution_count": 194, "id": "bb5745a7", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bb5745a7", "outputId": "8fe32e59-c542-4a40-bd06-0c04fdcfbbfe" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First five records: \n", " - p00/p000020/3544749_0001\n", " - p00/p000020/3544749_0002\n", " - p00/p000020/3544749_0003\n", " - p00/p000020/3544749_0004\n", " - p00/p000020/3544749_0005\n", "\n", "Note the formatting of these records:\n", " - intermediate directory ('p00' in this case)\n", " - icu stay identifier (e.g. 'p000020')\n", " - segment identifier (e.g. '3544749_0001'), consisting of the record (e.g. '3544749'), followed by the segment number (e.g. '0001').\n", " \n" ] } ], "source": [ "# format and print first five records\n", "first_five_records = [str(x) for x in records[2:7]] # ignored the first two which are numerics\n", "first_five_records = \"\\n - \".join(first_five_records)\n", "print(f\"First five records: \\n - {first_five_records}\")\n", "\n", "print(\"\"\"\n", "Note the formatting of these records:\n", " - intermediate directory ('p00' in this case)\n", " - icu stay identifier (e.g. 'p000020')\n", " - segment identifier (e.g. '3544749_0001'), consisting of the record (e.g. '3544749'), followed by the segment number (e.g. '0001').\n", " \"\"\")" ] }, { "cell_type": "markdown", "id": "b56c29d5", "metadata": { "id": "b56c29d5" }, "source": [ "
\n", "

Q: Can you print the names of the last five records?
Hint: in Python, the last five elements can be specified using '[-5:]'

\n", "
" ] }, { "cell_type": "markdown", "id": "d2a80895", "metadata": { "id": "d2a80895" }, "source": [ "## Identify records suitable for analysis" ] }, { "cell_type": "markdown", "id": "1a3218d3", "metadata": { "id": "1a3218d3" }, "source": [ "- The signals and their durations vary from one record (and segment) to the next. \n", "- Since most studies require specific types of signals (e.g. blood pressure and photoplethysmography signals), we need to be able to identify which records (or segments) contain the required signals and duration." ] }, { "cell_type": "markdown", "id": "b02c0b4e", "metadata": { "id": "b02c0b4e" }, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": 195, "id": "5bb47556", "metadata": { "id": "5bb47556" }, "outputs": [], "source": [ "import pandas as pd\n", "from pprint import pprint" ] }, { "cell_type": "code", "execution_count": 196, "id": "95181681", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "95181681", "outputId": "544c69db-59d9-432c-ee6c-10e1b0f54318" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Earlier, we loaded 10282 records from the 'mimic3wdb-matched/1.0' database.\n" ] } ], "source": [ "print(f\"Earlier, we loaded {len(icustay_records)} records from the '{database_name}' database.\")" ] }, { "cell_type": "markdown", "id": "7f2b5955", "metadata": { "id": "7f2b5955" }, "source": [ "### Specify requirements" ] }, { "cell_type": "markdown", "id": "83f8611c", "metadata": { "id": "83f8611c" }, "source": [ "- Required signals" ] }, { "cell_type": "code", "execution_count": 197, "id": "3d1505ab", "metadata": { "id": "3d1505ab" }, "outputs": [], "source": [ "required_sigs = ['ABP', 'PLETH', 'II']" ] }, { "cell_type": "markdown", "id": "d9493d73", "metadata": {}, "source": [ "_Definitions:_\n", "- ABP - arterial blood presure (i.e. blood pressure signal)\n", "- PLETH - photoplethysmogram (PPG) signal\n", "- II - lead II electrocardiogram (ECG) signal" ] }, { "cell_type": "markdown", "id": "03920810", "metadata": { "id": "03920810" }, "source": [ "- Required duration" ] }, { "cell_type": "code", "execution_count": 198, "id": "568a93c1", "metadata": { "id": "568a93c1" }, "outputs": [], "source": [ "# convert from minutes to seconds\n", "req_seg_duration = 10*60 " ] }, { "cell_type": "markdown", "id": "d49187cd", "metadata": { "id": "d49187cd" }, "source": [ "### Find out how many records meet the requirements" ] }, { "cell_type": "markdown", "id": "65f2cdce", "metadata": { "id": "65f2cdce" }, "source": [ "_NB: This step may take a while. The results are copied below to save running it yourself._" ] }, { "cell_type": "code", "execution_count": null, "id": "015b47d3", "metadata": { "id": "015b47d3" }, "outputs": [], "source": [ "import urllib.request # to read online text file\n", "\n", "no_recs_req = 100\n", "matching_recs = {'dir':[], 'seg_name':[], 'length':[]}\n", "\n", "for icustay_record in icustay_records:\n", " \n", " if len(matching_recs['seg_name']) == no_recs_req:\n", " break\n", " \n", " print('Record: {}'.format(icustay_record)) #, end=\"\", flush=True)\n", " icustay_record_dir = f'{database_name}/{icustay_record}'\n", " temp = icustay_record.split(\"/\")\n", " icustay_record_name = temp[1]\n", " \n", " print(' Identifying segments: ', end=\"\", flush=True) \n", " target_url = \"https://physionet.org/files/\" + icustay_record_dir + \"RECORDS\"\n", " data = urllib.request.urlopen(target_url)\n", " \n", " segs = []\n", " for line in data:\n", " curr_line = line.decode(\"utf-8\") \n", " if \"p\" not in curr_line:\n", " temp = curr_line.split(\"\\n\")\n", " segs.append(temp[0])\n", " \n", " # check to see whether there are any waveform files (as opposed to just numerics)\n", " if len(segs) == 0:\n", " print('(no waveforms)')\n", " continue\n", " \n", " # check to see whether the required signals are present in this recording\n", " rec_header = segs[0].split(\"_\")[0] + \"_layout\"\n", " rec_metadata = wfdb.rdheader(rec_header,\n", " pn_dir=icustay_record_dir,\n", " rd_segments=True)\n", " sigs_present = rec_metadata.sig_name\n", " if not all(x in sigs_present for x in required_sigs):\n", " print('{} (missing signals)'.format(rec_header))\n", " continue\n", " \n", " print(' Reading data: ', end=\"\", flush=True)\n", " for seg in segs:\n", " print(seg, end=\"\", flush=True)\n", " seg_metadata = wfdb.rdheader(seg,\n", " pn_dir=icustay_record_dir,\n", " rd_segments=True)\n", "\n", " # Check whether the required signals are present in this segment\n", " sigs_present = seg_metadata.sig_name\n", " if not all(x in sigs_present for x in required_sigs):\n", " print(' (missing signals)')\n", " continue\n", " \n", " # Check whether the segment is of the required duration\n", " seg_length = seg_metadata.sig_len/(seg_metadata.fs)\n", "\n", " req_seg_duration = 60*10\n", " if seg_length < req_seg_duration:\n", " print(f' (too short at {seg_length/60:.1f} mins)')\n", " continue\n", "\n", " matching_recs['dir'].append(icustay_record_dir)\n", " matching_recs['seg_name'].append(seg)\n", " matching_recs['length'].append(seg_length)\n", " print(' (met requirements)')\n", " # Since we only need one segment per record break out of loop\n", " break\n", "\n", "print(f\"A total of {len(matching_recs['dir'])} records met the requirements:\")\n", "\n", "#df_matching_recs = pd.DataFrame(data=matching_recs)\n", "#df_matching_recs.to_csv('matching_records.csv', index=False)\n", "#p=1" ] }, { "cell_type": "code", "execution_count": 213, "id": "75ec15f4", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "75ec15f4", "outputId": "3ea832cd-4a4b-4265-bc2b-275d0f6c1802" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A total of 100 records were identified that met the requirements (although the search was limited to 100 records).\n", "\n", "The relevant segment names are:\n", " - 3531764_0003\n", " - 3285727_0007\n", " - 3092245_0007\n", " - 3047369_0003\n", " - 3481389_0008\n", " - 3189254_0006\n", " - 3054941_0001\n", " - 3462211_0001\n", " - 3755731_0015\n", " - 3988865_0012\n", " - 3747397_0005\n", " - 3016830_0019\n", " - 3429844_0002\n", " - 3395556_0005\n", " - 3046879_0003\n", " - 3376064_0010\n", " - 3205611_0013\n", " - 3126291_0001\n", " - 3622493_0043\n", " - 3410222_0004\n", " - 3021418_0002\n", " - 3832863_0007\n", " - 3757068_0002\n", " - 3090785_0011\n", " - 3929725_0001\n", " - 3031688_0022\n", " - 3040472_0001\n", " - 3289928_0011\n", " - 3068462_0004\n", " - 3873015_0004\n", " - 3089709_0002\n", " - 3883443_0015\n", " - 3447276_0006\n", " - 3216751_0009\n", " - 3377083_0005\n", " - 3831833_0006\n", " - 3203448_0022\n", " - 3729471_0009\n", " - 3000714_0001\n", " - 3019875_0004\n", " - 3113048_0001\n", " - 3678945_0003\n", " - 3358389_0004\n", " - 3124300_0001\n", " - 3145678_0005\n", " - 3383991_0003\n", " - 3147138_0008\n", " - 3068585_0032\n", " - 3545143_0009\n", " - 3062663_0002\n", " - 3548704_0010\n", " - 3947427_0006\n", " - 3763138_0005\n", " - 3263629_0001\n", " - 3081334_0026\n", " - 3251540_0009\n", " - 3195697_0001\n", " - 3973217_0001\n", " - 3942161_0001\n", " - 3696733_0005\n", " - 3176657_0041\n", " - 3310335_0019\n", " - 3132214_0015\n", " - 3228071_0003\n", " - 3195753_0006\n", " - 3573743_0005\n", " - 3266729_0003\n", " - 3760919_0010\n", " - 3958352_0006\n", " - 3300295_0003\n", " - 3028261_0046\n", " - 3912316_0020\n", " - 3341288_0003\n", " - 3306238_0001\n", " - 3217364_0003\n", " - 3872162_0001\n", " - 3133443_0001\n", " - 3457114_0003\n", " - 3020774_0012\n", " - 3820075_0003\n", " - 3573275_0029\n", " - 3145453_0019\n", " - 3528521_0005\n", " - 3119460_0009\n", " - 3511654_0003\n", " - 3061470_0003\n", " - 3083615_0012\n", " - 3145814_0005\n", " - 3426164_0003\n", " - 3688103_0007\n", " - 3228082_0010\n", " - 3028253_0056\n", " - 3523904_0009\n", " - 3576649_0004\n", " - 3794019_0005\n", " - 3013677_0009\n", " - 3025436_0007\n", " - 3879187_0012\n", " - 3195265_0024\n", " - 3163704_0001\n", "\n", "The corresponding directories are: \n", " - mimic3wdb-matched/1.0/p00/p000160/\n", " - mimic3wdb-matched/1.0/p00/p000188/\n", " - mimic3wdb-matched/1.0/p00/p000333/\n", " - mimic3wdb-matched/1.0/p00/p000543/\n", " - mimic3wdb-matched/1.0/p00/p000618/\n", " - mimic3wdb-matched/1.0/p00/p000735/\n", " - mimic3wdb-matched/1.0/p00/p000801/\n", " - mimic3wdb-matched/1.0/p00/p000946/\n", " - mimic3wdb-matched/1.0/p00/p001038/\n", " - mimic3wdb-matched/1.0/p00/p001049/\n", " - mimic3wdb-matched/1.0/p00/p001457/\n", " - mimic3wdb-matched/1.0/p00/p001501/\n", " - mimic3wdb-matched/1.0/p00/p001606/\n", " - mimic3wdb-matched/1.0/p00/p001791/\n", " - mimic3wdb-matched/1.0/p00/p001840/\n", " - mimic3wdb-matched/1.0/p00/p001855/\n", " - mimic3wdb-matched/1.0/p00/p001949/\n", " - mimic3wdb-matched/1.0/p00/p002063/\n", " - mimic3wdb-matched/1.0/p00/p002343/\n", " - mimic3wdb-matched/1.0/p00/p002369/\n", " - mimic3wdb-matched/1.0/p00/p002458/\n", " - mimic3wdb-matched/1.0/p00/p002577/\n", " - mimic3wdb-matched/1.0/p00/p002586/\n", " - mimic3wdb-matched/1.0/p00/p002639/\n", " - mimic3wdb-matched/1.0/p00/p002703/\n", " - mimic3wdb-matched/1.0/p00/p002858/\n", " - mimic3wdb-matched/1.0/p00/p002906/\n", " - mimic3wdb-matched/1.0/p00/p002974/\n", " - mimic3wdb-matched/1.0/p00/p002981/\n", " - mimic3wdb-matched/1.0/p00/p003039/\n", " - mimic3wdb-matched/1.0/p00/p003386/\n", " - mimic3wdb-matched/1.0/p00/p003617/\n", " - mimic3wdb-matched/1.0/p00/p003744/\n", " - mimic3wdb-matched/1.0/p00/p003866/\n", " - mimic3wdb-matched/1.0/p00/p003949/\n", " - mimic3wdb-matched/1.0/p00/p004053/\n", " - mimic3wdb-matched/1.0/p00/p004115/\n", " - mimic3wdb-matched/1.0/p00/p004313/\n", " - mimic3wdb-matched/1.0/p00/p004324/\n", " - mimic3wdb-matched/1.0/p00/p004331/\n", " - mimic3wdb-matched/1.0/p00/p004405/\n", " - mimic3wdb-matched/1.0/p00/p004588/\n", " - mimic3wdb-matched/1.0/p00/p004679/\n", " - mimic3wdb-matched/1.0/p00/p004802/\n", " - mimic3wdb-matched/1.0/p00/p004804/\n", " - mimic3wdb-matched/1.0/p00/p004833/\n", " - mimic3wdb-matched/1.0/p00/p004837/\n", " - mimic3wdb-matched/1.0/p00/p004904/\n", " - mimic3wdb-matched/1.0/p00/p004906/\n", " - mimic3wdb-matched/1.0/p00/p004966/\n", " - mimic3wdb-matched/1.0/p00/p005030/\n", " - mimic3wdb-matched/1.0/p00/p005071/\n", " - mimic3wdb-matched/1.0/p00/p005193/\n", " - mimic3wdb-matched/1.0/p00/p005345/\n", " - mimic3wdb-matched/1.0/p00/p005453/\n", " - mimic3wdb-matched/1.0/p00/p005742/\n", " - mimic3wdb-matched/1.0/p00/p005885/\n", " - mimic3wdb-matched/1.0/p00/p005913/\n", " - mimic3wdb-matched/1.0/p00/p005937/\n", " - mimic3wdb-matched/1.0/p00/p006116/\n", " - mimic3wdb-matched/1.0/p00/p006381/\n", " - mimic3wdb-matched/1.0/p00/p006533/\n", " - mimic3wdb-matched/1.0/p00/p006539/\n", " - mimic3wdb-matched/1.0/p00/p006621/\n", " - mimic3wdb-matched/1.0/p00/p006692/\n", " - mimic3wdb-matched/1.0/p00/p006702/\n", " - mimic3wdb-matched/1.0/p00/p006728/\n", " - mimic3wdb-matched/1.0/p00/p006875/\n", " - mimic3wdb-matched/1.0/p00/p007107/\n", " - mimic3wdb-matched/1.0/p00/p007184/\n", " - mimic3wdb-matched/1.0/p00/p007251/\n", " - mimic3wdb-matched/1.0/p00/p007445/\n", " - mimic3wdb-matched/1.0/p00/p007477/\n", " - mimic3wdb-matched/1.0/p00/p007529/\n", " - mimic3wdb-matched/1.0/p00/p007533/\n", " - mimic3wdb-matched/1.0/p00/p007614/\n", " - mimic3wdb-matched/1.0/p00/p007629/\n", " - mimic3wdb-matched/1.0/p00/p007644/\n", " - mimic3wdb-matched/1.0/p00/p007654/\n", " - mimic3wdb-matched/1.0/p00/p007866/\n", " - mimic3wdb-matched/1.0/p00/p007966/\n", " - mimic3wdb-matched/1.0/p00/p008061/\n", " - mimic3wdb-matched/1.0/p00/p008142/\n", " - mimic3wdb-matched/1.0/p00/p008167/\n", " - mimic3wdb-matched/1.0/p00/p008318/\n", " - mimic3wdb-matched/1.0/p00/p008396/\n", " - mimic3wdb-matched/1.0/p00/p008723/\n", " - mimic3wdb-matched/1.0/p00/p008735/\n", " - mimic3wdb-matched/1.0/p00/p008748/\n", " - mimic3wdb-matched/1.0/p00/p008780/\n", " - mimic3wdb-matched/1.0/p00/p008795/\n", " - mimic3wdb-matched/1.0/p00/p008799/\n", " - mimic3wdb-matched/1.0/p00/p008896/\n", " - mimic3wdb-matched/1.0/p00/p009124/\n", " - mimic3wdb-matched/1.0/p00/p009128/\n", " - mimic3wdb-matched/1.0/p00/p009258/\n", " - mimic3wdb-matched/1.0/p00/p009460/\n", " - mimic3wdb-matched/1.0/p00/p009473/\n", " - mimic3wdb-matched/1.0/p00/p009798/\n", " - mimic3wdb-matched/1.0/p00/p009993/\n" ] } ], "source": [ "print(f\"A total of {len(matching_recs['dir'])} records were identified that met the requirements (although the search was limited to {no_recs_req} records).\")\n", "\n", "relevant_segments_names = \"\\n - \".join(matching_recs['seg_name'])\n", "print(f\"\\nThe relevant segment names are:\\n - {relevant_segments_names}\")\n", "\n", "relevant_dirs = \"\\n - \".join(matching_recs['dir'])\n", "print(f\"\\nThe corresponding directories are: \\n - {relevant_dirs}\")" ] }, { "cell_type": "markdown", "id": "719f20f8", "metadata": { "id": "719f20f8" }, "source": [ "
\n", "

Question: Is this enough data for a study? Consider different types of studies, e.g. assessing the performance of a previously proposed algorithm to estimate BP from the PPG signal, vs. developing a deep learning approach to estimate BP from the PPG.

\n", "
" ] }, { "cell_type": "markdown", "id": "cfbd9837", "metadata": { "id": "6fccda20" }, "source": [ "## Extract data for a segment" ] }, { "cell_type": "markdown", "id": "23048b80", "metadata": {}, "source": [ "- Provide a list of segments which meet the requirements for the study" ] }, { "cell_type": "code", "execution_count": 214, "id": "8fed5590", "metadata": {}, "outputs": [], "source": [ "segment_names = ['3531764_0003', '3285727_0007', '3092245_0007', '3047369_0003', '3481389_0008', '3189254_0006', '3054941_0001', '3462211_0001', '3755731_0015', '3988865_0012', '3747397_0005', '3016830_0019', '3429844_0002', '3395556_0005', '3046879_0003', '3376064_0010', '3205611_0013', '3126291_0001', '3622493_0043', '3410222_0004', '3021418_0002', '3832863_0007', '3757068_0002', '3090785_0011', '3929725_0001', '3031688_0022', '3040472_0001', '3289928_0011', '3068462_0004', '3873015_0004', '3089709_0002', '3883443_0015', '3447276_0006', '3216751_0009', '3377083_0005', '3831833_0006', '3203448_0022', '3729471_0009', '3000714_0001', '3019875_0004', '3113048_0001', '3678945_0003', '3358389_0004', '3124300_0001', '3145678_0005', '3383991_0003', '3147138_0008', '3068585_0032', '3545143_0009', '3062663_0002', '3548704_0010', '3947427_0006', '3763138_0005', '3263629_0001', '3081334_0026', '3251540_0009', '3195697_0001', '3973217_0001', '3942161_0001', '3696733_0005', '3176657_0041', '3310335_0019', '3132214_0015', '3228071_0003', '3195753_0006', '3573743_0005', '3266729_0003', '3760919_0010', '3958352_0006', '3300295_0003', '3028261_0046', '3912316_0020', '3341288_0003', '3306238_0001', '3217364_0003', '3872162_0001', '3133443_0001', '3457114_0003', '3020774_0012', '3820075_0003', '3573275_0029', '3145453_0019', '3528521_0005', '3119460_0009', '3511654_0003', '3061470_0003', '3083615_0012', '3145814_0005', '3426164_0003', '3688103_0007', '3228082_0010', '3028253_0056', '3523904_0009', '3576649_0004', '3794019_0005', '3013677_0009', '3025436_0007', '3879187_0012', '3195265_0024', '3163704_0001']\n", "segment_dirs = ['mimic3wdb-matched/1.0/p00/p000160/', 'mimic3wdb-matched/1.0/p00/p000188/', 'mimic3wdb-matched/1.0/p00/p000333/', 'mimic3wdb-matched/1.0/p00/p000543/', 'mimic3wdb-matched/1.0/p00/p000618/', 'mimic3wdb-matched/1.0/p00/p000735/', 'mimic3wdb-matched/1.0/p00/p000801/', 'mimic3wdb-matched/1.0/p00/p000946/', 'mimic3wdb-matched/1.0/p00/p001038/', 'mimic3wdb-matched/1.0/p00/p001049/', 'mimic3wdb-matched/1.0/p00/p001457/', 'mimic3wdb-matched/1.0/p00/p001501/', 'mimic3wdb-matched/1.0/p00/p001606/', 'mimic3wdb-matched/1.0/p00/p001791/', 'mimic3wdb-matched/1.0/p00/p001840/', 'mimic3wdb-matched/1.0/p00/p001855/', 'mimic3wdb-matched/1.0/p00/p001949/', 'mimic3wdb-matched/1.0/p00/p002063/', 'mimic3wdb-matched/1.0/p00/p002343/', 'mimic3wdb-matched/1.0/p00/p002369/', 'mimic3wdb-matched/1.0/p00/p002458/', 'mimic3wdb-matched/1.0/p00/p002577/', 'mimic3wdb-matched/1.0/p00/p002586/', 'mimic3wdb-matched/1.0/p00/p002639/', 'mimic3wdb-matched/1.0/p00/p002703/', 'mimic3wdb-matched/1.0/p00/p002858/', 'mimic3wdb-matched/1.0/p00/p002906/', 'mimic3wdb-matched/1.0/p00/p002974/', 'mimic3wdb-matched/1.0/p00/p002981/', 'mimic3wdb-matched/1.0/p00/p003039/', 'mimic3wdb-matched/1.0/p00/p003386/', 'mimic3wdb-matched/1.0/p00/p003617/', 'mimic3wdb-matched/1.0/p00/p003744/', 'mimic3wdb-matched/1.0/p00/p003866/', 'mimic3wdb-matched/1.0/p00/p003949/', 'mimic3wdb-matched/1.0/p00/p004053/', 'mimic3wdb-matched/1.0/p00/p004115/', 'mimic3wdb-matched/1.0/p00/p004313/', 'mimic3wdb-matched/1.0/p00/p004324/', 'mimic3wdb-matched/1.0/p00/p004331/', 'mimic3wdb-matched/1.0/p00/p004405/', 'mimic3wdb-matched/1.0/p00/p004588/', 'mimic3wdb-matched/1.0/p00/p004679/', 'mimic3wdb-matched/1.0/p00/p004802/', 'mimic3wdb-matched/1.0/p00/p004804/', 'mimic3wdb-matched/1.0/p00/p004833/', 'mimic3wdb-matched/1.0/p00/p004837/', 'mimic3wdb-matched/1.0/p00/p004904/', 'mimic3wdb-matched/1.0/p00/p004906/', 'mimic3wdb-matched/1.0/p00/p004966/', 'mimic3wdb-matched/1.0/p00/p005030/', 'mimic3wdb-matched/1.0/p00/p005071/', 'mimic3wdb-matched/1.0/p00/p005193/', 'mimic3wdb-matched/1.0/p00/p005345/', 'mimic3wdb-matched/1.0/p00/p005453/', 'mimic3wdb-matched/1.0/p00/p005742/', 'mimic3wdb-matched/1.0/p00/p005885/', 'mimic3wdb-matched/1.0/p00/p005913/', 'mimic3wdb-matched/1.0/p00/p005937/', 'mimic3wdb-matched/1.0/p00/p006116/', 'mimic3wdb-matched/1.0/p00/p006381/', 'mimic3wdb-matched/1.0/p00/p006533/', 'mimic3wdb-matched/1.0/p00/p006539/', 'mimic3wdb-matched/1.0/p00/p006621/', 'mimic3wdb-matched/1.0/p00/p006692/', 'mimic3wdb-matched/1.0/p00/p006702/', 'mimic3wdb-matched/1.0/p00/p006728/', 'mimic3wdb-matched/1.0/p00/p006875/', 'mimic3wdb-matched/1.0/p00/p007107/', 'mimic3wdb-matched/1.0/p00/p007184/', 'mimic3wdb-matched/1.0/p00/p007251/', 'mimic3wdb-matched/1.0/p00/p007445/', 'mimic3wdb-matched/1.0/p00/p007477/', 'mimic3wdb-matched/1.0/p00/p007529/', 'mimic3wdb-matched/1.0/p00/p007533/', 'mimic3wdb-matched/1.0/p00/p007614/', 'mimic3wdb-matched/1.0/p00/p007629/', 'mimic3wdb-matched/1.0/p00/p007644/', 'mimic3wdb-matched/1.0/p00/p007654/', 'mimic3wdb-matched/1.0/p00/p007866/', 'mimic3wdb-matched/1.0/p00/p007966/', 'mimic3wdb-matched/1.0/p00/p008061/', 'mimic3wdb-matched/1.0/p00/p008142/', 'mimic3wdb-matched/1.0/p00/p008167/', 'mimic3wdb-matched/1.0/p00/p008318/', 'mimic3wdb-matched/1.0/p00/p008396/', 'mimic3wdb-matched/1.0/p00/p008723/', 'mimic3wdb-matched/1.0/p00/p008735/', 'mimic3wdb-matched/1.0/p00/p008748/', 'mimic3wdb-matched/1.0/p00/p008780/', 'mimic3wdb-matched/1.0/p00/p008795/', 'mimic3wdb-matched/1.0/p00/p008799/', 'mimic3wdb-matched/1.0/p00/p008896/', 'mimic3wdb-matched/1.0/p00/p009124/', 'mimic3wdb-matched/1.0/p00/p009128/', 'mimic3wdb-matched/1.0/p00/p009258/', 'mimic3wdb-matched/1.0/p00/p009460/', 'mimic3wdb-matched/1.0/p00/p009473/', 'mimic3wdb-matched/1.0/p00/p009798/', 'mimic3wdb-matched/1.0/p00/p009993/']" ] }, { "cell_type": "markdown", "id": "fbc272be", "metadata": {}, "source": [ "- Specify a segment from which to extract data" ] }, { "cell_type": "code", "execution_count": 215, "id": "404505b2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Specified segment '3531764_0003' in directory: 'mimic3wdb-matched/1.0/p00/p000160/'\n" ] } ], "source": [ "rel_segment_no = 0\n", "rel_segment_name = segment_names[rel_segment_no]\n", "rel_segment_dir = segment_dirs[rel_segment_no]\n", "print(f\"Specified segment '{rel_segment_name}' in directory: '{rel_segment_dir}'\")" ] }, { "cell_type": "markdown", "id": "6f989fcf", "metadata": {}, "source": [ "
\n", "

Extension: Have a look at the files which make up this record here (NB: you will need to scroll to the bottom of the page).

\n", "
" ] }, { "cell_type": "markdown", "id": "f2d5beb9", "metadata": {}, "source": [ "- Use the [`rdrecord`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.rdrecord) function from the WFDB toolbox to read the data for this segment." ] }, { "cell_type": "code", "execution_count": 203, "id": "f9ec70d5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data loaded from segment: 3531764_0003\n" ] } ], "source": [ "segment_data = wfdb.rdrecord(record_name=rel_segment_name, pn_dir=rel_segment_dir) \n", "print(f\"Data loaded from segment: {rel_segment_name}\")" ] }, { "cell_type": "markdown", "id": "51b04c34", "metadata": {}, "source": [ "- Look at class type of the object in which the data are stored:" ] }, { "cell_type": "code", "execution_count": 204, "id": "82bfd5f3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data stored in class of type: \n" ] } ], "source": [ "print(f\"Data stored in class of type: {type(segment_data)}\")" ] }, { "cell_type": "markdown", "id": "8bf8327f", "metadata": {}, "source": [ "
\n", "

Resource: You can find out more about the class representing single segment WFDB records here.

\n", "
" ] }, { "cell_type": "markdown", "id": "cad8b9d6", "metadata": {}, "source": [ "- Find out about the signals which have been extracted" ] }, { "cell_type": "code", "execution_count": 205, "id": "02a17c50", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This segment contains waveform data for the following 6 signals: ['RESP', 'PLETH', 'II', 'V', 'AVR', 'ABP']\n", "The signals are sampled at 125 Hz\n", "They last for 179.8 minutes\n" ] } ], "source": [ "print(f\"This segment contains waveform data for the following {segment_data.n_sig} signals: {segment_data.sig_name}\")\n", "print(f\"The signals are sampled at {segment_data.fs} Hz\")\n", "print(f\"They last for {segment_data.sig_len/(60*segment_data.fs):.1f} minutes\")" ] }, { "cell_type": "markdown", "id": "e0b448aa", "metadata": {}, "source": [ "## Visualise the signals" ] }, { "cell_type": "markdown", "id": "b74ed4a9", "metadata": {}, "source": [ "### Select a segment" ] }, { "cell_type": "markdown", "id": "278826ab", "metadata": {}, "source": [ "- Specify a segment from which to extract data" ] }, { "cell_type": "code", "execution_count": 206, "id": "5807515f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Specified segment '3285727_0007' in directory: 'mimic3wdb-matched/1.0/p00/p000188/'\n" ] } ], "source": [ "rel_segment_n = 1\n", "rel_segment_name = segment_names[rel_segment_n]\n", "rel_segment_dir = segment_dirs[rel_segment_n]\n", "print(f\"Specified segment '{rel_segment_name}' in directory: '{rel_segment_dir}'\")" ] }, { "cell_type": "markdown", "id": "3a7d23b6", "metadata": {}, "source": [ "### Extract one minute of data from this segment" ] }, { "cell_type": "markdown", "id": "c7b5622e", "metadata": {}, "source": [ "- Specify the timings of the data to be extracted" ] }, { "cell_type": "code", "execution_count": 207, "id": "928be8d7", "metadata": {}, "outputs": [], "source": [ "# time since the start of the segment at which to begin extracting data\n", "start_seconds = 90\n", "n_seconds_to_load = 60" ] }, { "cell_type": "markdown", "id": "7cf8a27b", "metadata": {}, "source": [ "- Find out the sampling frequency of the waveform data" ] }, { "cell_type": "code", "execution_count": 208, "id": "40177f31", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Metadata loaded from segment: 3285727_0007\n" ] } ], "source": [ "segment_metadata = wfdb.rdheader(record_name=rel_segment_name,\n", " pn_dir=rel_segment_dir)\n", "\n", "print(f\"Metadata loaded from segment: {rel_segment_name}\")\n", "fs = round(segment_metadata.fs)" ] }, { "cell_type": "markdown", "id": "9771170e", "metadata": {}, "source": [ "- Extract the specified data" ] }, { "cell_type": "code", "execution_count": 209, "id": "035a8598", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "60 seconds of data extracted from segment 3285727_0007\n" ] } ], "source": [ "sampfrom = fs * start_seconds\n", "sampto = fs * (start_seconds + n_seconds_to_load)\n", "\n", "segment_data = wfdb.rdrecord(record_name=rel_segment_name,\n", " sampfrom=sampfrom,\n", " sampto=sampto,\n", " pn_dir=rel_segment_dir)\n", "\n", "print(f\"{n_seconds_to_load} seconds of data extracted from segment {rel_segment_name}\")" ] }, { "cell_type": "markdown", "id": "25198a91", "metadata": {}, "source": [ "### Plot the extracted signals" ] }, { "cell_type": "markdown", "id": "91c248fa", "metadata": {}, "source": [ "- Plot the extracted signals using the [plot_wfdb](https://wfdb.readthedocs.io/en/latest/plot.html#wfdb.plot.plot_wfdb) function from the WFDB Toolbox." ] }, { "cell_type": "code", "execution_count": 210, "id": "86a6b071", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "title_text = f\"Segment {rel_segment_name}\"\n", "wfdb.plot_wfdb(record=segment_data,\n", " title=title_text,\n", " time_units='seconds') " ] }, { "cell_type": "markdown", "id": "e28e098b", "metadata": {}, "source": [ "- Extract the PPG signal to loook at it more closely" ] }, { "cell_type": "code", "execution_count": 211, "id": "24263c2a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracted the PPG signal from column 0 of the matrix of waveform data.\n" ] } ], "source": [ "for sig_no in range(0, len(segment_data.sig_name)):\n", " if \"PLETH\" in segment_data.sig_name[sig_no]:\n", " break\n", "\n", "ppg = segment_data.p_signal[:, sig_no]\n", "fs = segment_data.fs\n", "print(f\"Extracted the PPG signal from column {sig_no} of the matrix of waveform data.\")" ] }, { "cell_type": "markdown", "id": "f0e3f259", "metadata": {}, "source": [ "

Note: the name given to PPG signals in the database is 'PLETH'.

" ] }, { "cell_type": "markdown", "id": "6981b724", "metadata": {}, "source": [ "- Plot to look at the shape of the PPG pulse wave more closely" ] }, { "cell_type": "code", "execution_count": 212, "id": "6fbd074a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(50.0, 55.0)" ] }, "execution_count": 212, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from matplotlib import pyplot as plt\n", "import numpy as np\n", "\n", "t = np.arange(0, (len(ppg) / fs), 1.0 / fs)\n", "plt.plot(t, ppg, color = 'black', label='PPG')\n", "plt.xlim([50, 55])" ] }, { "cell_type": "markdown", "id": "94dca660", "metadata": {}, "source": [ "### Compare this to pulse waves from the literature" ] }, { "cell_type": "markdown", "id": "2255ee94", "metadata": {}, "source": [ "- Compare the pulse waves above to the different shapes of pulse waves shown here:" ] }, { "cell_type": "markdown", "id": "ae3177b0", "metadata": {}, "source": [ "![PPG pulse waves](https://upload.wikimedia.org/wikipedia/commons/e/ed/Classes_of_photoplethysmogram_%28PPG%29_pulse_wave_shape.svg)" ] }, { "cell_type": "markdown", "id": "3f9a16af", "metadata": {}, "source": [ "Source: _Charlton PH et al., 'Assessing hemodynamics from the photoplethysmogram to gain insights into vascular age: a review from VascAgeNet', https://doi.org/10.1152/ajpheart.00392.2021 (CC BY 4.0)_" ] }, { "cell_type": "markdown", "id": "e5e21413", "metadata": {}, "source": [ "These pulse waves are the typical shapes for young (class 1) to old (class 4) subjects." ] }, { "cell_type": "markdown", "id": "1f724c93", "metadata": {}, "source": [ "

Question: How do these pulse waves compare to those extracted from the MIMIC Database? Which is most similar?

" ] }, { "cell_type": "code", "execution_count": null, "id": "6136867d", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "colab": { "name": "data-exploration.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "306px" }, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }