{"id":3373,"date":"2023-04-12T16:11:42","date_gmt":"2023-04-12T15:11:42","guid":{"rendered":"https:\/\/www.microdata.no\/?post_type=eksempel&#038;p=3373"},"modified":"2023-08-18T13:31:06","modified_gmt":"2023-08-18T12:31:06","slug":"how-to-prepare-data-for-survival-analysis","status":"publish","type":"eksempel","link":"https:\/\/www.microdata.no\/en\/eksempel\/how-to-prepare-data-for-survival-analysis\/","title":{"rendered":"How to prepare data for survival analysis"},"content":{"rendered":"\n<p>There are several ways to calculate &#8220;time&#8221; (the number of time units from the start of the measurement period until a specific event occurs) in survival analysis. We will demonstrate two methods here:<\/p>\n\n\n\n<p>a) Event based import and use of starting date variables<\/p>\n\n\n\n<p>b) Ready-to-use date variables with fixed values \u200b\u200bper unit\/individual<\/p>\n\n\n\n<p>The script below demonstrates how to make the adaptations for the two options. There are some similarities, but also some differences:<\/p>\n\n\n<div id=\"rose-block_9d60f71ac690127b21a36502f4ab4d6d\" class=\"rose-code codeblock-wrapper\">\n<pre tabindex=\"0\" class=\"codeblock\"><code>require no.ssb.fdb:23 as ds\r\n\r\ntextblock\r\nA) Use of eventbased variables and collapse(min)\r\n-------------------------------------------------\r\nendblock\r\n\r\n\/\/Create dataset with relevant eventbased variable and define measurement period\r\ncreate-dataset unemployed\r\nimport-event ds\/ARBSOEK2001FDT_HOVED 2010-01-01 to 2019-12-15 as workseeker_status\r\n\r\n\/\/Keep all events where workseeker status = fully unemployed and date >= 2010\r\nkeep if workseeker_status == '1' & START@workseeker_status > date(2010,01,01)\r\n\r\n\/\/Retrieve the first event and aggregate to individual level data\r\ncollapse (min) START@workseeker_status, by(PERSONID_1)\r\n\r\n\/\/Run the analysis on a small random sample (optional)\r\nsample 10000 3245\r\n\r\n\/\/Calculate the number of days from the start of the measurement period to the first occurence of the event\r\ngenerate days = START@workseeker_status - date(2010,01,01)\r\nreplace days = 0 if days < 0\r\nsummarize days\r\nhistogram days\r\n\r\ntextblock\r\nCreate the variable event which takes the value 1 for everyone with a value of the number of days. Those who haven't\r\nvalue for the number of days or which has an event date after the measurement period has passed,\r\ngets the value 0 (people with the value 0 are called censored cases in the technical language).\r\nendblock\r\n\r\ngenerate event = 1 if sysmiss(days) == 0\r\nreplace event = 0 if sysmiss(days) | START@workseeker_status > date(2019,12,15)\r\n\r\ntextblock\r\nSet the number of days to the maximum value for people during which the event has not occurred in\r\nthe measurement period. These are people who have gone through the entire measurement period without the event\r\nhappening. These also get event = 0 set through the step above.\r\nendblock\r\n\r\nreplace days = date(2019,12,15) - date(2010,01,01) if sysmiss(days)\r\n\r\ntabulate event, summarize(days) mean freq\r\n\r\n\/\/Create a year variable to use number of years instead of days\r\ngenerate year = int(days\/365.24)\r\ntabulate year, missing\r\nhistogram year, discrete\r\nsummarize year event\r\nhistogram days\r\n\r\ntextblock\r\nImport relevant variables in order to compare survival rates between groups of the population\r\nendblock\r\n\r\nimport ds\/BEFOLKNING_KJOENN as gender\r\nimport ds\/BEFOLKNING_INVKAT as imm_cat\r\nimport ds\/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month\r\n\r\ngenerate age2010 = 2010 - int(birth_year_month\/100)\r\ngenerate agegroup = 1\r\nreplace agegroup = 2 if age2010 > 30\r\nreplace agegroup = 3 if age2010 > 50\r\ndefine-labels agelabel 1 \"age 0-30\" 2 \"age 31-50\" 3 \"age 51 ->\"\r\nassign-labels agegroup agelabel\r\n\r\ngenerate norwegian = 0\r\nreplace norwegian = 1 if imm_cat == 'A'\r\n\r\nkaplan-meier event year\r\nkaplan-meier event days\r\n\r\nkaplan-meier event year, by(gender)\r\nkaplan-meier event days, by(gender)\r\n\r\nkaplan-meier event year, by(agegroup)\r\nkaplan-meier event days, by(agegroup)\r\n\r\nsummarize norwegian\r\ntabulate event norwegian\r\ntabulate year norwegian\r\n\r\ndefine-labels norwegianlabel 0 \"Foreign origin\" 1 \"Norwegian origin\"\r\nassign-labels norwegian norwegianlabel\r\n\r\nkaplan-meier event year, by(norwegian)\r\nkaplan-meier event days, by(norwegian)\r\n\r\nkaplan-meier event year, by(imm_cat)\r\nkaplan-meier event days, by(imm_cat)\r\ntabulate year imm_cat\r\n\r\ncox event year norwegian age2010 i.gender\r\ncox event year norwegian age2010 i.gender, hazard\r\ncox event days norwegian age2010 i.gender\r\ncox event days norwegian age2010 i.gender, hazard\r\n\r\n\r\ntextblock\r\nB) Cross-sectional dataset with dates collected from fixed variables\r\n--------------------------------------------------------------------\r\nendblock\r\n\r\n\/\/Create dataset of persons over 70 years who are residents in Norway per 2010-01-01\r\ncreate-dataset elder\r\nimport ds\/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month\r\nimport ds\/BEFOLKNING_STATUSKODE 2010-01-01 as regstat\r\ngenerate age = 2010 - int(birth_year_month\/100)\r\nkeep if age > 70 & regstat == '1'\r\n\r\ntextblock\r\nImport a ready-to-use date variable (fixed information): Date of death. Do some operations\r\nto be able to create the standard UnixTime format on dates through the date() function\r\nendblock\r\n\r\nimport ds\/BEFOLKNING_DOEDS_DATO as death_date\r\nsummarize death_date\r\nreplace death_date = string(death_date)\r\ngenerate yyyy = substr(death_date,1,4)\r\ngenerate mm = substr(death_date,5,2)\r\ngenerate dd = substr(death_date,7,2)\r\ndestring yyyy\r\ndestring mm\r\ndestring dd\r\ngenerate death_date2 = date(yyyy,mm,dd)\r\nsummarize death_date2\r\n\r\n\/\/Calculate number of days measured from 2010-01-01 until death date\r\ngenerate days = death_date2 - date(2010,01,01)\r\nreplace days = 0 if days < 0 \r\n\r\ntextblock\r\nSet event = 1 if death date has a value greater than 2010-01-01.\r\nUses 2023-01-01 as maximum measurement date. Others get event = 0.\r\nendblock\r\n\r\ngenerate event = 0\r\nreplace event = 1 if sysmiss(death_date) == 0 &#038; death_date2 >= date(2010,01,01) & death_date2 <= date(2023,01,01)\r\n\r\ntextblock\r\nSet number of days to max value if no death date or death date happens after last measurement date\r\nendblock\r\n\r\nreplace days = date(2023,01,01) - date(2010,01,01) if sysmiss(days) | death_date2 > date(2023,01,01)\r\n\r\ntabulate event, summarize(days) mean freq\r\n\r\n\/\/Generate a year variable for measurement in number of years\r\ngenerate year = int(days\/365.24)\r\ntabulate year\r\n\r\nkaplan-meier event year\r\nkaplan-meier event days\r\n\r\n\/\/Import gender to compare survival rates between genders\r\nimport ds\/BEFOLKNING_KJOENN as gender\r\n\r\nkaplan-meier event year, by(gender)\r\nkaplan-meier event days, by(gender)\r\n\r\ncox event year age i.gender\r\ncox event year age i.gender, hazard\r\ncox event days age i.gender\r\ncox event days age i.gender, hazard<\/code><\/pre>\n<\/div>\n\n\n<p><\/p>\n","protected":false},"parent":0,"menu_order":131,"template":"","meta":{"_acf_changed":false,"inline_featured_image":false,"_kad_blocks_custom_css":"","_kad_blocks_head_custom_js":"","_kad_blocks_body_custom_js":"","_kad_blocks_footer_custom_js":"","_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":""},"class_list":["post-3373","eksempel","type-eksempel","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How to prepare data for survival analysis - microdata.no<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.microdata.no\/eksempel\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to prepare data for survival analysis - microdata.no\" \/>\n<meta property=\"og:description\" content=\"There are several ways to calculate &#8220;time&#8221; (the number of time units from the start of the measurement period until a specific event occurs) in survival analysis. We will demonstrate two methods here: a) Event based import and use of starting date variables b) Ready-to-use date variables with fixed values \u200b\u200bper unit\/individual The script below...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.microdata.no\/eksempel\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\/\" \/>\n<meta property=\"og:site_name\" content=\"microdata.no\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-18T12:31:06+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.microdata.no\\\/eksempel\\\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\\\/\",\"url\":\"https:\\\/\\\/www.microdata.no\\\/eksempel\\\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\\\/\",\"name\":\"How to prepare data for survival analysis - microdata.no\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.microdata.no\\\/#website\"},\"datePublished\":\"2023-04-12T15:11:42+00:00\",\"dateModified\":\"2023-08-18T12:31:06+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.microdata.no\\\/eksempel\\\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.microdata.no\\\/eksempel\\\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.microdata.no\\\/eksempel\\\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Hjem\",\"item\":\"https:\\\/\\\/www.microdata.no\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to prepare data for survival analysis\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.microdata.no\\\/#website\",\"url\":\"https:\\\/\\\/www.microdata.no\\\/\",\"name\":\"microdata.no\",\"description\":\"Gj\u00f8r det enklere \u00e5 analysere registerdata\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.microdata.no\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to prepare data for survival analysis - microdata.no","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.microdata.no\/eksempel\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\/","og_locale":"en_US","og_type":"article","og_title":"How to prepare data for survival analysis - microdata.no","og_description":"There are several ways to calculate &#8220;time&#8221; (the number of time units from the start of the measurement period until a specific event occurs) in survival analysis. We will demonstrate two methods here: a) Event based import and use of starting date variables b) Ready-to-use date variables with fixed values \u200b\u200bper unit\/individual The script below...","og_url":"https:\/\/www.microdata.no\/eksempel\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\/","og_site_name":"microdata.no","article_modified_time":"2023-08-18T12:31:06+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.microdata.no\/eksempel\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\/","url":"https:\/\/www.microdata.no\/eksempel\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\/","name":"How to prepare data for survival analysis - microdata.no","isPartOf":{"@id":"https:\/\/www.microdata.no\/#website"},"datePublished":"2023-04-12T15:11:42+00:00","dateModified":"2023-08-18T12:31:06+00:00","breadcrumb":{"@id":"https:\/\/www.microdata.no\/eksempel\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.microdata.no\/eksempel\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.microdata.no\/eksempel\/hvordan-tilrettelegge-data-for-overlevelsesanalyse\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Hjem","item":"https:\/\/www.microdata.no\/en\/"},{"@type":"ListItem","position":2,"name":"How to prepare data for survival analysis"}]},{"@type":"WebSite","@id":"https:\/\/www.microdata.no\/#website","url":"https:\/\/www.microdata.no\/","name":"microdata.no","description":"Gj\u00f8r det enklere \u00e5 analysere registerdata","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.microdata.no\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"taxonomy_info":[],"featured_image_src_large":[],"author_info":[],"comment_info":"","_links":{"self":[{"href":"https:\/\/www.microdata.no\/en\/wp-json\/wp\/v2\/eksempel\/3373","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microdata.no\/en\/wp-json\/wp\/v2\/eksempel"}],"about":[{"href":"https:\/\/www.microdata.no\/en\/wp-json\/wp\/v2\/types\/eksempel"}],"wp:attachment":[{"href":"https:\/\/www.microdata.no\/en\/wp-json\/wp\/v2\/media?parent=3373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}