Milestone for microdata.no - from project to continuous development

When Sikt and Statistics Norway launched microdata.no in 2018, we wanted to make it faster and easier to access important registry data. Now, after a few years with project-based improvements, we focus on ensuring that the user communities get the most possible value from the service. We mark the transition by showing what the service does, how it has been used, and what we plan for the future.

Administrative full population registry data is a rich and important source of knowledge. Microdata.no is designed to provide researchers, analysts, and master’s students with quick and good access to such data. This tool provides immediate access to link, process, and analyse large amounts of registry data with long time series.

The beginning of 2024 marked the end of the infrastructure project Microdata 2.0, funded through the Research Council’s national focus on research infrastructure. The project period was from 2020 to 2023. The Research Council also funded the first phase of development (RAIRD) which resulted in the launch of microdata.no in 2018.

Now, the microdata.no service is transitioning into a new phase with continuous further development, and with a new and three-part financing model in line with the recommendations from the data infrastructure committee in May 2022. We mark the transition through a brief summary of what the service is, what it has been used for so far, and a bit about where the road goes next.

What is registry data?

Registry data is detailed administrative information obtained from public registers and is an important source of knowledge about society. At the same time, registry data is protected by laws such as GDPR, the Statistics Act, etc., and the use of such data for analytical purposes is therefore subject to restrictions.

What is the point of microdata.no?

The point is to make registry data as accessible as possible for use in research and analysis. Traditional application-based access to registry data is time- and resource-consuming, and in practice excludes registry data as a source of knowledge for large user groups.

Microdata.no has been developed with solutions for embedded privacy to provide immediate, broad, and rich and flexible access to registry data analysis. Analyses, findings, and results can be shared as easily as possible with colleagues and others.

The service is designed from the ground up specifically for registry data and to make it fast and easy to exploit the opportunities such data provides for analyzing and comparing numbers, trends, and correlations over time and between regions/groups.

It is you, the user, who control the selection of variables, timing and periods, linkage, derived variables and recodes. You define your analysis populations via criteria you formulate and adjust along the way.

The analysis and visualization possibilities are continuously developed and are well described in the user manual, which also contains a number of analysis examples you can use to get acquainted with the tool’s functionality.

Statistics and analysis results produced in microdata.no can be easily exported to spreadsheets and other tools for further processing for use in publications.

What can microdata.no be used for?

To illustrate some of the things the service can be used for, it’s probably useful to see in what way it has been used so far, and what data is available there today.

Since its launch in the spring of 2018, microdata.no has been cited as a data source in well over 200 publications. The bibliography shows a wide range of themes, usage methods, and types of publications, and lists a number of scientific publications (in level 2-, level 1-, and level 0-journals), reports, evaluations, NOUs, master’s theses, etc.

Themes such as exclusion, social inequality, migration and settlement patterns, gender differences, employment, competence needs, business opportunities, and education trends are recurring among the publications in the bibliography.

Below you can see some examples of different types of publications that have used microdata.no as a source.

Examples of scientific articles:

Examples of PhD theses:

Examples of analyses carried out by or for actors in the public sector:

Examples of master’s theses:

What data is available?

There are currently approximately 500 registry variables available for linking and analysis in microdata.no. Some variables go as far back as 1960. The variables are well described in the service’s variable overview, which is available without logging in.

The variables currently cover, among others, the following topics:

Different ways of using microdata.no

The bibliography shows that the service has been used in many different ways and for many different scenarios. Some publications cite microdata.no as their primary data source and often as the only data source. Other publications combine microdata.no analyses with analyses of other data and using other tools.

We also see that some have used microdata.no for smaller sub-analyses to show context, background, selection bias, etc. for the analysis of entirely different data.

The service is also suitable for use in combination with application-based delivery and can raise the precision and quality of data applications, avoid misunderstandings, and shorten the time usage in connection with data applications to Statistics Norway (SSB).

In state and regional administration, we know that microdata.no is used for published analyses, but also in planning and case preparation work and production of information basis for political treatment. However, this material can be hard to access externally, and our bibliography contains few examples of this type of use.

Student use of microdata.no has so far mainly been linked to master’s theses, but the service has also been used for teaching purposes in master’s courses (e.g. in macroeconomics).

Sharing of analyses/FAIR registry data

Sikt and SSB, through their work with microdata.no, have wanted to improve all FAIR aspects related to access to registry data:

Findable – all variables are well documented and searchable in the service and search engines
Accessible – immediate access to work with the variables you find
Interoperable – the service is designed for linking variables on the user’s terms
Reusable – it’s easy to share microdata scripts for reuse, further development, and replicability

Because microdata.no supports data versioning and because all connections, population definitions, and analyses are expressed as text-based scripts, it is trivial to share your results with others.

Some user networks share, exchange, and collaborate on the development of microdata.no scripts with peers in or outside their own organization. The simplest form of sharing is to send scripts by email, so that the recipient can “replay” the script in microdata.no and work from the starting point the script represents.

We further see that microdata.no scripts are shared/cited via Github, via Zenodo, via OSF, and as an appendix in the publication itself. Sikt/SSB also has a plan for user-friendly support for archiving microdata.no scripts in the National research archive (NVA) when that service is rolled out, so that researchers and students can easily archive their scripts as result objects in NVA.

Moving forward

The microdata.no-concept of fast and flexible access provides a lot of opportunities. But the real utility of microdata.no primarily depends on four things:

The availability of data
Functionality
Ability to interact with the rest of the ecosystem for access to registry data at home and abroad
That existing and emerging possibilities in the service are known to existing and potential user communities

The road ahead is about continuous improvement in all areas, in collaboration and dialogue with relevant user communities and other actors.

An important part of the Microdata 2.0 project that has now ended was about collaborating with pilot partner the Cancer Registry Norway (CRN) on technological and legal solutions and clarifications to turn microdata.no a platform where also other data owners than SSB can make their registry data available. The collaboration with the CRN (now part of the National Institute of Public Health – NIPH) continues, and Sikt and SSB also have ongoing collaboration with several registry data owners in the health sector and the knowledge sector along the same lines. SSB is also continuously working to make more of its registry data available in the service.

Further development of functionality (more statistical methods, visualizations, better processing tools, export options, integrations) happens continuously and in collaboration with various user communities in research, analysis, and public planning. Microdata.no uses established and expandable technical components, and can, in principle, support most of the desired functionality within the framework of built-in privacy and as long as automated anonymization of the statistics that come out is ensured. Sikt and SSB are continuously working to increase the understanding of user needs and how these can be solved in the service or in interaction with other services.

Microdata.no covers and can cover a large portion of the needs surrounding the use of registry data. But the service is also part of a larger ecosystem that collectively covers more needs. Sikt and SSB have initiated various collaborations with, among others, Health Data Service and other relevant actors in the registry data landscape to contribute to a better everyday life for users of registry data.

In November 2023, Sikt, SSB, NIPH, and CRN together submitted a new application for further development of microdata.no to the Research Council infrastructure intitative. The new project is about increased focus, collaboration, and speed on further development along the four dimensions mentioned in this paragraph, and we are now awaiting a response from the Research Council.