France has gone through several modifications of regulations and laws and just launched itself on a path to better access to administrative data, while also committing to develop more opportunities for research matching. It is interesting to see how similar situations are handled in other countries, and what solutions are being created or considered.
A report on these questions was issued in the USA, a country often cited as an example in terms of access to data for research purposes, and where access to administrative data remains complex.
The complexity attached to the use of administrative sources, highlighted to not only be in the government but also in the research field as well, led to the creation of a commission, which just issued a report as well as recommendations to the President of the United States and to Congress. The report, The Promise of Evidence-Based Policymaking, underlines the importance of giving wider access to data in order to better create and evaluate public policies.
The commission, presided by a scholar, Katharine G Abraham, first started with thoroughly appraising the situation, auditing a wide range of actors in the field for several months, and examining other countries’ experiences with access to administrative data. In Europe, they looked at Statistics Denmark, the ADRN network, recently set up to provide access to administrative data in the UK, the Bundesbank in Germany, and CASD in France.
Even though access to administrative data does not strictly depend on the structures of the statistic systems, which differ from one country to the next, these still facilitate the process. Countries where public statistics are based on records create a more favorable context, as is the case in Denmark, for example. In countries like the United States of America, the complexity due to the fact that public statistics are based on surveys and the subsequent difficulties in mobilizing administrative data combine with the difficulties brought by the decentralization at the federal level and by each State inside this federal system having their own statistics procedures.
Even though the American institutional context is clearly a specific case, many of the recommendations issued by the US Commission are beneficial to analyze as France moves towards greater access to data while ensuring civil liberties. The Commission’s report offers insight into the conversations on the difficulties that are still to be overcome and the foreseen means and solutions (see A. Bozio & P.Y Geoffard, L’accès des chercheurs aux données administratives, Groupe de travail du Conseil National de l’Information Statistique).
The American report addresses as a whole two questions: the mobilization of administrative data by governmental statistics, and the access to these data by researchers. The common goal is the better use of data to contribute to public policies, and the evaluation of the latter, based on factual and verifiable data. The Commission is therefore asking for wider access to administrative data as much as for the statistical apparatus as for research purposes.
A complex legal context
The highlighted complexities are twofold. First, the difficulties are very similar to those found in France, and are linked to the plurality and diversity of the laws and regulations voted over time, and which add to each other, thus preventing access to administrative data for other purposes than those of the current administration and its very own practice. The report particularly highlights the impact of regulations that are specific to each source, citing the example of the national census where data, even though accessible to researchers, are essentially limited to the initial use they were created for.
As observed in France, in cases where the use by researchers isn’t explicitly prohibited in the legal texts, there can be a wide variety of interpretation of the regulations by the different administrations. The other hurdle, more specific to the United States, reveals itself as a compounding factor. It is created by the intentional decentralization of the statistics departments inside of a federal system, to which is then added each State’s autonomy in terms of statistical production as well as regulations. This is similar to the challenge faced by the European Union with the standardization of surveys of official statistics and which is likely to worsen when it will increasingly be necessary to mobilize very diverse administrative data from one country to another.
The solution recommended to the United States by the Commission on Evidence-Based Policymaking starts with the modification of the all the laws and regulations in order for them to formally take into account the possibility of mobilizing all the administrative data for statistical purposes or for research, all this while strengthening the security and guarantee the confidentiality and protection of privacy.
A Necessary Transparency of the Public Policy
The Commission then puts a strong emphasis on the issue of transparency, which is considered critical to ensure a successful mobilization of data. This means providing, systematically and publicly, all the information attached to all the processes. This goes beyond the list of tasks achieved to include their objectives and results, the exact data mobilized, the performed matchings, the security measures put in place at each stage of the process guaranteeing confidentiality and protection of privacy, and the audits and certifications which, under the recommendation of the Commission, need to be systematically created. This applies to governmental statistics as well as research. We can clearly note in this a concern similar to that in the United Kingdom, on the importance of getting and maintaining the citizens’ express consent regarding the utility of data mobilization and the respect of confidentiality.
The Creation of a Centralised Service
When it comes to the specific topic of organizing access to the data by researchers, the report recommends creating a national service, the National Secure Data Service, NSDS, insisting on the following roles: centralization and service.
For the Commission, there are many benefits to centralization: a single access point for the researchers, a standardized accreditation procedure for the data collected, the possibility of simultaneous use of data from different administrations, and the opportunity to perform data matching. Data matching is considered by the Commission to be the core mission of the NSDS, hence hoping for a strong growth in this.
It won’t be long until France starts pondering these questions, especially as the subscription to the law of free access to data leaves the administrations to decide whether they will go through the Statistics Secret Committee for the accreditation of projects. It is the same thing regarding secure access paths, which could increase in the future. Several administrations have decided to go through CASD for now, therefore enabling the simultaneous use of data, when necessary.
A Data Center with a Goal to Serve
Furthermore, the Commission vigorously defends the NSDS mission to serve, as opposed to a role of data storage. The purpose is not to create a central bank of confidential data, a solution which had been suggested many times in the past and recommended in the Act of the creation of the Commission. The Commission considered a bank of data as too high of a security risk, conducive to strengthening the fears of the public. Whether it is access to data or data matching, the service is provided for a strictly limited amount of time making the diverse administrations still in charge of the data. We note that the topic of long term data storage, as well as the completed matchings, was not dealt with in the report, in opposition to the report made by the CNIS work team, which recommends a collaboration with the French “Archives Nationales.” CASD is in the process now of signing an agreement with the National Archives Services.
The Commission also recommends resorting to confidential data sparingly and limiting access to strictly necessary data. Possible alternatives will have to be taken into account, including the current developments on synthetic data, as well as the Secure Multiparty Computation (SMC) techniques, which would allow to work “on the fly” on data located in different places, without connecting them.
Security: a Technology and a Necessity
Referring to secure access itself, the Commission is alarmed by a delay in technological innovation. They note that current access coordinated by the Census Bureau is strictly on-site. The Commission stresses the necessity to look at remote access, an idea which made it take a special interest in the CASD technology (see the box on CASD in the report). One of the responsibilities of the NSDS would be to close the technological gap.
Organizing the Financing
It is suggested to create the NSDS as part of the Census Bureau, while giving it an independent status within the OMB (Office of Management and Budget). The OMB is a critical service in the US President’s executive office with a main mission to assist the President in preparing the budget.
Leaning on the Census Bureau would allow them to build on already existing ground ; the Census Bureau currently manages a network of 27 secured data centers in universities (Federal Statistical Research Data Centers, FSRDSC), for the federal statistical agencies, and allows access – on-site only – to confidential data to researchers. Increased resources would then provide for steady growth, with much needed resources. The Commission believes that solely self-funding based on selling services to users will not be a solution in the long run and suggests that the administrations will have to eventually contribute.
Documenting Data, an Essential Part of the Process
The Commission highlights among other things the importance of supporting the documentation of data, without which its recommendations risk to go unheeded. The first capital level on which the Commission strongly insists is that of making a comprehensive list of the sources of the existing data and their availability status. All the administrations should therefore, according to the Commission, be required to produce and regularly update this list, indicating the level of confidentiality of each file (which goes beyond the vademecum suggested for France by the report of the CNIS work team on the modalities of access to different types of data). The Commission then insists on having more detailed metadata, concerning, at a minimum, the list of variables. Aware of the amount of work to produce, the Commission recommends the prioritization of the administrative data which needs documenting. The work on the metadata would also serve as a primary reflection on topics to tackle, along with researchers, on the comparability of the data (especially in terms of matching), as well as on the evolution of the data necessary to better create public policies and their assessment.
This constitutes a lot of questions which are also being asked in France. It is a report worth reading and keeping an eye on, hoping its execution will bring a centralized solution, which, in the long run, will also facilitate access for researchers who are not based in the United States.
Appendix : Here are several of the most important recommendations by the Commission
RECOMMANDATION 2.1: The Congress and the President should enact legislation establishing the National Secure Data Service (NSDS) to facilite access for evidence building while ensuring transparency and privacy …
RECOMMANDATION 4.3: To ensure exemplary transparency and accountability for the Federal government’s use of data for evidence building, the NSDS should maintain a searchable inventory of approved projects using data and undergo regular auditing of compliance with rules governing privacy, confidentiality, and access.
RECOMMANDATION 2.2: The NSDS should be a service, not a data clearinghouse or warehouse. The NSDS should facilitate data linkage in support of distinct authorized projects.
RECOMMANDATION 2.8: The Office of Management and Budget should promulgate a single, streamlined process for researchers external to government to apply, become qualified and gain approval to access government data that are not publicly available. Approval would remain subject to any restrictions appropriate to the data in questions.
RECOMMANDATION 4.5: The Office of Management and Budget should increase efforts to make information available on existing Federal datasets including data inventories, metadata and data documentation in a searchable format.
RECOMMANDATION 5: The Congress and the President should consider repealing current bans and limiting future bans on collection and use of data for evidence building