Skip to content ↓

Research data preprocessing

Once you have completed the data collection process, you need to decide which data you want to share. Research are most commonly shared in datasets According to the FAIR principle,  datasets should be described using metadata. This ensures proper indexing, searching and reuse of these data.

Selection
It is not necessary to make all data accessible. The following conditions are worth considering:

  • requirements of agencies financing scientific research,
  • the scientific value of research data,
  • duplication with existing similar data sets,
  • costs of managing and storing data with their justification.

Deleting sensitive data
If the data contains personally identifiable information, it should be anonymised or pseudonymised.

  • anonymisation – is the process of removing personal identifiers, both direct and indirect, that may lead to an individual being identified.,
  • pseudonymisation – a transformation of personal data in such a way that they cannot be attributed to a specific data subject without the use of additional information.

Choice of files
Data should be published in a widely available format that does not require commercial software and uses standardised encoding.

Name of files
Give appropriate names to folders and files. It is good practice for them to be descriptive to reflect the contents of the file.

Versioning
Every change and version should be included and stored.

Metadata
Data should be described in a structured way with metadata so that it can be indexed, searched and reused.

File format

It can be any format, the most important thing is to choose one that provides universal access and openness (standard ASCII encoding, UTF-8). It is recommended to use open software to read the data.

preferred: .odt, .ods.

acceptable: .doc, .docx, .pdf, .xml, .htm, .html, .rtf, .xlsx, .epub

preferred: .csv, .tsv, .spss, .por

acceptable: .xls, .sav, .dta, .mdb/.accdb

preferred: .tiff, .jpeg2000, .png, .svg

acceptable: .gif, .jpg, .ai, .cgm

preffered: .wav, .aif, .aiff, .flac

acceptable: .mp3, .m4p, .m4a, .mid, .midi, .ogg

preferred: .avi

acceptable: .mov, .wmv, .mpg

preferred: .pdf, .opg

acceptable: .pptx

preferred: .shp, .shx, .dbf, .sbn, .sbx, .prj, .xml

acceptable: .PostGIS, .tif, .tfw, .fde, .adf, .dat, .nit

When data can or must be closed?

Commercialization of research results (e.g., inventions, objects of industrial property rights).

Legal barriers (copyright, data protection, lack of consent to share data in cases, where such consent is required, etc.).

Data has an influence on national and global security, the security of facilities, entities, institutions or persons to whom the data relates.

Stopka