This post is also available in: Deutsch
Open source, well documented data formats are key for the long-term availability of data. General principles are:
Open and non-proprietary file formats are preferable to closed and proprietary formats, text-based formats are better than binary formats; depending on the research area, this is not always possible.
Certain data formats can be converted/exported into formats that are suitable for long-term archiving:
- Export Microsoft DOCX files to PDF: In the “Save as” or “Export” dialog select additional “Options…”: Activate the checkbox “ISO 19005-1 compatible (PDF/A)” here (c.f. p. 2).
If you use LATEX, you can, for example, include the “pdfx” package to create a PDF/A compliant document.
Check PDF/A compatibility in Adobe Acrobat Reader DC: A corresponding info icon appears in the left navigation bar. A good alternative is veraPDF (EU funded).
- Convert existing PDF files into PDF/A: Either with Adobe Acrobat Professional, which requires a license, or with certain free tools such as the PDF24 Creator.
- Export Microsoft XLSX files as CSV (Column Separated Values): This is useful if you deal with simple numerical tables, e.g. values of measurements. This can be achieved either by the “Save as” or “Export” dialog.
- Text encoding: Unicode UTF-8 is a good standard that aims to combine different characters and symbols into a single character set.
The ETH Library provides a general overview of suitable file formats.
Often it is only the interaction of various parameters and data formats that enables the representation of complex facts: BIDS (Brain Imaging Data Structure), for example, enables the visualization of multidimensional images in the field of Magnetic Resonance Imaging and is based on a predefined naming convention/folder structure.
- Well documented community standards make research better comprehensible, e.g. neuroimaging.
- General information on naming conventions and data organisation can be found here.
The Data Services team will be happy to answer any questions you may have about research data. You can reach us via mail and our website.