Upload of large files / a lot of files in DataverseNL

Modified on Thu, 16 May at 4:56 PM

Upload of large files / a lot of files in DataverseNL

Version 1.0

28/03/2024

Laura Huis in ‘t Veld


Introduction

The upload of large files (>4 GB) in Dataverse NL is not always going smoothly. Uploading a dataset with a lot of files can also be a challenge.This document provides you with some tips and possible alternatives. 


Known limits

  • The maximum size for a single file upload is 9,3 GB. The upload through the user interface has to be finished within 1,5 hours, otherwise a time-out will occur. 

  • The maximum number of files within a zip file is 2000. 

  • Session time-out: After 100 minutes of inactivity the session will be ended automatically. If you have been logged in for a while before starting the upload, it is recommended to log out and in again.

  • You can upload a maximum amount of 1000 files at once via the User Interface. Uploaded files are shown in a long list at the upload page, so even with 100 files you might encounter performance issues. 


Internet connection

A fast internet connection is of great importance when uploading large data files. You can check your upload speed through for example these websites: 

https://www.speedtest.net/ and https://www.ziggo.nl/speedtest. The upload speed is shown in Megabits per second (Mbps). Examples:

With an upload speed of 20 Mbps, you can upload around 9 GB in one hour. 

With an upload speed of 15 Mbps, you can upload around 6,75 GB in one hour. 

With an upload speed of 12 Mbps, you can upload around 5,4 GB in one hour. 


Tips

  • Avoid using a VPN connection.

  • Make sure that the files you would like to upload are stored on your local computer. Do not upload directly from (for example) SURFdrive to DataverseNL. 


The upload through the DataverseNL User Interface

If you are uploading a zip-file, please bear in mind that after the upload, dataverse will also unpack the zip. So when the blue progress bar is completely filled, but nothing happens, the system is still busy with unpacking your zip. This can take a while, especially if the zip contains a complex folder structure and lots of files.


  • Do not click on ‘Done’ while uploading, this will abort the upload. Wait until you see the files you have uploaded displayed in the User Interface. 

  • Upload only one large data file at once. Click ‘Save changes’ first before selecting a new file for upload. 


Image 1: The blue progress bar is completed, but the upload is not finished yet!



Image 2: The file is now ready to be saved. 



Suggestions when your upload is not successful


Splitting your zip-file

If you are uploading a zip file that contains multiple files (and maybe also a folder structure), you should consider splitting the file in smaller zips. In order to keep the folder structure, you should keep names of the folders and the hierarchy the same within each separate zip. 


Here is one example of a folder structure:


Main folder

-Sub folder1

    ----file 1

    ----file 2

    ----sub sub folder 1

        ---- 60 files    

-Sub folder 2

    ----sub sub folder 2

        ---- some files

File 1 and file 2 two are quite large files. Also, subfolder 1 contains a lot of files.

In this case, a solution would be to split this folder into three different zip files:


ZIP 1
Main folder

    -Subfolder1

        ----file1

        ----file2


ZIP 2

Main folder

    -Subfolder 1

        ---- sub sub folder 1

            ---- 60 files


ZIP 3

Main folder

    -Sub folder 2

        ----sub sub folder 2

            ---- some files

Using the DVUploader tool

You can download this tool and install it on your local computer. It uses the Dataverse API and the command line. This bypasses the problems you might have with the User Interface. 
If the upload with the DVuploader fails and stops, you can restart it. The tool will scan through the existing files and will start uploading the first file it detects that does not exist in the dataset yet.

You can run this tool yourself, or ask help from your local dataverse administrator. (See https://dans.knaw.nl/en/data-services/dataversenl/institutions/ for a list with contact persons.) 

You can read more about this tool and download it here: https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader/wiki/DVUploader,-a-Command-line-Bulk-Uploader-for-Dataverse.


Upload your files doubled zipped

If you have a huge number of files for your dataset, and there is no other way to get them uploaded, consider uploading the files double zipped. This prevents the automatic unzipping of all files by Dataverse, a process that takes a lot of time and can cause a time out. 

For example, place your .zip file in a folder and then zip this folder. 


Keep the downloader in mind

For reuse of data, bear in mind that downloading a large dataset can also be problematic. The zip download limit is set to 10 GB. So if a user would like to download multiple files - Dataverse will create a zip in this case - it can happen that not all files will be included in the zip. See also the FAQ ' How to download large files from DataverseNL'. 



Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article