Finished Last updated: 20.05.2022
End: dec 2021
Start: jan 2019

Availability of data and information is crucial to be able to take correct decisions at all levels in many sectors: from healthcare to transport and energy, including bioeconomy, big data have become fundamental. The new challenges coming from data complexity and the demands for faster information management, elaboration and extraction call for a paradigm shift in the way data is acquired, stored, processed and analyzed. The solution for the challenges lies in the methods and in the technology, platform and software solutions of big data which are advancing rapidly. Especially the geospatial part of big data solution is a technology which is much newer and constantly in progress.

Status Concluded
Start - end date 01.01.2019 - 31.12.2021
Project manager Jonathan Rizzi
Project manager at Nibio Tor-Einar Skog
Division Division of Survey and Statistics
Department Geomatics

Norway is among the most advanced countries in the use of IT and for digitalization. Several recent report or strategy documents of the Norwegian government highlight the potential societal benefits for Norway in using big data, including geospatial information. In particular, it is shown that benefits can be represented by the improvement of the efficiency of decision-making processes and of the quality and timeliness of decisions and by giving actors the opportunity to react more quickly to deviations from normal situations. As examples, the handling of environmental disasters, planning of transport, security of society and increased business development and innovation are mentioned as concrete examples.

The Big data (Stordata) project has been started in order to increase the competence within NIBIO and take the benefits deriving from the application of new methods, such as machine learning, in the daily activity of the institute. The project is organized in different work packages aiming at reaching three main objectives.

1) Establish new alliances and partnerships in relation to the use of big data, cooperate in reasch projects and share information.

2) Test methods for the analysis of big data (machine learning and deep learning algorythms) and apply them in pilot cases for different sectors of interest for NIBIO, including automatic update of maps suche as AR (areal resource map of Norwa).

3) Test and apply to test case different technolgies and platforms for storing and sharing big data, including the Norwegian Sigma2 infrastructure, open source flatform for distributed storage and distributed computing and online cloud platforms.

Results will be useful to boost the competence on big data within NIBIO by exchanging experiences and sharing information and knwledge on this topic.

Publications in the project

Abstract

Rapporten dokumenterer utvalgte eksempler på bruk av stordata (engelsk: big data) teknologi og metode i NIBIO. Det første eksemplet er knyttet til oppdatering av arealressurskartet AR5, hvor det undersøkes om stordata-tilnærming kan benyttes for å identifisere lokaliteter der kartet må oppdateres. De neste eksemplene er hentet fra fagområdet plantehelse og tar for seg mulighetene for å bruke stordata-metode for å bedre prediksjonsmodeller og gjenkjenning av for skadegjørere.

Abstract

Rapporten utforsker og diskuterer potensialet for økt bruk av Stordata (engelsk: big data) teknologi og metode innenfor instituttets arbeidsområder. I dag benyttes Stordata-tilnærminger til å løse forvaltningsstøtteoppgaver, samt til forskningsformål, særlig i sentrene for presisjonslandbruk og presisjonsjordbruk. Potensialet for økt bruk av Stordata innenfor instituttet er stort. For å realisere potensialet er det behov for god samordning mellom organisasjonsenhetene og utvikling av strategisk kompetanse på fagområdet.

Abstract

There are neither volume nor velocity thresholds that define big data. Any data ranging from just beyond the capacity of a single personal computer to tera- and petabytes of data can be considered big data. Although it is common to use High Performance Computers (HPCs) and cloud facilities to compute big data, migrating to such facilities is not always practical due to various reasons, especially for medium/small analysis. Personal computers at public institutions and business companies are often idle during parts of the day and the entire night. Exploiting such computational resources can partly alleviate the need for HPC and cloud services for analysis of big data where HPC and cloud facilities are not immediate options. This is particularly relevant also during testing and pilot application before implementation on HPC or cloud computing. In this paper, we show a real case of using a local network of personal computers using open-source software packages configured for distributed processing to process remotely sensed big data. Sentinel-2 image time series are used for the testing of the distributed system. The normalized difference vegetation index (NDVI) and the monthly median band values are the variables computed to test and evaluate the practicality and efficiency of the distributed cluster. Computational efficiencies of the cluster in relation to different cluster setup, different data sources and different data distribution are tested and evaluated. The results demonstrate that the proposed cluster of local computers is efficient and practical to process remotely sensed data where single personal computers cannot perform the computation. Careful configurations of the computers, the distributed framework and the data are important aspects to be considered in optimizing the efficiency of such a system. If correctly implemented, the solution leads to an efficient use of the computer facilities and allows the processing of big, remote, sensing data without the need to migrate it to larger facilities such as HPC and cloud computing systems, except when going to production and large applications.