The quality of the data captured provides the foundation for any GIS.
Important decisions are often taken these days based on geodata, including planning decisions, simulations (faults in plant management networks), or for model calculations (e.g. flooding models), etc. These decision-making tools based on geodata can only ever be as good as the geodata itself.
Geodata represents enormous value. Procuring and ensuring the geodata account for around half of the costs incurred for GIS operations according to estimates. Data quality must therefore be the objective of any GIS deployment. It is THE essential condition for being able to deploy a GIS effectively.
Learn more about, which tools can help you to ensure data quality and download the GEONIS Data Quality guidelines.
Contents GEONIS Data Quality guidelines
- Mandatory quality characteristics for GEONIS
- Tools for quality assurance
- Overview of data quality tools
- Added value of data quality
- How to enhance data quality
Masses of data are collected and stored on a continuous basis these days. People often get swamped by floods of data with no real added value, since production of the data is generally much easier (e.g. digital photos) than proper filing and qualitative processing.
This is no different in a geographic information system (GIS). Unlike pure master databases, which among other things should be redundancy-free and consistent, topological correctness also plays a very important role with GIS. The fundamental purpose of topology is to guarantee the spatial relationships of the objects among themselves, and to support data acquisition based on rules.
Development of the network documentation (see image).
Unfortunately not enough attention is paid specifically to the topology when acquiring data in a GIS. This means that countless (new) functions and products cannot be used. The missing topology is not evident from charts and graphs, and the map looks “nice” and accurate, as was the case previously on the analog maps. Yet this means that significant added value of a GIS is not exploited.
Data quality in Geographic Information Systems means the quality of the relevant information. The following criteria are often applied (see image).
Timeliness describes the temporal validity of the geodata and can for instance be specified via an acquisition date. Timeliness can also be defined via an internal corporate guideline, e.g. the data must be updated by the day, week or month.
- Ensuring update quality/reporting
Accuracy can be interpreted in different ways. On the one hand it relates to the geographical position of the geodata (absolute position accuracy in relation to a higher-level coordinates system or relative to other objects), and on the other it can also refer to the thematic accuracy. This involves the level of detail for the master information, e.g. the number of cable types in an electrical registry system.
- Record in the acquisition guidelines
Consistency means the logical accuracy of the geodata. Among other things this includes the topology, as well as the same level of timeliness for different objects within a data record or in multiple data records (no contradictions). Data redundancy must be eliminated in connected systems in particular using suitable interfaces.
- Carry out topology and master data checks (acquisition checks)
The concept of rightness must be seen in terms of accuracy (without errors), i.e. the data must be consistent with reality.
- Carry out plausibility checks
A data record is complete in quality terms if the complete required data is available for an area and all master data has been filled in.
- Whether a data record is complete in quality terms depends on its subsequent usage purpose(s) = record in the acquisition guidelines
Ensure Data Quality
The more automated processes are and the benefit of the connected systems is therefore exploited as a result, the more important it becomes to ensure that data is “clean”. Seven steps are listed below aimed at helping users to process their own data and increase data quality continuously.
Clear responsibilities for data entry and error rectification in the specialist departments as well as for the company as a whole are an absolute must.
Define quality criteria
The data quality requirements must be defined jointly within the specialist department or within a company and set out in writing, e.g. in acquisition guidelines.
Analyze data condition
The current condition of the data must be checked and reconciled with the acquisition guidelines. Any data that does not correspond with the acquisition guidelines must be corrected as a one-off occurrence. The tools described in the guideline may help with this, although they may need to be adapted to the user’s own needs.
Columns or tables are often obsolete or are no longer required in current databases. As well as making the database/tables unclear and confusing, these objects also impact on performance. For this reason a company should review periodically whether data that is no longer required for day-to-day business can be archived or even erased.
Automatize data flow
Clearly defined workflows (e.g. with flowcharts) enable data from different departments and from different providers of knowledge and expertise to be compiled together and even automated. The aim is to exploit the potential of the available data to maximum effect through merging it. In addition to recording new data, documenting erased data (historization) is also important for the purposes of dataflows.
Data quality cannot be achieved using technical resources alone. It also includes e.g. specification of consistent forms of spelling (Street, Str., St., etc.). In order to avoid these rules getting lost in the flurry of activity of day-to-day business, employees should be reminded or trained at fixed intervals. Training also raises awareness among employees of the importance of data quality.
Establish regular data quality checks
Quality checks and data clean-ups must be repeated at regular fixed intervals. The schedule for these checks/reviews is dependent on different factors (e.g. the degree to which the data is up-to-date). Among other things the tools listed in the guideline provide support here. This procedure aims to uncover any incomplete, defective and/or redundant data or any contradictions in the databases quickly.