The current data environment presents a growing variety and volume of data. While at the same time, a new mode of inquiry, problem solving, and decision-making is also becoming pervasive in our society. This new mode calls for the application of mathematical, computational, and statistical models to infer actionable information from large quantities of data. This paradigm, often called Big Data Analytics requires new forms of data management to deal with the volume, variety, and velocity of Big Data. This new form of data management is called Data Curation.
Data curation: what is it
Data curation encompasses all the processes needed for principled and controlled data creation, maintenance, and management. Besides, data curation also adds value to data. So data curation looks into:
- Acquisition and care of data
- Making decisions regarding what data to collect
- Overseeing data care and maintenance (metadata)
- Conducting research based on the data collected.
- Ensuring proper packaging of data for reuse
- And sharing this data with the public
- Ensure data maintains its value over time
Data curation: what key insights to draw
The demand for data interoperability and reuse, and effective transparency by eScience and eGovernment respectively, are driving data curation practices and technologies. These sectors are playing the roles of innovators and visionaries in the data curation technology adoption lifecycle. Organizations in the biomedical space, such as pharmaceutical companies, are one of the early adopters.
The main idea behind data curation is to enable more complete and high-quality data-driven models for knowledge organizations. Complete models would support a larger number of answers. Data curation practices and technologies are facilitating organizations and individuals to reuse third party data in different contexts.
Emerging economic models, like the public-private partnerships, can support the creation of data curation infrastructures. Such an investment in the data curation infrastructure will lead to better quantification of the economic impact of high-quality data.
In order to improve the scale of data curation, there needs to be a reduction in the cost per data curation task and an increment n the pool of data curators. For improving the automation of complex curation tasks, a hybrid human-algorithmic data curation approach and the ability to compute the uncertainty of the results of algorithmic approaches are fundamental. Another factor that plays an important role in scaling up data-curation is crowdsourcing, which allows access to a large pool of potential data curators.
The approach of interaction between curators and data has a significant bearing on curation efficiency and reduces the barrier domain experts and casual users to curate data. Some of the key functionalities in human-data interaction include semantic search, natural language interfaces, data visualization and summarization, and intuitive data transformation interfaces.
A standards-based data representation improves interoperability by way of reducing syntactic and semantic heterogeneity. Such conceptual model standards and data model available in different domains.
With a growth in the number of data sources and decentralized content generation, a fundamental issue for data management environments is to ensure data quality. And data curation methods and tools do exactly that for you.

