Data do not exist in a vacuum. To be useful, data must be accompanied by context on how they are captured, processed, analyzed, and validated and other information that enables interpretation and use.

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.

     .    The fact of coming from some particular source or quarter; origin, derivation
     .    The history or pedigree of a work of art, manuscript, antiques etc.
     .    A record of the passage of an item through its various owners

Why we need Provenance ?
To assess its quality, reliability & trustworthiness.

Persistent identifiers are identifiers that persist on a particular representation of a resource. They can be updated as the resources themselves change, thus persistent identifiers make resource representation accessible even in the case of the knowledge going off-line or changing locations. Persistent identifiers also enable linked data.

This software aims to incorporate both provenance and persistent identifiers to the data analytics processes in big data. A provenance system responsible for creating entities and recording processes will be developed. The entities will be made available via persistent identifiers, allowing the knowledge representation in both graphical and textual forms.

