Open Data for Development in Latin America and the Caribbean » Glossary

Get involved

Glossary

 

CONCEPTS AND DEFINITIONS

 

Datum: Symbolic representation (numerical, alphabetical, algorithmic, etc.), attribute or characteristic of an entity. A datum is a minimal expression of content on a topic.

Format (data): Set of technical and presentational features corresponding to the physical and logical structure used to store data in a file. Usually it is identified by a suffix at the end of the file name. Example: mifile.pdf, publications.xml.

Dataset: Related data, conveniently organized, and structured with a physical and thematic unity, so that they can be appropriately treated (processed) to obtain information. It is not necessarily directed to a specific user.

Database: Dataset with inherent meanings built for a specific purpose and directed to a specific user group.

Catalogue: A catalog is organized input to the user of the datasets published by an organism. The catalog can be viewed as an organized list of terms and concepts that describe the metadata of a dataset, plus more metadata, and links to other related data relevant to users. Generally, it is used to sort and locate information by users of the data repository.

Repository (of data): Virtual Drive of the physical datasets published by an organization.

Personal data: Any numerical, alphabetical, graphic, photographic, sound or any other information on an identified or identifiable person, whose identity can be established directly or indirectly, through one or many identification numbers or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity.

Copyright: Moral right, irrevocable and inalienable. It is an exclusive right of exploitation that a person has over its literary, artistic and scientific creations.

Intellectual Property Rights: Copyright and related rights, including forms of sui generis protection.

License: Information about the copyright of the dataset. Using a license is essential to provide clarity and certainty to users about the possible uses of the information contained in the datasets. For purposes of the release of the datasets and to facilitate their reuse, the following basic conditions must be respected: a) keep the original sense of the information b) cite the source of the information, c) publish when the information was updated the last time.

Open Data: Data without restrictions of any kind and with particular emphasis on the absence of administrative and technological restrictions. To be considered as open, there are several technical requirements data must meet (Example: The 8 Principles of Open Government Data), which implies that there may be different “intensities” of data opening (Example: The 5 stars of Open Data from the Web Foundation).

Open Government Data: Data: a) collected, produced and/or received by public institutions -and particularly by institutions of the State and the central government- during the course of their business or according to assigned functions; b) maintained, organized and stored as an object of history or inquiry to the institution, the public administration and the general public; c) made available to citizens and public or private institutions, so the data can be disclosed and potentially used by the society in general and particularly by entities that can add value to that data, d) that fall into the “technical and administrative” definition of open data. There are a variety of data that qualify for these purposes: geographic, weather, traffic, government management, the use of fiscal resources, among others.

Data Opening: Action and result of making publicly accessible restricted or hidden data in an organization, presenting them in a way that can be used by any re-user agent.

Publication: Model of open data exchange that does not require a previous bilateral agreement between the publisher of data and its consumer. It is the process to publicly and permanently expose the data and metadata of an institution. The data are available on known and standard formats and process patterns. The goal is to allow an open use from any counterpart, either by human interfaces or automated procedures. The publisher is the entity or individual responsible for the publication of data.

Consumer (of data): Any person or organization that accesses the published data and gets a copy of all or part of them for their own purposes.

Reuse: Use something, either for the same function previously performed or for other purposes.

Public Reuse: Use of documents held by the public sector, by natural or legal persons, with commercial or noncommercial purposes. That use, should not constitute a public administrative activity.

Infomediary: Company or business whose model is based on managing information for third parties, through the collection of data from various sources, its study and selection, in order to organize and distribute them to its your customers.

Infoactivist: A person who despite of having access to appropriate technology and uses it to collect, combine, create, and distribute information in a democratic and participatory way; utilizes Internet as a global platform to try to cause social, political, economic, and environmental change, among others.

Interoperability: Property or ability of two or more systems or components to exchange information and use the exchanged information.

Visualization: Graphic representation of data and models, which helps the user understand the structure and meaning of the information contained in the data.

 

OPEN DATA TECHNICAL GLOSSARY

 

Metadata: Data and/or other documents that describe the data in terms of context, content, or in any way that is considered necessary to extend the conceptualization of the described data – that is, the characteristics that any data set has associated. Metadata is an essential tool to organize, classify, relate, and reason about data. Examples of a dataset metadata: title, description, publisher, publication date, etc.

Raw data: An expression that refers to data in its “original” state and not derived from another set of data –that means prior to processing or aggregation. They are also defined as “primary”.

URI (Uniform Resource Identifier): It is a short string that uniquely identifies a resource (service, page, document, email, encyclopedia, etc.). Usually these resources are accessible on a network or system. The URI can be Uniform Resource Locators (URLs), Uniform Resource Name (URN), or both (URL + URN).

URL (Uniform Resource Locator): Name/identifier. Compact sequence of characters that allows to locate a resource by describing its primary access mode. URLs are a subset of URIs.

RDF (Resource Description Framework): Infrastructure to semantically describe resources. This means giving sense to what we are representing so machines can understand it. RDF can be represented in different formats: XML, N3, Turtle, etc.

RDFa (Resource Description Framework-in-attributes): Form of representing visible structured data in websites using semantic annotations –included in the code and invisible to the user- which allow applications to interpret this information and use it effectively .

DCAT (Data Catalog Vocabulary): RDF Vocabulary for interoperability of data catalogs. Its main objective is the expression of government data catalogs in a standard format using RDF.

REST (Representational State Transfer): Any simple web interface that uses XML and HTTP, without the additional abstractions of the protocols based on message exchange patterns such as SOAP web services protocol.

SKOS (Simple Knowledge Organization System): RDF vocabulary for representing semi-formal knowledge systems such as thesauri, taxonomies and classification schemes. SKOS was designed to facilitate the migration of existing organizational systems to the Semantic Web.

SPARQL (SPARQL Protocol and RDF Query Language): Query information technology from databases and other data sources in its primary state through the Web. It consists of a standardized query language and a protocol that provides a standard Web service (HTTP / SOAP), which allows queries to diverse data sources that store them natively in RDF or present them as such.

Linked Data: Information objects that are linked by computer protocols: using the RDF model to describe the data, and URIs – or href links (Web) – to name the data objects and expose them for access via HTTP protocol, facilitate the interconnection and a useful relationship between the data in an interpretable form by both people and machines.

API (Application Programming Interface): It is is an interface for communication between software components, which provides a set of calls to certain programming libraries that provide access to certain services from the process, getting into programming abstraction between lower and higher levels of the Software.

MIME (Multipurpose Internet Mail Extensions): A set of conventions or specifications aimed at the exchange over the Internet of any type of file (text, audio, video, etc.) in an inadvertently way for the user.

Data Mining: Set of advanced techniques used for discovering and obtaining implicitly existing in the data, which is useful for a particular field of study or business.

Ontology: A formal description of concepts and relationships that can exist on a community or between determined agents. It is an agreed specification which describes an information domain.

Semantic: Particular way of inferring facts or ideas from a word or phrase.

 

ABBREVIATIONS

 

CSV: Comma separated Values.

DCMIDublin Core Metadata Initiative.

GML: Geography Markup Language.

HTML: Hyper Text Markup Language.

JPEG: Join Photographic Expert Group.

KML: Keyhole Markup Language.

MIME (Multipurpose Internet Mail Extensions): A set of conventions or specifications dedicated to the exchange on Internet of all kinds of files (text, audio, video, etc.) in a transparent way to to the user.

MPEG-7: Moving Picture Expert Group.

PNG: Portable Network Graphics.

SDMX: Statistical Data and Metadata exchange.

SPSS: Statistical Package for Social Science.

SSL/TLS: Secure Socket Layer/Transport Layer Security.

STATA: Data Analysis and Statistical Software.

SVG: Scalable Vector Graphics.

TAR: Tape Archive.

URI: Uniform Resource Identifier.

WGS84: World Geodetic System 1984.

XHTML:  Extensible Hyper Text Markup Language.

XML:  Extensible Markup Language.