36 Open Data and Open Licensing
Sachin Kalra
Objectives
Why Open Data?
What is Open Data? What is Open License?
What is Government role in Open License? Open Data in Education sector.
1. Introduction
The concept of open data is not new; but a formalized definition is relatively new the primary such formalization being that in the Open Definition which can be summarized in the statement that “A piece of data is open if anyone is free to use, reuse, and redistribute it subject only, at most, to the requirement to attribute and/or share-alike.”
Open data is often focused on non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organizations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by a license.
2. Open Data
1.1 Understanding the Foundations of Open Data While understanding the Foundations of Open Data, we aim to address the following aspects:
● Why do government and people share data?
● What will citizens, businesses, scientists, and journalists do with the data?
● How can we manage it?
The Open Data Foundation (ODaF) is a non-profit organization dedicated to the adoption of global metadata standards and the development of open-source solutions promoting the use of statistical data. ODaF focuses on improving data and metadata accessibility and overall quality in support of research, policy making, and transparency, in the fields of economics, finance, healthcare, education, labor, social science, technology, agriculture, development, and the environment. While we see this information as being primarily statistical in nature, it is understood that it can be drawn from a wide variety of sources and therefore may include information not traditionally seen as such.
Data has many sources, the administration of surveys and the monitoring of transactional flows and registers being some of the most common. In order to become useful for the end-user communities, raw data commonly go through various editing, aggregation and analytical stages.
While researchers and academics may find the micro-data useful, policy and decision makers and the and the general public are more commonly interested in the easier to manage high-level aggregates. Despite the existence of tools and the emergence of open metadata specifications, it is often not possible to connect the different parts of this information chain together. Such connection, however, is critical in fully understanding the data.
Ideally, it should be possible for a user to easily perform tasks such as:
● Discover the existence of data
● Access the data for research and analysis
● Find detailed information describing the data and its production processes
● Access the data sources and collection instruments from which and with which the data was collected, compiled, and aggregated
● Effectively communicate with the agencies involved in the production, storage, distribution of the data
● Share knowledge with other users
The Open Data Foundation exists to help realize this vision, working in cooperation with standards initiatives and other interested parties.
1.2 Why Government Share Data?Another critical aspects of Open Data is the fact why should the Government share the Open Data ? The following points clearly states the reasons on why should the data be shared:
● Meet regulatory compliance
● Provide transparency into government operations
● Anticipate economic development
● Initiate innovation
3. What is Open Data?
Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other “Open” movements such as open source, open hardware, open content, and open access. The philosophy behind open data has been long established, but the term “open data” itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as Data.gov and Data.gov.uk.
Data is open if it satisfies both conditions below:
● Technically open: available in a machine-readable standard format, which means it can be retrieved and meaningfully processed by a computer application.
● Legally open: explicitly licensed in a way that permits commercial and non- commercial use and re-use without restrictions.
3.1 Open Data Paradigm
3.2 Technology Option
Open Data Scenarios: Three different levels of complexity based on the number of datasets available and the frequency of their update frequency
● Level 1 – less than 100 datasets with less than 10 datasets changing weekly
● Level 2 – 100 to 1,000 datasets with 10-100 datasets changing weekly
● Level 3 – Over 1,000 datasets with 100+ changes weekly.
Level 1 dataset –less than datasets with less than datasets changing weekly
- Level 1–less than10 datasets with less than 1 datasets changing weekly
- Level 1 –less than 10 datasets with less than 10 datasets changing weekly
- Level 1 –less than 100 datasets with less than 10 datasets changing weekly
- Level 1 –less than 100 datasets with less than 1 datasets changing weekly
- None of the above
Level 2 dataset – to changing weekly datasets with datasets
- Level2–100 to 1,000 datasets with 10-100 datasets changing weekly
- Level 2 –10 to 1,00 datasets with 10-100 datasets changing weekly
- Level 2 –100 to 1,000 datasets with 100-1000 datasets changing weekly
- Level 2 –10 to 1,00 datasets with 100-1000 datasets changing weekly
- None of the above
Level 3 dataset – Over datasets with changes weekly.
- Level 3 –Over 100 datasets with 100+ changes weekly.
- Level 3 –Over 1,000 datasets with 1000+ changes weekly.
- Level 3 –Over 10,000 datasets with 100+ changes weekly.
- Level 3 –Over 10,000 datasets with 1000+ changes weekly.
- None of the above
3.3 Datasets level
3.3.1 Level 1 Open Datasets
Level 1: less than 100 datasets with less than 10 datasets changing weekly Build a conventional website using a standard web server
● Data as raw data files within web server space (or on public cloud storage)
● Metadata as micro formats on HMTL page
● Site search using existing tools like Google site search
● Manual update of data, metadata and content by central IT team
● Recommended
3.3.2 Level 2 Open Datasets
Level 2: 100 to 1000 datasets with 10-100 datasets changing weekly Front end as conventional CMS-based website
● Data as raw, manually managed, data files within web server space, public cloud storage , or individual ministry websites
● Metadata in SQL database, served through dynamically generated query pages
● Automated checking of broken links –Community facilities essential
● Site search by text search on database, CMS search or Google site search (GSS)
● Maintained by central team
● Recommended
o Consider caching to take care of load
o Do not store metadata in the CMS itself
3.3.3 Level 3 Open Datasets
Level 3: Over 1000 datasets with 100+ changes weekly Front end needs to integrate different web services (API)
● Automated management of raw data files (possibly being stored on public cloud storage) (Level 3a)
● Management of data using a combination of raw data files and data stored in an Open Data System’s database (Level 3 b)
● Metadata in optimized metadata repository
● Delegate submission/maintenance of datasets to individual Ministries with custom dialogues, automatic validation, and role-based access control
● Automated checking of broken links
● Community facilities essential, including ability to raise issue with the “owner” of each dataset (e.g contact person, web form)
● Search by structured search on metadata repository, CMS or Google site search on main site
3.4 Demand and Engagement
● Central and local governments around the world are increasingly ‘opening’ a range of data, for free, including as part of continuing global efforts to strengthen ‘open government.’
● While this has resulted in excitement from development practitioners, government sponsors, and technologists, much of the public has been left behind.
● As a result, the level of informed public debate across regions on data-driven issues from budgets to service delivery to the practical effectiveness of donor aid in ‘opened’ sectors is low.
● So, now that this data has been ‘opened’, how can it capture the attention and imaginations of the full spectrum of users?
● How can we focus on the other side the demand side of the open data phenomenon?
● How can we grow communities of data users, and encourage data ‘ownership’ by the media, civic hackers, community groups, NGOs, labor unions, professional associations, universities, and more?
4. India’s Open Data: www.data.gov.in
Data.gov.in (Data Portal of India) is a platform for single-point access to datasets and apps published by Ministries/Departments/Organisations of the Government of India. It combines and expands the best features of India’s “India.gov.in” and the U.S. government’s Data.gov project.
History
After announcing the launch of the site in June 2011, the site was launched in October 2012 part of the Open Government Initiative was launched during October 2012, in compliance with the National Data Sharing and Accessibility Policy (NDSAP) of India, Gazette notified in March 2012.
According to the preamble of NDSAP, there has been an increasing demand by the community that data collected with the deployment of public funds should be made more readily available to all, for enabling rational debate, better decision making and use in meeting civil society needs.
The policy envisages proactive dissemination of data by Government Ministries / Departments / Organizations.
Overview
The site is based on Drupal Framework, and has four major modules:
- Data Management System (DMS): This facilitates publishing of Datasets/applications by authorised users from Ministries/Departments/Organisations.
- Content Management System (CMS): This module is used to update or create content and functionalities for Data Portal India.
- Visitor Relationship Management (VRM): This module facilitates collation and dissemination of feedback/suggestions received on Data Portal India.
- Communities: People with specific interest can connect through online communities.
Drupal Framework, does NOT have following module:
- Document Management System
- Data Management System
- Content Management System
- Visitor Relationship Management
- Communities
5. National Data Sharing & Accessibility Policy (NDSAP) – 2012
Asset and value potentials of data are widely recognized at all levels. Data collected or developed through public investments, when made publicly available and maintained over time, their potential value could be more fully realized. There has been an increasing demand by the community, that such updated data collected with the deployment of public funds should be made more readily available to all, for enabling rational debate, increase transparency better decision making and use in meeting civil society and government needs. Efficient sharing of data among data owners and inter-and-intra governmental agencies along with data standards and interoperable systems is the need of the hour. Hence, there was a need to formulate a policy on National Data Sharing and Accessibility Policy (NDSAP) which provides an enabling provision and platform for proactive and open access to the data generated through public funds available with various ministries/departments/organizations of Government of India.
Open Government Data
“A dataset is said to be open if anyone is free to use, reuse, and redistribute it – Open Data shall be machine readable and it should also be easily accessible.”
Government collects processes and generates a large amount of data in its day-to-day functioning. But a large quantum of government data remains inaccessible to citizens, civil society, although most of such data may be non-sensitive in nature and could be used by public for social, economic and developmental purposes.
These data need to be made available in an open format to facilitate use, reuse and redistribute; it should be free from any license or any other mechanism of control. Opening up of government data in open formats would enhance transparency and accountability while encouraging public engagement. The government data in open formats has a huge potential for innovation building various types of Apps, mash-ups and services around the published data sets.
The government data in open formats has a huge potential for innovation building various types:
- Apps around the published datasets
- Crowd sourcing around the published datasets
- Mash-ups around the published datasets
- Services around the published datasets
- All of the above
As per the NDSAP, within a year all the datasets need to be published on the Data Portal, data.gov.in and within the first three months at least 5 high value datasets need to be published.
Government data generated through following processes and events:
- Primary Data e.g. Population Census, Education Census, Economic Survey, etc.
- Processed/Value Added Data e.g. Budget, Planning, e
- Data Generated through delivery of Government Services e.g. Income Tax Collection, MNREGA wage distribution etc.
Government data generated through following processes and events:
- Primary Data e.g. Population Census, Education Census, Economic Survey, etc.
- Processed/Value Added Data e.g. Budget, Planning, etc.
- Data Generated through delivery of Government Services e.g. Income Tax Collection, MNREGA wage distribution etc.
- All of the above
Data Generated through delivery of Government Services e.g. Income Tax Collection, MNREGA wage distribution etc.
Each Government department shall have its own criterion of high value and low value data sets, generally High value data is governed by following principles:
- Completeness
- Primary
- Timeliness
- Ease of Physical and Electronic Access
- Machine readability
- Non-discrimination
- Use of Commonly Owned Standards
- Licensing
- Permanence
- Usage Costs
High value data is governed by following principles:
- Completeness
- Redundancy
- Timeliness
- Ease of Physical and Electronic Access
- Non-discrimination Generally High value data is NOT governed by following principles except:
- Use of Commonly Owned Standards
- Licensing
- Usage Costs
- Permanence
- None of the above
NDSAP recommends that data has to be published in open format. It should be machine readable. Though there are many formats suitable to different category of data. Based on current analysis of data formats prevalent in Government it is proposed that data should be published in any of the following formats:
- CSV (Comma separated Values)
- XLS (spread sheet- Excel)
- ODS/ODT (Open Document Formats for Spreadsheet/Text)
- XML (Extensive Markup Language)
- RDF (Resources Description Framework)
- KML (Keyhole Markup Language used for Maps)
- GML (Geography Markup Language)
- RSS/ATOM (Fast changing data e.g. hourly/daily)
NDSAP recommends that data need NOT be published in open format except:
- RDF (Resources Description Framework)
- KML (Keyhole Markup Language used for Maps)
- GML (Geography Markup Language)
- None of the above.
NDSAP recommends that data to be published in open format:
- CSV (Comma separated Values)
- XLS (spread sheet- Excel)
- ODS/ODT (Open Document Formats for Spreadsheet/Text)
- All of the above
6. Relation to other open activities
The goals of the Open Data movement are similar to those of other “Open” movements.
Open access is concerned with making scholarly publications freely available on the internet. In some cases, these articles include open datasets as well.
Open content is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.
Open knowledge The Open Knowledge Foundation argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open data is included within the scope of the Open Knowledge Definition, which is alluded to in Science Commons’ Protocol for Implementing Open Access Data.
Open notebook science refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.
Open source (software) is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.
Open research/Open science/Open science data (Linked open science) means an approach to open and interconnect scientific assets like data, methods and tools with Linked Data techniques to enable transparent, reproducible and transdisciplinary research.
7. Open Data for Education Sector
(Linked Data technologies for connecting open educational data)
Educational institutions produce a lot of data. Much of these data is or could be publicly available, either because they are useful to communicate (e.g., the course catalogue) or because of external policies (e.g., reports to funding bodies). The point of open data for education is therefore easy to make and many educational institutions are starting initiatives in this direction.
Linked Data, can help to make state of the art technologies for Web data efficient in bringing the benefit of open data into education. A special focus is put on the usage and consumption of such data: how the availability of large scale, interlinked data resources on the Web can lead to the development of a new type of services in the more and more globalised education environment. Concretely, this will be achieved through introducing the basis of Linked Data technologies, as well as through numerous examples of concrete deployment of these technologies for open educational data and their usage.
The emerging Web of Data has produced a vast body of knowledge, containing data of explicit educational nature, as well as vast amounts of resources, for instance, from libraries, museums or encyclopedias, which are not explicitly targeted at educational purposes yet are increasingly being used in such contexts. Building on earlier initiatives, such as linkededucation.org or linkeduniversities.org, the EU-funded project LinkedUp (http://linkedup-project.eu) pushes forward the exploitation and adoption of public, open data available on the Web, in particular by educational organizations. LinkedUp conducts activities, including the establishment of the LinkedUp Challenge (http://linkedup-challenge.org) and a corresponding evaluation framework. The latter will provide a general framework for evaluating all aspects of open Web data-driven applications. These are aimed at identifying and promoting innovative success stories which exploit large-scale Web data in educational scenarios as part of robust applications and tools. Additional dataset curation activities are resulting in a repository and catalog of well-described and assessed datasets (see http://datahub.io/group/linked-education & http://data.linkededucation.org/linkedup/catalog/) and will support interested data consumers and application developers.
8. Open License
8.1 What is Open License
A license is a document that specifies what can and cannot be done with a work (whether sound, text, image or multimedia). It grants permissions and states restrictions. Broadly speaking, an open license is one which grants permission to access, re-use and redistribute a work with few or no restrictions. For example, a piece of writing on a website made available under an open license would be free for anyone to:
- print out and share,
- publish on another website or in print,
- make alterations or additions,
- incorporate, in part or in whole, into another piece of writing,
- use as the basis for a work in another medium such as an audio recording or a film. Openly licensed works are hence free to be shared, improved and built upon!
The exact permissions granted depend on the full text of the open license that is applied. Different projects may require slightly different sets of permissions, or restrictions and there are a range of different licenses available to cater to these different purposes. Some open licenses stipulate that the work may be freely re-used or re-distributed as long as the original author is appropriately credited. Some licenses state that any derivative works or works that incorporate all or parts of the original work are made available under the same license as the original work.
An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared under defined terms and conditions. This allows end users to review and modify the source code, blueprint or design for their own customization, curiosity or troubleshooting needs. Open-source licensed software is mostly available free of charge, though this does not necessarily have to be the case. Licenses which only permit non-commercial redistribution or modification of the source code for personal use only are generally not considered as open-source licenses. However, open-source licenses may have some restrictions, particularly regarding the expression of respect to the origin of software, such as a requirement to preserve the name of the authors and a copyright statement within the code, or a requirement to redistribute the licensed software only under the same license. One popular set of open-source software licenses are those approved by the Open Source Initiative (OSI) based on their Open Source Definition (OSD).
8.2 Why use an open license?
Works that are published without an explicit license are usually subject to the copyright laws of the jurisdiction they are published in by default. These laws typically give several exclusive rights to the copyright holder – including the right to produce copies, and to produce derivative works. These rights prohibit unauthorized re-distribution and re-use by third parties – and can remain in effect until the date of death of the author plus 70 years. While the protections offered by copyright laws are appropriate in many circumstances, there are also circumstances in which these protections may be unnecessarily restrictive.
Open licenses enable creators to allow more freedom in what others can do with their works. Benefits of this freedom include:
- allowing others to circulate the work freely potentially giving it a greater circulation than if a single group or individual retained an exclusive right to distribute;
- not forcing users to apply for permission every time they wish to circulate a copy of the work in question which can be a time consuming affair, especially if the work has many authors;
- encouraging others to continuously improve and add value to a work;
- encouraging others to create new works based on or derived from the original work e.g. translations, adaptations, or works with a different scope or focus.
8.3 How can I apply an open license?
Applying an open license to a work can be very straightforward. The procedure may slightly vary depending on which license is selected, but should be more or less as follows:
- Get permission from all right holders to openly license the work.
- Decide which open license best suits your purposes.
- Display a notice somewhere prominent on your work stating that your work is made available under the open license you have chosen. Include a copy of, or a link to, the full text of your chosen license in your work.
8.4 Why Does Openness and Licensing Matter?
Why should one bother about openness and licensing for data?
Open data is crucial to progress on the fundamental items. It’s crucial because open data is so much easier to break-up and recombine, to use and reuse. People or organizations get incentives to make their data open and for open data to be easily usable and reusable i.e. for open data to form a ‘commons’. A good definition of openness acts as a standard that ensures different open datasets are ‘interoperable’ and therefore do form a commons. Licensing is important because it reduces uncertainty. Without a license you don’t know where you, as a user, stand: when are you allowed to use this data? Are you allowed to give to others? To distribute your own changes, etc? Together, a definition of openness, plus a set of conformant licenses deliver clarity and simplicity. Not only is interoperability ensured but people can know at a glance, and without having to go through a whole lot of legalese, what they are free to do. Thus, licensing and definitions are important even though they are only a small part of the overall picture. If we get them wrong they will keep on getting in the way of everything else.
- True Does Openess and Licensing matter?
- False How can one apply for Open License
- Get permission from all right holders to openly license the work.
- Decide which open license best suits your purposes
- Display a notice somewhere prominent on your work stating that your work is made available under the open license you have chosen
- All of the above
9. Creative Commons license
A Creative Commons (CC) license is one of several public copyright licenses that enable the free distribution of an otherwise copyrighted work. A CC license is used when an author wants to give people the right to share, use and build upon a work that they have created. CC provides an author flexibility (for example, they might choose to allow only non-commercial uses of their own work) and protects the people who use or redistribute an author’s work, so they don’t have to worry about copyright infringement, as long as they abide by the conditions that are specified in the license by which the author distributes the work.
There are several types of CC licenses. The licenses differ by several combinations that condition the terms of distribution. They were initially released on December 16, 2002 by Creative Commons, a U.S. non-profit corporation founded in 2001.
Work licensed under a Creative Commons license is governed by applicable copyright law. This allows Creative Commons licenses to be applied to all work falling under copyright, including: books, plays, movies, music, articles, photographs, blogs, and websites. Creative Commons does not recommend the use of Creative Commons licenses for software.
However, application of a Creative Commons license may not modify the rights allowed by fair use or fair dealing or exert restrictions which violate copyright exceptions. Furthermore, Creative Commons licenses are non-exclusive and non-revocable. Any work or copies of the work obtained under a Creative Commons license may continue to be used under that license.
9.1 Types of CC licenses
The CC licenses all grant the “baseline rights”, such as the right to distribute the copyrighted work worldwide, without changes, at no charge. The details of each of these licenses depends on the version, and comprises a selection of four conditions:
- Attribution (BY)
- Share-alike (SA)
- Non-commercial (NC)
- No Derivative Works (ND)
Reference
- http://www.opendatafoundation.org/
- http://creativecommons.org/licenses/by/2.5/au/legalcode
- http://data.gov.in/
- http://en.wikipedia.org/wiki/Data.gov.in
- http://opensource.org/licenses
- http://en.wikipedia.org/wiki/Open_License_Program