This page has information about what to include in a data citation. Also included is information about copyright and licensing.
In addition to below, see the LTS webpages about copyright here.
What is copyright?
U.S. Copyright Office defines “copyright is a form of protection grounded in the U.S. Constitution and granted by law for original works of authorship fixed in a tangible medium of expression. Copyright covers both published and unpublished works”. Therefore, copyright is a form of intellectual property, working similar to a patent and a trademark.
What data is copyrightable?
In the U.S., facts are not copyrightable. For example, the fact that 1+1=2 is not copyrightable. Therefore, individual facts or data points are not copyrightable by applying this principle to data. For example, the water temperature was measured at 10 degrees at a lake on March 20th is a fact, and hence not copyrightable. What is copyrightable in the U.S. is a new or original way to select and arrange data. Using the phone book as an example, the information a phone book contains, including facts such as name, address and phone number is not copyrightable; however, the original arrangement of all the sequential ordering is copyrightable. For data, it means that the original selection and arrangement of data can be copyrighted. In addition, the associated metadata, that is, the documentation and descriptions of data or about the processing used to collect data are also copyrightable.
Who can claim copyright?
In the U.S., copyright is typically assigned by default. Therefore, you will have the copyright of the data you produced even if you haven’t officially filed for a copyright. However, copyright is not necessarily assigned to the people who create the data, but rather to the organizations for which they work. It’s important to read and understand the copyright and intellectual property policy of Lehigh University to be aware of when this might be the case for you.
(Adapted from "Mayernik, M. 2012. “Responsible Data Use: Copyright and Data.” In Data Management for Scientists Short Course, edited by Ruth Duerr and Nancy J. Hoebelheinrich, Federation of Earth Science Information Partners: ESIP Commons. doi: 10.7269/P31V5BWP")
Why do we need data citation?
Datasets generated in the research are equally valuable as the papers appearing at scientific journals, and should be treated as a citable source on par with traditional materials. To ensure these dataset assets permanently available for access and reuse, the arising data citation can enable researchers to create links between their academic publications and the underlying datasets.
What does a data ciation contain?
|Author(s)||Creator(s) of the dataset|
|Publication date||Whichever is the later of: the date the dataset was made available, the date all quality assurance procedures were completed, and the date the embargo period expired.|
|Title||As well as the name of the cited resource itself, this may also include the name of a facility and the titles of the top collection and main parent sub-collection (if any) of which the dataset is a part.|
|Edition||The level or stage of processing of the data, indicating how raw or refined the dataset is.|
|Version||A number increased when the data changes, as the result of adding more data points or re-running a derivation process, for example.|
|Feature name and URI||The name of an ISO 19101:2002 'feature' (e.g. GridSeries, ProfileSeries) and the URI identifying its standard definition, used to pick out a subset of the data.|
|Resource type||Examples: 'database', 'dataset'.|
|Publisher||The organisation either hosting the data or performing quality assurance.|
|Unique numeric fingerprint (UNF)||A cryptographic hash of the data, used to ensure no changes have occurred since the citation.|
|Identifier||An identifier for the data, according to a persistent scheme.|
|Location||A persistent URL from which the dataset is available. Some identifier schemes provide these via an identifier resolver service.|
What should researchers be aware of when citing a dataset?
Although the standardization and consistency in research data citation are still evolving, Ball and Duke(2012) from Digital Curation Center have summarzied some widely accepted practices in data citation for researchers to use:
(Adapted from "Alex Ball and Monica Duke, 2012. How to Cite Datasets and Link to Publications. In A Digital Curation Center 'working level' guide. Digital Curation Center".)
Why license research data?
A data license will make clear the terms of using data, ensure a second party to understand what they are allowed to do with the data, and prevent infringing on the rights held during data reuse.
What data licenses are available?
|License Option||General Information||Pros||Cons||License Type|
|Creative Commons||• Simple yet robust licenses for creative works.
• Have been used widely for most forms of original content, including data.
• very simple, factual datasets
• data to be used automatically
|Watch out for: • attribution stacking
• the NC (Non-Commercial) condition: only use with dual licensing
• the SA (Share Alike) condition as it reduces interoperability
• the ND (No Derivatives) condition as it severely restricts reuse.
|• The most permissive way of releasing data.
• All copyrights and database rights are waived, allowing the data to be used as freely as possible.
• Infringement becomes a non-issue.
|Watch out for:
• lack of control over how database is reused
• lack of protection against unfair competition
|Open Data Commons||
• Similar to Creative Commons licenses, but designed specifically for databases.
|Depending on the license type||Depending on the license type||ODC-BY
|Multiple Licensing||• Used when none of the above licenses are satisfactory
• Usually employed in licensing the open source softwares.
How do I select a data license?
(Adapted from "Ball, A. 2011. How to license research data. In: A Digital Curation Centre and JISC Legal 'working level' guide. Digital Curation Centre".)