collection-analysis documentation & resources¶
NOTE: This document and tool is very much a work in progress. Data in the various databases provided, and information found in this documentation is subject to change.
The interface and included data can be found here:
Source for these docs as well as the export / import scripts and more can be found here:
General Purpose of this Resource¶
This resource is provided to document the snapshot process for the Cincinnati & Hamilton County Public Library (CHPL) “current_collection” data set and to provide other resources related to the use of this tool.
Using the Data Set¶
The software used to power this data set is called “Datasette” (https://datasette.io/) and is currently written and maintained by Simon Willison. Datasette documentation can be found here: https://docs.datasette.io/en/latest/
To explore the data, it’s possible to use this tool to run SQL queries directly on the tables in each of the databases to explore various states of the CHPL physical collection.
For example: To find the number of total titles that have at least a single item associated with that title, you can use the following query.
select count(distinct b.bib_record_num) from bib as b join item as i on i.bib_record_num = b.bib_record_num
Below are static SQL queries for reports and analysis
- Static SQL Queries & Reports
- Bib Records Deleted Within the Last Week
- Item Records Deleted Within the Last Week
- Collection Value
- Available Items & Circulation Information By Location at Branch
- Lucky Day Leased Books and Leased DVDs Analysis
- New Books List
- Items with 0 Circulation by branch_name (including pagination)
- Item Data Consistency Report – Excluded Titles
More examples, general use-cases and miscellaneous information can be found below.
What is Included in the Data Set?¶
Snapshots of the CHPL physical collection’s metadata is done weekly
There are 3 “levels” of collection metadata snapshots provided as distinct databases:
current_collection: Most-current snapshot of the metadata.
collection_prev: Previous snapshot
collection-YYYY-MM-DD: Start-of-year snapshot. (e.g., collection-2021-01-04 would represent the state of the collection from the first Monday of 2021)
There are two primary tables in each data snapshot (additional tables are also included to supplement these tables which can be found in the links below):
bib: Bibliographic metadata associated with a resource
item: Item-level metadata (such as item location, barcode, etc.)
More detail about what is included in each of the database snapshots can be found below.
- Database Tables, Columns, and Definitions
- bib Table, item Table Relationship
- bib Table
- item Table
- bib_record Table
- item_status_property_myuser Table
- itype_property_myuser Table
- physical_format_myuser Table
- country_property_myuser Table
- language_property Table
- record_metadata Table
- bib_record_item_record_link Table
- location Table, location_name Table, branch Table, branch_name Table Relationship
- location Table
- location_name Table
- branch Table
- branch_name Table
- phrase_entry Table
Uh, OK. But, Like …Why?¶
Data is amazing! The ability to examine, aggregate, and transform data can give incredible and powerful insights into the large physical collection that CHPL maintains for the public.
Reports, search tools and other really interesting and useful things can be generated from this data
For Example This: