Web content, websites and online resources
Websites and intranets are used to conduct business, and provide services and information. Records of these activities are public records that you must capture and manage.
Responsibility for the keeping and managing web records is shared across several staff and teams:
- records managers and CEOs
- website / content authors
- website administrators (including outsourced service providers)
- IT personnel.
What you need to capture and how will depend on the type of content, your web content strategy, and how you manage records in your agency.
Find out what to do if the management of your website, web content or online services are outsourced.
Table of contents
A recordkeeping strategy will help you determine:
- what web records are being created
- what records need to be kept
- at what point and how often the records need to be captured (consider the addition, modification and removal of resources and services)
- how they should be captured and managed
- how long they need to be kept.
Base your web content strategy on:
- what you use the website for (advice and resources, delivery of services, or both)
- who uses the information and services
- your legislative, accountability and business requirements
- the type of content and records created
- the site complexity
- how records are created (e.g. automatically generated from online transactions, manually created advice)
- how often resources are added, modified, and removed (i.e. frequent/infrequent, regular/irregular)
- the consequences (costs of legal/financial/corrective actions) if information is unreliable, out-dated, or unrecoverable
- what needs to be captured to adequately document business activities
- how you capture and manage records in your agency.
Your strategy should aim to include recordkeeping as part of the overall process and not an additional activity.
Websites contain three types of information: content, context and structure.
Content refers to what is published on your website:
- HTML-encoded pages
- documents, forms, publications, advice
- changes to content
- services and resources provided
Web content can be:
- Static–what you publish online (e.g. advice, publications, tools, documents). Static content doesn’t change based on what the user does.
- Dynamic–the online services and resource delivered online, where there is interaction between the user and the service/resource (e.g. online transactions, search queries, online forms)
Context is information about what is published:
- date and time the web content was published, amended, and/or removed
- codes or templates used to turn data into web content
- author and approval for publication of or changes to content
- content (e.g. policies, procedures, audit logs, permissions)websiteinformation relating to the modification/administration of your
- evidence of transactions or communications conducted online (e.g. submit a form, file a complaint, update details, make a payment)
- publication details of web page (e.g. URL).
Context includes administrative information about the management and operation of the website (e.g. content author permissions, page views, metadata).
Structure is information about the website as a whole.
It includes administrative information about the operation of the website such as site maps and information architecture that illustrate the arrangement of website content.
You must capture any records that document your business activities, and any that are created, received or kept to meet legal, business or community requirements and expectations.
- advice, tools, publications or information published on your website (static content)
- services, transactions or business activities that are delivered or occur online (dynamic content)
- records relating to the administration and management of the website (administrative content).
Some of these records may be automatically generated (e.g. an email from a form submitted online); others will need to be manually created.
You need to capture enough information to give a clear picture of what was published and/or what occurred. To do that you need to capture the content, context, and, in some cases, the structure of the website.
Consider whether you need to capture the whole website (the look and feel and interactivity between pages) or if capturing what was published / what has occurred is sufficient.
Ensure appropriate metadata is attached to records and that the content, context and structural information is linked. This will help provide a complete picture of what was published or what occurred, particularly for dynamic content such as online services and transactions.
You should capture automatically generated records based on the format they are in (e.g. email, PDF) and the function or activity they relate to (e.g. advice, license renewals). Find out how to capture records and specific formats.
Note: Any records of online transactions integrated into business systems must be created and maintained. The systems will need to have sufficient recordkeeping functionality to be able to manage the records.
If your website includes search functions that users can use to access particular services and resources, consider capturing records of:
- how search functions work
- search queries and results, including:
- the tool selected
- the dates of use
- details about how the search facility selects and ranks results.
You can do this by recording:
- search queries and results, and ensuring links between the 2 are maintained
- each unique HTML file that is delivered in response to a query.
How you capture web records will depend on what you need to capture, the type of content, and the complexity of your website.
You may need to use one or more of the options below to ensure you capture adequate web records.
To capture static content:
- capture snapshots
- use a combination of snapshots and change logs
- manage content and objects separately (e.g. word document of advice published and when).
To capture dynamic content:
- create, update and maintain a log of the available resources and services, and the triggers for their delivery
- capture and maintain records of the HTML files that are delivered to a client, and the elements that triggered their creation
- use a combination of snapshots and activity/audit logs to capture and manage the required information
- capture the IP addresses where resources or services were provided.
To capture administrative content use a combination of snapshots, change logs and activity/audit logs to capture and manage the required information.
Snapshots (or screenshots) allow you to see what was available at the time and capture the look, feel and functionality of the website.
Snapshots can be done at regular intervals (e.g. every month) or at particular points in time (e.g. after major changes).
They can be used to capture static content, or the entire collection of resources and services.
They do not allow you to retain the hypertext (e.g. hyperlinks) functionality of the website. You may need to add information about hyperlink destinations to ensure the information is not lost even if the destination changes or is moved.
Make sure snapshots collect:
- the text or documents displayed
- any scripts that run on the page
- any programs, browser software and plug-ins that are used.
When to capture snapshots will depend on your agency’s requirements and risk assessment.
Snapshots should be used in combination with other strategies to capture all required information.
Talk to your IT team about organising snapshots. They may need to make modifications to relevant software or website functionality to ensure the snapshot is authentic.
Free open source web harvesting tools are available to capture websites in a form suitable for long term preservation. Talk to your IT department about which tool to use.
Change logs track the alterations made to resources and services over time, creating a list of online activities.
You can use change logs to provide a record of online activity and changes to resources between snapshots.
Change logs should capture changes to online resources and services, including:
- the text or documents displayed
Use change logs in combination to snapshots to capture static and dynamic content.
Activity and audit logs capture online transactions in response to particular events (e.g. evidence of resource or service use, tracking queries or transactions).
They can be used to document who has accessed (or attempted to access) a system, the actions performed, the number of people accessing a system, or verify the security of a system.
They should capture:
- the date and time of the event, transaction or activity
- information about who accessed the system–user profile, IP address or domain name, web browser used, user name or identification
- all actions performed online, including services accessed, any searches, queries or changes
- any resources returned to the user, including any scripts that are executed
- authentication of identities involved in an event, payments made and data security (for transactional services).
Use activity/audit logs in combination with snapshots to capture dynamic content.
Activity/audit logs may be purged or overwritten in some systems. Make sure any audit logs captured to document online activity are retained for the full appropriate retention period.
Online records logs
Online records logs can be used to maintain a log of what, where and when resources were made available. Fields can include:
- date and time
- person making the changes
- person who requested the changes
- target URL
- the source or documents relating to the changes
- person responsible for authorising the changes
- file number relating to the activity.
Use online record logs in combination with snapshots to capture dynamic content. Activity/audit logs can also be used to ensure all information is captured.
An online archive replicates all past and present online resources and services. It allows the reconstruction and navigation of activity for any point in time.
Note: an online archive is a data archive, not a recordkeeping application or an archive for the storage and management of permanent records.
An online archive needs to be linked to a recordkeeping system where sufficient metadata can be stored and maintained, and records can be protected and managed appropriately.
Maintaining an online archive will require:
- large amounts of storage
- technical support
- adequate funding
- careful planning of recordkeeping and system requirements
- collaboration between recordkeeping and IT staff.
Note: Online archives do not capture records of electronic transactions.
Free open source tools are available online. Talk to your IT department about which tools to use.
Resources and associated metadata can be captured separately and linked to the URL where they are or have been available.
- convert the web page and related contextual information into PDFs and capture in your recordkeeping system
- take screen shots (e.g. print screen, image) of all relevant information and context
- capture draft new content or changes to content, including structural and contextual information as text (e.g. MS Word, Apple Pages etc.)
- print the web page and related approvals and place on a physical file (if no other option available).
Make sure you capture all the required information to give a clear picture of what was published or what occurred. This includes hyperlinks to other pages and documents, approvals, and any metadata, structural or contextual information.
This approach allows you to manage static resources individually instead of part of a more complicated set of objects.
It can be useful if resources are available as individual items (e.g. publications). However, it does not allow you to easily reconstruct the set of resources made available at a given point in time.
Records of websites are subject to the same disposal rules as other digital records.
Records should be sentenced based on the function and activity they relate to.
Most website records can be sentenced using the same activity or record class in your agency’s core schedule or the General retention and disposal schedule (GRDS) that other records documenting that activity would be sentenced under.
The GRDS does contain record classes for data administration and routine computer operations. These may apply to some website records such as activity, audit and change logs, and other administrative information.
If you are reviewing your agency retention and disposal schedule, make sure records you capture as part of any services provided online or maintaining your agency’s website are covered.
Most websites will use a content management systems (CMS) to manage online content–they automate web publications and their associated processes.
However, some are created without automated content controls and they may not be able to support full and accurate records over their required retention periods.
If you use a CMS or other web authoring software to manage your website and online resources, ensure:
- the system has adequate recordkeeping functionality,
- appropriate metadata can be attached to records
- it captures all of the information you need to keep
- the CMS is suitably integrated with a recordkeeping system
- additional recordkeeping strategies are employed.
You may need to put controls in place to ensure records are appropriately managed for as long as they are required–some can be modified to do this.
If your CMS is not a suitable recordkeeping solution, you may need to use other options as well to ensure you can capture and manage your web content records.
Records generated from agency websites are subject to the requirements of relevant legislation and evidence laws including:
- Public Records Act 2002
- Evidence Act 1977
- Electronic Transactions (Queensland) Act 2001
- Information Privacy Act 2009 (see Collecting and using personal information below)
- Copyright Act 1968 (Cth)
Principle 2 of Information Standard 26: Internet (IS26) outlines the mandatory recordkeeping requirements for Queensland Government agencies in the creation, implementation, and management of agency websites for the delivery of information and services.
You may need to capture records documenting the use of any third party copyrighted material on your website.
In Queensland, digital publications, including online material, are subject to legal deposit provisions under ss.195CA-195CJ of the Copyright Act 1968 (Cth) and part 8 of the Libraries Act 1988 (Qld).
Consider security, privacy and authentication requirements for any online services and resources that require or allow users enter personal information. This includes:
- online e-commerce or other financial services
- services requiring clients to enter or update personal details
- lodgement of forms containing personal details
- exchanging information about clients with other agencies (e.g. changes of address).
Collecting and using personal information
You must comply with the Information Privacy Act 2009 when collecting, capturing or storing personal information. You will need to create and keep records that:
- give your agency permission to capture, use, distribute or otherwise make available personal details that have been provided online
- document the processes that ensure client privacy is maintained.
Any exchanges of information provided online by clients will need to be documented, including records of:
- how client consent was gathered
- the date, time and process of information exchange
- authorisations for the exchange
- the transaction itself.
The Office of the Information Commissioner Queensland provides guidelines on the Information Privacy Principles relating to personal information.
Systems and processes that collect, capture or store personal information must be secure (i.e. free from as many risks as possible) and inviolate (i.e. complete and undamaged).
Find out more about the collection of information under the Information Privacy Act 2009, the Information Privacy Principles, and National Privacy Principle 1.
Authentication and authorisation
Any online system that allows users to enter personal information should have some form of authentication requirement to verify their identity and their right of access to the resources and services.
You will need to create and maintain records of:
- authentication processes and requirements
- different authorisations for resources and services
- the authorisation processes
- authorised users, including the dates of their authorisations and any relevant renewals.
Audit and activity logs can be used to document who has accessed (or attempted to access) a system, the actions performed, the number of people accessing a system, or verify the security of a system.
Audit logs containing personal information and captured as a record will need to be stored appropriately.
If you are archiving or decommissioning your entire website, you may need to capture:
- all of the information on the site
- site management records (e.g. records of routine administration)
- web modifications
- change/activity/audit logs
- technical specifications.
This will depend on how your website has been managed, if records have already been captured, and how records were captured previously.
Look at what content and information needs to be captured to provide sufficient evidence of the resources and services available online and of your business activities.
Also consider capturing records about:
- decisions to decommission the website
- information about replacement / new website (if applicable)
- if content has been removed, moved or replaced
- what it is replaced by (if applicable)
- changes between the old website and the new website.
If the whole website is particularly important (e.g. a permanent record), you may need to capture and keep the entire site.
If a website is archived, make sure all contextual information is retained and that the website will remain meaningful.
If you inherit a website from another agency as part of a machinery-of-government change, your agency becomes responsible for the management of that website and ensuring all records are captured.
PANDORA (through TROVE) is an option to capture an entire website. It is capable of ensuring websites are archived and remain accessible. QSA can then work with them to manage the site as a permanent archival value record. However, PANDORA may not be able to capture and manage all elements of your website and some functionality may be lost. You will need to discuss this option with both TROVE and QSA to ensure the website can be captured and managed as a permanent archival value record.
You should also capture site management decisions about how to access it and all other supporting documentation. This will ensure that you are retaining an adequate record of current management and access arrangements.
If you are decommissioning or migrating your entire website, or even just the content, you may need to capture the entire website as a record of what is was beforehand.
Look at both the decommissioning toolkit and the advice on migrating records and systems for more information. The advice these contain are designed to help decommission or migrate systems that contain records. This information can be applied to websites and may help inform your decisions about managing the records your website contains.
- Websites (IS26) policy, Queensland Government Chief Information Office
- Online standards, policies and legislation for websites
- Archiving Web Resources: Guidelines for Keeping Records of Web-based Activity in the Commonwealth Government, National Archives of Australia
- NARA Guidance on Managing Web Records, National Archives and Records Administration
- Keeping web records, State Archives & Records Authority of New South Wales
- Guideline 15: Recordkeeping Strategies for Websites and Web pages, Tasmanian Archive + Heritage Office
- Web archiving and web continuity guidance, The National Archives UK.