Vacuuming the Digital Trail – Part 1
Vacuuming the Digital Trail – Part 1
Like most database management systems (DBMS), SQLite databases have a vacuuming feature that cleans up unused pages automatically at the end of a transaction.
In this series I will breakdown how vacuuming works in SQLite databases so we can get a better understanding of the impact vacuuming can have on record recovery during a forensic examination.
In part 1 I will breakdown how vacuuming works in SQLite databases.
What is Database Vacuuming?
At a high-level, database vacuuming is the process of reclaiming space occupied by data that has been marked for deletion but has not been physically removed from the database.
Here are several reasons why vacuuming can be important:
- Reclaiming Space: When records are deleted or updated in a database, the space they occupied isn’t always immediately released back to the operating system. Instead, it’s typically marked as free within the database file but still physically occupies space within the file. Vacuuming allows the DBMS to reclaim this space by reorganizing the database file and physically removing the data that is no longer needed.
- Defragmenting the Database: As databases undergo frequent inserts, updates, and deletes, the data files can become fragmented. Fragmentation occurs when data is spread out across the database file, leading to increased disk I/O and slower data retrieval times. Vacuuming helps to defragment the database by consolidating free space and reorganizing data, which improves data locality and reduces the need for the DBMS to scan through scattered data blocks.
- Maintaining Database Health: Regular vacuuming is essential for keeping the database healthy and performing optimally. Without vacuuming, deleted data remains in the database file, causing it to grow unnecessarily larger over time. This can lead to increased storage requirements and degraded database performance as the DBMS has to manage larger data files. By vacuuming regularly, the database remains compact, organized, and efficient, which contributes to better overall performance and responsiveness for database queries and transactions.
In summary, database vacuuming is crucial for reclaiming disk space, optimizing database performance by reducing fragmentation, and maintaining the health and efficiency of the database over time. It’s a standard maintenance practice in database administration to ensure that databases operate smoothly and efficiently as they handle increasing volumes of data and transactions.
How does Vacuuming Work in SQLite Databases?
In SQLite databases, vacuuming can be triggered in 2 ways:
- Enabling auto_vacuum (Most Common)
- Executing the vacuum command
Damien Attoe
Before joining Spyder Forensics, Damien was a Managing Consultant at AccessData where he managed eDiscovery and digital forensics projects and provided services to companies in various industries including the Health Care, Energy, and Financial industries. Prior to that, Damien was a Computer Crime Specialist at the National White Collar Crime Center where he conducted digital forensic research and performed software validation studies on digital forensic software
Auto_Vacuum
Auto vacuum was first introduced to SQLite in Version 3.1.0 (2005-01-21) and can be enabled by the application developer by executing: PRAGMA database.auto_vacuum = mode. This type of vacuuming is what we commonly encounter during a forensic examination, especially with the publicly available applications that we use daily on our mobile devices.
Note: For Auto_vacuum to be enabled, the PRAMGA statement must be executed when the database is created (before the tables are created).
The auto vacuum mode determines what happens to “free pages” at the end of a transaction. There are 3 auto_vacuum modes:
- 0 or NONE – The auto_vacuum feature is disabled (Default settings)
- 1 or FULL – The auto_vacuum feature is enabled and any freelist pages that exist after a transaction has been committed will be moved to the end of the file and truncated.
- 2 or INCREMENTAL – The auto_vacuum feature is enabled; but the incremental_vacuum PRAGMA must be executed for freelist pages to be truncated. With this mode only a set number of freelist pages will be removed (more on this shortly).
Full Vacuum
When a transaction occurs that causes all records on a page to be deleted, the database will be reorganized, and a page will be added to the freelist so it can be reused when the database needs a new page to write data. Due to the database reorganization (part of the B-Tree mechanism), the resulting freelist pages may or may not be the page where the records were deleted.
In the below example, a delete transaction has caused all records on Page 2 of the main database file to be deleted. The database was reorganized causing the contents of Page 3 to overwrite Page 2 and Page 3 was added to the freelist.
With full vacuum mode enabled, at the conclusion of the transaction, the database will be reorganized again resulting in all freelist pages being automatically moved to the end of the file and then they are truncated causing the main database file to shrink.
Continuing with our example where page 3 was turned into a freelist page. With full vacuum enabled, the contents of page 4 would overwrite the contents of page 3 and then page 4 would be truncated. In this example the main database file would shrink by 1 page.
Incremental Vacuum
With incremental vacuuming mode enabled, at the conclusion of a transaction the database will be reorganized; however, rather than all freelist pages being removed from the database file, only a set number will be truncated, and the main database file will shrink accordingly. This mode is becoming more popular vs Full Vacuum as it is less i/o intensive on the database, especially when there are transactions occurring frequently. An example of a database that has incremental_vacuum enabled is the iOS KnowledgeC database. This database will only keep records for a certain period so it can accumulate a lot of freelist pages fast depending on device usage. In this instance it makes sense from an i/o perspective to incrementally truncate the freelist pages to reclaim space rather than a complete database reorganization each time.
As mentioned previously, for an Incremental Vacuum to occur, a PRAGMA statement must be executed when a connection is made to the database specifying how many freelist pages to truncated. The command is PRAGMA database.incremental_vacuum(N) where N is the number of pages to be truncated.
In the below example, a transaction deleted all the records on Pages 2 and Page 3 and the database is reorganized causing the contents of Page 4 to be moved to Page 2. Now Pages 3 and 4 are freelist pages. If incremental vacuuming is set to truncate 1 freelist page at the conclusion of the transaction, Page 4 would be truncated, and the main database file will shrink by 1 page. This would still leave page 3 on the freelist which will be used when the database needs a new page to write data. From a forensic examination perspective, Page 4 is lost, which would contain data; however, Page 3 will still exist as a freelist page and could potentially contain recoverable records.
Note: In this example, if the next transaction did not create a new freelist page or add a new page, i.e. modifications were made to Page 1 or 2, then Page 3 would be truncated and there would be no freelist pages in the database.
Vacuum Command
The vacuum command is the alternative method for ‘cleaning’ a database but must be triggered either manually by a user, or via the application source code. When executed, the database file is rebuilt to use the minimal amount of storage space possible.
With the smaller databases that we typically encounter on mobile devices the vacuuming process is fairly quick and completely transparent to the user if triggered through the application code. For larger databases that may be implemented in a corporate environment, vacuuming can take a while run and will use up more storage space as it goes through the process (Up to twice the original database size).
Vacuum
When this command is executed, Step 1 is to copy all active data in the database to a transient sqlite database (temporary). Think of this like running the sqlite dump command where the database schema, and active records are output to a file. This will not include freelist pages, freeblocks and page unallocated space which are areas in the database file where we look to recover deleted records.
Step 2 is to overwrite the original database file with the contents of the temporary file. This process is treated like a regular transaction; whereby, the journal files will be utilized. With a Rollback Journal, the new “clean” version of the database will automatically be committed to the main database file; however, if a write-ahead log is used, a WAL checkpoint must occur for the vacuum to be considered complete.
Vacuum INTO
There is also an INTO clause that can be used with the Vacuum command. When this clause is included the original database file is not modified, but rather a new “clean” database is created. This new database will contain all the active data from the original database without any of the deleted content.
This process is a little more friendly from an I/O perspective, but a little more complex as the application will need to be configured to connect to the new database file.
Final Thoughts
As I have discussed, vacuuming is a very destructive process and essentially destroys our chances of recovering deleted records from an SQLite database. From an application perspective, database vacuuming is critical for the health and longevity of the database, especially those that are constantly running in the background storing various types of information.
Over the past few years, I have seen a wider adoption of the vacuuming features in SQLite, and not only for the purposes of storage, defragmentation and health, but also to better protect the application users. The whole idea of these privacy apps is to keep your stuff private. Initially, the focus was on securing the network traffic and since that has been achieved through various mechanisms, we have seen a shift in focus on securing the app data stored on the device. With SQLite, app developers can utilize the SQLite encryption extension, vacuuming and secure_delete to make it harder for forensic examiners to access the data and recover deleted data.
When I am examining a database and the tool reports that it was unable to recover any data from freelist pages, like any competent examiner I don’t take that at face value, I validate it so I can attempt to explain why there is no deleted data. I don’t go down a rabbit hole straight away and do a deep dive hex analysis, but rather I check the signs to see if vacuuming is enabled. Does the database have freelist pages? No, next question, is auto_vacuum enabled? Yes, so we have our answer as to why there is no data, freelist pages are truncated at the end of the transaction.
In the next post in this series, I will deep dive the technical aspects of the auto vacuuming feature, by looking at the database header and pointer map pages.