Check this post out on the Arctype blog!
Intro
SQLite is an often overlooked flavor of SQL engines. Some have suggested it is the most prolific SQL engine in existence due to its highly flexible nature and ability to run on almost any platform with limited resources. Unlike SQL engines like MySQL, PostgreSQL, MSSQL, or Oracle, SQLite runs without a server. SQLite does not rely on a data directory or a constantly running daemon: a database is encapsulated in a single file.
SQLite and iMessage
iMessage is one of the most popular messaging platforms today, largely due to being built into iOS and Mac devices. Since its release it has evolved significantly and brought many new features to messaging, but at its core it is simply an instant messaging platform. iMessage uses SQLite in the background to store relational data about messages, conversions, and their participants.
As a long time Apple user, I have backed up and transferred my iPhone data since my first time using an iPhone, which was November 10, 2009. Because I have been digitally hoarding my text data for so long, my iMessage database is nearly 1GB in size.
Until a few years ago, the built-in search feature for iMessage was very limited and buggy. Although it has recently improved significantly, it is, like nearly any end-user tool, very limited in how you can query it. Those of us who frequently work with data that is trapped behind a limited front-end often wish we could get direct access to the SQL database. Fortunately, the iMessage database is not inaccessible and very easy to access.
Finding the iMessage SQL Database
On Your Mac
If you have iMessage enabled on your Mac as well as your iPhone you have 2 different databases from which to choose.
The database on your Mac is very easy to find as it is simply under ~/Library/Messages/chat.db
.
If you do not use your Mac for iMessage, or as in my case your Mac iMessages do not go as far back,
you can extract your iPhone's database by performing a backup to your Mac.
On Your iPhone
Follow these instructions to extract your iPhone's iMessage database:
- Opening Finder and selecting your iPhone under "Locations".
- Find the "Backups" section and select "Back up all of the data on your iPhone to this Mac", then press Back Up Now to immediately create a new backup. (This process may take a while.)
- Once it is complete, you will find the SQLite file under
/Users/[username]/Library/Application Support/MobileSync/Backup/[backup name]/3d/3d0d7e5fb2ce288813306e4d4636395e047a3d28
. - If you plan to open this database with Arctype, you'll want to copy and rename the file with a
.db
extension to indicate that it is a SQLite file.
Getting Started with SQLite
Unlike most SQL servers, you do not need a connection string, host, or username to connect to a SQLite database. All you need to do is point your SQL client to the database file. Arctype makes it simple and convenient to load in SQLite databases within the same workspace as your other connections.
With Arctype
- Under the Connections dropdown, select "Add new data source"
- Select "SQLite"
- Find the SQLite database file. The file must have a .sqlite3 or .db extension for Arctype to open it.
More detailed instructions can be found in the Arctype Docs.
With Command Line
From a UNIX terminal, type sqlite3 [filename]
iMessage Schema
One of my favorite parts about Arctype is how easy it is to analyze database schema. I'm a long time user of command line
tools and old-school editors, but sometimes having a more visually interactive tool is a lifesaver.
Let's dig into the schema Apple has created for iMessage.
Today we will focus on the chat
, message
, handle
tables, as well as a few join tables to connect related records.
Note that I have created a custom view called handle2
which adds a field id2
that obfuscates the phone numbers
and email addresses of my personal contacts,
and you will see this view referenced in the examples in this article.
Digging Into iMessage
Let's write some queries and makes some observations that would not be possible without direct SQL access.
Pique Your Nostalgia with Old Messages
To get started, let's begin with a simple query to view your first 50 messages. If you have chat threads that go back years and years, there is no easy way to access early messages from your iPhone or Mac.
The interface on both platforms requires you to scroll back by about 25 messages at a time. This is prohibitively time-consuming and can result in a crash or reset if the user sends you a new message while you're scrolled back.
Fortunately, we have custom SQL for the save:
select
h.id2 as sender_name,
m.text as message_body
from
message m
join handle2 h on h.rowid = m.handle_id
order by
m.date
limit
50;
handle.id
represents the readable identifier for the user. It will be either a phone number or an email address.
Rate Your Friendships with SQL
Let's use SQL to find out who our best friends are. Assuming you view quality of friendship as a function of the quantity of sent text messages, this should be very accurate!
First, let's divide the amount of messages that are from_me
by those that are not to produce a reply ratio. This query shows the top 10 people
we have been messaging by the total amount of messages, as well as the reply ratio.
Multiplication by 1.0 casts to the REAL
data type to avoid integer division, which would result in 1 or 0 instead of a decimal.
Here you can see the rules for integer division in SQLite.
select
h.id2,
count(1) as cnt,
round(
sum(
case
when m.is_from_me then 1
else 0
end
) * 1.0 / count(1) * 100.0,
2
)
from
message m
join handle2 h on h.rowid = m.handle_id
group by
h.id
order by
cnt desc
limit
10;
One issue with this analysis is that fewer sent messages does not necessarily imply fewer words sent. Let's add some more fields to get a better insight.
Here we can see the total amount of characters sent and received, the average length of text message sent and received, the total ratio of characters sent and received, and the reply ratio. In my case, people from whom I tend to receive more messages also send longer messages than I.
select
h.id,
count(1) as cnt,
sum(length(m.text)) as chars,
sum(length(m.text)) filter (where m.is_from_me) as chars_sent,
sum(length(m.text)) filter (where not m.is_from_me) as chars_received,
round(avg(length(m.text)) filter (where m.is_from_me)) as avg_length_sent,
round(avg(length(m.text)) filter (where not m.is_from_me)) as avg_length_received,
round((sum(length(m.text)) filter (where m.is_from_me) * 1.0 / sum(length(m.text)) filter (where not m.is_from_me)), 2) as characters_sent_ratio,
round((count(1) filter (where m.is_from_me)) * 1.0 / (count(1) filter (where not m.is_from_me)), 2) as reply_ratio
from
message m
join handle h on h.rowid = m.handle_id
group by
h.id
order by
cnt desc
limit
10;
This query makes heavy use of aggregate filters.
Aggregate filters allow you to use an aggregate function on only a part of the data by specifying a WHERE
clause to filter out unwanted records.
Examining iMessage Reactions
There are 2 newer iMessage features whose implementations, in the context of their schema design, are interesting to look into. Recently it was announced that Android phones will be able to show iMessage "reactions" properly. Historically, if you send an iMessage reaction to a non-Apple device, it will show up as a textual addition instead of an icon.
With the announcement of the new compatibility with Android devices, I was curious to learn how the current implmentation of the feature works.
I SELECT
ed
a few records with and without a reaction and compared the results. I discovered that the associated_message_type
column was usually set to 0, but in messages
with a reaction it was an integer value between 2000-2005. I also noticed that associated_message_guid
was present.
Apple appears to be using 2000-2005 for its 5 reaction types, 3000-3005 for when a user removed a reaction, 3 for an Apple Pay request.
From this investigation it appears that reactions are sent as iMessages with the reaction's textual equivalent appended and a foreign key relation to the parent message. This allows the messages to seamlessly be sent and received by non-Apple devices.
If the message is sent over SMS, the metadata linking the reaction to
the message it references is simply lost. If the device is iMessage capable, Apple devices will ignore the text
part of the message, find the associated
message and add the proper reaction as a visual overlay.
Note that the message
table includes both a ROWID
and a guid
. ROWID
is a typical auto-increment integer id
field which
is useful for joining on within the local database. However, the auto-incremented primary key will not be the
same for the same message across devices. The GUID
is globally unique, generated by the author of the message,
and sent to all of its recipients. This allows foreign key reference across different databases, devices, and users.
For more information about the utility of GUIDs, check out this article.
Find Your Most Popular Group Chats
Group chats are stored in the chat
table. Join tables chat_handle_join
and chat_message_join
are used to associate users and messages, respectively,
with group chats. Here's a query to find out most used group chats (chat with > 1 members) and the identities of their participants.
select
group_concat(distinct h.id2) as participants,
count(m. "ROWID") as message_count
from
chat c
join chat_handle_join chj on chj.chat_id = c."ROWID"
join handle2 h on h. "ROWID" = chj.handle_id
join chat_message_join cmj on cmj.chat_id = c."ROWID"
join message m on m. "ROWID" = cmj.message_id
group by
c."ROWID"
having
count(distinct h.id) > 1
order by
message_count desc
limit
10
The group_concat
function, which is familiar from MySQL by the same name and familiar to PostgreSQL users as string_agg
, is an aggregate function that
concatenates strings together. See more on how it can be used within SQLite here.
The HAVING
clause is similar to a WHERE
clause but operates on aggregate functions. If you've wanted to write a query
conditional on an aggregate but are not able to inside of your WHERE
clause, HAVING
is there for you.
Conclusion
SQLite is a powerful tool whose prolific reach across devices and numerous use cases make it one of the most impressive software projects around. If you're curious about what's behind the scenes, SQLite's source code is well known to be well-organized and fun (well, to some of us) to peek into.
iMessage is just one of many pieces of software that rely on SQLite and are used by million of end-users. Download Arctype for free today and start exploring the databases that power the tools you use daily!