iOS
Diffable Data Sources and Data Storage — Part 1
Matt Neuburg
Written on January 28, 2021

When I first started programming iOS (back in iOS 3.2), many of my apps revolved around table views (UITableView). At first, I was surprised by how Apple makes you supply the data to a table view. You don’t just store all the data in the table view. Instead, there’s a separate object, the data source (the table view’s dataSource
, conforming to UITableViewDataSource). Periodically, the table view comes along and asks the data source what I like to call the Three Big Questions:
numberOfSections(in:)
tableView(_:numberOfRowsInSection:)
tableView(_:cellForRowAt:)
The last one is the most important. It takes an index path, signifying what section and row of the table this is; the idea is to supply the cell and configure it to represent the data corresponding to that one spot in the table.
I soon realized that this was an ingenious architecture. A table view might consist of 1000 rows, but 1000 simultaneous cells would be terribly view-heavy, putting a massive strain on the device’s limited resources. But with the Three Big Questions, there might never be more than a dozen cells: each cell, as soon as it was scrolled off the screen, was free to be reused and reconfigured for a different index path being scrolled onto the screen.
Still, beginners were (and still are) often confused by cell reuse, and even more by the need to organize the data appropriately. I can sympathize because I myself didn’t initially understand how to build a general data structure suitable for use by a data source for a table view with more than one section.
A Revolutionary Advance
In iOS 13, Apple introduced a whole new way of supplying the data to table views (and collection views, introduced in iOS 6) — diffable data sources. The name (“diffable”) comes from one particular problem that they solve: they make it much easier to reorganize the table view (adding and removing rows) before the user’s eyes in sync with the data itself.
Diffable data sources also make for a much cleaner architecture — such as getting your table-related code out of your view controller — plus, many iOS 14 features depend upon them, including outlines and collection view cell registration objects.
Diffable data sources also make it potentially much easier just to store the data in the first place, and that’s what I want to talk about here.
An Inside Job
Let’s take a very simple case where we assume the data will never change. Typically, you might be able to hand all the data over to the diffable data source and let it worry about how to organize that data!
To illustrate, I’ll return to a favorite example of mine, a list of U.S states organized alphabetically and clumped into sections according to the first letter of the state’s name.
The data originally comes from a text file that lists the names of the states in alphabetical order. To configure the diffable data source initially, we read the text file, parse it into a dictionary that clumps the state names into arrays keyed by the first letter of the state, and then just feed that information right into the diffable data source snapshot:
let s = try! String(
contentsOfFile: Bundle.main.path(
forResource: "states", ofType: "txt")!)
let states = s.components(separatedBy:"\n")
let d = Dictionary(grouping: states) {String($0.prefix(1))}
let sections = Array(d).sorted {$0.key < $1.key}
var snap = NSDiffableDataSourceSnapshot<String,String>()
for section in sections {
snap.appendSections([section.0])
snap.appendItems(section.1)
}
self.datasource.apply(snap, animatingDifferences: false)
Once we’ve done that, the data is out of our hands. The dictionary d
and the array sections
are temporary; when our code ends, they go out of existence. We needed them only in order to construct the snapshot. We apply the snapshot to the diffable data source, and now the diffable data source holds all the data internally. We don’t know how the data source stores the data internally, and we don’t care. All that matters is that it can supply the data to configure a cell in its cell provider function:
tv, ip, s in // the table view, the index path, and the string name of the state
let cell = tv.dequeueReusableCell(withIdentifier: cellID, for: ip)
var config = cell.defaultContentConfiguration()
config.text = s
var stateName = // ... work out the image name from the state name ...
let im = UIImage(named: stateName)
config.image = im
cell.contentConfiguration = config
return cell
In that code, there is no need to fetch the data — here, the name of the state. Rather, it arrives automatically into the cell provider function (as the third parameter). There is no need to use an index path to obtain the data from some other data structure; in fact, the only reason the index path arrives into the cell provider function at all is that we still have to call dequeueReusableCell
with an index path in order to obtain the cell itself.
There Can Be Only One
There is one severe restriction on the nature of the data that a diffable data source can contain. I didn’t discuss this restriction in the preceding section, because it just so happens that our data didn’t violate it. That restriction is: all section data and all cell data must be unique.
This restriction stems from the fact that every item of data must be Hashable. The reason is that that’s how the diffable data source is able to lay its hands on the correct piece of data instantly. For this to work, no two pieces of data may be Equatable to one another. Hence my expression of the requirement in terms of uniqueness. And indeed, the runtime expresses it the same way.
To demonstrate, I’ll just pretend there are two states named California. I’ll do that by changing the name of Connecticut to California:
var s = try! String(
contentsOfFile: Bundle.main.path(
forResource: "states", ofType: "txt")!)
s = s.replacingOccurrences(of: "Connecticut", with: "California")
We run our code and boom! There’s a crash: Fatal: supplied item identifiers are not unique.
Well, suppose there really are two states named California. How can we represent that data in a table using a diffable data source?
This problem actually arises very often. There is no particular reason why the data portrayed in a table view (or a collection view) should be unique for every cell. The solution is typically to wrap the data in something that makes the data unique under the hood.
So, in our case, we’ve been using a cell data type of String; our diffable data source and our snapshot have generic resolution types <String,String>
. Now it turns out that that won’t do because the strings are not unique. So let’s create a wrapper struct with something about it that we can guarantee is unique. I know — what about a UUID? The name is right on the tin: this is a unique identifier:
struct UniqueState : Hashable {
let uuid: UUID
let name: String
}
Note the Hashable declaration! As I said earlier, every item must be Hashable. Mere declaration of Hashable adoption is sufficient here to guarantee conformance; the actual implementation of hashability is synthesized for us by Swift.
Our data source and snapshot are now of type <String,UniqueState>
. When we populate the initial snapshot, we take that fact into account:
var snap = NSDiffableDataSourceSnapshot<String,UniqueState>()
for section in sections {
snap.appendSections([section.0])
snap.appendItems(section.1.map {UniqueState(uuid:UUID(), name:$0)})
}
self.datasource.apply(snap, animatingDifferences: false)
The data source’s cell provider function now receives a UniqueState instead of a String as its third parameter. But that’s no problem; the UniqueState wraps the String. We just pull out the string and throw the rest away! The UniqueState also contains a UUID, but we are not portraying that in the cell:
tv, ip, uniq in // table view, index path, and UniqueState
let s = uniq.name // ... and the rest is as before ...
Something to Think About
So far, I’ve been handing all the data over to the diffable data source. The diffable data source is thus not only supplying the data to the table view; it is also storing the data.
But now ask yourself: If the name of the state is not a determinant of how a cell is identified, why does that part of the data need to live inside the diffable data source in the first place? The only thing that really matters here, as far as the diffable data source itself is concerned, is, in fact, the unique identifier. The state name could live anywhere.
You might want to think about that idea, because I’ll have a lot more to say about it in the second part of this article. See you then!
Read Part II here.