CloudKit: Moves like Azure

After my past travails using iCloud with Core Data, I was both interested and concerned when Apple announced CloudKit at WWDC 2014. In this post I’m going to go over what Apple has planned for CloudKit from the perspective of someone wanting to sync app data via some cloud-based means. “Planned” is a key word here, because it’s still to early to say how things work in practice.

CloudKit vs. iCloud Core Data

CloudKit makes a refreshing change from iCloud Core Data in that there’s a lot less magic going on in the framework. Using iCloud to sync Core Data is very slick, in that you can essentially treat changes from the cloud as if they had been made on a different thread. Changes get saved, you get notified, and you merge those changes and update the app state. Your code never concerns itself directly with the cloud aspects of its data. That’s when it works, of course. It does appear that iCloud with Core Data works much better than it did when I last tangled with it, but it’s still a magic box that abstracts away all of the work of dealing with the cloud.

In contrast CloudKit is not actually a sync mechanism. Instead it’s a transfer mechanism, where your app must explicitly initiate all data transfers. Starting with the CKDatabase class, your app gets local references to cloud-based databases. CloudKit’s API deals with that database as a specifically cloud-based entity. When new changes are available, you need to fetch CKRecord instances and convert them into something that fits your local model. When you have new outgoing changes, you need to create or update the appropriate CKRecord, again converting data as needed. It’s enough to make one suspect that there’s a REST API lurking in there somewhere, though this is not (currently?) exposed.

This all means that using CloudKit in an app will more closely resemble third party cloud solutions like Azure or Parse.

As a result, using CloudKit will likely mean more code in your app than using iCloud with Core Data. On the other hand, CloudKit’s more direct and less magical approach to dealing with cloud-based data should be more reliable. There are fewer hidden moving parts. If something isn’t working right, you stand a better chance of being able to fix it in your app instead of reporting a bug and hoping for better results in the future.

Data Representation and Conversion

In Core Data you use a data model with entity types defined by NSEntityDescription, and your data is represented as instances of NSManagedObject. With CloudKit you use a CKDatabase that contains CKRecord instances. There’s no direct equivalent to the data model and entity description classes. Each CKRecord has a record type, but during development you can write arbitrary key/value pairs and CloudKit will update the back-end schema as needed. Once your app goes live, the schema becomes fixed. You effectively get defined entity types, but there’s no class that represents them.

Besides the CKRecord, CloudKit offers a CKAsset class which represents an arbitrary blob of data saved in a file. The API is nothing like UIDocument, though.

CKRecord values are limited to property list types, plus some useful additions:

  • Geographic values, without conversion, as CLLocation
  • Blob references. Save a CKAsset as a value to represent a blob attached to a record, and then use that CKAsset as an attribute value. This is more or less the same idea as managing blobs in Core Data by saving blobs in files and keeping only the files name in Core Data, but with a cloud-resident file.
  • Relationships. CloudKit represents relationships with CKReference, which is an explicitly many-to-one representation. It’s more like a SQL foreign key than a Core Data relationship. A CKReference lives only on one side of the relationship.

Translating between Core Data and CloudKit is mostly straightforward for simple attributes, since CKRecord accepts every type that NSManagedObject uses. Using object properties instead of scalars (i.e. NSNumber instead of an NSInteger) will help keep it easy. For binary attributes, it’s possible to just store an NSData as a CKRecord value, but the size is limited to 1MB per value. Unless your attribute sizes are strictly limited to less than that, you’ll need to convert to/from CKAsset when dealing with CloudKit.

Relationships get somewhat more complex because in order to create a CKReference you need, minimally, the CKRecordID of the target record. So you need to have a CKRecordID that corresponds to a managed object. That’s important more generally, even without relationships, to update or delete existing records. Really you need to be able to take any managed object and find out the corresponding CKRecordID. It’s early days for CloudKit but a likely solution is to store the CKRecordID as an attribute of the managed object (no problem since it conforms to NSCoding) and then look that up any time you need to talk to the cloud. If putting the CKRecordID in the persistent store seems ugly (and let’s face it, it is) then a managed object ID-to-CKRecordID translation table could go in a property list, or really anywhere convenient. But if a managed object is going to be saved in CloudKit, you’ll need some way to easily get its CloudKit ID.

Handling relationships this way going to require some careful sequencing of steps, to make sure that a CKRecordID actually exists when you need it. If your entity relationships get complex, this might be a challenge, but in most apps it should be feasible.

Transfer limits and public/private data

With CloudKit you get two separate databases, public and private.

The private database is associated with the user’s iCloud account. As with other iCloud APIs, data is only available to that user, and data limits depend on how much space is available in the account.

The public database is associated with your app, and can be read and written by anyone using your app. Interestingly, the public database is readable even when your app is running on a device with no iCloud account. Since this data is public, it’s available to any user of an app, and anyone can add data. By default, only the person who created a record can edit it, but it’s readable by anyone. Your app might impose restrictions, but it doesn’t look like CloudKit will do so.

The limits on the public database depend on the number of app users. Adding 1MB of database storage and 100MB of assets for each user allows for all kinds of interesting stuff. But the transfer rates– 250kB/day plus 5kB/user/day– may give app developers pause.

The Good

There are a few other things that look really nice about CloudKit:

  • Subscriptions. A CKSubscription is basically a fetch request that lives in the cloud and sends a push notification any time the fetch results change. This relates to the “big cloud, little phone” idea discussed at WWDC and means it’s possible for an iOS device to download only a portion of the data set (instead of iCloud’s approach of getting everything). With subscriptions an app could download and display, for example, all records created or modified in the past 30 days plus all records marked as “favorite”. The rest of the data stays up in the cloud.

  • Dog-fooding. With iCloud + Core Data it appeared that Apple was making only minimal internal use of the API. As a result they didn’t encounter problems in the same way that external developers did. But CloudKit is the underlying network API for Apple’s forthcoming iCloud Drive and iCloud Photos features. With headline Apple features depending on it, CloudKit stands a better chance of getting bugs fixed quickly.

  • User Identity. CloudKit can fetch a CKRecordID corresponding to the current iCloud account. If the user opts in, this can be resolved into a CKRecord with limited information about the user. The scheme appears to keep the user’s privacy intact, so you can’t just go snooping for details without permission, but if the user’s OK with it then CloudKit opens things up. The randomized user ID is consistent within a CloudKit container (so it’s the same on every device) but is not the same from one container to another (so multiple apps can’t identify users to aggregate data and build up a multi-app profile).

  • Friend Identity. Building on the user identity approach, if users opt in, apps can find the current user’s friends via CloudKit. I don’t think this would be enough to build a CloudKit-based social network, but it should be possible for apps to have data that relates to multiple users.

  • Apple Account Convenience. As with iCloud, you don’t ever have to think about how to set up an account for the user or how to manage (or pay for) the server side of things. It’s all taken care of. The most an app might need to do is tell a user that iCloud needs to be enabled.

  • The CloudKit Dashboard, which is strongly reminiscent of Azure. With the dashboard you can browse and edit app data (either yours or public data, not just anyone’s). This is in sharp contrast to iCloud Core Data, which only browses files. You can see Core Data transaction log files, but that only tells you that they exist, not what data is present. The dashboard also includes managing subscriptions, adding other users to management roles, and other items.

The Bad

CloudKit also has a few limitations to be aware of.

  • Apple-only siloing. I’m reminded of a scene from “The Blues Brothers”:

    “What kind of music do you usually have here?”

    “Oh we got both kinds. We got country and Western.”

    CloudKit is cross-plaform, but with Apple, that means it’s available on iOS and on OS X. That’s cool for lots of apps but not all. If you want your app to support Android, Windows, or have a web interface, you’re currently out of luck with CloudKit and likely to remain that way. On the other hand, who knows, maybe there will be a REST API some day. I’m not betting on it, but I know better than to try and predict Apple’s moves.

  • No server-side code. All processing happens on the client side, in your app. You get cloud storage, but no cloud processing.

  • Limited model migration, at least compared with Core Data. If your schema needs to change, you’re currently limited to adding new fields and updating your own code to use those fields. There’s no concept of model versioning or of a migration step. You could effectively do a migration step by fetching everything from the cloud and updating all the things, but that’s potentially an awful lot of network traffic. The alternative is to update records as you encounter them, which gives a sort of gradual migration. But that’s no good if your updated app really needs the new fields. A major model refactoring would be possible by copying everything to a new container, but that’s even worse than running through the entire store in terms of data transfer, cloud storage, and complexity.

  • Apple account limitations. In contrast to the convenience described above, your data goes into the user’s iCloud account along with their device backups and email and spreadsheets and presentations and whatever else their other apps put in the account. Device backups alone can be enough to severely limit space but, since it’s iCloud, you can’t really do anything about it directly. A free 5GB worth of cloud space doesn’t go so far these days, so your app might well find that there’s nowhere to put all that data.

The Verdict?

As I mentioned it’s too soon to say for sure. But I like what I see so far and I’m looking forward to getting into the guts of the system to see how well they work.