I am currently trying to set up a simple mirrored 2-disk zpool for archiving my files, and I’ve followed this tutorial to initialize the pool. I know I can also set up datasets and ZVOLs, but am not sure how to proceed.
Overall, I would like my ZFS system to be simple with minimal overhead. My main requirement is really just encryption over the entire pool. I would ideally have a pool that acts the same way as a single encrypted ext4 drive, of course with the added benefit of 2-disk redundancy and self-healing.
A few questions I have are:
Do I need to setup a dataset after creating the pool? Is there already a default “root” dataset? What is the benefit of multiple datasets?
I understand ZVOLs create raw block devices over the pool (still don’t fully understand block devices). Is a ZVOL an alternative to a dataset, or would it exist in conjunction with datasets? If I’m aiming to keep things simple, should I even be considering ZVOLs?
Between datasets and ZVOLs, which is more analogous to a hard drive partition?
I know these are some fairly noob questions, but I was having trouble piecing together the information on my own. Thanks!
——-
You don’t need to set up a dataset after creating the pool, but it is a good practice to do so. There is a root dataset created for each pool, and each additional dataset inherits most of its properties from that dataset. So you’ll generally want to set things like compression, recordsize, etc. to “default” values that are reasonable for your system, and then create sub-datasets to actually store data, effectively using the root dataset as a template. Nothing bad will happen if you choose to put all of your data directly on the root dataset though. You just may find yourself a bit limited in terms of flexibility down the line.
A ZVOL is basically a way of exposing a portion of the zpool to the system as a “raw” device. You can do pretty much anything with a ZVOL that you could with an actual physical hard drive, such as formatting it, dumping data from it with dd, etc. Of course, it’s not really raw in the sense that it is backed by ZFS (so you can set ZFS properties on it, snapshot it, and so on) but as far as the system is concerned there is no difference between a ZVOL and a physical hard drive. So if you wanted to take 6 4TB drives, RAIDZ2 them, present the entire thing to the OS as a 16TB hard drive, and format it as ext4fs, you could. A common use case is for creating ZVOLs to share as iSCSI devices, so you can export them to a windows or Mac system for example and use the drive in the system’s native format, while still getting the benefits of ZFS. So the short answer is, if you’re just looking to store data, don’t worry about them for now.
I don’t think either of them is really analogous to a hard drive partition per se, but I guess in the sense that a partition is a logical subset of the entire volume that is presented as a raw device and can be used for any purpose, a ZVOL is closest to that. The main difference being that it wouldn’t make any sense to divide your zpool into ZVOLs and then format those, since you would just create datasets instead. I think the closest analogy to a ZVOL would really be a disk image, with the difference being that it’s a type of disk image that ZFS natively supports and therefore can bypass the need for a filesystem as an intermediary between the image and the storage pool.