Differences

This shows you the differences between two versions of the page.

--- software:files [2010/07/25 08:30]
cyril ext3 Reserved blocks
+++ software:files [2024/07/05 08:09] (current)
cyril [Tools]
@@ Line 1: / Line 1: @@
 ====== Files management ======
+===== Partitioning =====
+Cheatsheet for preparing and a new disk and using it with:
+  * LVM (Logical Volume Manager)
+    * Principle: **physical volumes** (hardware) are merged into **volume groups**, and then split back into **logical volumes**, which can be used as a partition (or as a disk with a partition table, but not very useful?).
+    * Advantages: flexible/resizable, snapshots, raid...
+    * Cons: only supported in Linux, additional layer of complexity
+    * Cons for removable media: volume groups are automatically activated when the device is plugged, and need to be manually deactivated before removing it, even if we haven't mounted any volume. Issues with [[https://wiki.archlinux.org/title/LVM#Troubleshooting|sleep]] ?
+  * LUKS
+  * Filesystems (ext4, btrfs, ...)
+  * Subvolumes
+==== Creation ====
+This section presents how to use each tool, in a given order, but depending on the use case you may want to skip some layers, or apply them in a different order. In particular:
+  * You may not use LVM
+  * You may want to perform LVM over LUKS instead of LUKS over LVM, if you prefer having a single password and unlock for all volumes.
+  * But you DO want to use LUKS (seriously, always encrypt your disks).
+=== Partitions ===
+  * If you want to use LVM and don't need to boot on the disk, not much to do in this section, just remove all the existing partitions with ''gparted'', then go directly to the next one to create the LVM volumes directly on the raw device.
+  * Otherwise, create a ''gpt'' partition table with ''gparted''
+  * If you need to boot on the disk, with ''gparted'':
+    * create the EFI partition: set size around 200-400MB, format it to fat32, set ''boot'' and ''esp'' flags
+    * create the boot partition: set size around 300-500MB, format it to ext2
+    * mount everything: root partition to /mnt, boot partition to /mnt/boot, efi partition to /mnt/boot/efi, ''mount --bind'' /mnt/{dev,proc,sys}, ''mount -t efivarfs efivarfs /mnt/sys/firmware/efi/efivars''
+    * chroot to the new root: ''chroot /mnt''
+    * install grub: ''grub-install --root-directory=/ --boot-directory=/boot --efi-directory=/boot/efi --bootloader-id=<os-name>''
+    * you can check the EFI install with ''efibootmgr -v'' (and remove an entry with ''efibootmgr -b <0005> -B''
+    * grub-mkconfig -o /boot/grub/grub.cfg
+  * If you want to use LVM, create a single large partition with the remaining space with ''gparted'' to create the LVM volumes on this partition.
+  * Otherwise create the required system and data partitions with ''gparted''
+=== LVM ===
+  * create physical volume: ''pvcreate <device-name>'' (device can be the whole device if not a boot device, or a partition).
+    * Check with ''pvdisplay'' or ''pvs''.
+    * if it complains with the error ''Cannot use <device-name>: device is partitioned'', you need to remove existing traces of partition table or filesystem with the command ''wipefs --all <device-name>''
+  * create volume group: ''vgcreate <vgroup-name!> <device-name>''.
+    * Check with ''vgdisplay'' or ''vgs''.
+  * create logical volume: ''lvcreate -n <lvolume-name!> [-L <absolute-size>] [-l <relative-size>] <vgroup-name>''.
+    * Check with ''lvdisplay'' or ''lvs''
+    * ''<absolute-size>'': ''200G'', ''3T'', ...
+    * ''<relative-size>'': ''+100%FREE''
+  * If using an SSD drive, [[#ssd_trim|TRIM commands]] from the layers below (eg filesystem) will be transparently forwarded without any special configuration. However if you wish that LVM issues its own TRIM commands when some space is not allocated by LVM, you can set the ''issue_discards'' option to 1 in ''/etc/lvm/lvm.conf''.
+=== LUKS ===
+  * Encrypt the volume/partition/device: ''cryptsetup luksFormat -c aes-xts-plain64 -h sha256 -s 512 <volume-name>''
+    * ''<volume-name>'' is the device or partition name if not using LVM, or ''/dev/mapper/<vgroup-name>-<lvolume-name>'' if using it.
+    * Choose a strong passphrase as it can be brute-forced (at least 80 bits of entropy)
+    * By default it will configure the key derivation take 2 seconds
+  * Open (decrypt) the volume: ''cryptsetup luksOpen <volume-name> <evolume-name!>''
+  * If using an SSD drive, you probably should enable [[#ssd_trim|TRIM-forwarding]]: ''cryptsetup --allow-discards --persistent refresh <evolume-name>'' (check [[https://wiki.archlinux.org/title/Solid_state_drive#dm-crypt|security implications]] though). Check with ''cryptsetup luksDump <volume-name>''. If you enabled it by mistake (for instance on a non-SSD), you can disable it with ''cryptsetup --persistent refresh <evolume-name>'' (it resets flags).
+=== Filesystem ===
+  * Choose the filesystem:
+    * ''ext4'' for a default robust journaled filesystem
+    * ''btrfs'': modern filesystem based on copy-on-write (instead of journal), offering more features (integrity checks, subvolumes, snapshots / deduplication, compression, encryption, RAID, ...), but also some drawbacks (slightly less stable, requires more resources, [[https://ohthehugemanatee.org/blog/2019/02/11/btrfs-out-of-space-emergency-response/|wasted allocated data]], ...). It is a good choice for a work volume (because integrity checks and snapshots are really useful), but less obvious for backup volumes (because native deduplication is less performant with moved files and modified files, especially if not using ''btrfs-send'', so it is more efficient to use a deduplicating backup software, which will also handle integrity checks).
+    * ''zfs'': similar to ''btrfs'', using both copy-on-write and a journal (for improved performance with synchonous writes), more mature and sligthly more stable, but not included in kernel due to licensing (though easy to use).
+  * Create the filesystem: ''mkfs.<fs> <evolume-name>''
+  * Tune the filesystem:
+    * With ''ext4'', if not the system partition, you can remove the 5% reserved for ''root'': ''tune2fs -r 0 /dev/mapper/<evolume-name>''
+  * Mount the filesystem: ''mount /dev/mapper/<evolume-name> /mnt/<evolume-name>''
+  * Some filesystem tuning must be done after mount:
+    * With ''btrfs'', you can enable compression: ''btrfs property set <fs-root> compression <algo>'' with ''<algo>'' equal to ''lzo'' (fastest) or ''zstd'' (compromise). Note that this syntax does not support configuring levels, nor forcing compression to disable heuristics. For that you have to use instead a mount option in the previous step: ''compress=zstd:1'' (default is '':3'') or ''compress-force=lzo''.
+=== Sub-volumes ===
+Some filesystems such as BTRFS and ZFS allow to create subvolumes.
+  * BTRFS:
+    * The filesystem root is a subvolume
+    * You can create other subvolumes: ''btrfs subvolume create <subvolume-path>'' (directory must not exist, ''-p'' for creating parents).
+      * Check with ''btrfs subvolume list <fs-root>'' and ''btrfs subvolume show <subvolume-path>''
+==== Usage ====
+=== Open ===
+  * ''cryptsetup luksOpen <volume-name> <evolume-name>''
+  * ''mount /dev/mapper/<evolume-name> /mnt/<evolume-name>''
+=== Close ===
+  * ''umount /dev/mapper/<evolume-name>''
+  * ''cryptsetup luksClose <evolume-name>''
+  * ''vgchange -an <groupe-name'' if using LVM and the disk will be removed
+=== Backup ===
+  * BTRFS snapshots:
+    * A snapshot is deduplicated copy of a subvolume, using CoW (Copy-on-Write) mechanism.
+    * They are useful for storing a history with deduplication, but also to freeze the subvolume before making a copy
+    * They can be stored inside the subvolume (because they are a subvolume themselves, and snapshots are not recursive)
+    * Create a read-only snapshot: ''btrfs subvolume snapshot -r <input-subvolume-path> <output-snapshot-path>'' (<output-snapshot-path> is typically <input-subvolume-path>/.snapshots/<date>) (just remove ''-r'' for a read-write snapshot).
+    * Delete a snapshot: ''btrfs subvolume delete <snapshot-path>''
+    * Analyze snapshot disk usage: ''btrfs quota enable <subvolume>'' and then ''btrfs qgroup show <subvolume>'' [[[https://unix.stackexchange.com/questions/188315/how-to-check-simulate-how-much-space-will-be-freed-after-i-remove-a-btrfs-sub|source]]]
+  * ZFS snapshots:
+    * They can be recursive.
+==== Maintenance ====
+=== Renaming / Updating ===
+  * ''vgrename <old-name> <new-name>''
+  * ''lvrename <group> <old-name> <new-name>''
+  * ''cryptsetup luksChangeKey''
+=== Checks ===
+  * ''pvck <device>''
+  * ''btrfs check <mount-point>'' to verify the structural integrity of the filesystem
+  * ''btrfs scrub <mount-point>'' to verify the data integrity
+=== Backup ===
+  * ''cryptsetup luksHeaderBackup /dev/DEVICE --header-backup-file /path/to/backupfile''
+  * ''vgcfgbackup -f /path/to/backup/file vg01''
+=== Disk usage ===
+  * ''btrfs filesystem usage <subvolume>'' (option ''-g'' to display GB only).
+=== Defragmentation ===
+  * With ''btrfs'':
+    * ''btrfs filesystem defrag'' for **defragmenting files**.
+      * can also be used to change compression of existing files (but breaks deduplication) with option ''-czstd'' (inherits level specified at mount).
+    *  ''btrfs filesystem balance -dusage=<percentage>'' for **defragmenting free space** (only data chunks less full than ''<percentage>'' will be compacted).
+  * ''filefrag -v <file>'' to analyze the fragmentation of a file, and list all extents.
+=== Compression ===
+  * With ''btrfs'':
+    * ''compsize <subvolume-path>'' in order to get statistics about quantity of compressed files, and compression ratio.
+    * ''compsize <file-path>'' in order to get compression details about a specific file.
+=== SSD TRIM ===
+  * TRIM (or discard) operation means informing the SSD drive about the unused memory, so that it can perform efficiently wear leveling.
+  * Checking TRIM support: run ''lsblk --discard'', and check for non-zero values in columns DISC-GRAN (DISCard GRANularity) and DISC-MAX (DISCard MAX bytes).
+  * **Warning**: make sure that your device supports TRIM before using it, or data loss can occur.
+  * Each layer must forward the TRIM commands to the layer above, until it reaches the drive. If you haven't done it persistently for LUKS as suggested in the [[#luks|create]] section, you can open it with this option: ''cryptsetup <...> --allow-discards''
+  * Then two options are available to enable it:
+    * Continuous TRIM, i.e. configuring the filesystem to notify instantly each block that is freed.
+      * It is not advised because doing it to often can reduce the lifetime of poor quality SSDs.
+    * Periodic TRIM, i.e. explicitly notifying the free blocks periodically.
+      * Using the ''fstrim'' utils from the util-linux package.
+      * Manually: run ''fstrim --verbose <mount-point>'' for a single volume, or ''fstrim --verbose -A'' for all mounted filesystems listed in ''/etc/fstab'' and the root filesystem inferred from the kernel command line.
+      * Weekly: enable the timer ''systemctl start fstrim.timer''
+Source : https://wiki.archlinux.org/title/Solid_state_drive
+=== Resizing ===
+==== Extending ====
+* Resize the LVM logical volume: ''lvresize -L <absolute-size> <lvolume-name>''
+  * ''<absolute-size>'' can also be an increment, e.g. ''+50G''
+* Open the volume with Luks: ''cryptsetup luksOpen <lvolume-name> <evolume-name>''
+* Resize the filesystem:
+  * ext4: ''e2fsck -f <evolume-name> ; resize2fs <evolume-name>''
+  * btrfs: mount the filesystem then ''btrfs filesystem resize max /mnt/<evolume-name>''
+==== Reducing ====
+TODO
+===== Backup =====
+Everyone has personal data that nothing could recreate (pictures, emails, creations, ...), or global data and configuration that it would take a lot of time to recreate. However you can lose some of them or all of them in several situations: hard drive crash, hard drive corruption, computer theft, computer destruction (fire...).
+My advice:
+  * partition your hard drive to have a separate partition for system and data
+  * put important application data on the data partition (configuration, emails, ...)
+  * do a full mirror backup of the data partition regularly (eg with rsync or a deduplicate software such as Attic) on an external hard drive or a network drive. Try to keep at least one copy somewhere else from your home (network drive, or one at home and one at work).
+  * take precautions to put the odds on your side in case of problem: make copies of your disks MBR (output of command p of fdisk), of your encrypted partitions headers, etc.
+==== Tools ====
+=== rsync ===
+=== Borg Backup ===
+  * Create a Borg repository in the current folder: <code>borg init -e <encryption> [--append-only] .</code>
+    * ''<encryption>'' can be:
+      * ''none'' to disable it, for instance on an already encrypted volume
+      * ''repokey'' to enable it (or ''repokey-blake2'' to use Blake2 instead of Sha256, which is often faster)
+      * ''authenticated'' to disable encryption but still enable authentication (or ''authenticated-blake2'' to use Blake2)
+    * ''--append-only'' means that no data can be removed with borg, archives can only be added. It can be used to protect an online repository against malware.
+  * Create archives: <code>borg create <repo>::<!archive> <path> --stats --progress
+    --compression auto,zstd,12 --chunker-params 15,23,19,4095 --noctime -x --exclude-caches</code>
+    * ''--compression'': it can make sense to adjust the compression level depending on your computer speed and your storage speed, so that compression does not slow down the backup, but still save as much space as possible under this constraint. However it is not always easy to find an universal value (data that compress very well are mostly limited by the input storage speed, while data that compress less well are mostly limited by the output storage speed). You have roughly the choice between LZ4 (very quick), LZMA (very high compression ratio), and ZSTD (wide-range) in between.
+    * ''--chunker-params'': this is also an important but a bit complicated tuning. Originally default value was creating small chunks causing huge cache and memory usage, so they switched to much larger chunks, but which can be too large for some applications (for instance when modifying only metadata of an image file, we want to deduplicate the data), so I came with this compromise ''15,23,19,4095''.
+  * ''borg info <repo>''
+  * ''borg list <repo>''
+  * ''borg info <repo>::<archive>''
+  * ''borg diff <repo>::<archive1> <archive2>''
+  * ''borg mount <repo>::<archive> <mountpoint>''
+=== Restic ===
+  * Create a Restic repository in the current folder: <code>restic init --repo .</code>
+    * Note that encryption **and** password are mandatory, [[https://github.com/restic/restic/issues/4326|because]]. However you can store the password in a file in the repository, or use the a password file with ''--password-file''.
+  * Create snapshots: <code>restic --repo <repo> --verbose --compression auto --ignore-ctime backup <path></code>
+    * The chunker cannot be configured, contrary to Borg. It is equivalent to [[https://restic.readthedocs.io/en/stable/100_references.html#backups-and-deduplication|19,23]],[[https://restic.net/blog/2015-09-12/restic-foundation1-cdc/|21,512]], similarly to Borg's default [[https://borgbackup.readthedocs.io/en/stable/internals/data-structures.html#buzhash-chunker|19,23,21,4095]], but unlike my chosen values.
+    * ''--compression'': unlike Borg, there is only choicies ''auto'', ''max'' and ''off''
+  * ''restic --repo <repo> snapshots'' to list snapshots
+=== BTRFS snapshots ===
+The BTRFS filesystem allows to perform some sorts of backups:
+  * On the work disk, regularly creating snapshots allows to keep an history, for recovery in case of bad manual operation
+  * It is also useful in order to "freeze" the content, so that the backup with another tool saves a consistent copy
+  * On a backup disk, snapshots can also be used to keep an history.
+  * If you update the backup with ''rsync'' for instance, then moved files and modified files will not be deduplicated (because they are sent again by rsync and won't be recognized).
+  * However if you moved or modified files on a btrfs filesystem, you can send the increment between two snapshots: ''btrfs send -p <parent-src-snapshot> <src-snapshot> | pv | btrf receive <target-snapshot>'' (''<parent-source-snapshot>'' must have been sent already).
+  * You can also deduplicate afterwards using offline tools for out-of-band deduplication (cf [[https://wiki.tnonline.net/w/Btrfs/Deduplication|list]])
+===== Container files =====
+Sources:
+  * [[https://serverfault.com/questions/696554/creating-a-grow-on-demand-encrypted-volume-with-luks|Excellent demonstration]] for creating a growing container file with LUKS and EXT4 using sparse files.
+===== Digital Will =====
+The goal is threefold:
+  * No one non-authorized can ever access to your data
+  * The trusted persons can only access to your data in some conditions (deceased, coma, ...)
+  * You are alerted when your data is accessed by the trusted persons (in case the access was not legitimate)
+Different approaches:
+  - A service that gives your data to designated persons when they provide a death certificate for you (Wishbook, ...)
+    * Cons: sending a fake document, does not work for coma, service needs to remain available, price, incomplete control.
+    * Variants: store it directly in a vault at the bank, or at the notary.
+  - A service that regularly checks that you are alive by requesting a connection with your private credentials (period can be adapted to the situations), and gives your data when you fail to do it, after warning emails (can be self-hosted).
+    * Cons: there will be some delay between when you stop pinging and when your data becomes available, service needs to remain available, and if you host it yourself there is still a risk that your server crashes at the wrong time.
+  - A service that waits for a request to reveal your data with a personal password, sends you one or several emails to warn you that this request has been made, and in the absence of opposition from you in some delay (that can be adapted to the situation) sends your data (can be self-hosted).
+    * Cons: there will be some delay between when you stop pinging and when your data becomes available, service needs to remain available, or if you host it yourself there is still a risk that your server crashes at the wrong time (but if the service is associated to your password manager for instance, then the availability is not a problem anymore...)
+  - Split the secret between several people (cf [[https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing|Shamir's secret]], implemented for instance in [[http://point-at-infinity.org/ssss/|ssss]] or [[https://github.com/jcushman/libgfshare|libgfshare]]), so that X out of Y need to agree to obtain your data.
+    * Cons: people need to remain accessible (and not loose the information), compromise between robustness and risk of conspiracy, what data can each person access
+  - Store your key on a piece of opaque paper (eg visit card), with you (eg in your smartphone), wrapped in a unique piece of paper (eg journal paper) that is glued. The goal is that it is impossible to read the key (eg through light) without removing the wrapping paper, impossible to remove it without tearing it apart, impossible to replace it without you noticing quickly. This base key then has to be derived a large number of times, so that it takes several days with a classical computer to derive the final key, which the designated persons then can use to decrypt their personal message. In case of unauthorized attempt, you can change the password. A better solution could be to use a proper TLP (Time Lock Problem) / TLE (Time Lock Encryption) / VDF (Verifiable Delay Function) / Delayed encryption, which would have the big advantage of fast generation, but it doesn't seem to exist standard implementations, even of seminal Rivest & Shamir proposal.
+    * Cons: need to actively monitor the integrity of your artifact, doesn't work if you have an accident that destroys the artifact, need to revoke the information in case of unauthorized attempt (thus only revokable information is protected, such as passwords protecting data for which it is impossible to make an unauthorized copy beforehand), need to revoke the information in case of upgrade of the derivation scheme in order to follow hardware improvements.
+    * Variant: it seems that it is possible to make "paint to scratch" with dishwashing soap and gouache, thus it can be an alternate way to hide while allowing to detect unauthorized access. Even if there is a tentative of reconstructing the paint layer, if a complex color mix was chosen, with approximate random borders, and even random color gradients, it would be very difficult to replicate accurately enough so that it goes unnoticed (especially if the color changes while drying).
+Different methods could be combined, for instance 2 or 3 plus 4. But 3 managed by the password manager is probably unbeatable.
+Ideally, for increased safety, the data to be obtained is always encrypted with a key that the designated persons possess.
+What to transmit?
+  * Passwords (master password of your password manager, computer, encrypted data partitions, phone, ...)
+  * Instructions about what data you have
+Notes:
+  * Different levels of amount of information for your spouse, children, other family, friends?
+  * How to transmit data (such as pictures) to a child? Probably has to go through a tutor.
 ===== File Systems =====
-==== ext3 ====
+==== ext3, ext4 ====
 === Reserved blocks ===
-By default ext3 reserved 5% of disk space to super-user. The intent is to let to critical applications the ability to write to the disk when it is full, but it has no use for a data partition, you just waste 5% of your partition.
+By default ext3 reserves 5% of disk space to super-user. The intent is to let to critical applications the ability to write to the disk when it is full, but it has no use for a data partition, you just waste 5% of your partition.
 You can check and remove these reserved blocks with the following commands:
@@ Line 29: / Line 298: @@
 First unmount your partition and remount it read-only.
-  * testdisk (photorec)
+  * ''extundelete --restore-file Documents/file.dat /dev/sda4'' : the easiest solution if there are only a few files and you know their name. Accepts not unmounting the partition, works generally ok if you do it immediately after removing the files.
-  * ext3grep <code>
+  * ''testdisk'' (photorec) is great to recover files on a mobile storage device because it works with any filesystem (finds signatures in data so no need of journal), and find all deleted files on the partition.
+  * ''ext3grep'' <code>
 ext3grep <partition> --restore-file <filename> # filename => file ; works great, but only for one file at a time...
 ext3grep <partition> --restore-all --deleted --after=1270639550 # dates -> files
@@ Line 38: / Line 308: @@
 ext3grep <partition> --restore-inode <inode> # inodes => files
 </code> Notes: "restore-all" failed while building stage2 cache with error "ext3grep: init_directories.cc:535: void init_directories(): Assertion `lost_plus_found_directory_iter != all_directories.end()' failed.". However doing a "ls inode" created this stage2 cache, and afterwards "restore-all" worked... but just restored everything on the disk even not deleted files/dirs, not taking into account the "after" option... But manually editing the stage2 cache to only keep files/dirs you want to restore then "restore-all" worked perfectly!
-  * others: debugfs, foremost, http://freshmeat.net/projects/unrm/, http://freshmeat.net/projects/ext3undel
+  * ''ext4magic''
+  * others: debugfs, foremost, [[http://freshmeat.net/projects/unrm/|unrm]], [[http://freshmeat.net/projects/ext3undel|ext3undel]]
+===== Disk Recovery =====
+In case the MBR/partition table of you disk is damaged.
+==== Make a backup before ====
+You should always keep a backup of your partition table !
+The first way is to store the output of p command of ''fdisk''.
+You can also do a dump of the MBR and EBR:
+<code shell>
+dd if=/dev/sda of=sda.dd bs=512 count=1 # full MBR dump
+sfdisk -d /dev/sda > sda.sfdisk         # MBR and EBR partition tables
+</code>
+Out of curiosity, the ''file'' command is able to interpret the content of your MBR dump:
+<code shell>
+file sda.dd
+</code>
+==== Restore with a backup ====
+If you have the output of the p command of ''fdisk'', then you can manually recreate the partition table with ''fdisk'' with the same information. As long as you don't mount or format, modifying the partition table with ''fdisk'' doesn't modify the partitions data.
+If you have a full dump of MBR and EBR, you can automatically restore it:
+<code shell>
+dd if=sda.dd of=/dev/sda
+sfdisk /dev/sda < sda.sfdisk
+</code>
+To restore the MBR without the partition table:
+<code shell>
+dd if=sda.dd of=/dev/sda bs=446 count=1
+</code>
+To restore only the partition table:
+<code shell>
+dd if=sda.dd of=/dev/sda bs=1 skip=446 count=66
+</code>
+==== Restore without a backup ====
+If you don't have a copy of your partition info, don't panic, some software can recover them by searching for the partitions in the disk content (but it has to be formatted as a standard filesystem, ie not encrypted):
+  * testdisk (very good) [[http://www.cgsecurity.org/wiki/TestDisk_Etape_par_Etape|howto]]
+  * gpart (didn't work very well for me, only found the NTFS partition) [[http://www.ibiblio.org/pub/linux/docs/howto/other-formats/html_single/Partition-Rescue.html|howto]]
+==== Performance optimization ====
+  * e4rat ([[http://e4rat.sourceforge.net/|Home Page]], [[http://en.gentoo-wiki.com/wiki/E4rat|Gentoo Howto]]):
+  * preload ([[http://sourceforge.net/projects/preload/|Home Page]], [[http://forums.gentoo.org/viewtopic-t-437590-start-0.html|Gentoo Howto]]):
+  * prelink  ([[http://people.redhat.com/jakub/prelink/|Home Page]], [[http://www.gentoo.org/doc/en/prelink-howto.xml|Gentoo Howto]]):
+  * verynice