5333 private links
Ditch legacy backup architectures that are no match for the modern threat landscape, and learn how you can improve your security posture and fortify your defenses against modern cyber threats. //
Data Resilience
Secure your data from insider threats or ransomware with air-gapped, immutable, access-controlled backups.
Data Observability
Continuously monitor and remediate data risks, including ransomware, sensitive data exposure, and indicators of compromise.
Data Remediation
Surgically and rapidly recover your apps, files, or users while avoiding malware reinfection.
In 1970, the well-heeled corporate behemoth Xerox, with a nearly perfect monopoly on the quintessential office technology of photocopying, cut the ribbon on a new and ambitious bet on its future: the Xerox Palo Alto Research Center (PARC). PARC was a large research and development organization, comprised of distinct laboratories. Several concentrated on extending Xerox’s dominance of photocopying, like the General Science and Optical Science Laboratories. Others, specifically the Computer Science and Systems Science Laboratories, were aimed at a new goal. They would develop computer hardware and software that could plausibly form the basis for the “office of the future” some ten to fifteen years hence, giving Xerox a profound head start in this arena. //
Individual Alto users could store and back up their files in several ways. Altos could store information on removable “disk packs” the size of a medium pizza. Through the Ethernet, they could also store information on a series of IFSs, “Interim File Servers.” These were Altos outfitted with larger hard drives, running software that turned them into data stores. The researchers who developed the IFS software never anticipated that their “interim” systems would be used for some fifteen years.
With the IFSs, PARC researchers could store and share copies of their innovations, but the ancient anxiety demanded the question: “But what if something happened to an IFS?!” Here again, Ethernet held a solution. The PARC researchers created a new tape backup system, this time controlled by an Alto. Now, using Ethernet connections, files from the MAXC, the IFSs, and individuals’ Altos could be backed up to 9-track magnetic tapes. //
The nearly one hundred and fifty thousand unique files —around four gigabytes of information—in the archive cover an astonishing landscape: programming languages; graphics; printing and typography; mathematics; networking; databases; file systems; electronic mail; servers; voice; artificial intelligence; hardware design; integrated circuit design tools and simulators; and additions to the Alto archive. All of this is open for you to explore today at https://info.computerhistory.org/xerox-parc-archive Explore!
Traditional backup tools can mostly be subdivided by the following characteristics:
-
file-based vs. image-based
Image-based solutions make sure everything is backed up, but are potentially difficult to restore on other (less powerful) hardware. Additionally, creating images by using traditional tools like dd requires the disk that is being backed up to be unmounted (to avoid consistency issues). This makes image-based backups better suited for filesystems that allow doing advanced operations like snapshots or zfs send-style images that contain a consistent snapshot of the data of interest. For file-based tools there is also a distinction between tools that exactly replicate the source file structure in the backup target (e.g. rsync or rdiff-backup) and tools that use an archive format to store backup contents (tar). -
networked vs. single-host
Networked solutions allow backing up multiple hosts and to some extent allow for centralized administration. Traditionally, a dedicated client is required to be installed on all machines to be backed up. Networked solutions can act pull-based (server gets backups from the clients) or push-based (client sends backup to server). Single-Host solutions consist of a single tool that is being invoked to backup data from the current host to a target storage. As this target storage can be a network target, the distinction between networked and single-host solutions is not exactly clear. -
incremental vs. full
Traditionally, tools either do an actual 1:1 copy (full backup) or copy “just the differences“ which can mean anything from “copy all changed files” to “copy changes from within files”. Incremental schemes allow multiple backup states to be kept without needing much disk space. However, traditional tools require that another full backup be made in order to free space used by previous changes.
Modern tools mostly advance things on the incremental vs. full front by acting incremental forever without the negative impacts that such a scheme has when realized with traditional tools. Additionally, modern tools mostly rely on their own/custom archival format. While this may seem like a step back from tools that replicate the file structure, there are numerous potential advantages to be taken from this:
This repo aims to compare different backup solutions among:
borg backup
bupstash
restic
kopia
duplicacy
your tool (PRs to support new backup tools are welcome)
The idea is to have a script that executes all backup programs on the same datasets.
We'd heard of SwissDisk here at rsync.net, but they rarely showed up on our radar screen. We were reminded of their existence a few days ago when their entire infrastructure failed. It's unclear how much data, if any, was eventually lost ... but my reading of their announcement makes me think "a lot".
I'm commenting on this because I believe their failure was due to an unnecessarily complex infrastructure. Of course, this requires a lot of conjecture on my part about an organization I know little about ... but I'm pretty comfortable making some guesses.
It's en vogue these days to build filesystems across a SAN and build an application layer on top of that SAN platform that deals with data as "objects" in a database, or something resembling a database. All kinds of advantages are then presented by this infrastructure, from survivability and fault tolerance to speed and latency. And cost. That is, when you look out to the great green future and the billions of transactions you handle every day from your millions of customers are all realized, the per unit cost is strikingly low.
It is my contention that, in the context of offsite storage, these models are too complex, and present risks that the end user is incapable of evaluating. I can say this with some certainty, since we have seen that the model presented risks that even the people running it were incapable of evaluating.
This is indeed an indictment of "cloud storage", which may seem odd coming from the proprietor of what seems to be "cloud storage". It makes sense, however, when you consider the very broad range of infrastructure that can be used to deliver "online backup". When you don't have stars in your eyes, and aren't preparing for your IPO filing and the "hockey sticking" of your business model, you can do sensible things like keep regular files on UFS2 filesystems on standalone FreeBSD systems.
This is, of course, laughable in the "real world". You couldn't possibly support thousands and thousands of customers around the globe, for nearly a decade, using such an infrastructure. Certainly not without regular interruption and failure.
Except when you can, I guess:
Rsync, or Remote Sync, is a free command-line tool that lets you transfer files and directories to local and remote destinations. Rsync is used for mirroring, performing backups, or migrating data to other servers.
This tool is fast and efficient, copying only the changes from the source and offering customization options.
Follow this tutorial to learn how to use rsync with 20 command examples to cover most use-cases in Linux. //
Note: Be careful how you use the trailing slash in the source path when syncing directories. The trail plays an important role. If you enter the trailing slash on the source, the rsync command does not create the source folder on the destination; it only copies the directory's files. When you do not use the trailing slash, rsync also creates the original directory inside the destination directory.
PaperBack is a free application that allows you to back up your precious files on the ordinary paper in the form of the oversized bitmaps. If you have a good laser printer with the 600 dpi resolution, you can save up to 500,000 bytes of uncompressed data on the single A4/Letter sheet. Integrated packer allows for much better data density - up to 3,000,000+ (three megabytes) of C code per page.
You may ask - why? Why, for heaven's sake, do I need to make paper backups, if there are so many alternative possibilities like CD-R's, DVD±R's, memory sticks, flash cards, hard disks, streamer tapes, ZIP drives, network storages, magnetooptical cartridges, and even 8-inch double-sided floppy disks formatted for DEC PDP-11? (I still have some). The answer is simple: you don't. However, by looking on CD or magnetic tape, you are not able to tell whether your data is readable or not. You must insert your medium into the drive (if you have one!) and try to read it.
Paper is different. Do you remember the punched cards? EBCDIC and all this stuff. For years, cards were the main storage medium for the source code. I agree that 100K+ programs were... unhandly, but hey, only real programmers dared to write applications of this size. And used cards were good as notepads, too. Punched tapes were also common. And even the most weird codings, like CDC or EBCDIC, were readable by humans (I mean, by real programmers).
Of course, bitmaps produced by PaperBack are also human-readable (with the small help of any decent microscope). I'm joking. What you need is a scanner attached to PC. Actual version is for Windows only, but it's free and open source, and there is nothing that prevents you from porting PaperBack to Linux or Mac, and the chances are good that it still will work under Windows XXXP or Trillenium Edition. And, of course, you can mail your printouts to the recipients anywhere in the world, even if they have no Internet access or live in the countries where such access is restricted by the regiment.
Oh yes, a scanner. For 600 dpi printer you will need a scanner with at least 900 dpi physical (let me emphasize, physical, not interpolated) resolution.
Have I already mentioned that PaperBack is free? I release it under the GNU General Public License, version 3. This means that you pay nothing for the program, that the sources are freely available, and that you are allowed - in fact, encouraged - to modify and improve this application.
- Installation.
You don't need to install PaperBack. Copy it to any directory, if possible, with unrestricted write access (to allow PaperBack to save settings to the initialization file), optionally create shortcut on the desktop - that's all.
PaperBack is a free application that allows you to back up your precious files on the ordinary paper in the form of the oversized bitmaps. If you have a good laser printer with the 600 dpi resolution, you can save up to 500,000 bytes of uncompressed data on the single A4/Letter sheet. Integrated packer allows for much better data density - up to 3,000,000+ (three megabytes) of C code per page.
You may ask - why? Why, for heaven's sake, do I need to make paper backups, if there are so many alternative possibilities like CD-R's, DVD±R's, memory sticks, flash cards, hard disks, streamer tapes, ZIP drives, network storages, magnetooptical cartridges, and even 8-inch double-sided floppy disks formatted for DEC PDP-11? (I still have some). The answer is simple: you don't. However, by looking on CD or magnetic tape, you are not able to tell whether your data is readable or not. You must insert your medium into the drive (if you have one!) and try to read it.
Paper is different. Do you remember the punched cards? EBCDIC and all this stuff. For years, cards were the main storage medium for the source code. I agree that 100K+ programs were... unhandly, but hey, only real programmers dared to write applications of this size. And used cards were good as notepads, too. Punched tapes were also common. And even the most weird codings, like CDC or EBCDIC, were readable by humans (I mean, by real programmers).
Of course, bitmaps produced by PaperBack are also human-readable (with the small help of any decent microscope). I'm joking. What you need is a scanner attached to PC. Actual version is for Windows only, but it's free and open source, and there is nothing that prevents you from porting PaperBack to Linux or Mac, and the chances are good that it still will work under Windows XXXP or Trillenium Edition. And, of course, you can mail your printouts to the recipients anywhere in the world, even if they have no Internet access or live in the countries where such access is restricted by the regiment.
Oh yes, a scanner. For 600 dpi printer you will need a scanner with at least 900 dpi physical (let me emphasize, physical, not interpolated) resolution.
Have I already mentioned that PaperBack is free? I release it under the GNU General Public License, version 3. This means that you pay nothing for the program, that the sources are freely available, and that you are allowed - in fact, encouraged - to modify and improve this application.
Backup Capacity Calculator
To evaluates required storage capacity enter backup configuration details
Further Observations on ZFS Metadata Special Device
Last quarter we discussed metadata special devices.
By default, adding a special device to a zpool causes all (new) pool metadata to be written to the device. Presumably this device is substantially faster than spinning disk vdevs in the pool. //
Finally, the permissions and ownerships regime is re-enforced every five minutes just in case something ever gets chowned/chmodded incorrectly. Yes, we really are gratuitously chmodding and chowning ourselves every five minutes.
The Kyoto University in Japan has lost about 77TB of research data due to an error in the backup system of its Hewlett-Packard supercomputer.
The incident occurred between December 14 and 16, 2021, and resulted in 34 million files from 14 research groups being wiped from the system and the backup file. //
At the moment, the backup process has been stopped. To prevent data loss from happening again, the university has scraped the backup system and plans to apply improvements and re-introduce it in January 2022.
The plan is to also keep incremental backups - which cover files that have been changed since the last backup happened - in addition to full backup mirrors.
ou may have noticed that there is a trailing slash (/) at the end of the first argument in the above commands:
rsync -a dir1/ dir2
This is necessary to mean “the contents of dir1”. The alternative, without the trailing slash, would place dir1, including the directory, within dir2. This would create a hierarchy that looks like:
~/dir2/dir1/[files]
Always double-check your arguments before executing an rsync command. Rsync provides a method for doing this by passing the -n or --dry-run options. The -v flag (for verbose) is also necessary to get the appropriate output:
The -P flag is very helpful. It combines the flags --progress and --partial. The first of these gives you a progress bar for the transfers and the second allows you to resume interrupted transfers:
rsync -azP source destination
Hello folks,
For my Linux server backups, i really love BorgBackup but i'm looking now for something that natively support S3. I've heard few good things about Kopia and feel like it's the way i may go. I would prefer something free so Duplicacy, which i use and like a lot for my home personnal usage, is not an option. I also want to use encryption, compression and dedup.
What do you use for your backups that support S3 as backend?
On the side i've read few discussions regarding encryption methods, looks like some use good encryption, some others use double encryption which sounds a bad idea (and also use of OpenSSL) regarding some people with security interest, some use also homemade encryption while there is existing good encryption algorithms. I'm not that used to advanced security like that but as most of us i try to make the best choices for long term use.
In the past i've used a lot BackupPC. Useful frontend when your team is not that advanced with SSH, you can keep the compression locally to the BackupPC server even if you get file from remote, etc.
What's your input?
Thanks a lot for all your advices and app sharing!
Encrypted backups made easy.
Backups with strict access controls, strong encryption, data deduplication, incremental uploads and offline decryption keys.
Encrypted Backup Shootout (acha.ninja)
https://acha.ninja/blog/encrypted_backup_shootout/
aDfbrtVt 9 months ago [–]
I get that performance is interesting to graph, but it's very much secondary in importance when compared to the backup solution being bulletproof. I've found encrypted Borg very difficult to get wrong and setup is very simple. I've also successfully recovered two systems with the tool without issue.
Not saying that Borg is necessarily the best solution, just that we should evaluate the important metrics. ///
Good discussion re: backup software, including comments from tarsnap author cperciva
Bupstash is a tool for encrypted backups - if you need secure backups, Bupstash is the tool for you.
Bupstash was designed to have:
- Efficient deduplication - Bupstash can store thousands of encrypted directory snapshots using a fraction of the space encrypted tarballs would require.
- Strong privacy - Data is encrypted client side and the repository never needs has access to the decryption keys.
- Offline decryption keys - Backups do not require the decryption key be anywhere near an at-risk server or computer.
- Key/value tagging with search - all while keeping the tags fully encrypted.
- Great performance on slow networks - Bupstash really strives to work well on high latency networks like cellular and connections to far-off lands.
- Secure remote access controls - Ransomware, angry spouses, and disgruntled business partners will be powerless to delete your remote backups.
- Efficient incremental backups - Bupstash knows what it backed up last time and skips that work.
- Fantastic performance with low ram usage - Bupstash won't bog down your production servers.
- Safety against malicious attacks - Bupstash is written in a memory safe language to dramatically reduce the attack surface over the network.
Stability and Backwards Compatibility
Bupstash is alpha software, while all efforts are made to keep bupstash bug free, we currently recommend using bupstash for making REDUNDANT backups where failure can be tolerated.
Recently I have been spending time on improving the performance of bupstash (my encrypted backup tool), and wanted to compare it to some existing tools to try and find its relative performance in the backup tool landscape.
This post compares bupstash, restic, borg backup and plain old tar + gzip + GPG across a series of simple benchmarks.
What do all these tools have in common?
- They encrypt data at rest.
- They compress data.
- They have some form of incremental and/or deduplicated snapshotting.
- They are all pretty great backup systems.
Kopia uploads directories and files to remote storage called Repository and maintains a set of historical point-in-time snapshot records based on defined policies.
Kopia uses content-addressable storage for snapshots, which has many benefits:
- Each snapshot is always incremental, no data included in previous snapshots is ever re-uploaded to the repository based on file content.
- Multiple copies of the same file will be stored once. This is known as de-duplication.
- After moving or renaming even large files, Kopia can recognize that they have the same content and won’t need to upload them again.
- Multiple users or computers can share same repository: if users share the same files, they are also uploaded only once.
NOTE: that there is currently no access control mechanism within repository - everybody with access to repository can see everyone’s data.
We all rely on data to help us get our job done and enhance our life. Think of business documents, personal pictures, source code for software projects, tax receipts or health records. Some of this data now lives in the cloud. You can sync your pictures or keep your source code on Github. This has left many of us with the impression that someone else is responsible for keeping our data safe.
Since you are reading this, you probably know better. Cloud accounts can be blocked or compromised. Syncing something is not a backup, because deletions are synced as well. Even well-run data centers can burn down.1 That’s why backups are still important if you have any kind of data that’s important. Just consider this: if file or folder X was gone. Would it bother you? Could you still do your job effectively? If the answer is yes for any data you keep on your Macbook, then read on.
Having a structured backup strategy will not just save your precious family pictures, but also ensure business continuity. This article has a list of practical steps and generic templates you can use to make this task as simple as possible.
Goals
After completing the steps in this article, you will have the following:
- A list of your data assets.
- Where they are located and how they are backed up.
- Identified common errors regarding backup correlations, security and frequency.
Backups vs. Archives
This article will mainly discuss Backups, rather than Archives. The main difference between the two:
- Backups are a copy of production data that’s frequently updated without changing the source data. Main challenge is to keep up with changing source data and restore quickly if needed.
- Archives are generally accessed less frequently and the source data is deleted after creating the archive. E.g. tape storage with old accounting data. Main challenge is longevity of the data medium. //
Backup Strategy Template on Google Drive ↩ ↩
https://docs.google.com/spreadsheets/d/1cuTM849Fu6palPUG5SgUJrzw2J4z_hq71g-jZPY4hcw/edit?usp=sharing
To create a printable key, either paste the contents of your keyfile or a key export in the text field below, or select a key export file.
To create a key export use
borg key export /path/to/repository exportfile.txt
If you are using keyfile mode, keyfiles are usually stored in $HOME/.config/borg/keys/
You can edit the parts with light blue border in the print preview below by click into them.
Key security: This print template will never send anything to remote servers. But keep in mind, that printing might involve computers that can store the printed image, for example with cloud printing services, or networked printers.