Caution
Make sure to use scrap destination directories. The sample will UNCODNITIONALLY OVERWRITE any existing files in the destination directories, if their names match the source files' names.
This sample shows the basic setup needed to copy a local file to one or more destinations with verification of the copied date. The complete sample is in Quine.Samples\CopyDirectory.cs.
Create an instance of the driver, producer and consumers. In this example, everyhting is "static", i.e., known upfront. The only parameter needed for the set up is the number of destinations.
// Pool and driver
private readonly TransferDriver driver;
// Producer and consumers
private readonly UnbufferedFile.Reader reader;
private readonly UnbufferedFile.Writer[] writers;
private CopyDirectory(int dstcount) {
// 16 blocks of 2MB; should be sufficient to saturate common SSDs.
driver = new(2 << 20, 16);
reader = new UnbufferedFile.Reader();
writers = Enumerable.Range(0, dstcount)
.Select(x => new UnbufferedFile.Writer())
.ToArray();
driver.Producer = reader;
driver.Consumers = writers;
// configure hash verification
driver.HasherFactory = () => new XX64TransferHash();
driver.VerifyHash = true;
}
// Once the pool has been disposed, the driver becomes unusable.
public void Dispose() => driver.Dispose();Because producer and consumer instance are meant to be reusable, we keep explicit references to them. They are also accessible through Producer and Consumers properties on driver instance, but these properties have weaker (interface) types.
To copy a single file, the producer and consumers must be configured first; in this case: setting up source and destination paths. Second, we execute the transfer. Third, we report any errors.
The code relies on PathComponents helper to easily manipulate paths.
// Source and destination paths
private PathComponents srcPath;
private PathComponents[] dstPaths;
private async Task CopyFile(string srcFileName) {
// 1: Set up reader and writers to point to source / destinations.
// For local files, we just use the file's path.
reader.FilePath = srcPath.Append(srcFileName).NativeString;
foreach (var x in writers.Zip(dstPaths))
x.First.FilePath = x.Second.Append(srcFileName).NativeString;
// 2: Execute copy. We don't support cancellation in this program.
await driver.ExecuteAsync(default);
// 3: Check for errors and report.
var anyerror = false;
if (reader.State.Exception is not null) {
Console.WriteLine($"ERROR: reading file {reader.FilePath}: {reader.State.Exception.Message}");
anyerror = true;
}
foreach (var w in writers.Where(x => x.State.Exception is not null)) {
Console.WriteLine($"ERROR: writing file {w.FilePath}: {w.State.Exception!.Message}");
anyerror = true;
}
if (!anyerror)
Console.WriteLine($"OK: copied {reader.FilePath} to all destinations.");
}In this example, the destination file name is the same as the source file name, but more complex logic can also be implemented (e.g., change the destination file name if the file already exists).
The entry point, called from Main, ties all the parts together. First, we create an instance to hold the parts that will be reused for copying of individual files (described above). Then, we perform basic validation on the paths so that we don't accidentally overwrite valuable data. Third, we enumerate (non-recursively) individual files in the directory and copy it to the destination(s).
public static async Task ExecuteAsync(DirectoryInfo src, DirectoryInfo[] dst) {
// Create the "holder" for pool and driver.
using var copier = new CopyDirectory(dst.Length);
// Initialize paths.
copier.srcPath = PathComponents.Make(src.FullName);
copier.dstPaths = dst.Select(x => PathComponents.Make(x.FullName)).ToArray();
// It is wise to allow only absolute paths for security reasons: relative paths might open up for overwriting arbitrary files.
// (A full SMB paths is also considered absolute.)
if (!copier.srcPath.IsAbsolute || copier.dstPaths.Any(x => !x.IsAbsolute))
throw new InvalidOperationException("All paths must be absolute.");
// Use the same instances of driver and workers to copy many files.
foreach (var file in src.EnumerateFiles()) {
// IMPORTANT! The driver can copy only a single file at a time. DO NOT spawn multiple copies in parallel.
await copier.CopyFile(file.Name);
}
// Driver is no longer usable after disposal.
}Build the solution with Visual Studio. Open a command line and execute a command like the following to non-recursively copy a directory TestFiles to two destination directories. Absolute paths must be used.
Quine.Samples.exe CopyDir "C:\TEMP\TestFiles" D:\Dest1 D:\Dest2Driver, producer and consumer instances are meant to be reused for many individual file transfers. The driver allocates native memory, which is freed when the driver is disposed.
DO expect partial failures, i.e., that the file has been copied to only a subset of destinations. The following is a list of commonly encountered sources of hard errors that occurred in practice durig copying large amounts of data (100+ GB):
No space left on a destination.
Permissions problem on a destination.
Bad USB cables will cause data corruption and hash verification errors. I cannot remember that data corruption was ever detected by the OS.
(Temporary) loss of network connectivity while copying from/to SMB-mounted drives.
(Uncommon, but confirmed with memtest86): memory corruption, which will also result in a hash verification error. Unless you have ECC memory, this type of corruption cannot be detected by the OS.
Especially when dealing with network drives or other "external" storage, DO be prepared for soft errors. Technically, the network might be present, but degraded to, for example, 1/10th of normal bandwidth. This will in turn cause 10-fold increase in transfer time, which might be classified as "unusable" in some scenarios. Again, use decorator pattern to introduce timeouts, cancel the transfer and report an error.
Buffer pool parameters (provided to TransferDriver) and producer/consumer concurrency levels affect performance (i.e., total transfer time), but choosing them is tricky. At the time being, UnbufferedFile reader and writer support only serial operation, and the buffer pool parameters in the example should be enough to saturate common SSDs.
To extract performance from media other than SSDs, two factors must be accounted for: bandwidth and latency. Larger buffer size gives better bandwidth utilization, higher concurrency hides latency better, while higher pool capacity (more buffers) aids in hiding differences among producer and consumers. Too high concurrency may have negative impact on transfer times (when concurrent requests exceed available bandwidth), while too many buffers may consume excessive memory.
The library currently supports only constant concurrency levels, i.e., they may be set only before starting an individual transfer, but cannot be adjusted during the transfer (through some kind of feedback loop). For fixed setup and huge amounts of data, it is wise to perform some experiments upfront to determine the best combination of parameters. (This could also be done "online" by running some kind of ML between transfers of individual files.)
FillAsync and DrainAsync MUST NOT "steal" a reference to ITransferBuffer, or any of its members, for later use. Doing so WILL lead to data corruption or, even worse, undefined behavior and true security problems due to use-after-free bugs. TransferDriver allocates native memory and frees it upon disposal or finalization.
Because TransferDriver implements a finalizer that deallocates native memory, you MUST hold a live reference to it during the transfer.
File copying can be destructive. UnbufferedFile reader and writer accept any path that the OS deems valid. In production, and as shown in the above sample, you should always work with absolute paths. QI also disallowed writing to the "root" path, i.e., / under UNIX and X:\ (for any drive letter) under Windows.
Apply the above guideline also when working with "external" files, such as BLOB storage or FTP servers. References to external resources can often be represented with a Uri class (perhaps with a custom scheme); DO use its IsAbsoluteUri property.