Data Deduplication: High-impact Strategies - What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors
Depending on the type of deduplication, redundant files may be reduced, or even portions of files or other data that are similar can also be removed. As a simple example of file based deduplication, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, the deduplication ratio is roughly 100 to 1.
Different applications and data types naturally have different levels of data redundancy. Backup applications generally benefit the most from de-duplication due to the nature of repeated full backups of an existing file system.
Like a traditional stream-based dictionary coder, deduplication identifies identical sections of data and replaces them by references to a single copy of the data. However, whereas standard file compression tools like LZ77 and LZ78 identify short repeated substrings inside single files, the focus of data deduplication is to take a very large volume of data and identify large sections - such as entire files or large sections of files - that are identical, and store only one copy of it. This copy may be additionally compressed by single-file compression techniques.
This book is your ultimate resource for Data Deduplication. Here you will find the most up-to-date information, analysis, background and everything you need to know.
In easy to read chapters, with extensive references and links to get you to know all there is to know about Data Deduplication right away, covering: Data deduplication, Data compression, Adaptive compression, Audio compression (data), Binary Ordered Compression for Unicode, Bitrate peeling, Bitstream format, Calgary Challenge, Calgary Corpus, Canterbury Corpus, CMX (file format), Compressed pattern matching, Compression artifact, Context mixing, Data compaction, Data compression ratio, Data compression symmetry, Double Tools for DoubleSpace, DriveSpace, Dyadic distribution, Error exponent, Generation loss, Grammar-based code, Hutter Prize, Image compression, Krichevsky-Trofimov estimator, Lossless compression benchmarks, Lossless data compression, Lossy compression, Lossy data conversion, LZX (algorithm), Matt Mahoney, Mode Z compression, Modulo-N code, Move-to-front transform, Ocarina Networks, Precompressor, Prefix code, Progressive Graphics File, QDX, Rate-distortion theory, Recursive indexing, Robust Header Compression, Sardinas-Patterson algorithm, Self-extracting archive, Set partitioning in hierarchical trees, Shannon's source coding theorem, Signaling Compression, Silence compression, Jan Sloot, Smart Bitrate Control, Smart Data Compression, Solid compression, Sub-band coding, Time-domain harmonic scaling, Transparency (data compression), Transparency threshold, Trellis quantization, Twin vector quantization, Ultra Low Delay Audio Coder, Unary coding, Universal code (data compression), Van Jacobson TCP/IP Header Compression, Video compression, Volume (compression), White noise, WinZip, ZeoSync, Zoo (file format)
This book explains in-depth the real drivers and workings of Data Deduplication. It reduces the risk of your technology, time and resources investment decisions by enabling you to compare your understanding of Data Deduplication with the objectivity of experienced professionals.
Title: Data Deduplication: High-impact Strategies - What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors
Author: Kevin Roebuck
Hadoop: The Definitive Guide 2012 US$ 39.99 688 pages
Alan Turing: The Enigma 2014 US$ 16.95 777 pages