dCache is a system for storing and retrieving huge amounts of scientific data, distributed among a large number of heterogeneous server nodes, under a single virtual filesystem tree with a variety of standard access methods including NFSv4.1 (pNFS), FTP, WebDav and xroot.
dCache is a distributed storage system providing location-independent access to data. The data are stored across multiple data servers as complete files presented to the end-user via a single rooted namespace.
As the physical location of the data is not exposed to the user it can be migrated from one data server to another without interruption of service. Therefore the system can be expanded or contracted by adding/removing data servers at any time.
dCache can be configured to work as a fast disk cache in front of tertiary storage systems. Such systems typically store data on magnetic tapes instead of disks, which must be loaded and unloaded using a tape robot. The main reason for using tertiary storage is the cost-efficiency of archiving a very large amount of data on less expensive hardware. Slower media or limited resources (like a number of tape drives) of tertiary storage systems lead to significantly higher access latency for archived data.
dCache supports multiple data transfer protocols. The protocols are implemented as services that have modular deployment, allowing horizontal scaling by adding front-end machines without service interruption.
Another performance feature of dCache is hot-spot data migration. In a situation when the rate of requests to read a file or a group of files is high resulting in a single data server or a group of data servers becoming "hot". If this happens dCache detects the condition and attempts to spread the load by distributing popular files to other, less busy, data servers.
The flow of data within dCache can also be carefully controlled. This is especially important for large sites as the chaotic movement of data may lead to suboptimal usage. Instead, incoming and outgoing data can be marshaled so they use designated resources guaranteeing better throughput and improving end-user experience.