Why is metadata important for an organization building AI and ML models?
AI and ML models generally need to train from large quantities of data, and the broad adoption of AI and ML has only been possible due to the widespread accumulation of Big Data over the past decade or so. The challenge is that it’s difficult for Data Scientists to navigate massive Big Data deployments to locate the data they need for their models.
Metadata is simply data about data. It describes what data is. Metadata can tell you when a piece of data was created, who created it, and where it was created, as well describing the content of that data with much greater detail. For example, if the data is a video clip from a news broadcast, custom metadata can specify who the newscasters are, what the segment is about, if it’s a breaking news story or a softer feature piece., etc. There’s really no limit to what it can describe.
Rich metadata makes it easy to search for and locate data in order to use it in AI and ML models. Without it, data scientists often find themselves looking for a needle in a haystack, making the process less efficient and weakening the accuracy of AI and ML models.
What’s the difference between object storage and file storage?
Object storage is a much newer storage architecture than file storage, which has been around for decades. File storage works by organizing data as files into simple hierarchies – think of a filing cabinet with drawers, folders, sub-folders and, ultimately, files. File storage works fine for smaller deployments, but it doesn’t scale well beyond a certain limit. Once that capacity limit has been hit, organizations have to deploy a new file system on top of their existing file systems, a cumbersome and costly process.
Object storage treats data as objects that are stored in a flat address space, which eliminates the scaling limitations found in file storage. Object storage uses a clustered architecture that makes it very easy to scale out by simply adding additional devices to an existing system, rather than adding new systems entirely. As a result, object storage can easily scale to petabytes and beyond.
As mentioned earlier, AI and ML require large datasets. In this respect, object storage is better than file storage for supporting AI and ML due to its much greater scalability.
How do object storage and file storage use metadata?
Object storage provides for fully customizable metadata—including the ability to accommodate limitless metadata—while file storage uses very little metadata with fixed parameters. File systems use basic metadata to tell things like when data was created, where it was created and who created it. But it doesn’t support metadata that describes anything beyond those basic attributes. Object storage, on the other hand, allows users to customize metadata to describe anything they want.
For example, a traditional X-ray file would only have metadata describing basics like creation date, owner, location and size. An X-ray object, however, could use metadata that identifies patient’s name, age, injury details and which area of the body was X-rayed, making it much easier to locate that X-ray data via search and use it for AI and ML models.
Can you describe a real-world example of using object storage to support AI and ML?
Consider advanced surveillance applications deployed in smart cities. In this example, object storage is employed to support AI-driven pattern-detection apps that recognize faces, logos, landmarks, and other categories of content, which use metadata to describe attributes such as colors, sizes, gender, location, etc. so it’s easy to find the right images and videos.
Object storage provides the scalability to manage the large quantity of surveillance video data these apps would be capturing while also supporting limitless metadata needed to locate specific image and video data.
How are AI and ML related to the growth of edge computing?
There is so much new data being generated by sensors and software applications at the edge that storage and analysis systems are being overwhelmed. AI and ML are needed to filter and process the data at the edge to control and reduce the amount of data. Organizations have realized that in order for these advanced AI and ML apps to work efficiently, they must process their data at or near where it’s created at the edge rather than sending it to the Public Cloud for processing and back, which is a slow and expensive process.
There are tons of other AI and ML use cases that illustrate this trend, such as a targeted advertising app that identifies the make, model and year of cars on a highway and then displays relevant ads on a nearby billboard.
Or a manufacturing quality inspection app that identifies and removes products that failed to get wrapped in an automated assembly line process. In either case, there’s a great deal of data that must be analyzed rapidly. These apps have to make decisions quickly – it wouldn’t do much good to flash that targeted ad when the car’s already a mile down the road. For any of this to be possible, these apps must process the data where it’s created, so edge computing infrastructure is being deployed to support that.
Gary Ogasawara is Cloudian’s first CTO, responsible for setting the company’s long-term technology vision and direction.
Cloudian is the most widely deployed independent provider of object storage systems, with the industry’s most advanced S3 compatibility and an extensive partnership ecosystem. Its award-winning flagship solution, HyperStore, provides limitless scalability and cloud-like technology, flexibility and economics in the data center. Cloudian’s global data fabric architecture enables enterprises to store, find and protect object and file data seamlessly across sites, both on-premises and in public clouds, within a single, unified platform.