In a cloud environment, providing client companies REST API services to store and update their customers information (phone numbers etc'), I'm looking for a way for a newly joining client company to pass a file (or a set of files) containing all their customers. The file/s may contain millions of customer records.
Suppose the idea is that file/s may be uploaded to a certain folder, and once detected, an import process starts. Suppose also that there exists a service in the cloud that can create a customer from a request containing the details. Suppose each file is limited to something like 1GB.
I've heard that Yarn may be used or Kubernetes, but i can't really see how they can be used, or what's the advantage of using them.
This import process could be done in pure Java as follows: A Folder watching code in Java can easily detect the new file in the folder, and invoke a process that reads the records of the file/s, and from each record in the file, create a request obejct and call the service which can create the customer.
So what's the advantage of using Yarn or Kubernetes, over pure Java, in doing a task like this? And are there other alternative technologies that can be used for this task?
In a cloud environment, you want your Java service to be "highly available" and, when dealing with "millions of customer records" per client, even "secure." This is where Kubernetes and Yarn come in.
If you are running one VM, with a Java process saving sensitive customer data unencrypted to the local file system- what happens when:
You get the idea, there are an infinite number of failure and compromise scenarios.
Kubernetes and Yarn, in different ways, support architectural patterns that allow you to run multiple Java upload and import processes across a collection of VMs so that there can be sensible handling for the various failure cases, and sensible custodial machinery for the sensitive aspects of this process, at scale, with live data.