4.1.1 Tackling Lack of Customizability.
Requesting for more customizability is a sentiment shared by participants. Both P7 and P9 pointed out that existing AutoML platforms are often encapsulated, making it difficult for users to intervene with the automation process or perform fine-grained tuning of the generated results.
“There is actually nothing, not so much you can do in, you know, adjusting the parameters, or whatever algorithm they use.” (P7)
“The cloud-based AutoML doesn’t allow the user to export the model or download the model to deploy it on their own machines. There is no such feature now because they are using the most advanced models. Those models are corporate properties and should not be disclosed to anyone else.” (P9)
Moreover, we found that AutoML’s lack of customizability is multi-fold and participants need to derive various strategies to tackle this challenge in different scenarios.
Workaround 1: Contextualizing Input Data. As most current AutoML platforms are generic and do not provide the flexibility to configure their inner workings, participants often find lacking the capability to handle context-sensitive tasks. One common workaround is to contextualize the input data by adding “context hints”, so that AutoML is able to utilize such additional information to generate context-specific ML solutions (P6, P7). For instance, P7, who is an HCI researcher, conducted a user study to understand the user experience with a voice-based self-tracking application. Due to her limited coding experience, she chose a commercial AutoML platform to provide the natural language processing (NLP) functionality. However, she reflected that the platform was not adaptive to the self-tracking context, and she needed to add additional contextual information to the input data for AutoML to work precisely:
“I feel that the AutoML services are not smart enough if you don’t give them enough contextual information, they cannot accurately recognize users’ voice input. I don’t know how to improve that, so I tell my participants to give the system a little bit more contextual information. For example, “7 to 9” is often mistakenly translated into “729”, and I asked my participants to say “7 to 9 AM” or “7 to 9 in the morning” so that it can help these systems improve their performance.” (P7)
Workaround 2: Incorporating Domain Knowledge. Another limitation related to AutoML’s customizability perceived by participants is that it does not naturally fit the needs of different industries. Participants’ workaround is to gather and incorporate domain knowledge into AutoML’s optimization objectives (P3, P9, P13, P15). For example, P13, who works in a technology company focusing on providing AI and ML solutions to traditional industries, reported that AutoML was too generic and regarded it as “a product made by obtaining the greatest common divisor among the needs of all users.” To fit AutoML to industry-specific tasks, he communicated with industry experts and transformed the experts’ domain knowledge into AutoML’s optimization objectives:
“Based on our cooperation with enterprises in traditional industries, the most difficult but valuable thing is how to convert domain knowledge of different industries into your model design. It’s actually the most valuable part, but this is definitely something I can’t do with AutoML. For example, in the optimization of the supply chain, a relatively reasonable level of inventory should be maintained, if you have no one to tell you about this kind of domain knowledge, you can not make AutoML fit into this specific task. Since either ML or data scientists are not particularly familiar with such issues, it actually requires us to have more communication with industry experts and transform this kind of domain knowledge into the objective in my model, so the bridging work is actually very important.” (P13)
Workaround 3: Building Internal AutoML Tools. AutoML’s lack of customizability is also manifested in its limited support for uncommon data types. Correspondingly, participants often opt to build their own AutoML tools (P6, P11, P17). For instance, P11 explained that his company has developed AutoML tools to support tabular data, which is missing on mainstream platforms:
“Our company has developed our own AutoML platform. The AutoML platforms provided by companies like Google and Amazon are very mature. However, the functions of their AutoML platforms support generic data such as images and text but do not support tabular data, which our company deals with.” (P11)
Similarly, P17 described that in his company the data comes in different formats and with different features requiring refining the search space, while current AutoML platforms do not provide such configurability:
“Because the data in our field can be in many forms and has discrete features, it needs a better representation of the overall data. The process of its correction also needs to be searched. Our (internal) AutoML is designed to be more refined and can handle different kinds of input data.” (P17)
As another example, P6 works at a non-governmental organization (NGO) in Kenya. As the company provides healthcare information and helps patients connect with local medical resources, the ML solution needs to support the local language of Swahili. Therefore, the company is building an internal AutoML platform and is going to switch from the commercial AutoML service to its own platform, which can better support Swahili without sacrificing accuracy due to translation, as well as “significantly saves money for the company.”
In summary, building internal AutoML tools is the strategy to deal with special data types or data with unique features, or to provide better-localized solutions.
4.1.2 Tackling Lack of Transparency.
The lack of transparency is another major concern frequently mentioned by participants. For example, P13 and P17 emphasized that while ML is already a black-box, automating ML adds another layer of “black-boxness”; thus, they perceived AutoML as a “double black-box”. The main transparency issues perceived by participants include two aspects: (i) AutoML has limited support to evaluate its outcomes; and (ii) it also falls short to provide sufficient information to assess its process. Thus, participants have devised various workarounds to assess and evaluate AutoML’s outcomes and process.
Workaround 1: Evaluating the resultsValidating AutoML’s Outcomes Manually. Several participants (P2, P10, P16) shared their struggles with evaluating and validating AutoML’s outcomes. For example, P16 pointed out the lack of indicative performance metrics on the AutoML platform he has used:
“There is one issue with AutoML, at least according to our experience when cooperating with the NGO. The evaluation metrics it (AutoML) gave were relatively limited. I remember that it only had one metric of ‘precision’ for classification, but other metrics such as ‘F1 score’ and ‘accuracy’ were missing.” (P16)
To cope with such transparency issues, one common workaround by participants is to manually validate AutoML’s outcomes using self-selected metrics or checking their backward compatibility with existing ML solutions:
“If it’s just for the classification, I would just use the provided evaluation metrics like accuracy and some kind of like F1 score precision and recall, some kind of provided metrics; but for tasks like regression, I usually manually check whether the results are reliable.” (P10)
“In our company, we compare AutoML’s results with the previous results. For example, when we want to run a credit score, we first use a well-trained model like our previous model to run to get a batch of results. Then we use AutoML to run the score. How much we can trust AutoML results depends on how different its results are from our previous results. If there is a big difference, there may be problems. If AutoML’s results are within our acceptable range, there should be no big problem, but we definitely do a lot of such testing.” (P2)
Workaround 2: Tracking AutoML’s Process. Further, most current platforms only provide explanations for AutoML’s outcomes (e.g., the importance of different features for the models suggested by AutoML), while its dynamic process (e.g., how the models are actually found) remains fairly vague. However, as several participants (P3, P8) indicated, they equally care about evaluating the dynamics of AutoML’s process to assess whether it performs as expected.
The reasons for this lack of process transparency on existing AutoML platforms may be multi-fold. For instance, commercial platforms often view the underlying AutoML techniques as proprietary intellectual properties and are unwilling to disclose the internal information. Also, as it often requires sufficient expertise to apprehend AutoML’s process, providing the process transparency may be deemed unnecessary for AutoML platforms facing ordinary users. To work around this limitation, participants resort to manually tracking AutoML’s learning curves, which are significant indicators of its optimization trajectories:
“We mainly look at how the three learning curves of training, validation, and testing change during the training process and the testing process. It may be the parameter settings or hyperparameter settings chosen for each set of AutoML. We check whether these learning curves make sense. If it makes sense, we will probably trust these results.” (P3)
Another way to assess the dynamics of AutoML’s process is to compare the difference among multiple runs of this process under varying settings, as P8 shared his practice of this approach:
“Basically, I will look at the statistical results, but I could maybe randomly sample several searches, and that’s where I got the AutoML algorithms and performance metrics like accuracy or latency, and I try to measure the differences among different algorithms or different neurons. I think it is a very straightforward way to evaluate the performance of AutoML.” (P8)
Workaround 3: Generating explanations manuallyCreating Customized Visualization. Many participants (P1, P2, P3, P5, P13, P15) also recognized the importance of visualization not only for understanding AutoML’s inner workings but also for communicating AutoML’s outcomes to internal (e.g., team members and executives) and external (e.g., clients and stakeholders) parties:
“Our company’s internal AutoML platform has a function that I particularly like, it can visualize its running process, especially when there are so many tasks, it can tell us the running conditions of each task, and also tell us the overall comparisons by showing us a table that lists the differences between different tasks. This is quite useful. AutoML is no longer a black box; it can give us some insight that helps us to reduce the unnecessary search space of hyperparameters for this kind of experiment.” (P15)
“For my current job, we don’t have many new models, because companies like us are relatively stable. But if we have relatively new models or features, we will need to explain them to the clients.” (P2)
“Personally, I don’t use visualization very much. I just observe some specific numbers directly. However, if we need to report to clients, it’s best to visualize it.” (P3)
However, this functionality is often underdeveloped or even absent on many AutoML platforms. Even when the existing visualization is sufficient for internal communication with experts, it often falls short to convey consumable information to external, non-expert parties who lack relevant backgrounds. To work around this limitation, participants often need to manually visualize AutoML’s outcomes based on a set of pre-identified requirements to facilitate communication with external parties.
“We have to visualize the explanations manually based on the results we got from AutoML, as the visualization auto-generated by AutoML is ugly, not informative, and not easy to understand … We need to make it easy to understand and look professional. There are certain requirements, such as avoiding text-heavy explanations and using more pictures. But internally, for example, within our group, we generally do not need manually created visualization, we can just use whatever features the platforms provide.” (P2)
“I manually visualize feature selection, which features are more important, the learning process, and learning curve changes in the performance of each model I trained, as the existing AutoML visualization function is simply not helpful.” (P3)
Participants also recognized that one major challenge to creating such explanations is to visualize information in an essential but not overwhelming manner, especially when communicating with external parties with limited expertise:
“Convincing people is not that easy from a technical perspective, but if you tell clients too many technical details like why a particular feature has a value of 0.7, people are going to be confused even more, so it’s important to strike a balance to provide just enough information in the right way, not too much, not too technical, or too detailed. This can also protect us by preventing other people from copying our idea.” (P5)
4.1.3 Mitigating Potential Privacy Risks.
In addition to AutoML’s functionality limitations, many participants (P1, P3, P4, P5, P6, P9, P10, P11, P13, P15, P17, P18, P19) also expressed serious concerns about potential data privacy issues in using AutoML platforms.
One major concern is whether using AutoML platforms may entail the privacy leakage of training data, which is especially consequential for critical domains (e.g., healthcare and insurance) involving sensitive information such as health history, credit history, and demographic information:
“The problem of data privacy is quite serious. Some projects I have done before were related to medical information, which involved patient data … The initial data may first be provided by the hospital itself. However, if an institution provides you with a very small amount of data, while we need to do this experiment on a large scale, we must involve patient data provided by different institutions or different hospitals. Then there are privacy risks: First of all, the patient’s information cannot be leaked. Second, each hospital may not want its data to be somehow leaked to other hospitals. So privacy is definitely a very important part to consider when it comes to whether to use AutoML or which one to use.” (P18)
Another major concern is whether AutoML-generated ML solutions are subject to potential inference attacks if disclosed to and used by unauthorized parties. For instance, the models generated by AutoML based on confidential medical data carry a significant amount of sensitive information from the original data, while malicious parties, if given access, may infer such sensitive information by reverse-engineering the models:
“If I get a parameter after training on the data of a bank or a hospital, I want to use it in the parameter space of other hospitals or other banks. Sometimes the parameter itself can be used to infer what the previously trained dataset looks like, which may cause data leakage.” (P18)
To alleviate the privacy concerns above, participants resort to various workarounds as detailed below.
Workaround 1: LimitingUprooting Privacy Leakage. One straightforward workaround by participants (P6, P10, P13, P15, P17, P18, P19) is to limit privacy leakage at its root. This strategy can be adopted at either the user or organization level. Specifically, at the user level, they purposely collect less sensitive data during the data collection stage before using such data on AutoML platforms:
“We are only collecting minimal identifiable data right now such as users’ phone numbers because that’s how we engage with the users over SMS (Short Message Service). We also have information about health facilities or hospitals when users sign up. [This information] is enough for us to help the users connect to health services, but we are not collecting other information such as user names, addresses, or ages.” (P6)
In addition, participants (P13, P15, P17) also mentioned that their organizations may have already performed certain data anonymization to protect data privacy before handing over the data:
“If our company asked us to compress a model, we won’t have too many images due to user privacy, or we may have a lot of data, but we do not have sensitive information such as gender since such information has been masked by the company.” (P15)
“Basically, the data I get may already be processed in advance. The information that can be stored in the general database is basically not related to any personal information.” (P17)
“Clients in the healthcare industry, for example, a pharmaceutical company we have collaborated with before, have strong compliance requirements, so they will also do a lot of processing on their side.” (P13)
In general, only uploading non-sensitive data to AutoML platforms greatly reduces the risks of privacy breaches during AutoML’s process. On the downside, this strategy may significantly affect data authenticity and negatively impact AutoML’s performance, as noted by P11:
“Google definitely doesn’t want users to worry that their data will be leaked, so they (Google) may mask some data and may use other means, such as protecting the user’s ranking layer to protect data privacy, but such techniques actually damage the authenticity of the original data and affect performance.” (P11)
Workaround 2: Applying Privacy-Preserving Techniques. Another workaround mentioned by participants (P3, P9) is to proactively apply privacy-preserving techniques during AutoML’s process. Examples include “black-box optimization” [
8] that avoids direct access to data, and “federated learning” [
88] that constructs ML models using data spread across multiple parties yet without sharing data between different parties:
“There are some black-box optimization methods that AutoML does not touch [your data in optimization]. In this case, it can be done at least a little better to guarantee privacy. Another way is through federated learning, which is equivalent to giving data to local users without uploading [the data to the server]. It relies on the local side to do some [AutoML] searches. What AutoML receives is some high-level [data] or metadata instead of data from users’ own devices.” (P3)
However, this workaround is not for every AutoML user, as many may lack the necessary technical expertise to apply advanced privacy-preserving techniques.
Workaround 3: Delegating to Legal Regulation. In addition, several participants (P1, P13) who use commercial AutoML platforms referred to data privacy as a legal issue that should be clearly specified in the privacy agreements:
“I think it is a legal issue between the platform and the company. Before the company decides to use these platforms, it must clearly state the privacy issue in the confidential agreement. If the AutoML platform violates the regulations, it will be a legal issue.” (P1)
“Before we collaborate with the companies that provide the AutoML services, we must first make it clear about what data can be shared, and to what extent the data can be shared. For such issues, these (AutoML) companies actually have their own standards and their own legal team will handle such issues.” (P13)
Delegating to legal regulations helps AutoML users clarify their responsibilities and secure their data from a legal perspective.
Workaround 4: Choosing Trustworthy Platforms. Participants reflected contrastive views towards the trustworthiness of cloud-based AutoML platforms in terms of privacy protection. While some participants (P2, P4, P8, P9, P17) raised concerns about the privacy risks of using cloud-based AutoML platforms, other participants (P5) trusted the AutoML services of renowned companies. For example, P4 from a healthcare company explained his strategy of avoiding using cloud-based AutoML platforms when it involves private data:
“If using the public dataset to try out the AutoML service, I’m not worried about data privacy; if I’m going to use some private datasets, I will probably not upload the data but run it locally. If I’m using cloud AutoML services, I will not choose to upload all the private data to the cloud server.” (P4)
P9 also echoed P4’s concerns:
“The people who really have this concern won’t even need to ask, they have very strict rules to prevent them from uploading any data to the cloud so this option was rolled out at first glance, so they wouldn’t need to ask and this definitely a concern for many companies they don’t want to disclose their data, because they have strict rules to upload data to any other servers besides their internal servers.” (P9)
On the contrary, other participants, especially ones from startup companies (P5), prefer reputed cloud-based AutoML platforms (e.g., Amazon AWS) over their own in terms of data privacy protection and believe these large companies are better positioned to protect data privacy given their plenty of infrastructure and personnel resources:
“If I’m hosting service on my own server, I’ll be very concerned about getting attacks, because as a start-up, we cannot afford to have an onsite, fully dedicated security team. But those bigger companies have teams of experts and engineers that can take care of this.” (P5)
Apparently, this view contradicts that of participants who choose to use internal platforms for risk control, implying the complex landscape of how users choose among different AutoML platforms, which are affected by perceived privacy risks, operational costs, and platform trustworthiness.