WO2019232247A1

WO2019232247A1 - Biomass estimation in an aquaculture environment

Info

Publication number: WO2019232247A1
Application number: PCT/US2019/034709
Authority: WO
Inventors: Bryton SHANG; Thomas HOSSLER; Alok Saxena
Original assignee: Aquabyte, Inc.
Priority date: 2018-06-01
Filing date: 2019-05-30
Publication date: 2019-12-05

Abstract

A method and system to predict fish weight and biomass using a non-invasive, digital stereo-camera and computer vision system and method. The stereo camera system is immersed in a net fish pen and captures stereo images of freely moving fish from a substantially lateral perspective. The computer vision system automatically identifies and estimates specific combinations of fin to fin, body depth, and length dimensions, that are learned from the stereo images. The estimates are then used to predict weight with a high degree of accuracy. The system and method have the advantage of being more automated and highly accurate and providing greatly reduced stress levels in fish, compared with current biomass estimation techniques.

Description

BIOMASS ESTIMATION IN AN AQUACULTURE ENVIRONMENT

TECHNICAL FIELD

[0001] The present disclosure relates to computer vision techniques for estimating fish biomass in an aquaculture environment.

BACKGROUND

[0002] The growth rate of world human population is applying substantial pressure on the planet’s natural food resources. Aquaculture will play a significant part in feeding this growing human population.

[0003] Aquaculture is the farming of aquatic organisms, including fish, in both coastal and inland areas involving interventions in the rearing process to enhance production.

Aquaculture has experienced dramatic growth in recent years. The United Nations Food and Agriculture Organization estimates that aquaculture now accounts for half of the world’s fish that is used for food.

[0004] Fish farm production technology is underdeveloped, when compared to the state of the art in other food production processes. Techniques that improve the production processes in fish farms using new perception and prediction techniques would be appreciated by fish farmers.

[0005] Biomass estimates are important for evaluating growth of fish during the growing cycle. The estimates help farmers determine the best time to harvest, adjust feed and medicine amounts, detect fish loss, among other fish farming activities.

[0006] Traditional techniques to estimate fish biomass involve manual sampling and weighting. However, minimizing the handling of fish is highly desirable not just because it is human-labor intensive but also because it impacts the health of the fish. As such, non-manual techniques are preferred.

[0007] In brief, WO 2014098614 uses special lighting and a camera with depth perception filters to triangulate fish in water. An estimate of the fish dimensions is generated using the received data.

[0008] In brief, US 200400008259 utilizes shadows generated by illuminating a fish from multiple angles using multiple light sources to estimate the depth and size of the fish. [0009] In brief, WO 2005025309 uses multiple cameras to create a 3D image of a fish that swims through a chute that is equipped with a wreath of cameras that capture footage of the fish from multiple angles to generate the 3D image.

[0010] These solutions have special lighting requirements or involve multiple irradiation devices or cameras arranged opposite from each other in a transfer channel or chute. A smaller, less-expensive, and easier to handle solution that reduces the amount of detection equipment in the net pen is preferable. Further, a solution that does not require forcing fish to individually swim through a chute or channel is also preferred.

[0011] The present invention disclosed herein addresses these and other issues.

[0012] The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by their inclusion in this section.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] FIG. 1 is a schematic diagram of a biomass estimation system.

[0014] FIG. 2 depicts the position of truss and conventional dimensions of a fish.

[0015] FIG. 3 is a schematic diagram of an image processing sub-system of the biomass estimation system of FIG. 1.

[0016] FIG. 4 depicts basic computer hardware that may be used in an implementation. DETAILED DESCRIPTION

[0017] In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

OVERVIEW

[0018] Fish farms rely on accurate information on fish biomass. Such information is used by fish farmers to adjust feeding, control pen stocking densities, and select optimal stock harvest times. As mentioned, current methods for biomass estimation may be stressful to the fish. In addition, current methods may be inaccurate.

[0019] Having accurate information of fish size is important in aquaculture. In particular, accurate information is needed for effective management of feeding regimes, grading times, and optimal harvest times. Today, many net pens are located in areas along the coast where they can take advantage of ocean currents and deliver oxygen to the fish and disperse waste. This presents challenges to effectively monitoring the fish stock, especially in bad weather.

[0020] As mentioned, the most common way of estimating fish biomass is by physically netting a sub-sample of fish and weighing them manually. This is labor intensive, may cause stress to the fish and may cause scale damage.

[0021] The present invention disclosed herein addresses these and other issues.

[0022] The above-deficiencies and other problems associated with current methods of fish biomass estimation are reduced or eliminated with a method and system to predict fish weight and biomass using a non-invasive, digital stereo-camera and computer vision system and method. The stereo camera system is immersed in a net fish pen and captures stereo images of freely moving fish from a substantially lateral perspective. The computer vision system automatically identifies and estimates specific combinations of fin to fin, body depth, and length dimensions, that are detected in the stereo images. The estimates are then used to predict weight with a high degree of accuracy. The system and method have the advantage of being more automated and highly accurate and providing greatly reduced stress levels in fish, compared with current biomass estimation techniques.

BIOMASS ESTIMATION SYSTEM

[0023] FIG. 1 is a schematic diagram of fish biomass estimation system 100 for estimating biomass of fish 102 in net pen 104. System 100 includes pair of high-resolution, light sensitive, digital cameras 106 within a waterproof housing immersed underwater in net pen 104.

[0024] Cameras 106 may be arranged substantially horizontally with substantially no vertical disparity. Although it is possible for cameras 106 to be arranged substantially vertically with substantially no horizontal disparity. In that case, references herein to vertical may be substituted with horizontal and references herein to left and right may be substituted with top and bottom or vice versa, without loss of generality.

[0025] Each of cameras 106 may be approximately 12-megapixel monochrome camera with a resolution of approximately 4096 pixels by 3000 pixels, and a frame rate of approximately 1 to 8 frames per second. Although different cameras with different capabilities including color cameras may be used according to the requirements of the particular implementation at hand. For example, a color camera may be used to capture images that may be processed for sea lice detection and classification as well as for biomass estimation. [0026] Selection of the camera lens for cameras 106 may be based on an appropriate baseline and focal length to capture images of a fish swimming in front of a camera where the fish is close enough to the lenses for proper pixel resolution and feature detection in the captured image, but far enough away from the lenses such that the fish can fit in both the left and right frames. For example, 8-millimeter focal length lenses with high line pair count (lp/mm) can be used such that each of the pixels in the left and right images can be sufficiently resolved. The baseline of cameras 106 may have greater variance such as, for example, within the range of 6 to 12-centimeter baseline.

[0027] Net pen 104 may be framed by a plastic or steel cage that provides a substantially inverted conical, circular, or rectangular cage, or cage of other desired dimensions. Net pen 104 may hold a number of fish 102 of a particular type (e.g., salmon) depending on various factors such as the size of the net pen 104 and the maximum stocking density of the particular fish caged. For example, a net pen for salmon may be 50 meters in diameter, 20-50 meters deep, and hold up to approximately 200,000 salmon assuming a maximum stocking density of 10 to 25 kg/m3.

[0028] While the techniques for biomass estimation disclosed herein may be applied to a sea-pen environment such as net pen 104, the techniques may be applied to other fish farming enclosures. For example, the techniques may be applied to fish farm ponds, tanks, or other like fish farm enclosures.

[0029] Cameras 106 may be attached to a winch system that allows the cameras 106 to be relocated underwater in net pen 104 so as to capture stereo images of fish from different locations within the net pen 104. For example, the winch system may allow cameras 106 to move around the perimeter of net pen 104 and at various depths within net pen 104. The winch system may also allow control of pan and tilt of cameras 106. The winch system may be operated manually by a human controller such as, for example, by directing user input to an above-water surface winch control system. However, the winch system may operate autonomously according to a winch control program configured to adjust the location of cameras 106 within net pen 104, for example, in terms of location on the perimeter of the cage and depth within net pen 104.

[0030] The autonomous winch control system may adjust the location of cameras 106 according to a series of predefined or pre-programmed adjustments and /or according to detected signals in net pen 104 that indicate better or more optimal locations for capturing images of fish 102 relative to a current position and/or orientation of cameras 106. A variety of signals may be used such as, for example, machine learning and computer vision techniques applied to images captured by cameras 106 to detect schools or clusters of fish currently distant from cameras 106 such that a location that is closer to the school or cluster can be determined and the location, tilt, and / or pan of cameras 106 adjusted to capture more suitable images of the fish. The same techniques may be used to automatically determine that cameras 106 should remain or linger in a current location and /or orientation because cameras 106 are currently in a good position to capture suitable images of fish 102 for biomass estimation or other purposes.

[0031] It is also possible to illuminate fish 102 in net pen 104 with ambient lighting in the blue-green spectrum (450 nanometers to 570 nanometers). This may be useful to increase the length of the daily sample period during which useful images of fish 102 in net pen 104 may be captured. For example, depending on the current season (e.g., winter), time of day (e.g., sunrise or sunset), and latitude of net pen 104, only a few hours during the middle of the day may be suitable for capturing useful images without using ambient lighting. This daily period may be extended with ambient lighting.

[0032] Net pen 104 may be configured with wireless cage access point 108 A for transmitting stereo images captured by cameras 106 and other information wirelessly to barge 110 or other water vessel that is also configured with wireless access point 108B. Barge 110 may be where on-site fish farming process control, production, and planning activities are conducted. Barge 110 may house computer image processing system 112 of biomass estimation system 100. In general, computer image processing system 112 generates accurate biomass estimates of fish 102 in net pen 104 using computer vision and artificial intelligence techniques applied to the stereo images captured by cameras 106. Components and operation of the image processing system 112 are discussed in greater detail below with respect to FIG. 3.

[0033] Cameras 106 may be communicatively coupled to image processing system 112 wirelessly via wireless access points 108. However, cameras 106 may be communicatively coupled to image processing system 112 by wire such as, for example, via a wired fiber connection between net pen 104 and barge 110.

[0034] Some or all of image processing system 112 may be located remotely from cameras 106 and connected by wire or coupled wirelessly to cameras 106. However, some or all of image processing system 112 may be a component of cameras 106. In this case, cameras 106 may be configured within an on-board graphics processing unit (GPU) or other on-board processor or processors capable of executing image processing system 112, or portion thereof. [0035] In either case, the output of image processing system 112, including biomass estimates, may be uploaded to the cloud or otherwise over the Internet via a cellular data network, satellite data network, or other suitable data network to an online service configured to provide the biomass estimates or information derived by the online service therefrom in a web dashboard or the like (e.g., in a web browser, a mobile application, a client application, or other graphical user interface.)

[0036] One skilled in the art will recognize from the foregoing description that there is no requirement that image processing system 112 be contained on barge 110 or that barge 110 be present in the aquaculture environment. Instead, cameras 106 may contain image processing system 112 or be coupled by wire to a computer system that contains image processing system 112. The computer system may be affixed above the water surface to net pen 104 and may include wireless data communications capabilities for transmitting and receiving information over a data network (e.g., the Internet).

[0037] Although not shown in FIG. 1, barge 110 may include a mechanical feed system that is connected by physical pipes to net pen 104. The feed system may deliver food pellets via the pipes in doses to fish 102 in net pen 104. The feed system may include other components such as a feed blower connected to an air cooler which is connected to an air controller and a feed doser which is connected to a feed selector that is connected to the pipes to net pen 104. The accurate biomass estimates generated by image processing system 112 may be used as input to the feed system for determining the correct amount of feed in terms of dosage amounts and dosage frequency, thereby improving the operation of the feed system.

[0038] As well as being useful for determining the correct amount of feed, the accurate biomass estimates generated by image processing system 112 are also useful for determining more optimal feed formulation. Feed formulation includes determining the ratio of fat, protein, and other nutrients in the food pellets fed to fish 102. Using accurate biomass estimates generated by image processing system 112 for fish 102 in net pen 104, precise feed formulations for fish 102 in net pen 104 may be determined. In this way, it is also possible to have different formulations for the fish in different net pens based the different biomass estimates determined by image processing system 112 for those net pens. For example, a biomass estimate of fish 102 in net pen 104 may be generated by image processing system 112 and input to an onsite (e.g., on barge 110) food pellet mixer that uses the input biomass estimate to automatically select the ratio of nutrients to mix together in the food pellets that are delivered to fish 102 in net pen 104. In this way, as fish 102 grow over time as reflected by increasing biomass estimates generated over the time by image processing system 112, the nutrient composition of the feed pellets delivered to fish 102 will automatically adjust accordingly.

[0039] In addition to being useful for feed dosage optimization and feed formulation optimization, the accurate biomass estimates generated by image processing system 112 are also useful for determining optimal harvest times and maximizing sale profit for fish farmers. For example, fish farmers may use the biomass estimates to determine how much of different fish sizes they can harvest and bring to market. For example, the different fish sizes may be distinguished in the market by 1 kilogram increments. Thus, accurate biomass estimates are important to fish farmers to accurately determine which market bucket (e.g., the 4 kilogram to 5 kilogram bucket, the 5 kilogram to 6 kilogram bucket, etc.) fish 102 in net pen 104 fall into to. Having accurate biomass estimates also improves fish farmers’ relationship downstream in the market such as with slaughterhouse operators and fish futures markets.

[0040] Additionally, an accurate fish biomass estimate is useful for compliance with governmental regulations. For example, in Norway, a salmon farming license may impose a metric ton limit. Biomass estimates generated according to techniques disclosed herein may be useful for ensuring compliance with such licenses.

MORPHOLOGICAL LATERAL BODY MEASUREMENTS

[0041] One challenge with a stereo-vision system is the accurate estimation of biomass from the two-dimensional stereo camera images. For example, a single lateral dimension of a fish such as fork length not be sufficient to accurately predict the weight of the fish because of variances in fish size and feeding regimes. In some embodiments, to improve the accuracy of the weight prediction, system 100 automatically detects and captures a set of one or more morphological lateral body dimensions of a fish that are useful for accurately predicting the weight of the fish.

[0042] Once the dimensions are known, the weight may be calculated. The weight may be calculated using a regression equation, for example. The regression equation may be developed using regression analysis on known ground truth weight and dimension relationships of fish. The regression equation may be fish specific reflecting the particular morphological characteristics of the fish. For example, a different regression equation or set of equations may be used for Scottish salmon that is used for Norwegian salmon which are typically heavier than Scottish salmon. [0043] The weight calculated can be a discrete value representing a predicted weight of the fish or multiple values representing a probability distribution of the predicted weight of the fish.

[0044] Multiple regression equations may be used to calculate multiple weights for a fish and the weights averaged (e.g., a weighted average) to a calculate a final weight prediction for the fish.

[0045] A regression equation used can be a single-factor regression equation or a multi factor regression equation. A single-factor regression equation can predict the weight within a degree of accuracy using only one of the dimensions. A multi-factor regression equation can predict the weight within a degree of accuracy using multiple of the dimensions.

[0046] Different regression equations or sets of regression equations may be used in different situations depending on the particular morphological lateral body dimensions image processing system 112 is able to capture from different sets of stereo images. Various different morphological lateral body measurements and regression equations can be used such as those described in the paper by Beddow, T.A. and Ross, L.G.,“Predicting biomass of Atlantic salmon from morphometric lateral measurements,” in the Journal of Fish Biology 49(3): 469-482. System 112 is not limited to any particular set of morphological lateral body dimensions or any particular set of regression equations.

[0047] According to the Beddow or Ross paper referenced above, the morphological lateral body dimensions depicted in FIG. 2 may be especially suitable for predicting the weight of an individual fish to in some cases within plus or minus two percent (2%) of the actual weight.

[0048] FIG. 2 depicts truss dimensions 210 and conventional dimensions 230.

[0049] Truss dimensions 210 (shown as dashed lines) are established by corresponding landmark points on the fish and lines between the landmark points. The various landmark points include (1) the posterior most part of the eye, (2) the posterior point of the

neurocranium (where scales begin), (3) the origin of the pectoral fin, (4) the origin of the dorsal fin, (5) the origin of the pelvic fin, (6) the posterior end of the dorsal fin, (7) the origin of the anal fin, (8) the origin of the adipose fin, (9) the anterior attachment of the caudal fin to the tail, (10) the posterior attachment of the caudal fin to the tail and (11) the base of the middle caudal rays.

[0050] Conventional dimensions 230 include (A) the body depth at origin of the pectoral fin, (B) the body depth at origin of the dorsal fin, (C) the body depth at end of the dorsal fin, (D) the body depth at origin of the anal fin, (E) the least depth of the caudal peduncle, (POL) the post-orbital body length, and (SL) the standard body length.

[0051] Conventional dimensions 230 correspond to various landmark areas of the fish. The head area (SL)-(A) is between the start of the (SL) the standard body length at the anterior end of the fish and (A) the body depth at the origin pectoral fin. The pectoral area (A)-(B) is between the (A) the body depth at the origin pectoral fin and (B) the body depth at the origin of the dorsal fin is. The anterior dorsal area (B)-(C) is between the (B) body depth at the origin of the dorsal fin and (C) the body depth at the end of the dorsal fin. The posterior dorsal area (C)-(D) is between the (C) the body depth at end of the dorsal fin and (D) the body depth at origin of the anal fin. The anal area (D)-(E) is between (D) the body depth at origin of the anal fin and (E) the least depth of the caudal peduncle. The tail area (E)-(SL) is between the (E) the least depth of the caudal peduncle and the end of the (SL) the standard body length at the posterior end of the fish.

[0052] Image processing system 112 may automatically detect and identify one or more or all of the landmark points and the landmark areas discussed above for purposes of predicting the weight of the fish. Image processing system 112 may do this even though the yaw, roll, and pitch angle of fish 102 captured in the stereo images may be greater than zero degrees with respect to a fish that is perfectly lateral with cameras 106. By doing so, image processing system 112 can estimate biomass from stereo images of freely swimming fish 102 in net pen 104. System 100 does not require a tube or a channel in net pen 104 through which fish 102 must swim in order to accurately estimate biomass.

IMAGE PROCESSING SYSTEM

[0053] FIG. 3 is a schematic diagram showing high-level components of the image processing system 112. System 112 includes image storage 310, image filtration system 312, object detection and image segmentation system 314, stereo matching and occlusion handling system 316, and weight predictor system 318.

[0054] At a high-level, operation of the system 112 may be as follows.

[0055] High-resolution monochrome or color rectified stereo images captured by the cameras 106 in net pen 104 are transmitted to image processing system 112 via a

communication link established between wireless access points 108A and 108B. The images received by image processing system 112 are stored in image storage 310 (e.g., one or more non-volatile and /or volatile memory devices.) Pairs of rectified stereo images are output (read) from image storage 310 and input to image filtration system 312. [0056] Image filtration system 312 analyzes the input image pair, or alternatively one of the images in the pair, to make a preliminary, relatively low-processing cost determination of whether the image pair contains suitable images for further processing by system 112. If so, then the pair of images are input stereo matching and occlusion handling system 316.

[0057] Stereo matching and occlusion handling system 316 determines corresponding pairs of pixels in the stereo images and outputs a disparity map for the base image of the stereo pair.

[0058] Object detection and image segmentation system 314 detects fish in the input base image and produces one or more image segmentation masks corresponding to one or more landmark points and / or landmark areas of the fish detected in the base image.

[0059] Weight predictor system 318 obtains three-dimensional (3-D) world coordinates of points corresponding to pixels from the input disparity map output by the stereo matching and occlusion handling system 316. The 3-D world coordinates of points corresponding to pixels corresponding to the landmark point(s) and / or landmark area(s) of the fish are used to calculate one or more truss dimensions and / or one or more conventional dimensions of the fish. The calculated dimensions then used in a calculation to predict the weight of the fish.

[0060] As an alternative, instead of calculating the truss and / or conventional dimensions, weight predictor 318 may generate a 3-D point cloud object from the 3-D world coordinates. The volume of the fish may be estimated from the 3-D point cloud and then the weight predicting based on the estimated volume and a predetermined density or density distribution of the fish. For example, the predetermined density may be a known average density or density distribution of Atlantic Salmon.)

[0061] A threshold number of individual weight predictions determined (e.g., 1,000) for a period of time (e.g., a day) may be averaged. From this, a distribution of the average daily (or other time period) fish biomass in net pen 104 over an extended period of time (e.g., the past month) may be charted (e.g., as a histogram or other visual distribution presented in a web browser) to provide a visualization of whether the total fish biomass in net pen 104 is increasing, decreasing, or staying relatively constant over that extended period according to the aggregated individual estimates. A decreasing biomass may be indicative, for example, of fish that escaped net pen 104, a predator gained access to net pen 104, fish mortality, etc. An increasing biomass may be indicative that the fish are still growing and not yet ready for harvest. While a steady biomass distribution may be indicative that the fish are ready for harvest. Other applications of the individual biomass estimates are possible and the present invention is not limited to any particular application of the individual biomass estimates. [0062] It should also be noted that the sampling strategy may vary depending on if the object detection and segmentation system 314 has the capability to uniquely identify fish 102 in net pen 104. For example, one or more features of a fish detected in an image may be used to uniquely identify the fish in the image. If that case, a weight prediction may not be determined for the fish if a weight prediction was recently obtained for the fish within a sampling window (e.g., the same day or the same week.) The sampling strategy, in this case, may be to predict the weight of each unique fish identified during the sampling window and avoid re-predicting the weight of a fish for which a weight prediction has already been made during the sampling window. By doing so, a more accurate average biomass may be calculated because double counting is avoided.

STEREO MATCHING AND OCCLUSION HANDLING

[0063] To produce rectified stereo images for input to stereo matching and occlusion handling system 316, cameras 106 may be calibrated underwater against a target of a known geometry such as a black and white checkered board of alternative squares that provides good contrast in underwater conditions. Other calibration techniques are possible and the present invention is not limited to any particular calibration techniques.

[0064] Stereo matching and occlusion handling system 316 detects pairs of corresponding pixels in the rectified stereo pair. From this, stereo matching and occlusion handling system 316 outputs a disparity map for the base image of the stereo pair. The disparity map may be an array of values, one value for each pixel or each selected pixel in the base image, where each value numerical represents the disparity with respect to a corresponding pixel in the other (match) image of the stereo pair. For example, the selected pixels may correspond to landmark points on fish detected in the base image. A depth map for the base image may be derived from the disparity map using known techniques. For example, the depth of a pixel in the base image may be calculated from a known focal length of cameras 106, a known baseline distance between cameras 106, and the disparity of the pixel in the base image and its corresponding pixel in the match image. The depth map may also be an array of values, one value for each pixel in the image, but where each value numerically represents the distance of the object in the image scene corresponding to the pixel from the cameras 106 (e.g., from the center of the lens of the camera that captured the base image.)

[0065] The pair of images may be rectified, either vertically or horizontally. The matching task performed by the stereo matching and occlusion handling system 316 is to match pixels in one of the stereo images (the base image) to corresponding pixels in the other stereo image (the match image) according to a disparity matching algorithm (e.g., basic block matching, semi-global block matching, etc.) The output of the matching task is a disparity map for the stereo image pair. A depth map may be calculated from the disparity map using basic geometrical calculations. The depth map may contain per-pixel information relating to the distance of the surfaces of objects in the base image from a viewpoint (e.g., the center of the lens of the camera that captured the base image.)

[0066] One of the challenges to accurately estimating the weight of a fish in an image is occlusions. This challenge is magnified in the fish biomass estimation context because freely swimming fish 102 in net pen 104 may swim close to each other or in schools such that one fish occludes another as they swim in front of cameras 106. In this case, it can be difficult to stereo match corresponding pixels in the stereo image pair because a portion of a fish that is visible in one of the images may not be visible in the other of the images because of occlusion.

[0067] To address this occlusion problem, a convolutional neural network may be trained to aid the stereo matching task. In particular, the convolution neural network may be used to leam a similarity measure on small image patches of the base and match images. The network may be trained with a binary classification set having examples of similar and dissimilar pairs of patches. The output of the convolutional neural network may be used to initialize the stereo matching cost. Post-processing steps such as, for example, semi-global block matching, may follow to determine pixel or pixel region correspondence between the base and match images. A depth map is extracted from the input stereo image pair based on stereo matching method described in the paper by J. Zbontar and Y. LeCun,“Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches,” JMLR 17(65): 1-32, 2016.

[0068] Occlusion detection may be performed by using the area of the bounding boxes drawn by a model, the shape of the segmentation masks or a tracking of the bounding boxes over time. The challenges in detecting occlusion lies in the absence of idea of depth. At least two approaches are possible: either using the output of the depth mapping algorithm to know which fish is occluding the other or use the outputs of the detection/segmentation algorithm.

In the second case, we can track the area bounding boxes over time to find which fish is occluding the other (for example, the bounding box of the occluded fish should increase as the occluding fish moves away). In another possible approach, the shape of the segmentation mask, where occluded and occluding fish will have different shapes.

OBJECT DETECTION AND IMAGE SEGMENTATION [0069] Object detection and image segmentation system 314 identifies the pixels in an image that corresponds to a landmark point or a landmark area of the fish. The output of the object detection and image segmentation system 314 may be one or more image

segmentation masks for the image. For example, an image segmentation mask may be a binary mask where the binary mask is an array of values, each value corresponding to one of the pixels in the image, the value being 1 (or 0) to indicate that the corresponding pixel does correspond to a landmark point or a landmark area of the fish, or being the opposite binary value to indicate that the corresponding pixel does not correspond to a landmark point or a landmark area of the fish. Instead of binary values, confidence values may be used in the image segmentation mask. In addition, or as an alternative to representing an image segmentation using raster coordinates, an image segmentation may be represented by vector coordinates that indicate the outline area of a landmark point or a landmark area of the fish.

[0070] The object detection and image segmentation system 314 may output multiple image segmentation masks for the same input image corresponding to different landmark points and different landmark features detected in the image.

[0071] The object detection and image segmentation system 314 may include a convolutional neural network for analyzing an input image to perform classification on objects (fish) detected in the image. The convolutional neural network may be composed of an input layer, an output layer, and multiple hidden layers including convolutional layers that emulate in a computer the response of neurons in animals to visual stimuli.

[0072] Given an image, a convolutional neural network of the object detection and image segmentation system 314 may convolve the image by a number of convolution filters. Non linear transformations (e.g., MaxPool and RELU) may then then be applied to the output of convolution. The convolutional neural network may perform the convolution and the non linear transformations multiple times. The output of the final layer may then be sent to a softmax layer that gives the probability of the image being of a particular class (e.g., an image of one or more fish). Object detection and image segmentation system 314 may discard the input image from further processing by system 112 if the probability that the image is not an image of one or more fish is above a threshold (e.g., 80%).

[0073] Assuming the input image is not discarded by object detection and image segmentation system 314, may use a convolutional neural network to identify fish in the image via a bounding box. A bounding box may also be associated a label identifying the fish within the bounding box. The convolutional neural network may perform a selective search on the image through pixel windows of different sizes. For each size, the convolutional neural network attempts to group together adjacent pixels by texture, color, or intensity to identify fish in the image. This may result in a set of region proposals which may then be input to a convolutional neural network trained on images of fish with the locations of the fish within the image labeled to determine which regions contain fish and which do not.

[0074] As an alternative, it is also possible to detect fish, landmark points and / or landmark areas of the fish in the image without region proposals using a single stage method (e.g., YOLO, SSD, SSD with recurrent rolling convolution, deconvolutional single shot detector, or the like.) YOLO is described in the paper by J. Redmon, S. Divvala, R. Girschick and A. Farhadi,“You Only Look Once: Unified, Real-Time Object Detection,”

arXiv: 1506.02640n5, May 9, 2016. SSD is described in the paper by W. Liu, D. Anguelov,

D. Erhan, C. Szegedy, S. Reed, Cheng-Yang Fu and A. C. Berg,“SSD: Single Shot MultiBox Detector,” arXiv: 1512.02325n5, December 19, 2016. SSD with recurrent rolling convolution is described in the paper by J. Ren, X. Chen, J. Liu, W. Sun, J. Pang, Q. Yan, Y. Tai and L. Xu,“Accurate Single Stage Detector Using Recurrent Rolling Convolution,”

arXiv: l704:05776vl, April 19, 2017. Deconvolutional single shot detector is described in the paper by C. Fu, W. Liu, A. Ranga, A. Tyagi, A. Berg,“DSSD: Deconvolutional Single Shot Detector,” arXiv: l70l.06659vl, January 23, 2017.

[0075] The images of fish on which a convolutional neural network is trained may include images that are representative of images containing fish from which landmark point and landmark areas can be identified. For example, a convolutional neural network may be trained on images of fish that provide a full lateral view of the fish including the head and tail at various different yaw, roll, and pitch angles and at different sizes in the image representing different distances from cameras 106. Such training data may also be generated synthetically using a computer graphics application (e.g., a video gaming engine or a computer graphical animation application (e.g., Blender) in order to generate sufficient training data.

[0076] A final layer of a convolutional neural network may include a support vector machine (SVM) that classifies the fish in each valid region. Such classification may include whether a full lateral view of a fish including both the head and the tail of the fish is detected in the region. Object detection and image segmentation system 314 may tighten the bounding box around a region of a fish by running a linear regression on the region. This produces new bounding box coordinates for the fish in the region.

[0077] In addition, or alternatively, a convolutional neural network, a classifier, and a bounding box linear regressor may be jointly trained for greater accuracy. This may be accomplished by replacing the SVM classifier with a softmax layer on top of a convolutional neural network to output a classification for a valid region. The linear regressor layer may be added in parallel to the softmax layer to output bounding box coordinates for the valid region.

[0078] To speed up objection detection, it is also possible to leverage the convolutional feature maps used by a region-based detector such as Fast R-CNN, to generate region proposals such as is done with Faster R-CNN. For example, a single stage region-based detector may be used. Fast R-CNN is described in the paper by R. Girshick,“Fast R-CNN,” arXiv: l504.08083v2, September 27, 2015. Faster R-CNN is described in the paper by S. Ren, K. He, R. Girschick and J. Sun,“Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv: l506.0l497v3, January 6, 2016.

[0079] The bounding box of a full lateral view of a fish may be cropped from the image in which is detected. A convolutional neural network-based object detection may then be performed again on the cropped image to detect and obtain bounding boxes or image segmentation masks corresponding to the landmark points and the landmark areas of the fish in the cropped image. For this, a trained convolutional neural network may be used. The convolutional neural network may be trained on tight images of fish (synthetically generated or images captured in situ from a camera) with the locations of the various landmark points and the landmark areas in the images labeled in the tight images.

[0080] Once a suitable cropped image is obtained, the object detection and image segmentation system 314 performs pixel level segmentation on the cropped image. This may be accomplished using a convolutional neural network that runs in parallel with a

convolutional neural network for object detection such as Mask R-CNN. The output of this may be image segmentation masks for the locations of the detected landmark points and landmark areas in the image. Mask R-CNN is described in the paper by K. He, G. Gkioxari,

P. Dollar and R. Girschik,“Mask R-CNN,” arXiv: l703.06870v3, January 24, 2018.

WEIGHT PREDICTION

[0081] Weight predictor 318 combines the depth map output by stereo matching and occlusion handling system 316 with the image segmentation mask(s) generated by the object detection and image segmentation system 314. The combining is done to determine the location of the landmark point(s) and / or landmark area(s) in 3-D cartesian space This combining may be accomplished by superimposing the depth map on an image segmentation mask, or vice versa, on a pixel-by-pixel basis (pixelwise).

[0082] For example, one or more truss dimensions 210 and / or one or more conventional dimensions 230 of the fish may then be calculated using planar geometry. For example, the distance of the truss dimension between landmark point (1) the posterior most part of the eye and landmark point (4) the origin of the dorsal fin may be calculated using the Pythagorean theorem given the x, y, and z coordinates in the 3D space for each landmark point. Weight predictor 318 may then predict the weight of the fish according to a regression equation using the one or more truss and /or one or more conventional dimensions. In this way, weight prediction of the fish 102 in the net pen 104 may be made on an individual basis.

[0083] A predicted weight for a fish detected in an image produced according to the foregoing techniques may be used as an input to a condition factor (i.e., k-factor) calculation. The techniques may also be applied to determine the length of the fish from the tip of the snout to the rear edge of the fork at the center of the tail fin (e.g., the length to caudal fork), for input to the condition factor calculation, regardless if that length determination is used to estimate the weight of the fish.

[0084] The weight of a fish may be estimated based on distances between landmark points and landmark areas segmented from an image of the fish combined with a depth map, the weight of a fish may be predicted by combining an image segmentation mask of the entire fish with a depth map for the image to obtain a three-dimensional point cloud for the pixels in the image representing the fish in the image according to the image segmentation mask. The volume of the fish may then be estimated from the point cloud. For example, the volume may be estimated using the technique described in the paper by W. C. Chang, C. H. Wu, Y. H.

Tsai and W. Y. Chiu, "Object volume estimation based on 3D point cloud," 2017

International Automatic Control Conference (CACS), Pingtung, 2017, pp. 1-5. Once the volume is estimated, the weight of the fish may be estimated based on the estimated volume and a known density for the type of fish.

BASIC COMPUTING SYSTEM

[0085] According to some embodiments, the techniques for biomass estimation described herein are implemented by one or more special-purpose computing devices. The special- purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard wired and/or program logic to implement the techniques.

[0086] For example, FIG. 4 is a block diagram that illustrates a computer system 400 upon which an embodiment of the invention may be implemented. Computer system 400 includes a bus 402 or other communication mechanism for communicating information, and a hardware processor 404 coupled with bus 402 for processing information. Hardware processor 404 may be, for example, a general-purpose microprocessor.

[0087] Computer system 400 also includes a main memory 406, such as a random-access memory (RAM) or other dynamic storage device, coupled to bus 402 for storing information and instructions to be executed by processor 404. Main memory 406 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404. Such instructions, when stored in non-transitory storage media accessible to processor 404, render computer system 400 into a special-purpose machine that is customized to perform the operations specified in the instructions.

[0088] Computer system 400 further includes a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404. A storage device 410, such as a magnetic disk, optical disk, or solid-state drive is provided and coupled to bus 402 for storing information and instructions.

[0089] Computer system 400 may be coupled via bus 402 to a display 412, such as an OLED, LED or cathode ray tube (CRT), for displaying information to a computer user. An input device 414, including alphanumeric and other keys, is coupled to bus 402 for communicating information and command selections to processor 404. Another type of user input device is cursor control 416, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane. The input device 414 may also have multiple input modalities, such as multiple 2-axes controllers, and / or input buttons or keyboard. This allows a user to input along more than two dimensions simultaneously and / or control the input of more than one type of action.

[0090] Computer system 400 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 400 to be a special-purpose machine. According to some embodiments, the techniques herein are performed by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in main memory 406. Such instructions may be read into main memory 406 from another storage medium, such as storage device 410. Execution of the sequences of instructions contained in main memory 406 causes processor 404 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

[0091] The term“storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operate in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical disks, magnetic disks, or solid-state drives, such as storage device 410. Volatile media includes dynamic memory, such as main memory 406. Common forms of storage media include, for example, a hard disk, solid-state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.

[0092] Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 402. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

[0093] Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 404 for execution. For example, the instructions may initially be carried on a magnetic disk or solid-state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 400 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 402. Bus 402 carries the data to main memory 406, from which processor 404 retrieves and executes the instructions. The instructions received by main memory 406 may optionally be stored on storage device 410 either before or after execution by processor 404.

[0094] Computer system 400 also includes a communication interface 418 coupled to bus 402. Communication interface 418 provides a two-way data communication coupling to a network link 420 that is connected to a local network 422. For example, communication interface 418 may be a modem to provide a data communication connection to a

corresponding type of telephone or coaxial line. As another example, communication interface 418 may be a network card (e.g., an Ethernet card) to provide a data communication connection to a compatible Local Area Network (LAN). Wireless links may also be implemented. In any such implementation, communication interface 418 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information. Such a wireless link could be a Bluetooth, Bluetooth Low Energy (BLE), 802.11 WiFi connection, or the like.

[0095] Network link 420 typically provides data communication through one or more networks to other data devices. For example, network link 420 may provide a connection through local network 422 to a host computer 424 or to data equipment operated by an Internet Service Provider (ISP) 426. ISP 426 in turn provides data communication services through the world-wide packet data communication network now commonly referred to as the“Internet” 428. Local network 422 and Internet 428 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 420 and through communication interface 418, which carry the digital data to and from computer system 400, are example forms of transmission media.

[0096] Computer system 400 can send messages and receive data, including program code, through the network(s), network link 420 and communication interface 418. In the Internet example, a server 430 might transmit a requested code for an application program through Internet 428, ISP 426, local network 422 and communication interface 418.

[0097] The received code may be executed by processor 404 as it is received, and/or stored in storage device 410, or other non-volatile storage for later execution.

Claims

CLAIMS hat is claimed is:

1. A system for estimating biomass of freely swimming fish in a net pen in an aquaculture environment, the system comprising:

a stereo camera for immersion underwater in the net pen and for capturing stereo images of freely swimming fish in the net pen; and

an image processing system of, or operatively coupled to, the stereo camera, the image processing system comprising one or more processors, storage media, and one or more programs stored in the storage media and configured for execution by the one or more processors, the one or more programs comprising instructions configured for:

based on obtaining and storing a pair of digital images captured by, or derived from digital images captured by, the stereo camera, using a trained convolutional neural network to classify an image region of a digital image, of the pair of digital images, as containing an image of a fish;

cropping the digital image to the image region to generate a cropped image;

identifying a plurality of landmark points on the fish within the cropped image;

determining a plurality of disparity map values corresponding to at least the plurality of landmark points;

based on the plurality of disparity map values, calculating one or more morphological body dimensions of the fish; and

computing a weight of the fish based on the one or more morphological body dimensions.

2. The system of claim 1, wherein the one or more programs comprise instructions configured for:

computing the weight of the fish by using one or more regression

equations for predicting the weight of the fish based on the one or more morphological body dimensions.

3. The system of claim 1, wherein the one or more morphological body dimensions of the fish calculated based on the plurality of disparity map values comprises at least one of the following morphological body dimensions of the

fish:

a posterior most part of an eye,

a posterior point of a neurocranium,

an origin of a pectoral fin,

an origin of a dorsal fin,

an origin of a pelvic fin,

a posterior end of a dorsal fin,

an origin of an anal fin,

an origin of an adipose fin,

an anterior attachment of a caudal fin to a tail,

a posterior attachment of a caudal fin to a tail,

a base of a middle caudal rays,

a body depth at origin of a pectoral fin,

a body depth at origin of a dorsal fin,

a body depth at end of a dorsal fin,

a body depth at origin of an anal fin,

a least depth of a caudal peduncle,

a post-orbital body length, and

a standard body length.

4. The system of Claim 1, wherein the one or more programs comprise

instructions configured for:

training the convolutional neural network based on representative images of fish that each provide a lateral view of a fish.

5. The system of Claim 4, wherein the representative images are synthetic images generated using a computer graphics application.

6. The system of Claim 1, wherein the one or more programs comprise instructions configured for:

using a trained convolutional neural network to identify the plurality of

landmark points on the fish within the image region.

7. The system of Claim 1, wherein the one or more programs comprise instructions configured for: based on the plurality of disparity map values, determining a plurality of three-dimensional world coordinates for the plurality of landmark points; and

calculating the one or more morphological body dimensions of the fish based on the plurality of three-dimensional world coordinates.

8. A method comprising:

based on obtaining and storing a pair of digital images captured by, or derived from digital images captured by, a stereo camera, using a trained convolutional neural network to classify an image region of a digital image, of the pair of digital images, as containing an image of a fish;

cropping the digital image to the image region to generate a cropped image;

based on the plurality of disparity map values, calculating one or more morphological body dimensions of the fish;

computing a weight of the fish based on the one or more morphological body dimensions;

wherein the stereo camera is immersed underwater in a net pen and captures images of freely swimming fish in the net pen; and wherein the method is performed by an image processing system of, or operatively coupled to, the stereo camera, the image processing system comprising one or more processors, storage media, and one or more programs stored in the storage media and configured for execution by the one or more processors, the one or more programs comprising instructions executed by the one or more processors to perform the method.

9. The method of claim 8, further comprising:

computing the weight of the fish by using one or more regression

10. The method of claim 8, wherein the one or more morphological body dimensions of the fish calculated based on the plurality of disparity map values comprises at least one of the following morphological body dimensions of the fish:

a posterior most part of an eye,

a posterior point of a neurocranium,

an origin of a pectoral fin,

an origin of a dorsal fin,

an origin of a pelvic fin,

a posterior end of a dorsal fin,

an origin of an anal fin,

an origin of an adipose fin,

an anterior attachment of a caudal fin to a tail,

a posterior attachment of a caudal fin to a tail,

a base of a middle caudal rays,

a body depth at origin of a pectoral fin,

a body depth at origin of a dorsal fin,

a body depth at end of a dorsal fin,

a body depth at origin of an anal fin,

a least depth of a caudal peduncle,

a post-orbital body length, and

a standard body length.

11. The method of claim 8, further comprising:

training the convolutional neural network based on representative images

of fish that each provide a lateral view of a fish.

12. The method of claim 11, wherein the representative images are synthetic images generated using a computer graphics application.

13. The method of claim 8, further comprising:

using a trained convolutional neural network to identify the plurality of

landmark points on the fish within the image region.

14. The method of Claim 8, further comprising:

based on the plurality of disparity map values, determining a plurality of

three-dimensional world coordinates for the plurality of landmark points; and calculating the one or more morphological body dimensions of the fish based on the plurality of three-dimensional world coordinates.

15. One or more non-transitory computer-readable media storing one or more programs comprising instructions configured to perform a method as recited in any one of Claims 8-14.