DSpace
 

DSTO Publications Online >
DSTO Publications Online Repository >
DSTO Formal Reports >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1947/9122

Title: Automatic extraction of 3D models from an airborne video sequence.
Report number: DSTO-TR-2095
AR number: AR-014-090
Classification: Unclassified
Report type: Technical Report
Authors: Cooke, T.
Issue Date: 2008-01
Division: Intelligence, Surveillance and Reconnaissance Division
Abbreviation: ISRD
Release authority: Chief, Intelligence, Surveillance and Reconnaissance Division
Task sponsor: ASC (DIGO)
Task number: INT 04/028
Pages or format: 54
References: 40
DSTORL/DEFTEST terms: Airborne equipment
Air surveillance
Algorithms
Image recognition
Geolocation
Error analysis
Abstract: One method for accurately georegistering a video sequence from an airborne platform is to transform the video to the same coordinate system as some reference imagery that is already georeferenced. This transformation will be dependent upon the 3D structure within the scene, which is not known a priori. The current report examines several aspects of the construction of a 3D model from a video sequence, which may then be used for registration. The topics examined include: extraction of useful features (points, lines, or planes) from the images, determination of a sparse 3D model and camera motion model in cases where data may be missing, a method for estimating the depth at every pixel within a video frame, and finally an analysis of the errors at each step of the model construction process.
Executive summary: Information from airborne video can only be exploited when the scene imaged by the sensor can be associated with a point on the ground. While the video stream may contain embedded metadata, providing information such as platform position and camera orientation, this is sufficient to give only a crude geolocation for the imagery. More precise positioning can be obtained by matching the video imagery against an image (such as from an aerial survey) which has already been accurately georeferenced. Automatic matching is hampered due to the fact that the video and the image were likely taken from different camera positions. Most existing image registration algoritlms assume that the two images to be registered differ by a simple linear, quadratic, or perspective transformation. Images taken from different perspectives will also have a dependence on the depth of the scene, which is not easily pararneterised. One method for dealing with this is to automatically construct a 3D model from the video data, and then reproject it as it would be seen from point of view of the reference image. This would leave only a simple translation between the images. The current report dcscribes new and existing methods which may be used for the automatic extraction of a 3D model from airborne video imagery. An overview of the system, which uses such a model for the georegistration of video imagery, is the subject of a separate report [8]. There are three key steps to the construction of a 3D model, as described in this report. The first is the detection and tracking of features through the video sequence. Corner detectors have mostly been described previously in DSTO-TR-1759, so this report has some additional information on a few corner detectors not previously tested, and a short section on line detection. The second step in constructing a 3D model is to estimate the camera pose for each frame and the 3D positions of each of the tracked features. When all points are successfully tracked over all frames, the factorisation method can be used to estimate all of the required parameters. This report considers several methods for dealing with the more realistic case where large amounts of data are missing. The most successful of these seems to use a robust modification of Tomasi and Kanade's hallucination algorithm, followed by iterations of Shum's method. A factorisation method based on tracked lines, instead of tracked points, is also mentioned. The output of the second step is a sparse 3D model. To successfully model the entire scene, the depth if the scene at each image pixel is required. This report considers several approaches including segmenting the corner points into facets, and stereo dense matching algorithms. Dense matching using graph-cuts was by far the most successful of the tested techniqucs and produces a model which is expected to be useful for any subsequent 2D image registration step. Following the descriptions of how to obtain a 3D mode1 from a video sequence, this report also describes a framework for quantifying the errors in the construction of the model. Simulations of this framework using real airborne video imagery indicate that the resulting error estimates will be similar to the actual errors.
Appears in Collections:DSTO Formal Reports

Files in This Item:

File Description SizeFormat
DSTO-TR-2095 PR.pdf6.34 MBAdobe PDFView/Open

Items in DSTO Publications Online are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2008  The DSpace Foundation - Feedback