Abstract
This paper details common algorithms, some processes, and application of research, which make up scene understanding, a subgenre of computer science. Scene understanding focuses on semantic segmentation and object recognition. The objective of scene understanding involves making a computer to understand images the same way human brains do. As a result, image understanding by computers allows it to perform certain autonomous activities such as self-driving. This paper dwells into the explanation of how semantic segmentation will enable computers to differentiate and understand images. It touches on the computations involved in semantic segmentation which performs image selection and differentiation. Further, this paper tackles the process of indoor-outdoor scene classification, to understand how computers differentiate and classify scenes.
Keywords
Semantic segmentation, convolution neural network, indoor-outdoor classification, computer vision
Introduction
Computer vision is a Computer Science field involving Artificial Intelligence. It gives computers a visual understanding of the world by enabling computers to visualize, ascertain and process images similar to humans. From processing the images, computer vision enables computers to provide appropriate feedback by interacting with the images. In reality though, the computer vision concept of imparting human knowledge to computers is a difficult task. Scene understanding and image processing is a challenging computer science field. The challenge comes from wanting to program computers to think like humans. For instance, autonomous driving is a human task, and it involves numerous reactions to different stimuli. Getting a computer to emulate human behaviour would then involve numerous programming to enable it react to different stimuli differently.
The goal of computer visioning is not only to see images but also to process those images and provide useful results based on the observations. For instance, computer visioning allows computers to transform 2D images into 3D ones for interactions. In the case of autonomous driving, for example, a self-driving car needs to know when to stop, when to go, where the road is and the pedestrians walking along the road. As a scientific field, computer vision is involved in the principle and expertise of creating artificial schemes, which will get info from multi-faceted data.
History of Computer Vision
Computer vision was founded in 1960 when Larry Roberts attempted to extract 3D geometrical info from 2D prospect view of blocks. Ever since, scientists studying Artificial Intelligence have followed his work while studying computer vision in the perspective of blocks. The perspective of block world was then considered low level. Researchers needed to conduct more research involving the low level vision tasks to process pictures from the reality. More studies in the discipline involved edge detection and segmentation. In 1978, David Marr engaged in a bottom-up tactic that gave rise to scene understanding. Before scene understanding, low level picture handling involved image processing procedures applied to 2D pictures to obtain original sketches. During the process, directed segments produced 2.5D sketches of the processed scenes using binocular stereo. With the emergence of high-level techniques, processes such as structural analysis were discovered. Structural analysis techniques-also known as a priori understanding-were used to obtain 3D model depictions of articles. High-level techniques are considered perhaps the most instrumental work ever done in computer vision.
Nonetheless, more recent research discovered limitations associated with scene understanding. The program is challenging to conduct. More importantly, though, researchers agree that getting complete 3D object models is not necessary. For instance, in autonomous driving, it is only necessary to identify whether an object is moving towards or away from the car. Exact 3D motions of the object might not be essential for navigation. This new model is occasionally called Purposive Vision, suggesting that the processes are more goal-driven and could be qualitative in most cases. From the history of computer vision, the trend has involved merging computer vision alongside other closely associated fields such as image processing and photogrammetry. Image processing involves processing of raw graphics before additional examination while photogrammetry involves attuning cameras used in imaging.
By intending to emulate human vision, computer vision faces a lot of challenges since the human visual technique proves too advanced for numerous tasks such as face identification. By comparison, therefore, computer vision has fallen short of solving research problems satisfactorily. Human vision can identify faces under all sorts of variation: modifications in viewpoint, illumination, and appearance. One of the strengths of human vision is that the human brain has a limitless space to store face images. Researchers still try to figure out how to represent and distill the extensive human understanding in computers in a way that it can be easy to retrieve. Also, researchers are yet to figure out how to carry out numerous computations required in tasks-such as face identification-such that they can be performed real-time.
SEMANTIC SEGMENTATION
Semantic segmentation involves dividing a digital picture into various segments. Each segment includes a set of pixels recognized as super-pixels. Segmentation is used to streamline representation of an image. It simplifies image representation and gives it more meaning that is easier to analyze and understand. It allows identification of boundaries between objects by identifying lines and curves in the image. After defining the boundaries, pixels that share similar characteristics get assigned the same label to form a segment. As a result, the image will then consist of sections that would collectively cover the entire picture. The classification of pixels depends on calculated property like texture, or color intensity. The rankings then differentiate segments from each other by color, texture or intensity. After segmentation, the resulting contours can be used to make 3D reconstructions through interpolation algorithms.
TYPES OF SEMANTIC SEGMENTATION
THRESHOLDING
Thresholding is the simplest method of semantic segmentation. It depends on a threshold value that turns gray-scale imaging into a binary picture. The key in the process involves selecting the right threshold since thresholds that can be chosen automatically make robust segmentation. Automatically selecting a threshold depends on knowledge about the objects: the application and environment. Other characteristics that can help in automatic threshold selection include intensity characteristic of the object, sizes of the objects, fraction of the image occupied by the object and the number of different objects appearing in the picture.
CLUSTERING METHOD
Clustering involves forming objects into clusters. Objects that are alike between them are grouped but separated from those that are dissimilar. As a result, clusters are formed. It is deemed the most significant unsupervised discovery problem as it deals with discovering an arrangement in a group of data without labels. K-means grouping is the simplest algorithm among all clustering methods. It follows a simple procedure of classifying given sets of data through a given number of clusters. Through using a squared error function, the algorithm minimizes the objective function. The algorithm involves the following steps: first, placing k points in a space denoted by objects to be grouped, with points signifying initial cluster centroids. Secondly, each object gets assigned to a cluster with the closest centroid. Lastly, when all objects are allocated, the position of k centroids is recalculated. The procedure is then repeated until each centroid cannot move further. It produces partition of objects into clusters from which the metric to be minimalized can be computed.
MOTION AND INTERACTIVE SEGMENTATION
Motion segmentation denotes clustering pixels that undergo a familiar motion. The result segments a picture into shifting objects. It is a forcible cue for image comprehension, and scene examination. In semantic segmentation, interpreting succession movements is as essential as identifying texture or color. Motion segmentation is mostly used in video indexing and tracking. It allows images to be divided into foreground and background objects that can be used to perform impartial movement analysis. Another perception in motion segmentation involves video compression. It includes numerous encoding ideals such as MPEG, which represents an image as a succession of objects in a chain of layers. In such encoding, the object needs to be identified first before getting encoded.
COMPRESSION-BASED METHODS
Compression-based models imply that prime segmentation minimizes the length of data over all likely segmentations. The connection uses the concept segmentation uses patterns in images to compress them. Each segment in the image will be described by its color, texture and boundary shape. A probability function models each component in compression-based segment. While computing the coding length, the method assumes that encoding at boundaries influences the fact that areas in natural images have level contours. Therefore, while encoding, the softer the boundary is, the shorter the coding length becomes. The method also assumes that texture is programmed by lossy compression, same to the principle of minimum description length.
HISTOGRAM-BASED METHODS
Histogram-based segmentation methods only require one pass through pixels. It is considered the most efficient type of segmentation compared to other semantic segmentation procedures. In this method, all of the pixels in the images are used to compute a histogram. The clusters are then located using the peaks and valleys in the histogram. The measure is then picked from the colors and intensity. The method can be used to divide clusters in the image into smaller groups to give a refined technique. The procedure is then repeated with each smaller cluster until no more groups are formed. Identification of peaks and valleys can sometimes be hard, though, making the method hard to use.
EDGE DETECTION
Edge detection is known as one of the well-developed fields in image processing. It is mostly used as a base of segmentation technique, with region boundaries and edges closely related. Region boundaries are often determined by the sharp adjustment in color intensity in those areas. During segmentation, the determined edge gets disconnected. Since object segmentation requires closed regions, desired edges will form the boundaries between the spatial-taxons. Spatial-taxons are granules of information consisting of definite pixel regions. The regions are positioned at the levels of abstraction within ranked nested scene architecture. The areas are identical to Gestalt psychological description of the figure-ground although its extension also includes the foreground and other objects.
DUAL CLUSTERING METHOD
Dual clustering involves three image characteristics: image partition founded on a histogram investigation, separations proved by high cluster density and high border gradients. As a result, two spaces have to be introduced: one-dimensional histogram of brightness and the double 3-dimensional room of primal image. The first space allows for measurement of how compact the intensity of the distributed image is. It is calculated by minimal clustering. Threshold brightness defines binary-black and white-image. The goal in dual clustering is to find objects with definitive borders
REGION-GROWING METHOD
In the region-growing method, the supposition is that neighbori...
Cite this page
If Computers Can Understand Images the Same Way Human Brains Do Paper Example. (2022, Aug 23). Retrieved from https://proessays.net/essays/if-computers-can-understand-images-the-same-way-human-brains-do-paper-example
If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal:
- Why Leaking Classified Information Is Wrong
- Paper Example on Paranoia and Network Collectivity
- Essay Sample on Nanotechnology
- Essay Sample on the Benefits of Nuclear Power Plants Outweigh the Potential Harmful Applications
- Cybersecurity Journal - Essay Sample
- Paper Example on Company Transitions to Big Data: Reducing Costs With Hadoop
- Article Analysis Essay on the Potential Role for Smartphones Among Older Adults With Chronic Non-cancer Pain