AOMedia Video 1 (AV1), is an open, royalty-free video coding format designed for video transmissions over the Internet. It is being developed by the Alliance for Open Media (AOMedia), a consortium of firms from the semiconductor industry, video on demand providers, and web browser developers, founded in 2015.
AV1 is meant to succeed its predecessor VP9 and compete with HEVC/H.265 from the Moving Picture Experts Group. It is the primary contender for standardization by the video standard working group NetVC of the Internet Engineering Task Force (IETF). The group has put together a list of criteria to be met by the new video standard.
AV1 will be able to be used together with the audio format Opus in a future version of the WebM container format for HTML5 web video and WebRTC.
Video AV1
History
The first official announcement of the project came with the press release on the formation of the Alliance on 1 September 2015. The increased usage of its predecessor VP9 is attributed to confidence in the Alliance and development of AV1 as well as the pricey and complicated licensing situation of HEVC (High Efficiency Video Coding).
The roots of the project precede the Alliance, however. Individual contributors started experimental technology platforms years before: Xiph's/Mozilla's Daala already published code in 2010, VP10 was announced on 12 September 2014, and Cisco's Thor was published on 11 August 2015. The first version 0.1.0 of the AV1 reference codec was published on 7 April 2016.
Soft feature freeze was at the end of October 2017, but a few significant features were decided to continue developing beyond this. The bitstream format was projected to be frozen in January 2018; however, this was delayed due to unresolved critical bugs as well as last changes to transformations, syntax, the prediction of motion vectors, and the completion of legal analysis. The Alliance announced the release of the AV1 bitstream specification on 28 March 2018, along with a reference, software-based encoder and decoder. However, as of 29 March 2018, the specification is still being edited, and is marked "draft" until editing finishes.
Martin Smole from AOM member Bitmovin admits that the computational efficiency of the reference encoder is the greatest remaining challenge after the bitstream format freeze. While still working on the format, the encoder was not targeted for productive use and didn't receive any speed optimizations. Therefore, it works orders of magnitude slower than e.g. existing HEVC encoders, and development is planned to shift its focus towards maturing the reference encoder after the freeze.
Maps AV1
Purpose
AV1 aims to be a video format for the web that is both state of the art and royalty free. The mission of the Alliance for Open Media remains the same as the mission of the WebM project.
To fulfill the goal of being royalty free, the development process is such that no feature is adopted before it has been independently double checked that it does not infringe on patents of competing companies. This contrasts to its main competitor HEVC, for which IPR review was not part of the standardization process. The latter practice is stipulated in ITU-T's definition of an open standard. The case of HEVC's independent patent pools has been characterized by critical observers as a failure of price management.
Under patent rules adopted from the World Wide Web Consortium (W3C), technology contributors license their AV1-connected patents to anyone, anywhere, anytime based on reciprocity, i.e. as long as the user does not engage in patent litigation. As a defensive condition, anyone engaging in patent litigation loses the right to the patents of all patent holders.
The performance goals include "a step up from VP9 and HEVC" in efficiency for a low increase in complexity. NETVC's efficiency goal is 25% improvement over HEVC. The primary complexity concern is for software decoding, since hardware support will take time to reach users. However, for WebRTC, live encoding performance is also relevant, which is Cisco's agenda: Cisco is a manufacturer of videoconferencing equipment, and their Thor contributions aim at "reasonable compression at only moderate complexity".
Feature wise, it is specifically designed for real-time applications (especially WebRTC) and higher resolutions (wider color gamuts, higher frame rates, UHD) than typical usage scenarios of the current generation (H.264) of video formats where it is expected to achieve its biggest efficiency gains. It is therefore planned to support the color space from ITU-R Recommendation BT.2020 and 10 and 12 bits of precision per color component. AV1 is primarily intended for lossy encoding, although lossless compression is supported as well.
AV1-based containers have also been proposed as a replacement for JPEG, similar to Better Portable Graphics and High Efficiency Image File Format which wrap HEVC.
Technology
AV1 is a traditional block-based frequency transform format featuring new techniques, several of which were developed in experimental formats that have been testing technology for a next-generation format after HEVC and VP9. Based on Google's experimental VP9 evolution project VP10, AV1 incorporates additional techniques developed in Xiph's/Mozilla's Daala and Cisco's Thor.
The Alliance publishes a reference implementation written in C and assembly language (aomenc
, aomdec
) as free software under the terms of the BSD 2-Clause License. Development happens in public and is open for contributions, regardless of AOM membership.
The development process is such that coding tools are added to the reference codebase as experiments, controlled by flags that enable or disable them at build time, for review by other group members as well as specialized teams that help with and ensure hardware friendliness and compliance with intellectual property rights (TAPAS). Once the feature gains some support in the community, the experiment can be enabled by default, and ultimately have its flag removed when all of the reviews are passed. Experiment names are lowercased in the configure script and uppercased in conditional compilation flags.
Data transformation
To transform pixel data to the frequency domain, AV1 includes a range of specialized frequency transforms like rectangular versions of the DCT and asymmetric versions of the DST for edge blocks.
It can combine two one-dimensional transforms in order to use different transforms for the horizontal and the vertical dimension (ext_tx
).
Partitioning
Prediction can happen for bigger units (<=128×128), and they can be subpartitioned in more ways. "T-shaped" partitioning schemes for coding units are introduced, a feature developed for VP10. Two separate predictions can now be used on spatially different parts of a block using a smooth, wedge-shaped transition line (wedge-partitioned prediction). This enables more accurate separation of objects without the traditional staircase lines along the boundaries of square blocks.
More encoder parallelism is possible thanks to configurable prediction dependency between tile rows.
Prediction
AV1 performs internal processing in higher precision (10 or 12 bits per sample), which leads to compression improvement due to smaller rounding errors in reference imagery.
Predictions can be combined in more advanced ways (than a uniform average) in a block (compound prediction), including smooth and sharp transition gradients in different directions (wedge-partitioned prediction) as well as implicit masks that are based on the difference between the two predictors. This allows combination of either two inter predictions or an inter and an intra prediction to be used in the same block.
A frame can reference 6 instead of 3 of the 8 available frame buffers for temporal (inter) prediction.
The Warped Motion (warped_motion
) and Global Motion (global_motion
) tools in AV1 aim to reduce redundant information in motion vectors by recognizing patterns arising from camera motion. They implement ideas that were tried to be exploited in preceding formats like e.g. MPEG-4 ASP, albeit with a novel approach that works in three dimensions. There can be a set of warping parameters for a whole frame offered in the bitstream, or blocks can use a set of implicit local parameters that get computed based on surrounding blocks.
For intra prediction, there are 56 (instead of 8) angles for directional prediction and weighted filters for per-pixel extrapolation. The "TrueMotion" predictor got replaced with a Paeth predictor which looks at the difference from the known pixel in the above left corner to the pixel directly above and directly left of the new one and then chooses the one that lies in direction of the smaller gradient as predictor. A palette predictor is available for blocks with very few colors like in some computer screen content. Correlations between the luminosity and the color information can now be exploited with a predictor for chroma blocks that is based on samples from the luma plane (cfl
). In order to reduce discontinuities along borders of inter-predicted blocks, predictors can be overlapped and blended with those of neighbouring blocks (overlapped block motion compensation).
Quantization
AV1 has new optimized quantization matrices.
Filters
For the in-loop filtering step, the integration of Thor's constrained low-pass filter and Daala's directional deringing filter has been fruitful: The combined Constrained Directional Enhancement Filter (cdef
) exceeds the results of using the original filters separately or together. It is an edge-directed conditional replacement filter that smoothes blocks with configurable (signaled) strength roughly along the direction of the dominant edge to eliminate ringing artifacts.
There is also the loop restoration filter (loop_restoration
) to remove blur artifacts due to block processing.
Film grain synthesis (film_grain
) improves coding of noisy signals using a parametric video coding approach. Due to the randomness inherent to film grain noise, this signal component is traditionally either very expensive to code or prone get damaged or lost, possibly leaving serious coding artefacts as residue. This tool circumvents these problems using analysis and synthesis, replacing parts of the signal with a visually similar synthetic texture, based solely on subjective visual impression instead of objective similarity. It removes the grain component from the signal, analyzes its non-random characteristics, and instead transmits only descriptive parameters to the decoder, which adds back a synthetic, pseudorandom noise signal that's shaped after the original component.
Entropy coding
Daala's entropy coder (daala_ec
), a non-binary arithmetic coder, was selected for replacing VP9's binary entropy coder. The use of non-binary arithmetic coding helps evade patents, but also adds bit-level parallelism to an otherwise serial process, reducing clock rate demands on hardware implementations. This is to say that the effectiveness of modern binary arithmetic coding like CABAC is being approached using a greater alphabet than binary, hence greater speed, as in Huffman code (but not as simple and fast as Huffman code). AV1 also gained the ability to adapt the symbol probabilities in the arithmetic coder per coded symbol instead of per frame (ec_adapt
).
Former experiments that have been fully integrated
This list may or may not be complete.
Current experiments
Only explained experiments are listed.
Notable features not included
Daala Transforms implements discrete cosine and sine transforms that its authors describe as "better in every way" than the txmg
set of transforms that prevailed in AV1. Both the txmg
and daala_tx
experiments have merged high and low bitdepth code paths (unlike VP9), but daala_tx
achieved full embedding of smaller transforms within larger, as well as using fewer multiplies, which could have further reduced the cost of hardware implementations. The Daala transforms were kept as optional in the experimental codebase until late January 2018, but changing hardware blocks at a late stage was a general concern for delaying hardware availability.
The encoding complexity of Daala's Perceptual Vector Quantization was too much within the already complex framework of AV1. The Rate Distortion dist_8x8
heuristic aims to speed up the encoder by a sizable factor, PVQ or not, but PVQ was ultimately dropped.
ANS was the other non-binary arithmetic coder, developed in parallel with Daala's entropy coder. Of the two, Daala EC was the more hardware friendly, but ANS was the fastest to decode in software.
Quality and efficiency
A first comparison from the beginning of June 2016 found AV1 roughly on par with HEVC, as did one using code from late January 2017.
In April 2017, using the 8 enabled experimental features at the time (of 77 total), Bitmovin was able to demonstrate favorable objective metrics, as well as visual results, compared to HEVC on the Sintel and Tears of Steel animated films. A follow-up comparison by Jan Ozer of Streaming Media Magazine confirmed this, and concluded that "AV1 is at least as good as HEVC now".
Ozer noted that his and Bitmovin's results contradicted a comparison by Fraunhofer Institute for Telecommunications from late 2016 that had found AV1 38.4% less efficient than HEVC, underperforming even H.264/AVC, and justified this discrepancy by having used encoding parameters endorsed by each encoder vendor, as well as having more features in the newer AV1 encoder.
Tests from Netflix showed that, based on measurements with PSNR and VMAF at 720p, AV1 could be about 25% more efficient than VP9 (libvpx), at the expense of a 4-10 fold increase in encoding complexity. Similar conclusions with respect to quality were drawn from a test conducted by Moscow State University researchers, where VP9 was found to require 31% and HEVC 22% more bitrate than AV1 for the same level of quality. The researchers found that the used AV1 encoder was operating at a speed "2500-3500 times lower than competitors", while admitting that it has not been optimized yet.
AOMedia provides a list of test results on their website.
Adoption
Like its predecessor VP9, AV1 can be used inside WebM container files alongside the Opus audio format. These formats are well supported among web browsers, with the exception of Safari (only has Opus support) and the discontinued Internet Explorer (prior to Edge) (see VP9 in HTML5 video).
From November 2017 onwards, nightly builds of the Firefox web browser contained preliminary support for AV1. Upon its release on 9 February 2018, version 3.0.0 of the VLC media player shipped with an experimental AV1 decoder.
It is expected that Alliance members have interest in adopting the format, in respective ways, once the bitstream is frozen. The member companies represent several industries, including browser vendors (Apple, Google, Mozilla, Microsoft), content distributors (Apple, Amazon, Facebook, Google, Hulu, Netflix) and hardware designers (AMD, Apple, ARM, Broadcom, Intel, Nvidia). Video streaming service YouTube declared intent to transition to the new format as fast as possible, starting with highest resolutions within six months after the finalization of the bitstream format. Netflix "expects to be an early adopter of AV1".
According to Mukund Srinivasan, chief business officer of AOM member Ittiam, early hardware support will be dominated by software running on non-CPU hardware (such as GPGPU, DSP or shader programs, as is the case with some VP9 hardware implementations), as fixed-function hardware will take 12-18 months after bitstream freeze until chips are available, plus 6 months for products based on those chips to hit the market.
Software
- Firefox Nightly
- VLC media player (since 3.0)
- GStreamer (since 1.14)
References
External links
- Overview of the decoding process (not up to date)
- Bitstream specification
- Source code repository
- Source code review
- Issue tracker
- Requirements to be met for the IETF NetVC
Source of the article : Wikipedia