The explosion of visual data has brought considerable opportunities and unprecedented challenges to visual technology. The forum serves elites in academy and industry a platform to share the latest theoretical and technological developments in computer vision. It helps to explore the potential value and new development models in big visual data, which is essential for promoting theoretical innovation, industry development, and exchanges among different fields. The first forum was held at the Institute of Automation of the Chinese Academy of Sciences in Beijing, and the second forum was held at the Guangzhou Baiyun International Convention Center. Nearly 300 researchers from universities, institutes, and enterprises all over the country attended the event, receiving wide attention from all walks of life.
Title: Person Re-Identification in Images and Videos
Abstract: In recent years, as the coverage of surveillance camerashas significantly increased, a huge amount of video datais being produced and makes it impossible to manuallylocate and track specific targets of interest. Person Re-Identification (ReID) is thus proposed as an importantcomponent in an intelligent video surveillance system toaddress this problem. Person ReID refers to the procedure ofidentifying a probe person in a camera network by matchinghis/her images or video sequences.Despite decades of studies, person ReID is still far from being solved. This is mainly because pedestrian imagescould be taken in unconstrained environments, making theirappearances easily affected by many factors like detectionerrors, occlusion, pose variations, etc.
In the past 5 years, we have made many efforts to extract robust semantic attribute features, learn informative spatial-temporal features from pedestrian tracklets, build efficient person ReID oriented deep models, design training data transfer models, and index large-scale pedestrian images, etc. Thanks to those efforts, we have achieved significant progress in person ReID. This talk will share our latest works on training data transfer, pedestrian spatial-temporal feature learning, and robust visual feature extraction.
Biography: Shiliang Zhang received the Ph.D. degree in computer science from the Institute of ComputingTechnology, Chinese Academy of Sciences,in 2012. He was a Post-Doctoral Scientist withNEC Laboratories America and a Post-DoctoralResearch Fellow with The University of Texasat San Antonio. His research interest includes person re-identification, fine-grained visual categorization, and image retrieval. Over the span of last decade, Shiliang Zhang has contributed consistently to the three key issues in Multimedia Information Retrieval, i.e., visual feature extraction, image content understanding, and indexing model construction.
Shiliang Zhang has published over 50 journal and conference papers as the first or second author. More than 30 papers among them are published on top-tier Multimedia and Computer Vision journals and conferences like IEEE T-PAMI, IEEE T-IP, IEEE T-MM, ICCV, CVPR, ECCV, and ACM Multimedia. His works have received extensive citations and follow-up and have won many important awards including the 2016 First Prize of Technical Innovation by Ministry of Education of China, National 1000 Young Talents Program of China, Excellent Doctoral Dissertation Award by both Chinese Academy of Sciences and China Computer Federation, 2011 IEEE MMSP Conference Top 10% Paper Award, 2010 Microsoft Fellowship, NVidia Pioneering Research Award, NEC Labs America Spot Recognition Award, etc.
Title: Deep high-resolution representation learning for visual recognition
Abstract: Classification networks have been dominant in visual recognition, from image-level classification to region-level classification (object detection) and pixel-level classification (semantic segmentation, human pose estimation, and facial landmark detection). We argue that the classification network, formed by connecting high-to-low convolutions in series, is not a good choice for region-level and pixel-level classification because it only leads to rich low-resolution representations or poor high-resolution representations obtained with upsampling processes.
We propose a high-resolution network (HRNet). The HRNet maintains high-resolution representations by connecting high-to-low resolution convolutions in parallel and strengthens high-resolution representations by repeatedly performing multi-scale fusions across parallel convolutions. We demonstrate the effectives on pixel-level classification, region-level classification, and image-level classification. The HRNet turns out to be a strong repalcement of classification networks (e.g., ResNets, VGGNets) for visual recognition.
Biography: Jingdong Wang is a Senior Researcher with the Visual Computing Group, Microsoft Research, Beijing, China. His areas of current interest include CNN architecture design, human pose estimation, semantic segmentation, person re-identification, large-scale indexing, and salient object detection. He has authored one book and 100+ papers in top conferences and prestigious international journals in computer vision, multimedia, and machine learning. He authored a comprehensive survey on learning to hash in TPAMI. His paper was selected into the Best Paper Finalist at the ACM MM 2015. Dr. Wang is an Associate Editor of IEEE TPAMI, IEEE TCSVT and IEEE TMM. He was an Area Chair or a Senior Program Committee Member of top conferences, such as CVPR, ICCV, ECCV, AAAI, IJCAI, and ACM Multimedia. He is an ACM Distinguished Member and a Fellow of the IAPR. His homepage is https://jingdongwang2017.github.io
Title: Compression and Acceleration of Deep Learning Models
Abstract: Deep Convolutional Neural Networks (CNNs) have recently achieved great successes in many computer vision tasks. However, the high computational and storage requirements of deep CNNs, such as VGG or ResNet, make it difficult to apply them to real-time applications on mobile and embedded devices. Model compression techniques have been proposed to reduce both storage and computational costs and accelerate inference. In this talk, I will first give a review for model compression methods. Then I will introduce a novel automatic structured pruning framework, AutoSlim. Autoslim performs structured pruning in an automatic process; and leverages a simple yet effective heuristic search for hyper-parameters as well as an additional purification step for further weight reduction without any accuracy loss. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoSlim can achieve ultra-high compression rates. For example, AutoSlim outperforms the prior work on automatic model compression by up to 33× in terms of the pruning rate under the same accuracy.
Biography:Dr. Jian Tang is the Chief Scientist of Intelligent Control at Didi Chuxing and an IEEE Fellow. He leads the research and development on Embedded AI and Computer Vision. Before joining Didi, he was a Full Professor in the Department of Electrical Engineering and Computer Science at Syracuse University. He received his Ph.D degree in Computer Science from Arizona State University in 2006. His research interests lie in the areas of Machine Learning, IoT, Computer Vision, Wireless Networking, Big Data Systems and Cloud Computing. Dr. Tang has published over 140 papers in premier journals and conferences. He received an NSF CAREER award in 2009. He also received several Best Paper Awards, including the 2019 William R. Bennett Prize and the 2019 TCBD (Technical Committee on Big Data) Best Journal Paper Award from IEEE Communications Society (ComSoc), the 2016 Best Vehicular Electronics Paper Award from IEEE Vehicular Technology Society (VTS), and Best Paper Awards from the 2014 IEEE International Conference on Communications (ICC) and the 2015 IEEE Global Communications Conference (Globecom) respectively. He has served as an editor for several IEEE journals, including IEEE Transactions on Big Data, IEEE Transactions on Mobile Computing, etc. In addition, he served as a TPC co-chair for a few international conferences, including the IEEE/ACM IWQoS’2019, MobiQuitous’2018, IEEE iThings’2015. etc.; as the TPC vice chair for the IEEE INFOCOM’2019; and as an area TPC chair for IEEE INFOCOM 2017-2018. He is also an IEEE VTS Distinguished Lecturer, and the Vice Chair of the Communications Switching and Routing Committee of IEEE ComSoc.
Title: Security AI at Alibaba
Abstract:In recent years, content security (multimedia risk detection), intellectual property protection and person risk identification have become more and more important in China. For this reason, Alibaba has proposed to solve these problems based on “Security AI” technology. Based onvisual big data and Security AI technology, this report will introduce Alibaba’s technical practices and achievements in content security, intellectual property protection and person risk identification. Content security detection includes porn detection, portrait rights, illegal advertisements, prohibited and restricted products, privacy information in images and videos.Intellectual property protection includes original protection of images and videos.Person risk identification includes reality person authentication, identity verification, anti-theft/anti-damage in retail stores, etc. These businesses involve many general AI technologies, such as big data mining, porn image detection, face recognition, OCR, object recognition, LOGO detection, video fingerprint, video/image retrieval as well as Security AI technologies, such as few-shot learning, adversarial learning, model security, model interpretability and so on.
Biography：Wang Yan(Ph.D)is a senior algorithm expert of Alibaba Group Security Department and a council memberof China Society of Image and Graphics（CSIG）, as well as an expert member of the first Chinese Artificial Intelligence-Multimedia Recognition Competition in 2019. He has drafted 2 international standards, 1Chinese standard and 4 industry standards, and applied for more than 40 patents. He has worked in Tsinghua University as a teacherand Samsung R&D Institute (China)as a researcher and joined Alibabain 2014, focused on person recognition and multimedia recognition more than 15 years, including face recognition, liveness recognition, ReID, character recognition (OCR/handwriting), speaker recognition, voice recognition and video/image retrieval. In Samsung, he won the Samsung Electronics R&D Award of “The Person of Samsung (oversea)” (individual) and “Samsung(China) R&D Gold Award”. Histechnologieshave been widely used in Samsung mobile phones, printers and Internet smart TV systems worldwide, and widely used in identity verification, multimedia content security detection, intellectual property protection, smart retail securitysuccessfully and got great business value in Alibaba.