{"id":"alexander-kolesnikov","title":"Alexander Kolesnikov","content":"**Alexander Kolesnikov** is an artificial intelligence researcher specializing in computer vision, deep representation learning, and transfer learning. He is noted for his contributions to influential models such as the Vision Transformer (ViT) and his work at major AI labs, including Google, OpenAI, and [Meta Superintelligence Labs](https://iq.wiki/wiki/meta-superintelligence-team).\n$$widget0 [YOUTUBE@VID](https://youtube.com/watch?v=nvY6F6GZpek)$$\n\n## Education\n\nKolesnikov pursued his doctoral studies at the Institute of Science and Technology (IST) Austria, where he was enrolled as a PhD student from 2013 to 2018. Under the supervision of Christoph H. Lampert, his research focused on computer vision, transfer learning, and deep representation learning, areas that have remained central to his subsequent career. [\\[1\\]](#cite-id-kTL2EFYsC8) [\\[9\\]](#cite-id-2hcl0OxyS8)\n\n## Career\n\nAfter completing his PhD in 2018, Kolesnikov joined Google as a researcher, working within its Google Brain and DeepMind divisions for approximately seven years. During this time, he was involved in the development of several significant projects in the field of computer vision. His work at Google included contributions to the Vision Transformer (ViT), MLP-Mixer, and the `big_vision` open-source codebase, which became a platform for large-scale vision research.\n\nIn December 2024, Kolesnikov announced his departure from Google to join OpenAI. He, along with colleagues Xiaohua Zhai and Andreas Giffoul, was tasked with establishing a new OpenAI office in Zurich, Switzerland.\n\nHis tenure at OpenAI was brief. In June 2025, it was reported that Meta Platforms had hired Kolesnikov, [Lucas Beyer](https://iq.wiki/wiki/lucas-beyer), and [Xiaohua Zhai](https://iq.wiki/wiki/xiaohua-zhai) from OpenAI's Zurich office. The team was recruited to join Meta's efforts in developing [Superintelligence](https://iq.wiki/wiki/meta-superintelligence-team). [\\[1\\]](#cite-id-kTL2EFYsC8) [\\[3\\]](#cite-id-l44oSczIA1) [\\[9\\]](#cite-id-2hcl0OxyS8) [\\[10\\]](#cite-id-pCqSnsLXha) [\\[11\\]](#cite-id-p3D8my6JPu)\n\n## Major Works\n\nKolesnikov has been a key author and contributor to numerous influential research papers and open-source projects that have advanced the field of computer vision and AI.\n\n### Vision Transformer (ViT)\n\nKolesnikov was part of the Google research team that developed the Vision Transformer (ViT), an architecture that applied the Transformer model, originally successful in natural language processing, to computer vision tasks. The ViT model processes images by splitting them into patches and treating them as a sequence, similar to how words are handled in a sentence. This approach demonstrated that a pure Transformer architecture could achieve state-of-the-art results on image classification tasks, challenging the long-standing dominance of convolutional neural networks (CNNs). In October 2020, Kolesnikov announced the public release of pre-trained ViT models and the corresponding code for fine-tuning and inference, which facilitated widespread adoption and further research by the AI community. [\\[4\\]](#cite-id-qLXgX5yrox)\n\n### MLP-Mixer\n\nIn May 2021, Kolesnikov was involved in the introduction of MLP-Mixer, a novel vision architecture based exclusively on multi-layer perceptrons (MLPs). The model, often referred to as \"Mixer,\" avoids the use of convolutions and self-attention mechanisms, which were standard in leading vision models at the time. Instead, it operates by repeatedly applying MLPs across either spatial locations (mixing per-location features) or feature channels (mixing per-patch features). The research demonstrated that complex, specialized architectural components were not strictly necessary to achieve strong performance on vision benchmarks. The code and pre-trained models for MLP-Mixer were also made publicly available. [\\[5\\]](#cite-id-4egFoJpLZB)\n\n### `big_vision` Codebase\n\nKolesnikov was a primary developer of `big_vision`, a Google research codebase designed for large-scale pre-training and transfer learning in computer vision. The repository served as the original development home for models like ViT, MLP-Mixer, and LiT (Locked-image Tuning). He announced its public release in May 2022, highlighting its utility for conducting research with an emphasis on training large models and evaluating their transfer capabilities across various downstream tasks. The codebase has been used to develop and release other models, including PaliGemma. [\\[6\\]](#cite-id-Yg61NiPXRL)\n\n### Vision-Language Models\n\nKolesnikov has contributed to the development of vision-language models (VLMs), which are designed to understand and process information from both images and text. In May 2024, he announced the release of PaliGemma-3B, a VLM based on Google's Gemma architecture. The model was made available through various platforms, including GitHub, Google Colab, Kaggle, [Hugging Face](https://iq.wiki/wiki/hugging-face), and Vertex AI, to encourage fine-tuning for specific applications. His work in this area also includes contributions to PaLI-3, another line of vision-language models. [\\[7\\]](#cite-id-P27jaPWTTW) [\\[1\\]](#cite-id-kTL2EFYsC8)\n\n### Reward-Based Model Tuning\n\nIn 2023, Kolesnikov co-authored research exploring the use of policy gradient methods, a technique from reinforcement learning (RL), to fine-tune computer vision models. The study, titled \"Tuning Computer Vision Models With Task Rewards,\" demonstrated that this approach could directly optimize for complex, non-differentiable metrics such as mean Average Precision (mAP) or Panoptic Quality (PQ). This method led to significant performance improvements on tasks like object detection and panoptic segmentation, offering an alternative to traditional loss-based training. [\\[8\\]](#cite-id-25quF3kUl7) [\\[10\\]](#cite-id-pCqSnsLXha)\n\n## Interviews\n\n### New Vision Architectures Beyond CNNs #01\n\nIn a presentation for the IARAI Research channel on October 4, 2021, Alexander Kolesnikov discussed alternative architectures to Convolutional Neural Networks (CNNs), which have been widely used in computer vision for nearly a decade.\n$$widget0 [YOUTUBE@VID](https://youtube.com/watch?v=kD3LqIFzzY8)$$\nHe outlined two models introduced in recent research: the Vision Transformer (ViT) and the MLP-Mixer. The Vision Transformer applies the Transformer framework, originally developed for natural language processing, to image analysis by dividing images into patches. This structure removes the locality constraint inherent to CNNs and enables global attention from the earliest layers.\n\nThe MLP-Mixer was presented as a simpler design, based solely on multilayer perceptron (MLP) layers. It alternates between mixing information across image patches and across channels, without using convolution or self-attention mechanisms. Despite its simplified structure, it achieved competitive results in several vision tasks.\n\nAccording to Kolesnikov, these models suggest that strict locality is not a necessary condition for effective vision architectures. He emphasized the role of large-scale pretraining, the adaptability of models such as ViT and MLP-Mixer, and the potential application of these approaches to tasks beyond image classification. He also noted that ongoing research continues to explore architectural design, regularization strategies, self-supervised learning, and extensions to tasks such as segmentation and detection. [\\[12\\]](#cite-id-5ByJ0rM4yQ)","summary":"Alexander Kolesnikov is an AI researcher specializing in computer vision, deep representation learning, and transfer learning. He has worked at Google Brain/DeepMind and OpenAI. In June 2025, he joined Meta's superintelligence team.","images":[{"id":"QmQwZSDQnyR4KWcaiipy8qTaPFUqEjMg8MmLUmpwoGWDdm","type":"image/jpeg, image/png"}],"categories":[{"id":"people","title":"people"}],"tags":[{"id":"PeopleInDeFi"},{"id":"AI"},{"id":"Developers"},{"id":"Organizations"},{"id":"Venture"}],"media":[{"id":"QmafBxbFJoXiM8NaSyJrKN4kwJTmBk8XCgT23fVxRofHbj","name":"47025656.jpeg","caption":"","thumbnail":"QmafBxbFJoXiM8NaSyJrKN4kwJTmBk8XCgT23fVxRofHbj","source":"IPFS_IMG"},{"id":"QmU2yCfjii9QzgtTCsvwJs5uwrxudguuBx49Ry3QjQkCQ6","name":"1600967876294.jpeg","caption":"","thumbnail":"QmU2yCfjii9QzgtTCsvwJs5uwrxudguuBx49Ry3QjQkCQ6","source":"IPFS_IMG"},{"id":"QmU9YPHXaBz5fTTbRYyeN6Nqu2AwnzNnN7By3pq3Uhpbt9","name":"1687770905500.jpeg","caption":"","thumbnail":"QmU9YPHXaBz5fTTbRYyeN6Nqu2AwnzNnN7By3pq3Uhpbt9","source":"IPFS_IMG"},{"id":"QmbE5nvYFVYsygS5s3Ww3sh4Gk3igbsxnMuir44eUSQ5d5","name":"Alexander-Kolesnikov-1.jpg","caption":"","thumbnail":"QmbE5nvYFVYsygS5s3Ww3sh4Gk3igbsxnMuir44eUSQ5d5","source":"IPFS_IMG"},{"id":"QmepY6ASYqcfNRUNGjW6Z44X4WhCPU1WD2iYbtb8ZszNFM","name":"citations.jpeg","caption":"","thumbnail":"QmepY6ASYqcfNRUNGjW6Z44X4WhCPU1WD2iYbtb8ZszNFM","source":"IPFS_IMG"},{"id":"https://www.youtube.com/watch?v=nvY6F6GZpek","name":"nvY6F6GZpek","caption":"","thumbnail":"https://www.youtube.com/watch?v=nvY6F6GZpek","source":"YOUTUBE"},{"id":"https://www.youtube.com/watch?v=kD3LqIFzzY8","name":"kD3LqIFzzY8","caption":"","thumbnail":"https://www.youtube.com/watch?v=kD3LqIFzzY8","source":"YOUTUBE"}],"metadata":[{"id":"references","value":"[{\"id\":\"kTL2EFYsC8\",\"url\":\"https://openreview.net/profile?id=~Alexander\\\\_Kolesnikov2\",\"description\":\"Alexander Kolesnikov's professional profile on OpenReview\",\"timestamp\":1756113108249},{\"id\":\"bue2VxJn36\",\"url\":\"https://x.com/**kolesnikov**\",\"description\":\"Alexander Kolesnikov's professional X profile\",\"timestamp\":1756113108249},{\"id\":\"l44oSczIA1\",\"url\":\"https://www.wsj.com/tech/ai/meta-poaches-three-openai-researchers-eb55eea9\",\"description\":\"WSJ report on Meta hiring OpenAI researchers\",\"timestamp\":1756113108249},{\"id\":\"qLXgX5yrox\",\"url\":\"https://x.com/**kolesnikov**/status/1319602923001831425\",\"description\":\"Announcement of Vision Transformer code release\",\"timestamp\":1756113108249},{\"id\":\"4egFoJpLZB\",\"url\":\"https://x.com/**kolesnikov**/status/1390006566796107777\",\"description\":\"Announcement of MLP-Mixer code and model release\",\"timestamp\":1756113108249},{\"id\":\"Yg61NiPXRL\",\"url\":\"https://x.com/**kolesnikov**/status/1521763706706747393\",\"description\":\"Announcement of the big\\\\_vision codebase release\",\"timestamp\":1756113108249},{\"id\":\"P27jaPWTTW\",\"url\":\"https://x.com/**kolesnikov**/status/1790464234330972239\",\"description\":\"Announcement of PaliGemma-3B release\",\"timestamp\":1756113108249},{\"id\":\"25quF3kUl7\",\"url\":\"https://x.com/**kolesnikov**/status/1626546150579879936\",\"description\":\"Announcement of research on tuning vision models with RL\",\"timestamp\":1756113108249},{\"id\":\"2hcl0OxyS8\",\"description\":\"LinkedIn: Alexander Kolesnikov\\n\",\"timestamp\":1756113317219,\"url\":\"https://www.linkedin.com/in/alexaderkolesnikov/\"},{\"id\":\"pCqSnsLXha\",\"description\":\"Google Scholar: Alexander Kolesnikov\\n\",\"timestamp\":1756113454866,\"url\":\"https://scholar.google.com/citations?user=H9I0CVwAAAAJ\"},{\"id\":\"p3D8my6JPu\",\"description\":\"Meta Superintelligence: Alexander Kolesnikov\",\"timestamp\":1756113516166,\"url\":\"https://openreview.net/profile?id=~Alexander_Kolesnikov2\"},{\"id\":\"5ByJ0rM4yQ\",\"description\":\"New vision architectures beyond CNNs - Dr Alexander Kolesnikov\\n\",\"timestamp\":1756113784174,\"url\":\"https://www.youtube.com/watch?v=kD3LqIFzzY8\"}]"},{"id":"linkedin_profile","value":"https://www.linkedin.com/in/alexaderkolesnikov/"},{"id":"twitter_profile","value":"https://x.com/__kolesnikov__"},{"id":"github_profile","value":"https://github.com/akolesnikoff"},{"id":"previous_cid","value":"\"https://ipfs.everipedia.org/ipfs/QmZsQhbNLQo8es6CHKBAZFcvitG7ramLfmc3BSr2j1CHyL\""},{"id":"commit-message","value":"\"Republishing wiki\""},{"id":"previous_cid","value":"QmZsQhbNLQo8es6CHKBAZFcvitG7ramLfmc3BSr2j1CHyL"}],"events":[{"id":"ddefa415-6c98-455c-96a7-410d47e4a89c","date":"2013-01","title":"Began PhD at IST Austria","type":"DEFAULT","description":"Started his PhD studies at the Institute of Science and Technology (IST) Austria, focusing on computer vision and deep representation learning under advisor Christoph H. Lampert.","multiDateStart":null,"multiDateEnd":null},{"id":"aef2ee16-cdc1-4e6f-bbd1-9136efecb8e0","date":"2018-01","title":"Joined Google as a Researcher","type":"DEFAULT","description":"Began his career as a researcher at Google, working with the Google Brain and DeepMind teams. He contributed to key projects like Vision Transformer (ViT) and MLP-Mixer.","multiDateStart":null,"multiDateEnd":null},{"id":"60d4426d-a34e-41ed-9c09-82fcb160c3c9","date":"2024-12","title":"Joined OpenAI","type":"DEFAULT","description":"Joined OpenAI to co-establish its Zurich office alongside colleagues Xiaohua Zhai and Lucas Beyer, continuing his research in advanced AI models.","multiDateStart":null,"multiDateEnd":null},{"id":"2fc2031a-79f0-445a-a04d-8712f0adcacf","date":"2025-06","title":"Joined Meta's Superintelligence Team","type":"DEFAULT","description":"Was hired by Meta Platforms to join its superintelligence efforts, moving from OpenAI's Zurich office along with researchers Lucas Beyer and Xiaohua Zhai.","multiDateStart":null,"multiDateEnd":null}],"user":{"id":"0x8af7a19a26d8fbc48defb35aefb15ec8c407f889"},"author":{"id":"0x8af7a19a26d8fbc48defb35aefb15ec8c407f889"},"language":"en","version":1,"linkedWikis":{"blockchains":[],"founders":[],"speakers":[]}}