Senior Research Scientiest
State Key Laboratory of ASIC
Chinese Academy of Sciences, Beijing, China, 100190

Visiting Scholar
Computer Science and Engineering
University of California, San Diego 92093

email: shawnless.xie@gmail.com

linkedin: https://cn.linkedin.com/in/shawnless

Bio

I got my bachelor degree in 2004, and master degree in 2006 at Zhejiang University, China, majoring in electrical engineering; Then I got my PhD degree majoring in computer science at 2009, Chinese Academy of Sciences, China.

I'm a research scientist in Chinese Academy of Sciences since graduation. My research focuses on innovative digital signal microprocessor architecture exploration. I am specialized in VLIW processor micro architecture and digital VLSI design & timing optimization, and have rich engineering experience in full flow chip tape out.

Currently, I am very interested in open source processor architecture(RISC-V), configurable computing(CGRA) and Hardware Accelerator( Deep Learning, Data Center, IoT).

Experience
Visiting Scholar
July 2016 -- Present
School of Computer Science and Engineering, UCSD, San Diego, California
  • Work with Michael Taylor , the author of MIT RAW processor and the HPCA 2015 chair
  • Focus on manycore and compiler directed accelerator architecture
Research Scientiest
Jan 2010 -- July 2016
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Project: MaPU -- Mathematical Computing Processor
  • SoC chip with 1 ARM core and 4 novel MaPU cores running at 1GHz, targeting at computation intensive applications
  • Supported by the Strategic Priority Research Program of Chinese Academy of Sciences with 50 million CNY
  • Successfully taped out with 40nm process. Power efficiency is about 10x of traditional CPU and GPU
  • Results was published at top-tier computer science conference (HPCA)
Project Responsibility:
  • Proposed and defined the MaPU Instruction Set Architecture and the detailed micro-architecture
  • Took charge of full processor design flow, including architecture modelling, micro-architecture design, digital circuit implementation and processor tool chain development
Project Related Skills:
  • Architecture modelling: Both architecture description language LISA2.0 (Acquired by Synopsys now) and SystemC were used for architecture modelling at the early stage. Gem5 was used for final cycle accurate simulator.
  • Micro-architecture design: instruction fetch & decoding & dispatch, pipeline control, arithmetic unit, memory system, bus matrix, DMA etc
  • Benchmark optimization: Algorithm level and VLIW assembling level optimization, fixed point & floating point, FFT, Matrix Mul, FIR, Table lookup etc
  • Tool chain construction for innovative architecture: Took charge of simulator, linker, assembler & disassembler
  • Verification Environment: Functional verification plan, systemverilog verification library, automatic run-check testcases(3000+), regression scripts
  • STA & circuit optimization: Timing constrains specification, static timing analysis and debug, micro-architecture & circuit & layout optimization (500MHz to 1GHz Frequency improvement)
  • Power Estimation & Analysis: Switching activity setup, library setup, power group definition. Estimated Power & Real Chip Power difference < 8%
more>>
Publications
  • Celerity: An Open Source RISC-V Tiered Accelerator Fabric
    T. Ajayi, K. Al-Hawaj et al. (hotchips'17). (PDF )
  • MaPU: A novel mathematical computing architecture
    D. Wang, S. Xie , et al.IEEE International Symposium on High Performance Computer Architecture (HPCA'16). IEEE, 2016. (PDF, PPT)
US Patents:
  • S. Xie, D. Wang, J. Hao, T. Wang, and L. Yin, " Parallel bit reversal devices and methods. " U.S. Patent No. 9,268,744. Issued Feb 23, 2016. Avaliable: Google Patent .
  • D. Wang, T. Wang, S. Xie, J.Hao, and L. Yin, " Methods and devices for multi-granularity parallel fft butterfly computation,"e U.S. Patent No. 9,262,378, Issued Feb 16, 2016. Avaliable: Google Patent
  • D. Wang, S. Xie , J. Hao, X. Lin, T. Wang, and L. Yin, " Multi-granularity parallel fft computation device ," U.S. Patent No. 9,176,929. Issued Nov 3, 2015. Avaliable: Google Patent
  • D. Wang, S. Xie, X. Xue, Z. Liu, and Z. Zhang, " Multi-granularity parallel storage system and storage ," U.S. Patent No. 9,146,696. Issued Sep 29, 2015. Avaliable: Google Patent
  • D Wang, Z Liu, X Xue, X Zhang, Z Zhang, S Xie , " Multi-granularity parallel storage system ," U.S. Patent No. 9,171,593, Issued Oct 27, 2015. Avaliable: Google Patent
  • D Wang, S Xie , Y Yang, L Yin, L Wang, Z Liu, T Wang,et al., " Processor with Polymorphic Instruction Set Architecture , " U.S. Patent App. 14/785,385. Published Jun 09, 2016. Avaliable: Google Patent
  • S. Xie , X. Lin, J. Hao, X. Xue, T. Wang et al., " Data access method and device for parallel fft calculation " U.S. Patent App. 14/117,375. Published Jul 4, 2013. Avaliable Google Patent
Selected (total 17) China Patents: (Issued)
  • D. Wang, S. Xie, X. Xue, Z. Liu, and Z. Zhang, " Multi-granularity parallel storage system and storage .&Quot CHN. Patent, Patent No. CN201110460585.1, issued Feb 4, 2015. Avaliable: Goole Patent
  • D. Wang, S. Xie, Z. Yin, X. Lin, Z. Zhang, H. Yan, J. Xue, " Method for generating vector processing instruction set architecture in high performance computing system. " CHN. Patent, Patent No. CN201010162391.9, issued May 8, 2013. Avaliable: Google Patent
  • D. Wang, S. Xie, Z. Yin, X. Lin, Z. Zhang, H. Yan, J. Xue, "e Parallel vector processing engine structure ."e CHN. Patent, Patent No. CN201010162350.X, issued Feb 13, 2013. Avaliable: Google Patent
  • D. Wang, J. Hao, S. Xie, X. Du, X. Lin, " Heterogeneous multi-core processor of two-stage computing architecture . " CHN. Patent, Patent No. CN201110435859.1, Issued Sep 17. 2014. Avaliable: Google Patent
  • D. Wang, Z. Liu, X. Zhang, S. Xie, " Multi-dimensional DMA (direct memory access) transmitting device and method ." CHN. Patent, Patent No. CN201110449966.X, issued Sep 17, 2014. Avaliable: Google Patent
more>>