Using MIPS microAptiv UP Processor CorExtend UDI interface

CorExtend is a feature of MIPS32 microAptiv microprocessor which is presented in MIPSfpga project as a real industrial unobfuscated RTL. Sources of MIPSfpga can be downloaded after joining Imagination University Programme CorExtend allows system designers to define and add their own instructions that operate on data in the general-purpose registers in the same manner as standard MIPS instructions.

This post describes CorExtend or User-Defined Instructions (UDI) interface protocol. The interface allows to connect custom CorExtend UDI block directly to the MIPS32 microAptiv UP processor core.

This post can be also download as a pdf file by the link MIPS microAptiv UP Processor CorExtend UDI interface protocol guide.

Example project with simulation sources can be downloaded on GitHub The project is described further in this post.

Position of CorExtend in the top-level RTL hierarchy of a m14k microAptiv processor core is shown below.
CorExtend RTL Hierarchy
CorExtend RTL Hierarchy
All core signals at the m14k_cpu level, including CorExtend UDI, are listed in MIPS32 microAptiv UP Processor Core Family Integrators Guide (Table 2.3 Signal Descriptions for m14k cpu Level). In the table below only signals related to CorExtend UDI are presented.

Description of CorExtend signals connected to m14k cpu
Signal Name Type Description
UDI_ir_e[31:0] Out This is the complete instruction word. Although the module also gets rs and rt source operands, the full instruction is provided so all or part of the source register fields may be used to hold immediate values. Note that the implementer is responsible for decoding the Opcode and Function fields.
UDI_irvalid_e Out Indicates whether the value of the instruction word (UDI_ir_e) is valid or not.
UDI_rs_e[31:0] Out Source operand rs after the bypass mux.
UDI_rt_e[31:0] Out Source operand rt after the bypass mux.
UDI_endianb_e Out Indicates that this instruction is executing in Big Endian mode. This signal is generally not needed unless a) the UDI instruction works on sub-word data that is endian dependent, and b) the UDI block is designed to be bi-endian
UDI_kd_mode e Out Indicates that the instruction is executing in kernel or debug mode. This can be used to prevent certain UDI instructions from being executed in user mode.
UDI_kill_m Out Late arriving kill signal due to an exception generated by an earlier instruction. This signal may optionally be used to deassert the UDI_stall_m output for improved interrupt latency on multi-cycle UDIs whose results wont be used.
UDI_start_e Out This is the mpc_run_ie signal coming from the core pipeline control logic.
UDI_run_m Out This is the mpc_run_m signal used to qualify UDI_kill_m.
UDI_greset Out Reset signal to be used to reset any state machines.
UDI_gclk Out Clock input.
UDI_gscanenable Out Global scan enable.
UDI_ri_e In A one-bit signal which when high indicates that the SPECIAL2 instruction currently being executed is illegal (i.e., reserved). This signal is used by the Master Pipeline Control (MPC) block within the core to signal an illegal instruction, however, this signal is sampled by MPC only if the current instruction is within the SPECIAL2 range of user-defined instructions (bits [5:4] of the instruction are 2'b01).
UDI_rd_m[31:0] In The 32-bit result of the executed instruction available in the M stage.
UDI_wrreg_e[4:0] In Register to write the result from the execution of this user-defined instruction. This value is also passed on to mpc.
UDI_stall_m In Signals that the UDI block is processing a multicycle instruction and needs to stall the pipeline since the outputs need to be written into the register file. Should be set to 0 for single cycle instructions. This is an M stage signal.
UDI_present In Static signal that denotes whether any UDI support is available.
UDI_honor_cee In Indicates whether the core should honor the CorExtend Enable (CEE) bit contained in the Status register. When this signal is asserted, Status.CEE is deasserted, and a UDI operation is attempted, the core will take a CorExtend Unusable Exception.

In addition to signals connected to m14k cpu, custom CorExtend block has external signals (table below) with variable width propagated out of m14k top.

Description of external CorExtend signals
Signal Name Type Description
UDI_toudi[x-1:0] In Variable-width external input to a custom CorExtend block.
UDI_fromudi[x-1:0] Out Variable-width external output from a custom CorExtend block.

In order to implement custom CorExtend block, m14k_edp_buf misc and m14k_udi_stub should be modified. Input and output signals of m14k_edp_buf_misc should be connected to each other, for example, like this.

Actual custom CorExtend block should replace m14k_udi_stub. Example of interaction between CorExtend and the microAptiv UP core is presented on the waveform below.
CorExtend interface protocol waveform
CorExtend interface protocol waveform
The UDI_present signal must be tied high. UDI_honor_cee can be tied low; in case it is tied high, Status CEE bit must be asserted using mtc0 instruction before any attempt to execute a CorExtend instruction. Otherwise the CorExtend unusable exception will occur and UDI_kill_m will be set during two clock cycles on the next clock cycle after UDI_start_e is asserted.

Every instruction word being executed by the core arrives on UDI_ir_e[31:0] with UDI_irvalid_e signal. UDI_start_e indicates the execution stage of the microAptiv UP core pipeline. If instruction has RS and/or RT operands, they arrive correspondingly on UDI_rs_e[31:0] and UDI_rt_e[31:0] with the UDI_start_e signal.

Some parts of an instruction must be decoded on the same cycle with UDI_start_e arriving. It is crucial for forming UDI_ri_e, which must be asserted on the same cycle with UDI_start_e if the instruction is illegal. If the instruction has to write the result to the processor's general-purpose register, the address of RD must be presented on UDI_wrreg_e[4:0] on the same cycle with UDI_start_e. Other fields of the instruction may be registered and decoded later.

The signal UDI_wrreg_e[4:0] can address 31 processor's general-purpose registers; value 5'd0 means not writing to them.

The result of the UDI instruction to be written to the register file must be presented on UDI_rd_m[31:0] on the next cycle after UDI_start_e. In case it should be written later, UDI_stall_m must be asserted on the next clock cycle after UDI_start_e. UDI_stall_m must be deasserted in the clock cycle before the result is present on UDI_rd_m[31:0].

Figure 3 represents the UDI instruction format. Major opcode of UDI is included in special2 major opcodes and equals 6'd28. RS and RT fields address source operand registers. Bits 15..6 may be used for custom CorExtend block purposes. For example, the address of the destination register to write the result can be written there. Function field has bits 5..4 with a mandatory value of 2'b01 and bits 3..0 capable of encoding up to 16 UDI instructions.
UDI instruction
Implementation of a custom CorExtend block is illustrated by the following example of the DSP accelerator block.
The block performs several closely related operations. It calculates instantaneous power
of a quadrature signal P(t) which is defined as
P(t) = a2(t) + b2(t)
where a(t) and b(t) are correspondingly real and imaginary parts of a quadrature signal.
This operation is useful for signal detection using comparing with a threshold.
Table below shows a list of implemented UDI instructions.

List of implemented UDI instructions
Instruction Explanation function field
UDI0 RD; RS; RT RD = RS[31:16]2 + RT[31:16]2 6'b010000
UDI1 RD; RS; RT RD = (RS[31:16]2 + RT[31:16]2) >> 1 6'b010001
UDI2 RD; RS RD = RS[31:16]2 6'b010010
UDI3 RS stored_threshold = RS 6'b010011
UDI4 RD; RS; RT RD = ( (RS[31:16]2 + RT[31:16]2) > stored_threshold ) ? 1:0 6'b010100
UDI5 RD; RS; RT RD = ( ((RS[31:16]2 + RT[31:16]2) >> 1) > stored_threshold ) ? 1:0 6'b010101
UDI6 RD; RS; RT RD = ( RS[31:16]2 > stored_threshold ) ? 1:0 6'b010110

UDI0 calculates instantaneous power. RS and RT are source operands which contain 16-bit real and imaginary parts of a quadrature signal. The 32-bit result is put in a RD destination register.

UDI1 does essentially the same operation as UDI0. The difference is that UDI1 shifts the result to prevent overflow.

UDI2 calculates instantaneous power using only real part of a quadrature signal. RT operand is not used.

UDI3 stores 32-bit threshold value in an internal register of the CorExtend block, no result is returned.

UDI4, UDI5, and UDI6 correspondingly do UDI0, UDI1, and UDI2 operations and compare the result with the stored threshold value. If it is exceeded, a value of 32'd1 is returned. Otherwise, a value of 32'd0 is returned.

All instructions, except UDI3, write results to the register file and, therefore, require the address of the destination register. To that end, field RD was included in the instruction word structure, as shown in figure below.
Example of custom UDI instruction
The code listing below shows the program written in MIPS assembler for testing all developed UDI instructions.

Example project that implements in Verilog custom CorExtend block from the example above can be downloaded with the link

The project includes all sources needed for simulation except the files from rtl_up directory. To obtain them, you need to register in the Imagination University Programme and make a request for downloading ( You may also need XilinxCorelib for simualtion. It can be compiled in Vivado using tcl command compile_simlib.

Example project has two variants of custom CorExtend block. The first one performs all UDI instructions in one cycle. The second one has additional pipelining and requires more cycles for some instructions. It was made especially to utilize UDI_stall_m signal.

Waveforms below show simulation of the assembler program from the above.
In the first unpipelined variant first three instructions UDI0, UDI1, and UDI2 are executed as show in the figure below (click to enlarge).
It can be seen that instructions arrives on UDI_ir_e with the signals UDI_irvalid_e and UDI_start_e. Operands are valid in the same cycle on UDI_rs_e and UDI_rt_e. The address of the GPR register to write the result is also formed in this very cycle. In the next cycle the result is valid on UDI_rd_m.
The signals from the register file (rf) are presented in the waveform as well. The result is written to the GPR with address displayed on mpc_dest_w. The data value can be seen on edp_wrdata_w with the strobe on mpc_rfwrite_w. Addresses of the operands being read from GPR are presented on mpc_rega_i and mpc_regb_i.

In the figure below the instructions UDI3, UDI4, UDI5, and UDI6 are shown.
As can be seen from the code listing, UDI3 writes a value 0xbeafdead to stored_threshold. It is a value of zero, that is written to the result since none of the computation products have exceeded threshold.

In the next waveform instructions UDI3, UDI4, UDI5, and UDI6 are executed again after a conditional jump was taken. Here the threshold value is lower than the computational products, and thus the results of executing these instructions are 0x000001.

The next three waveforms shows simulation of pipelined UDI block.

In the figure below instructions UDI0, UDI1, and UDI2 are executed. UDI_stall_m asserted while computation is being done in the UDI block. The result arrives on UDI_rd_m in the next cycle after deasserting UDI_stall_m. In the further cycle the result is written to the GPR.
In the waveform below instructions UDI4, UDI5, and UDI6 are executed with signal UDI_stall_m.
In the waveform below instructions UDI4, UDI5, and UDI6 are executed. The result value is different from the figure above.

Using MIPS microAptiv UP Processor CorExtend UDI interface Using MIPS microAptiv UP Processor CorExtend UDI interface Reviewed by kirillzats on 3:07 PM Rating: 5


  1. hi, could you describe step in more details how to run your custom command using modelsim, i have downloaded your files from repository and in readme file i think is mistyped, I would be very grateful

  2. Hi, could you please be more specific? What steps did you perform? Running the example should be fairly easy. Just run the script from /tb folder like it described here You do not even need Vivado; all necessary files are already in the repository. The only important thing is copying files from rtl_up. I could not put it on GitHub, the only way to get them is downloading from the Imagination site.

    If someone is interested in such articles I can continue writing them. I have half ready similar article about using co-processor. I have just been a little busy lately and abandoned this.

  3. I thought back about my comment and realized that you will actually need Xilinx Simulation Libraries, so make sure that you complied them and modelsim can locate them. I used Vivado 2012.4, but you can easily pick newer version and replace those multipliers with newer ones.

    1. hi, could you describe step how to compile xilinxcorelib, it is possible compile using command line? thank you, for your time)

  4. hi, thank you for your answer, if i have already got rtl_up files and copied it to root directory and i have few questions: in rtl_up_changed two files, but according to instruction we use only one of them and second question about lunching modeling i opened simMIPSfpga.tcl in modelSim but i got some error i couldn`t realise why? I got following output
    # do simMIPSfpga.tcl
    # Model Technology ModelSim PE vmap 10.4a Lib Mapping Utility 2015.03 Apr 7 2015
    # vmap -modelsim_quiet work work
    # Copying C:/Modeltech_pe_edu_10.4a/win32pe_edu/../modelsim.ini to modelsim.ini
    # Modifying modelsim.ini
    # ** Warning: Copied C:/Modeltech_pe_edu_10.4a/win32pe_edu/../modelsim.ini to modelsim.ini.
    # Updated modelsim.ini.
    # Model Technology ModelSim PE vmap 10.4a Lib Mapping Utility 2015.03 Apr 7 2015
    # vmap -modelsim_quiet mult_gen_v11_2 mult_gen_v11_2
    # Modifying modelsim.ini
    # Model Technology ModelSim PE Student Edition vcom 10.4a Compiler 2015.03 Apr 7 2015
    # Start time: 17:55:53 on Jun 05,2016
    # vcom -reportprogress 300 -work xilinxcorelib ../core_project/core_project/core_project.srcs/sources_1/ip/mult_16_x_16_res_32/mult_gen_v11_2/simulation/mult_gen_pkg_v11_2.vhd
    # ** Error: (vcom-66) Execution of vlib failed. Please check the error log for more details.
    # End time: 17:55:53 on Jun 05,2016, Elapsed time: 0:00:00
    # Errors: 1, Warnings: 0
    # ** Error: C:/Modeltech_pe_edu_10.4a/win32pe_edu/vcom failed.
    # Error in macro ./simMIPSfpga.tcl line 13
    # C:/Modeltech_pe_edu_10.4a/win32pe_edu/vcom failed.
    # while executing
    # "vcom -work xilinxcorelib ../core_project/core_project/core_project.srcs/sources_1/ip/mult_16_x_16_res_32/mult_gen_v11_2/simulation/mult_gen_pkg_v11_2..."

  5. I don't know, looks like it cannot find path to mult_gen_pkg_v11_2.vhd. If you run it from the GUI and not in command-line mode make sure you changed directory to the tcl file location. You can also try to read some manuals (e.g.,

    1. Thank you for answer, but i run all this command from command line. Did i need change path in simMIPSfpga.tcl or not?

  6. If you had run it from command line, then you must have been in the right directory already. You do not need to change anything in this case, it should be fine then. Just make sure that all paths in tcl files corresponds with locations of files on your computer. It should be OK if you just copy all files from the repository in some folder and leave everything intact. Relative Addressing will do the rest.

  7. could you specify where should be xilinxcorelib placed, i put it to root directory is it correct? and how to point a modelSim path to xilinxCorelib?

    1. Usually it is specified in modelsim.ini in modelsim folder (like this xilinxcorelib = path_to_xilinxcorelib). It is convinient to just copy those paths from modelsim.ini which is generated by Vivado after executing compile_simlib with specifying -simulator key.

  8. For generating xilinxcorelib i use the following command: "compile_simlib -language all -dir C:/xilinxcorelib -simulator modelsim -simulator_exec_path C:/Modeltech_pe_edu_10.4a/win32pe_edu -library all -family all" is it correct or there are some another command?

    1. That should be fine, you can use then modelsim.ini which will be generated in C:/xilinxcorelib in your case. Replace original modelsim.ini with this or just add strings with paths to libraries. Backuping original modelsim.ini is also a good idea.

  9. hi, i using vivado 2014.1 and as i understand in this version compile_simlib command do not create what i need it is create only following directory with files: [URL=][IMG][/IMG][/URL][URL=][IMG][/IMG][/URL]
    for all directories in xilinxcorelib.

    1. Vivado 2014.1 works fine, you did something wrong, sorry.