代做CSE x25 Lab 4调试Python程序

CSE x25 Lab 4

Welcome to CSE 125/225! Each lab is formed from one or more parts. Each part is

relatively independent and parts can normally be completed in any order. Each part will teach a concept by implementing multiple modules.

Each lab will be graded on multiple categories: correctness, style/lint, git hygiene, and demonstration. Correctness will be assessed by our autograder. Lint will be assessed by

Verilator (make lint). Style. and hygiene will be graded by the TAs.

To run the test scripts in this lab, run make test from one of the module directories.   This will run (many) tests against your solution in two simulators: Icarus Verilog, and Verilator. Both will generate waveform. files (.fst) in a directory: run/

values>/. You will need to run make extraclean after make test to clean your previous output files.

You may use any waveform. viewer you would like. The codespace has Surfer installed.  It is also an excellent web-based viewer. You can also download GTKWave, which is a bit more finicky.

Each Part will have a demonstration component that must be shown to a TA or instructor for credit. We may manually grade style/lint after the assignment deadline. Style. checking and linting feels pedantic, but it is the gold standard in industry.

At any time you can run make help from inside one of the module directories to find all of the commands you can run.

When you have questions, please ask. Otherwise, happy hardware hacking!

Assignment 4 Link: https://classroom.github.com/a/J8m2GxTt

Part 1

Part 2

Part 3

Part 4

Part 1: Synchronous Memories

In this lab, our goal is to demonstrate synchronous memories, which are a bit trickier to  use than asynchronous memories. However, synchronous memories are much more dense (on FPGAs).

To this end, you will write a 1R1W Synchronous Memory (Read-priority, 1 Write Port, 1     Read Port, synchronous memory). You will need to pass simulation and then verify that the code you have written synthesizes to an FPGA memory using Yosys.

You may use whatever operators and behavioral description you prefer, except you may not use always@(*). You are encouraged to reuse whatever modules see fit from previous labs  or this lab

ram_1r1w_sync: Using behavioral SystemVerilog, create a read-priority (aka read-first) synchronous memory with 1 write port and 1 read port. It must be parameterizable in

depth (number of elements) and width.

Note: You do not need to use $readmemh for this module. Ignore the memory initialization parameter.

A portion of the grade for this part is to demonstrate that Yosys is correctly interpreting your memory as an FPGA memory. Use the following command to generate the schematic

$ make mapped.pdf

You should see a single RAM instance in the schematic. If you do not, your design is not being recognized as a memory; do not continue to the next part until this is fixed

There will be additional circuitry in the schematic, but you are specifically looking for a memory that looks like this somewhere in the schematic.

fifo_1r1w_sync: Using behavioral SystemVerilog, create synchronous FIFO with 1 write port and 1 read port using your ram_1r1w_sync.  It must be parameterized in depth_log 2 p (log base 2 of the number of elements) and width_p (number of input/output bits).

If you get stuck on the fifo, feel free to move on to Parts 2 and 3 and come back. Don’t spend all of your time here.

There are two challenges you must address:

First, because reads are synchronous, this is slightly trickier than the previous FIFO.

Using a synchronous read pointer means data will take two cycles to be read from memory (one to update the counter, one to read), compared to the previous cycle (one to update, asynchronous read). As a hint: you should consider how your counter module must be modified to make it work for this solution, so that it doesn’t take a cycle for the RAM to get an updated address. How do Mealy machines help?

Solving the challenge above will get ~60% of the autograder points for the FIFO.

Second: For full points, when the fifo is empty, or the read pointer trails the write pointer by one slot, it will take two cycles to get the data out of the memory. Two tips: What additional circuitry can you write that forwards the data without passing through the memory? The solution feels very similar to hazard forwarding, and sort of like an express lane on a highway.

Solving the second challenge will get 100% of the autograder points for the FIFO. test_by

Demonstration:

First, demonstrate to the TAs that your ram_1r1w_sync produces a schematic like shown above.

Second, demonstrate your working FIFO by using it to connect between audio input and output on your FPGA board. You will plug the PMOD I2S2 into PMOD Port B on your board, and then use 3.5mm cables to connect to the Audio I/O ports to/from your computer/speaker. You can set your FIFO to a larger depth in this implementation.

What is the maximum value for depth_p that you can use on your FPGA, before the toolchains fail to compile? This fifo can be much less efficient for small width_p and small depth_p values. However, it is much more space efficient for large width_p and large depth_p values. Why? Your answer may be in the schematic viewer for both labs.

Part 2: Delay Buffers

You may use whatever operators and behavioral description you prefer, except you may not use always@(*). You are encouraged to reuse whatever modules see fit from previous labs  or this lab.

regdelaybuffer: Using behavioral SystemVerilog, create a multi-bit delay buffer with a

ready/valid interface on both sides. The module must be parameterized in width_p, and delay_p. The module must consume an input data item, and provide that item

delay_p ready/valid handshakes later.

delay_p  is the number of handshakes that must pass before an input reappears. So if delay_p is 2, and the sequence is:

Input: A B C D

Where A B C and D are data that was present at the handshake, the output is: Output: x x A B C D

Where x is “don’t care” . It will be x in your simulation when simulation starts. The following is roughly the python code I use to test your module:

buf = [0] * delay_p  expected = buf[0]   buf[0] = input_data buf = np.roll(buf, 1)

Note: This is basically an elastic pipeline stage (from Lab 3) but with extra datapath registers to delay the output data. There will be one valid_o state bit and the logic is exactly the same as the elastic stage. It is OK to provide valid_o” when data is x/unknown for the first few cycles (I don’t check this data). With ramdelaybuffer, you are replacing these registers with a circular buffer in memory.

Note: The following may be helpful, to show the contents of your buffer in the waveform.

logic [width_p-1:0] buffer [delay_p:0];

generate

for (genvar i = 0; i <= delay_p; i++)

wire [width_p-1:0] temp = buffer [i];

endgenerate

srldelaybuffer: Shift registers are so common in digital logic that modern FPGAs have specific primitives to support them (Plot twist: they’re really LUTs.) The following is the  instantiation template for an SRL16E. The documentation is here.

module SRL16E (Q, A0, A1, A2, A3, CE, CLK, D);

parameter INIT = 16'h0000;

output Q;

input A0, A1, A2, A3, CE, CLK, D;

Using the SRL16E, create a multi-bit delay buffer with a ready/valid interface on both sides. You may write behavioral SystemVerilog for the ready/valid logic, but not the   actual delay buffer.

You do not need to handle delay_p > 16. The key sentence from the documentation is:

The inputs A3, A2, A1, and A0 select the depth of the shift register.

counter: Using behavioral SystemVerilog, create a counter module that rolls over at

max_val_p, and rolls under from 0 to max_val_p. All other behavior. should remain the same.

The fact that this counter is here, should be a massive hint for the next module.

ramdelaybuffer: Using behavioral SystemVerilog, create a multi-bit delay buffer with a     ready/valid interface on both sides. The module must be parameterized in width_p, and delay_p. You may not use the regdelaybuffer, though you may use always_ff, you

must use your ram_1r1w_sync module (above).

You are highly encouraged to use the counter module that you have written. The counter should count up when data is read ready_i & valid_o.

Note: This looks like a FIFO, but has slightly different behavior. A fifo has latency that is dynamic and depends on how full it is, but the latency of the delay buffer is only

dependent on delay_p

Demonstration:

Show the TAs your post-synthesis schematics (make mapped.pdf) for the

regdelaybuffer and the ramdelaybuffer modules. Then answer these questions for the TA:

Which implementation of the delay buffer (above) actually synthesizes to the FPGA? If they both synthesize, which one takes less time to compile? Which one takes less area? Be prepared to    explain why to the TAs.

Part 3: Using Optimized IP

Writing behavioral verilog is the most straightforward way to describe circuits. When

doing chip design, or even FPGA design, it is uncommon to use a verilog arithmetic operator for high-performance logic. Why? There are many different circuits for implementing arithmetic

operations. Some projects provide highly optimized implementations of Floating Point

Hardware, for example, Berkeley Hard Float. Sometimes there are options (e.g. rounding

modes) that are not inferred via Synthesis. Some IP libraries, provide high performance

modules that cannot be directly synthesized. Finally, the tools occasionally do not infer FPGA DSP blocks, or fail to correctly infer particular features.

As demonstrated in class, you can use Yosys to get parts of the solution here. The command I used in class was:

yosys -p 'synth_xilinx -top multiplier -family xc6s ;

write_verilog xilinx_synth.sv ; write_json synth.json ' multiplier .sv

adder: Using the provided Xilinx DSP48A1 primitive, implement a pipelined addition operation with a ready/valid interface. Here is the DSP48A1 Datasheet: link. The

simulation module itself is in the provided folder (you are encouraged to read it).   Your solution must have a latency of at least 1 cycle and use the DSP48A1 module. I recommend following this path to getting your adder working:

1. Implement a pipelined adder using the Verilog operator and former elastic module, confirming that your elastic pipeline works. (This is your behavioral model)

2. Replace the Verilog addition operator with the Xilinx DSP48A1 primitive, and determine all of the parameter/signal configurations needed to implement a combinational-only addition. (This is your intermediate model.)

3. Use the internal pipeline registers instead of your elastic pipeline module. The control logic will be similar, if not the same.

multiplier: Using the provided Xilinx DSP48A1 primitive, implement a pipelined multiplication operation with a ready/valid interface.

Your solution must have a latency of at least 1 cycles, and use the Xilinx DSP48A1 DSP.

multiply-accumulate: Using only the provided Xilinx DSP48A1 primitive, implement a pipelined multiply-accumulate operation with a ready/valid interface. You may use as    many or as few pipeline registers as you would like, but you will need at least one.

A multiply-accumulate is a common operation that is performed in just about …

everything. Most processor ISAs provide fused-multiply accumulation instructions. Dot products are multiply-accumulates of two vectors. Matrix multiplications are just structured dot products. Matrix multiplications are used in everything. This core is a streaming dot product.

You can use this drawing as a reference:

Your final solution must use a maximum of 1 DSP. To accomplish this, you must use the internal DSP48A1 accumulate path. See the documentation for more information: link.

Notes:

1. Like above, we recommend writing the Behavioral Verilog first, and then incrementally adding the DSP.

2. You can reuse the logic from your elastic pipeline stage, but be careful. There is one parameter in your elastic pipeline that is critical for correctness.

Demonstration

Demonstrate that your solutions use only one DSP, and no external pipeline registers.

Part 4: Fixed-Point Practice

Have you ever heard anyone complain about how complicated IEEE 754 floating point    is? The problem is that it’s easy to use (in software), until it isn’t: List of Failures from IEEE 754. For this reason, floating point arithmetic isn’t used in many safety critical applications and signal processing. Fixed point operations are vastly less complicated than floating point operations,  require vastly less area, and are numerically stable. This part is a brief intro to fixed point

computations, in the guise of implementing pipelined arithmetic.

Fixed point arithmetic follows the same rules as normal two’s-complement arithmetic. In that sense, you already know the basics. The difference is that when two fixed-point numbers    are multiplied, the number of fractional bits increases. For example, .5 * .5, which is

representable with one fractional bit (binary : 0.1), produces .25, which needs two fractional bits to represent (binary : 0.01). So, when you multiply 0.1 and 0.1, the result will be two bits, 0.01.

I like to handle fractional bits by declaring the fractional bits in the negative range of the  bus. For example, wire [11 :-4] foo, has 12 integer bits, and 4 fractional bits. When foo is multiplied by itself, it produces 24 integer bits, and 8 fractional bits, or  [23:-8]. However, if

[-1 :-4] bus is multiplied by a  [11 :-4] bus, the result is only a  [11 :-8] bus. The widths declared here are a hint for the multiply-accumulate module.

Here are a few good tutorials:

From Berkeley: https://inst.eecs.berkeley.edu/~cs61c/sp06/handout/fixedpt.html From UW: https://courses.cs.washington.edu/courses/cse467/08au/labs/l5/fp.pdf

One final note, when using signed numbers in verilog it is useful to use the signed  keyword. The biggest benefit is that the number will be sign-extended when you increase the number of bits, but it also handles other edge-cases.

Here’s are a few good overviews of the signed keyword:

From Utah: Link From UW: Link

rgb2gray: Using SystemVerilog, implement a RGB to Grayscale Colorspace Conversion module in rgb2gray .sv. The dataflow graph below shows the computation you need to implement.

The computation for RGB to Grayscale with fixed point arithmetic does not use a divide.

We provide it as a hint.

Important Notes:

-    Your solution must use at least two elastic pipeline stages for credit. Some tests only check for correctness. Some tests only check latency.

-    You must use fixed point arithmetic in your implementation, and determine the multiplication constants by hand. Using automatic conversions in Verilog is

error-prone, so any decimal/floating point constants (e.g. .002) will result in a 0.




热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图