Reading: Chapter 2 in the text covers scanners, regular expressions, NFAs and DFAs. Read sections 2.1 through 2.3. Sections 2.4 and 2.5 provide implementation details that you can skim or skip for now. Section 2.6 is on advanced topics that you can skip.
In this assignment you will implement a DFA that decodes the value of Augusta integer literals. To do this, you will create a small, independent Scala project. You won't be using the Augusta codebase directly for this assignment.
For this assignment you will work inside the Augusta code base. However, to ensure the smoothest experience, it is best to work on a personal branch.
Proceed as follows:
Update your clone of the Augusta project using:
$ git pull
Create a branch in your clone using:
$ git branch homework-02
You can name the branch anything you want, but in anticipation of creating separate branches for each assignment, it makes sense to use something like homework-02, homework-03, etc.
Now, check out your new branch to switch to it:
$ git checkout homework-02
You are now working on a personal branch. You can commit to that branch freely without causing interference with any future changes in the main branch. If you want to go back to the original main branch, do:
$ git checkout main
Run the usual SBT commands to ensure everything is working as expected. Note that initially the tests will fail since you haven't implemented anything yet.
$ sbt compile $ sbt test
Open the project folder using your development tool of choice (VS Code, or IntelliJ IDEA).
You are now ready to get started programming!
Integer literals in Augusta are intended to follow the rules for literals in Ada as outlined in Section 2.4 of the Ada Reference Manual. A few important points are mentioned here:
Underscore characters may be embedded in an integer literal. If they are used,
they do not affect the value of the literal. They are only used to enhance readability. For
example: 12_345_678
.
When underscores are used, they never appear at the start or end of the literal: _123
,
123_
. Underscores are never doubled up: 123__456
.
Unlike most languages, integer literals in Ada can have exponents: 1E5
. The
exponent is introduced by either an upper or lower case 'e'. The exponent can optionally
include a '+' sign: 1E+5
. Negative exponents are not allowed for integer
literals.
There are no negative integer literals. Expressions of the form -10
are
taken to be the application of the unary minus operator to the integer literal 10
.
Based integer literals take the following form: 16#0A3C#
. The leading number
is the base and is always in decimal. The value between the hash marks is the literal, written
in the indicated base. The letters 'a' through 'f' (in either casing) are used in the expected
way.
The base must be in the range 2 through 16 inclusive. No other bases are permitted. All the digits of the literal must have a value that is less than the base (with the usual interpretations for the values of the letters).
Based integer literals can have exponents. The exponent is always in base 10: 16#2A#E+2
This is 42 * 162
= 10752.
Integer literals never overflow. They always have the value they appear to have regardless of the ranges on the target system's primitive types. For example:
16#7FFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF# = 2127 - 1
Of course, such large literals might be out of range for the context in which they are used, but that's a problem for elsewhere in the compiler (semantic analysis). In any case, the compiler is never confused about the meaning and value of an integer literal regardless of how large it is.
This assignment is NOT about checking for the validity of an integer literal. You can assume
that has already been done by the scanner. This assignment is about extracting the value of an
integer literal, which you can assume is in a valid format. The value should be returned as a BigInt
so that it can handle literals of any size.
Here are some examples of valid integer literals:
42
5E6
(5 million)16#2A#
(42 in hex)16#2A#E+2
(10752)16#7FFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF#
(2127 - 1)2#1000_0001#
(129)1_6#2A#
(42, but perversely with an underscore in the base)1E10_000
(1010000)8#52#
(5*8 + 2 = 42 in octal)13#2C#
(2*13 + 12 = 38 in base 13)Write a function decodeIntegerLiteral
with the following signature:
def decodeIntegerLiteral(literal: String): BigInt
Start by designing a deterministic finite automaton (DFA) that can process integer literals and extract their value. You can assume the format of the literal has been previously checked and is known to be valid. This allows some simplification. Once you are confident that your DFA will work, the implementation should go smoothly.
Write appropriate test cases so that the command sbt test exercises your function in a meaningful way.
HINT! Don't try to get everything working all at once. Build up your solution in stages:
After each step, set aside your code so you have something working in reserve before embarking on the next step. This will be useful if you run out of time before completing everything.
Submit your project folder as a ZIP file. Include all the source code, test code, and any other files needed to build and run your project.
Last Revised: 2025-02-03
© Copyright 2025 by Peter Chapin <peter.chapin@vermontstate.edu>